AsoSoft text corpus

AsoSoft text corpus

The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The corpus is partially tagged by topic, so it can be used for topic identification tasks. Also, it is applicable for extracting language model and computational lexicon information. Part of the corpus (75 million tokens) is available online for non-commercial use. The corpus uses the TEI format.

Film recorder

A film recorder is a graphical output device for transferring images to photographic film from a digital source. In a typical film recorder, an image is passed from a host computer to a mechanism to expose film through a variety of methods, historically by direct photography of a high-resolution cathode-ray tube (CRT) display. The exposed film can then be developed using conventional developing techniques, and displayed with a slide or motion picture projector. The use of film recorders predates the current use of digital projectors, which eliminate the time and cost involved in the intermediate step of transferring computer images to film stock, instead directly displaying the image signal from a computer. Motion picture film scanners are the opposite of film recorders, copying content from film stock to a computer system. Film recorders can be thought of as modern versions of kinescopes. == Design == === Operation === All film recorders typically work in the same manner. The image is fed from a host computer as a raster stream over a digital interface. A film recorder exposes film through various mechanisms; flying spot (early recorders); photographing a high resolution video monitor; electron beam recorder (Sony HDVS); a CRT scanning dot (Celco); focused beam of light from a light valve technology (LVT) recorder; a scanning laser beam (Arrilaser); or recently, full-frame LCD array chips. For color image recording on a CRT film recorder, the red, green, and blue channels are sequentially displayed on a single gray scale CRT, and exposed to the same piece of film as a multiple exposure through a filter of the appropriate color. This approach yields better resolution and color quality than possible with a tri-phosphor color CRT. The three filters are usually mounted on a motor-driven wheel. The filter wheel, as well as the camera's shutter, aperture, and film motion mechanism are usually controlled by the recorder's electronics and/or the driving software. CRT film recorders are further divided into analog and digital types. The analog film recorder uses the native video signal from the computer, while the digital type uses a separate display board in the computer to produce a digital signal for a display in the recorder. Digital CRT recorders provide a higher resolution at a higher cost compared to analog recorders due to the additional specialized hardware. Typical resolutions for digital recorders were quoted as 2K and 4K, referring to 2048×1366 and 4096×2732 pixels, respectively, while analog recorders provided a resolution of 640×428 pixels in comparison. Higher-quality LVT film recorders use a focused beam of light to write the image directly onto a film loaded spinning drum, one pixel at a time. In one example, the light valve was a liquid-crystal shutter, the light beam was steered with a lens, and text was printed using a pre-cut optical mask. The LVT will record pixel beyond grain. Some machines can burn 120-res or 120 lines per millimeter. The LVT is basically a reverse drum scanner. The exposed film is developed and printed by regular photographic chemical processing. === Formats === Film recorders are available for a variety of film types and formats. The 35 mm negative film and transparencies are popular because they can be processed by any photo shop. Single-image 4×5 film and 8×10 are often used for high-quality, large format printing. Some models have detachable film holders to handle multiple formats with the same camera or with Polaroid backs to provide on-site review of output before exposing film. == Uses == Film recorders are used in digital printing to generate master negatives for offset and other bulk printing processes. For preview, archiving, and small-volume reproduction, film recorders have been rendered obsolete by modern printers that produce photographic-quality hardcopies directly on plain paper. They are also used to produce the master copies of movies that use computer animation or other special effects based on digital image processing. However, most cinemas nowadays use Digital Cinema Packages on hard drives instead of film stock. === Computer graphics === Film recorders were among the earliest computer graphics output devices; for example, the IBM 740 CRT Recorder was announced in 1954. Film recorders were also commonly used to produce slides for slide projectors; but this need is now largely met by video projectors that project images directly from a computer to a screen. The terms "slide" and "slide deck" are still commonly used in presentation programs. === Current uses === Currently, film recorders are primarily used in the motion picture film-out process for the ever increasing amount of digital intermediate work being done. Although significant advances in large venue video projection alleviates the need to output to film, there remains a deadlock between the motion picture studios and theater owners over who should pay for the cost of these very costly projection systems. This, combined with the increase in international and independent film production, will keep the demand for film recording steady for at least a decade. == Key manufacturers == Traditional film recorder manufacturers have all but vanished from the scene or have evolved their product lines to cater to the motion picture industry. Dicomed was one such early provider of digital color film recorders. Polaroid, Management Graphics, Inc, MacDonald-Detwiler, Information International, Inc., and Agfa were other producers of film recorders. Arri is the only current major manufacturer of film recorders. Kodak Lightning I film recorder. One of the first laser recorders. Needed an engineering staff to set up. Kodak Lightning II film recorder used both gas and diode laser to record on to film. The last LVT machines produced by Kodak / Durst-Dice stopped production in 2002. There are no LVT film recorders currently being produced. LVT Saturn 1010 uses a LED exposure (RGB) to 8"x10" film at 1000-3000ppi. LUX Laser Cinema Recorder from Autologic/Information International in Thousand Oaks, California. Sales end in March 2000. Used on the 1997 film “Titanic”. Arri produces the Arrilaser line of laser-based motion picture film recorders. MGI produced the Solitaire line of CRT-based motion picture film recorders. Matrix, originally ImaPRO, a branch of Agfa Division, produced the QCR line of CRT-based motion picture film recorders. CCG, formerly Agfa film recorders, has been a steady manufacturer of film recorders based in Germany. In 2004 CCG introduced Definity, a motion picture film recorder utilizing LCD technology. In 2010 CCG introduced the first full LED LCD film recorder as a new step in film recording. Cinevator was made by Cinevation AS, in Drammen, Norway. The Cinevator was a real-time digital film recorder. It could record IN, IP and prints with and without sound Oxberry produced the Model 3100 film recorder camera system, with interchangeable pin-registered movements (shuttles) for 35 mm (full frame/Silent, 1.33:1) and 16 mm (regular 16, "2R"), and others have adapted the Oxberry movements for CinemaScope, 1.85:1, 1.75:1, 1.66:1, as well as Academy/Sound (1.37:1) in 35 mm and Super-16 in 16 mm ("1R"). For instance, the "Solitaire" and numerous others employed the Oxberry 3100 camera system. == History == Before video tape recorders or VTRs were invented, TV shows were either broadcast live or recorded to film for later showing, using the kinescope process. In 1967, CBS Laboratories introduced the Electronic Video Recording format, which used video and telecined-to-video film sources, which were then recorded with an electron-beam recorder at CBS' EVR mastering plant at the time to 35mm film stock in a rank of 4 strips on the film, which was then slit down to 4 8.75 mm (0.344 in) film copies, for playback in an EVR player. All types of CRT recorders were (and still are) used for film recording. Some early examples used for computer-output recording were the 1954 IBM 740 CRT Recorder, and the 1962 Stromberg-Carlson SC-4020, the latter using a Charactron CRT for text and vector graphic output to either 16 mm motion picture film, 16 mm microfilm, or hard-copy paper output. Later 1970 and 80s-era recording to B&W (and color, with 3 separate exposures for red, green, and blue)) 16 mm film was done with an EBR (Electron Beam Recorder), the most prominent examples made by 3M), for both video and COM (Computer Output Microfilm) applications. Image Transform in Universal City, California used specially modified 3M EBR film recorders that could perform color film-out recording on 16 mm by exposing three 16 mm frames in a row (one red, one green and one blue). The film was then printed to color 16 mm or 35 mm film. The video fed to the recorder could either be NTSC, PAL or SECAM. Later, Image Transform used specially modified VTRs to record 24 frame for their "Image Vision" system. The modified 1 inch type B videotape VTRs would record

Comparison of machine translation applications

Machine translation is an algorithm which attempts to translate text or speech from one natural language to another. == General information == Basic general information for popular machine translation applications. == Languages features comparison == The following table compares the number of languages which the following machine translation programs can translate between. (Moses and Moses for Mere Mortals allow you to train translation models for any language pair, though collections of translated texts (parallel corpus) need to be provided by the user. The Moses site provides links to training corpora.) This is not an all-encompassing list. Some applications have many more language pairs than those listed below. This is a general comparison of key languages only. A full and accurate list of language pairs supported by each product should be found on each of the product's websites. === Multi-pair translations === === Paired translations ===

Babak Hodjat

Babak Hodjat (Persian: بابک حجت; born November 1, 1967) is a British computer scientist, entrepreneur, and writer. He was the co-founder and CEO of Sentient Technologies and now holds the position of Chief Technology Officer AI at Cognizant. He is a specialist in the field of artificial intelligence and machine learning. In 1998 Hodjat co-founded Dejima Inc and served as CEO and CTO, his patented work on artificial intelligence led to the technology used by Apple for their digital assistant Siri. == Biography == === Early life === Babak Hodjat was born on November 1, 1967, in Wimbledon. His father was a retired university professor in entomology who worked at the British Museum. As a child, he did not like insects and would wander off to the nearby science museum, where he would spend long hours in front of a computer they had on display. He attended middle school in the United States. He studied at the Sharif University of Technology from 1986 to 1995, and received his Master of Science degree in software engineering. In 1994, together with another computer department student Hormoz Shahrzad presented their research titled Introducing a dynamic problem solving scheme based on a learning algorithm in artificial life environments at the first IEEE Conference on Computational Intelligence held at Orlando. Hodjat received a PhD in machine intelligence from Kyushu University in 2003 During his time there, he published several works on adaptive agent oriented software architecture and natural language user interfaces. === Career in science and business === Hodjat moved to Silicon Valley, California in 1998 and founded Dejima Inc. (named after the historic Japanese Dejima artificial island). The firm was based on a patented adaptive agent-oriented software engineering platform developed by Hodjat, Christopher Savoie and Makoto Amamiya. Hodjat served as the CTO and as the CEO for 9 months from October 2000. By 2000 the company had offices in San Jose, London and Tokyo. In 2002, the company developed a voice control Natural Interaction Platform (NPI) in collaboration with the Stanford University's research group Archimedes Project. During these years Hodjat continued his research on agent oriented software architecture and natural language user interfaces. In July 2003, Dejima got funding from SRI International within the Cognitive Assistant that Learns and Organizes (CALO) project of DARPA and worked on a Perceptive Assistant that Learns (PAL) initiative. Hodjat was the primary inventor of the firm's agent-oriented technology applied to intelligent interfaces for mobile and enterprise computing – a technology that eventually led to Siri. In April 2004, Dejima was acquired by Sybase iAnywhere. Hodjat served as senior director of engineering at Sybase iAnywhere from 2004 to 2008, where he developed AvantGo Platform, mBusiness Anywhere, and Answers Anywhere. In 2006, he co-founded MobileVerbs Inc., a mobile marketing service company, which was acquired by iLoop Mobile in February 2010. In 2007, he teamed with Antoine Blondeau (former CEO of Dejima) and Adam Cheyer (Dejima's vice president and Chief Architect of the CALO project) to establish Genetic Finance Holding Ltd. (where he began as CTO). In 2014 the firm became Sentient Technologies. Hodjat was joined by his long-time research fellow Hormoz Shahrzad who became principal scientist, while Hodjat held the position of chief scientist. In the following years Hodjat has worked on developing massively distributed computing technology and improving machine-learning technique known as evolutionary algorithms. One area that gained special attention from the press was applying Sentient Technologies algorithms to a stock market trading through specially created Sentient Investment Management hedge fund. Following the management change within Sentient Technologies, Hodjat became the company's CEO in February 2017. He continues his business and educational projects (he was on the jury of IBM Watson AI XPRIZE and the Merit Awards committee for the ISAL Award). == Writing == Hodjat is the author of multiple books such as The Konar and the Apple: Fun, Beauty, and Dread--From Ahwaz to California and the science fiction novel "The Narrator" (January 2022; ISBN 978-1-7354860-1-7)(March 2023; ISBN 978-1-7354860-0-0). == Selected publications == Hodjat, B.; Shahrzad, H. (1994). "Introducing a dynamic problem solving scheme based on a learning algorithm in artificial life environments". IEEE International Joint Conference on neural networks (IJCNN-94). Vol. 4. IEEE International Joint Conference on neural networks. pp. 2333–2338. doi:10.1109/ICNN.1994.374583. ISBN 978-0-7803-1901-1. S2CID 60497133. Hodjat, B.; Savoie, C.J.; Amamiya, M. (2006) [1998]. "An adaptive agent oriented software architecture". PRICAI'98: Topics in Artificial Intelligence. Springer. pp. 33–46. arXiv:cs/9812014. doi:10.1007/BFb0095256. ISBN 978-3-540-49461-4. S2CID 5317786. Hodjat, B.; Amamiya, M. (2000-05-25). "Applying the Adaptive Agent Oriented Software Architecture to the Parsing of Context Sensitive Grammars". IEICE Transactions on Information and Systems. E83-D (5): 1142–1152. ISSN 0916-8532. Retrieved 2017-12-14. Hodjat, Babak; Hodjat, Siamak; Treadgold, Nick; Jonsson, Ing-Marie (2006). "CRUSE: a context reactive natural language mobile interface". Proceedings of the 2nd annual international workshop on Wireless internet. WICON. doi:10.1145/1234161.1234181. ISBN 978-1-59593-510-6. S2CID 2388254. O'Reilly, Una-May; Wagy, Mark; Hodjat, Babak (2013). "Chapter 6: EC-Star: A Massive-Scale, Hub and Spoke, Distributed Genetic Programming System". In Riolo, R.; Vladislavleva, E.; Ritchie, M.; Moore, J.H. (eds.). Genetic Programming Theory and Practice X. Springer-Verlag New York. pp. 73–85. doi:10.1007/978-1-4614-6846-2. ISBN 978-1-4614-6845-5. S2CID 39650969. Retrieved 2017-12-14. Hodjat, Babak; Hemberg, Erik; Shahrzad, Hormoz; O'Reilly, Una-May (2014). "Chapter 4: Maintenance of a Long Running Distributed Genetic Programming System for Solving Problems Requiring Big Data". In Riolo, Rick; Moore, Jason H.; Kotanchek, Mark (eds.). Genetic Programming Theory and Practice XI. Springer-Verlag New York. pp. 65–83. doi:10.1007/978-1-4939-0375-7. ISBN 978-1-4939-0374-0. S2CID 28843739. Retrieved 2017-12-14. Shahrzad, Hormoz; Hodjat, Babak; Miikkulainen, Risto (2016). "Estimating the Advantage of Age-Layering in Evolutionary Algorithms". Proceedings of the Genetic and Evolutionary Computation Conference 2016. Genetic and Evolutionary Computation Conference. pp. 693–699. doi:10.1145/2908812.2908911. ISBN 978-1-4503-4206-3. S2CID 215516530. == Patents == Babak Hodjat holds 21 patents in the fields of agent-oriented programming, natural language decision engines, distributed evolutionary algorithms for asset management and trading and data mining.

Noisy channel model

The noisy channel model is a framework used in spell checkers, question answering, speech recognition, and machine translation. In this model, the goal is to find the intended word given a word where the letters have been scrambled in some manner. == In spell-checking == See Chapter B of. Given an alphabet Σ {\displaystyle \Sigma } , let Σ ∗ {\displaystyle \Sigma ^{}} be the set of all finite strings over Σ {\displaystyle \Sigma } . Let the dictionary D {\displaystyle D} of valid words be some subset of Σ ∗ {\displaystyle \Sigma ^{}} , i.e., D ⊆ Σ ∗ {\displaystyle D\subseteq \Sigma ^{}} . The noisy channel is the matrix Γ w s = Pr ( s | w ) {\displaystyle \Gamma _{ws}=\Pr(s|w)} , where w ∈ D {\displaystyle w\in D} is the intended word and s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} is the scrambled word that was actually received. The goal of the noisy channel model is to find the intended word given the scrambled word that was received. The decision function σ : Σ ∗ → D {\displaystyle \sigma :\Sigma ^{}\to D} is a function that, given a scrambled word, returns the intended word. Methods of constructing a decision function include the maximum likelihood rule, the maximum a posteriori rule, and the minimum distance rule. In some cases, it may be better to accept the scrambled word as the intended word rather than attempt to find an intended word in the dictionary. For example, the word schönfinkeling may not be in the dictionary, but might in fact be the intended word. === Example === Consider the English alphabet Σ = { a , b , c , . . . , y , z , A , B , . . . , Z , . . . } {\displaystyle \Sigma =\{a,b,c,...,y,z,A,B,...,Z,...\}} . Some subset D ⊆ Σ ∗ {\displaystyle D\subseteq \Sigma ^{}} makes up the dictionary of valid English words. There are several mistakes that may occur while typing, including: Missing letters, e.g., leter instead of letter Accidental letter additions, e.g., misstake instead of mistake Swapping letters, e.g., recieved instead of received Replacing letters, e.g., fimite instead of finite To construct the noisy channel matrix Γ {\displaystyle \Gamma } , we must consider the probability of each mistake, given the intended word ( Pr ( s | w ) {\displaystyle \Pr(s|w)} for all w ∈ D {\displaystyle w\in D} and s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} ). These probabilities may be gathered, for example, by considering the Damerau–Levenshtein distance between s {\displaystyle s} and w {\displaystyle w} or by comparing the draft of an essay with one that has been manually edited for spelling. == In machine translation == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. See chapter 1, and chapter 25 of. Suppose we want to translate a foreign language to English, we could model P ( E | F ) {\displaystyle P(E|F)} directly: the probability that we have English sentence E given foreign sentence F, then we pick the most likely one E ^ = arg ⁡ max E P ( E | F ) {\displaystyle {\hat {E}}=\arg \max _{E}P(E|F)} . However, by Bayes law, we have the equivalent equation: E ^ = argmax E ∈ English P ( F ∣ E ) ⏞ translation model P ( E ) ⏞ language model {\displaystyle {\hat {E}}={\underset {E\in {\text{ English }}}{\operatorname {argmax} }}\overbrace {P(F\mid E)} ^{\text{translation model }}\overbrace {P(E)} ^{\text{language model}}} The benefit of the noisy-channel model is in terms of data: If collecting a parallel corpus is costly, then we would have only a small parallel corpus, so we can only train a moderately good English-to-foreign translation model, and a moderately good foreign-to-English translation model. However, we can collect a large corpus in the foreign language only, and a large corpus in the English language only, to train two good language models. Combining these four models, we immediately get a good English-to-foreign translator and a good foreign-to-English translator. The cost of noisy-channel model is that using Bayesian inference is more costly than using a translation model directly. Instead of reading out the most likely translation by arg ⁡ max E P ( E | F ) {\displaystyle \arg \max _{E}P(E|F)} , it would have to read out predictions by both the translation model and the language model, multiply them, and search for the highest number. == In speech recognition == Speech recognition can be thought of as translating from a sound-language to a text-language. Consequently, we have T ^ = argmax T ∈ Text P ( S ∣ T ) ⏞ speech model P ( T ) ⏞ language model {\displaystyle {\hat {T}}={\underset {T\in {\text{ Text }}}{\operatorname {argmax} }}\overbrace {P(S\mid T)} ^{\text{speech model }}\overbrace {P(T)} ^{\text{language model}}} where P ( S | T ) {\displaystyle P(S|T)} is the probability that a speech sound S is produced if the speaker is intending to say text T. Intuitively, this equation states that the most likely text is a text that's both a likely text in the language, and produces the speech sound with high probability. The utility of the noisy-channel model is not in capacity. Theoretically, any noisy-channel model can be replicated by a direct P ( T | S ) {\displaystyle P(T|S)} model. However, the noisy-channel model factors the model into two parts which are appropriate for the situation, and consequently it is generally more well-behaved. When a human speaks, it does not produce the sound directly, but first produces the text it wants to speak in the language centers of the brain, then the text is translated into sound by the motor cortex, vocal cords, and other parts of the body. The noisy-channel model matches this model of the human, and so it is appropriate. This is justified in the practical success of noisy-channel model in speech recognition. === Example === Consider the sound-language sentence (written in IPA for English) S = aɪ wʊd laɪk wʌn tuː. There are three possible texts T 1 , T 2 , T 3 {\displaystyle T_{1},T_{2},T_{3}} : T 1 = {\displaystyle T_{1}=} I would like one to. T 2 = {\displaystyle T_{2}=} I would like one too. T 3 = {\displaystyle T_{3}=} I would like one two. that are equally likely, in the sense that P ( S | T 1 ) = P ( S | T 2 ) = P ( S | T 3 ) {\displaystyle P(S|T_{1})=P(S|T_{2})=P(S|T_{3})} . With a good English language model, we would have P ( T 2 ) > P ( T 1 ) > P ( T 3 ) {\displaystyle P(T_{2})>P(T_{1})>P(T_{3})} , since the second sentence is grammatical, the first is not quite, but close to a grammatical one (such as "I would like one to [go]."), while the third one is far from grammatical. Consequently, the noisy-channel model would output T 2 {\displaystyle T_{2}} as the best transcription.

Night Sky (app)

Night Sky (app) is an application developed and published by indie studio iCandi Apps Ltd. from the UK. Night Sky is a stargazing reference app, where the user can explore a virtual representation of the night sky to identify stars, planets, constellations and satellites. The app is developed specifically for iOS, tvOS and watchOS devices. Night Sky was first released on November 1, 2011 for iOS, and has had multiple updates since launch. Night Sky was mentioned in the September 2016 Apple Keynote during the Apple Watch Series 2 announcement. In October 2016, Night Sky was featured as the Free App of The Week on the Apple App Store. == Reception == Night Sky was featured in Apple's 'Best of 2012' and has also been pre-installed onto iPads in Apple retail stores worldwide.

Is an AI Text-to-video Tool Worth It in 2026?

In search of the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.