STATISTICAL TRANSLATION USING A LARGE
CROSS-REFERENCE TO RELATED
 This application claims priority to U.S. Provisional Application Serial No. 60/368,071, filed on Mar. 26, 2002, the disclosures of which are incorporated by reference.
ORIGIN OF INVENTION
 The research and development described in this application were supported by DARPA under grant number N66001-00-1-8914. The U.S. Government may have certain rights in the claimed inventions.
 Corpus-based approaches to machine translation usually begin with a bilingual training corpus. One approach is to extract from the corpus generalized statistical knowledge that can be applied to new, unseen test sentences. A different approach is to simply memorize the bilingual corpus. This is called translation memory, and it provides excellent translation quality in the case of a "hit" (i.e., a test sentence to be translated has actually been observed before in the memorized corpus). However, it provides no output in the more frequent case of a "miss".
 In an embodiment, a statistical machine translation (MT) system may use a large monolingual corpus (or, e.g., the World Wide Web ("Web")) to improve the accuracy of translated phrases/sentences. The MT system may produce alternative translations and use the large monolingual corpus (or the Web) to (re)rank the alternative translations.
 The MT system may receive an input text segment in a source language, compare alternate translations for said input text string in a target language to text segments in the large monolingual corpus in the target language, and record a number of occurrences of the alternate translations in the large monolingual corpus. The MT system may then re-rank the alternate translations based, at least in part, on the number of occurrences of each translation in the corpus.
 The MT system may build a finite state acceptor (FSA) for the input text string which encodes alternative translations for the input text string in the target language.
BRIEF DESCRIPTION OF THE DRAWINGS
 FIG. 1 is a block diagram of a statistical machine translation system.
 FIG. 2 shows a word alignment between parallel phrases.
 FIG. 3 is a flowchart describing a stochastic process by which a source language string gets converted into a target language string.
 FIG. 4 is a block diagram of a finite state acceptor.
 FIG. 5 is a block diagram of a finite state transducer.
 FIG. 6 is block diagram of a finite state acceptor.
 FIG. 7 is a block diagram of a finite state machine which may be used to model the NULL word insertions.
 FIG. 8 is a flow diagram describing a machine translation operation.
 FIG. 1 illustrates a statistical machine translation (MT) system according to an embodiment. The MT system 100 may be used to translate from a source language (e.g., French) to a target language (e.g., English). The MT system 100 may include a language model 102, a translation model 105, a decoder 110, and a large monolingual corpus 115.
 The MT system 100 may use the large monolingual corpus 115 (or, e.g., the World Wide Web ("Web")) to improve the accuracy of translated phrases/sentences. The MT system 100 may produce alternative translations and use the large monolingual corpus (or the Web) to (re)rank the alternative translations. For example, the French sentence "elle a beaucoup de cran" may be translated by the MT system 100 as both "she has a lot of guts" and "it has a lot of guts", with similar probabilities. Given that "she has a lot of guts" is found more often in a large monolingual English corpus (or on the Web), its score increases significantly and the translation becomes the higher ranked.
 The MT system 100 may be based on a sourcechannel model. The language model (the source) provides an a priori distribution P(e) of probabilities indicating which English text strings are more likely, e.g., which are grammatically correct and which are not. The language model 102 may be an n-gram model trained by a large, naturally generated monolithic corpus (e.g., English) to determine the probability of a word sequence.
 The translation model 105 may be used to determine the probability of correctness for a translation. The translation model may be, for example, an IBM Model 4, described in U.S. Pat. No. 5,477,451. The IBM Model 4 revolves around the notion of a word alignment over a pair of sentences, such as that shown in FIG. 2. A word alignment assigns a single home (English string position) to each French word. If two French words align to the same English word, then that English word is said to have a fertility of two. Likewise, if an English word remains unaligned-to, then it has fertility zero. If a word has fertility greater than one, it is called very fertile.
 The word alignment in FIG. 2 is shorthand for a hypothetical stochastic process by which an English string 200 gets converted into a French string 205. FIG. 3 is a flowchart describing, at a high level, such a stochastic process 300. Every English word in the string is first assigned a fertility (block 305). These assignments may be made stochastically according to a table n(0|ei). Any word with fertility zero is deleted from the string, any word with fertility two is duplicated, etc. After each English word in the new string, the fertility of an invisible English NULL element with probability pi (typically about 0.02) is incremented (block 310). The NULL element may ultimately produce "spurious" French words. A word-for-word replacement of English words (including NULL) by French words is performed, according to the table t(fj|ei) (which together form a translation table, or T-table) (block 315). Finally, the French words are permuted according to certain distortion