WO1999062000A2 - Spelling and grammar checking system - Google Patents

Spelling and grammar checking system Download PDF

Info

Publication number
WO1999062000A2
WO1999062000A2 PCT/US1999/011713 US9911713W WO9962000A2 WO 1999062000 A2 WO1999062000 A2 WO 1999062000A2 US 9911713 W US9911713 W US 9911713W WO 9962000 A2 WO9962000 A2 WO 9962000A2
Authority
WO
WIPO (PCT)
Prior art keywords
word
words
text
fsm
character
Prior art date
Application number
PCT/US1999/011713
Other languages
French (fr)
Other versions
WO1999062000A3 (en
WO1999062000A8 (en
Inventor
Yves Schabes
Emmanuel Roche
Original Assignee
Teragram Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Teragram Corporation filed Critical Teragram Corporation
Priority to AU41003/99A priority Critical patent/AU4100399A/en
Priority to EP99924524A priority patent/EP1145141A3/en
Priority to CA002333402A priority patent/CA2333402A1/en
Publication of WO1999062000A2 publication Critical patent/WO1999062000A2/en
Publication of WO1999062000A3 publication Critical patent/WO1999062000A3/en
Publication of WO1999062000A8 publication Critical patent/WO1999062000A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Definitions

  • the present invention relates generally to a spelling and grammar checking system, and more particularly to a spelling and grammar checking system which corrects misspelled words, incorrectly -used words, and contextual and grammatical errors.
  • the invention has particular utility in connection with machine translation systems, word processing systems, and text indexing and retrieval systems such as World Wide Web search engines.
  • Conventional spelling correction systems check whether each word in a document is found in a dictionary database. When a word is not found in the dictionary, the word is flagged as being incorrectly spelled. Suggestions for replacing the incorrectly-spelled word with its correctly- spelled counterpart are then determined by inserting, deleting and/or transposing characters in the misspelled word. For example, in a sentence like My son thre a ball at me, the word thre is not correctly-spelled.
  • the incorrectly-spelled word thre should be replaced by threw.
  • the word thre should be replaced by the.
  • the word thre should be replaced by three.
  • conventional spelling correction systems suggest the same list of alternative words, ranked in the same order, for all three of the foregoing sentences.
  • the spelling correction program provided in Microsoft ® Word '97 suggests the following words, in the following order, for all three of the foregoing sentences: three, there, the, throe, threw.
  • the present invention addresses the foregoing needs by providing a system which corrects both the spelling and grammar of words using finite state machines, such as finite state transducers and finite state automata.
  • finite state machines such as finite state transducers and finite state automata.
  • the present invention For each word in a text sequence, the present invention provides a list of alternative words ranked according to a context of the text sequence, and then uses this list to correct words in the text (either interactively or automatically).
  • the invention has a variety of uses, and is of particular use in the fields of word processing, machine translation, text indexing and retrieval, and optical character recognition, to name a few.
  • the present invention determines alternatives for misspelled words, and ranks these alternatives based on a context in which the misspelled word occurs. For example, for the sentence My son thre a ball through the window, the present invention suggests the word threw as the best correction for the word thre, whereas for the sentence He broke thre window, the present invention suggests the word the as the best correction for the word thre.
  • the invention displays alternative word suggestions to a user and then corrects misspelled words in response to a user's selection of an alternative word.
  • the present invention determines, on its own, which of the alternatives should be used, and then implements any necessary corrections automatically (i.e. , without user input).
  • the invention also addresses incorrect word usage in the same manner that it addresses misspelled words.
  • the invention can be used to correct improper use of commonly-confused words such as who and whom, homophones such as then and than, and other such words that are spelled correctly, but that are improper in context.
  • the invention will correct the sentence He thre the ball to the sentence He threw the ball (and not three, the, ...); the sentence fragment flight smulator to flight simulator (and not stimulator); the sentence fragment air chilie to air base (and not baize, bass, babe, or bade); the phrase 77zre Miles Island to 77tree Miles Island (and not The or Threw) ; and the phrase ar traffic controller to air traffic controller (and not are, arc,
  • the invention also can be used to restore accents (such as a, a, e, .%) or diacritic marks (such as n, c, %) in languages such as French and Spanish.
  • the current invention corrects the sentence il I 'a releve to il I'a releve (and not releve, relevent, ).
  • the present invention is a system
  • misspelled words i.e. , an apparatus, a method and/or computer-executable process steps
  • the system detects a misspelled word in the input text, and determines a list of alternative words for the misspelled word. The list of alternative words is then ranked based on a context of the input text.
  • the present invention is a word processing system for creating and editing text documents.
  • the word processing system inputs text into a text document, spell-checks the text so as to replace misspelled words in the text with correctly-spelled words, and outputs the document.
  • the spell-checking performed by the system comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words.
  • the present invention is a machine translation system for translating text from a first language into a second language.
  • the machine translation system inputs text in the first language, spell-checks the text in the first language so as to replace misspelled words in the text with correctly-spelled words, translates the text from the first language into the second language, and outputs translated text.
  • the spell-checking performed by the system comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
  • the present invention is a machine translation system for translating text from a first language into a second language.
  • the machine translation system inputs text in the first language, translates the text from the first language into the second language, spell-checks the text in the second language so as to replace misspelled words in the text with correctly-spelled words, and outputs the text.
  • the spell-checking performed by the system comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
  • the present invention is an optical character recognition system for recognizing input character images.
  • the optical character recognition system inputs a document image, parses character images from the document image, performs recognition processing on parsed character images so as to produce document text, spell-checks the document text so as to replace misspelled words in the document text with correctly-spelled words, and outputs the document text.
  • the spell-checking performed by the system comprises detecting misspelled words in the document text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document text with the selected one of the alternative words.
  • the present invention is a system for retrieving text from a source.
  • the system inputs a search word, corrects a spelling of the search word to produce a corrected search word, and retrieves text from the source that includes the corrected search word.
  • the present invention is a system for retrieving text from a source.
  • the system inputs a search phrase comprised of a plurality of words, at least one of the plurality of words being an incorrect word, and replaces the incorrect word in the search phrase with a corrected word in order to produce a corrected search phrase. Text is then retrieved from the source based on the corrected search phrase.
  • the present invention is a system for correcting misspelled words in input text sequences received from a plurality of different clients.
  • the system stores, in a memory on a server, a lexicon comprised of a plurality of reference words, and receives the input text sequences from the plurality of different clients.
  • the system then spell-checks the input text sequences using the reference words in the lexicon, and outputs spell-checked text sequences to the plurality of different clients.
  • the present invention is a system for selecting a replacement word for an input word in a phrase.
  • the system determines alternative words for the input word, the alternative words including at least one compound word which is comprised of two or more separate words, each alternative word having a rank associated therewith.
  • the system selects, as the replacement word, an alternative word having a highest rank.
  • the present invention is a system for correcting grammatical errors in input text.
  • the system generates a first finite state machine (“FSM") for the input text, the first finite state machine including alternative words for at least one word in the input text and a rank associated with each alternative word, and adjusts the ranks in the first FSM in accordance with one or more of a plurality of predetermined grammatical rules.
  • the system determines which of the alternative words is grammatically correct based on the ranks associated with the alternative words, and replaces the at least one word in the input text with a grammatically-correct alternative word determined in the determining step.
  • the present invention is a word processing system for creating and editing text documents.
  • the word processing system inputs text into a text document, checks the document for grammatically-incorrect words, replaces grammatically-incorrect words in the document with grammatically-correct words, and outputs the document.
  • the checking performed by the system comprises (i) generating a finite state machine ("FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
  • FSM finite state machine
  • the present invention is a machine translation system for translating text from a first language into a second language.
  • the machine translation system inputs text in the first language, checks the text in the first language for grammatically-incorrect words, and replaces grammatically-incorrect words in the text with grammatically-correct words.
  • the machine translation system then translates the text with the grammatically-correct words from the first language into the second language, and outputs the text in the second language.
  • the checking performed by the machine translation system comprises (i) generating a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
  • FSM finite state machine
  • the present invention is a machine translation system for translating text from a first language into a second language.
  • the machine translation system inputs text in the first language, translates the text from the first language into the second language, checks the text in the second language for grammatically- incorrect words, replaces grammatically-incorrect words in the text with grammatically-correct words, and outputs the text with the grammatically- correct words.
  • the checking performed by the system comprises (i) generating a finite state machine ("FSM") for the text in the second language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
  • FSM finite state machine
  • the present invention is an optical character recognition system for recognizing input character images.
  • the optical character recognition system inputs a document image, parses character images from the document image, performs recognition processing on parsed character images so as to produce document text, checks the document text for grammatically-incorrect words, replaces grammatically-incorrect words in the document text with grammatically correct words, and outputs the document text.
  • the checking performed by the system comprises (i) generating a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
  • FSM finite state machine
  • the present invention is a system for retrieving text from a source.
  • the system inputs a search phrase comprised of a plurality of words, at least one of the plurality of words being a grammatically-incorrect word, replaces the grammatically- incorrect word in the search phrase with a grammatically-correct word in order to produce a corrected search phrase, and retrieves text from the source based on the corrected search phrase.
  • the present invention is a system of spell-checking input text.
  • the system detects a misspelled word in the input text, stores one or more lexicon finite state machines ("FSM") in a memory, each of the lexicon FSMs including plural reference words, generates an input FSM for the misspelled word, selects one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word, and outputs selected ones of the one or more reference words.
  • FSM lexicon finite state machines
  • Figure 1 shows representative computer-hardware on which the spelling and grammar checking system of the present invention may be executed.
  • Figure 2 shows the internal construction of the hardware shown in Figure 1.
  • Figure 3 depicts operation of the spelling and grammar checking system of the present invention in a manual mode.
  • Figure 4 depicts operation of the spelling and grammar checking system of the present invention in an automatic mode.
  • Figure 5 depicts operation of a spelling suggestion module used in the spelling and grammar checking system of the present invention.
  • Figure 6 depicts an input finite state transducer ("FST") generated by the spelling suggestion module depicted in Figure 6.
  • FST finite state transducer
  • Figure 7 shows another example of an FST generated by the spelling suggestion module depicted in Figure 6.
  • Figure 8 shows an example of a lexicon FST used in the spelling suggestion module depicted in Figure 6.
  • Figure 9 shows an example of a spelling FST used in the spelling suggestion module depicted in Figure 6.
  • Figure 10 illustrates an FST generated by an automaton conversion module used in the spelling and grammar checking code shown in Figures 3 and 4.
  • FIG 11 shows another example of an FST generated by the automaton conversion module used in the spelling and grammar checking code shown in Figures 3 and 4.
  • Figure 12 shows process steps used by the automaton conversion module to generate FSTs.
  • Figure 13 shows process steps executed by a contextual ranking module in the spelling and grammar checking code to generate a ranked list of alternative words for a misspelled word.
  • Figure 14 shows an FST which includes a compound word which is used by the contextual ranking module to generate the ranked list.
  • Figure 15 shows an FST stored in a morphological dictionary which is used by the contextual ranking module to generate the ranked list.
  • Figure 16 shows an FST generated by a morphology module in the contextual ranking module.
  • Figure 17 shows operation of a grammar application module included in the contextual ranking module.
  • Figure 18 shows an FST generated by the grammar application module in the contextual ranking module.
  • Figure 19 shows an FST generated by a morphological deletion module of the present invention.
  • Figure 20 shows process steps for a word processing system which includes the spelling and grammar checking system of the present invention.
  • Figure 21 shows process steps for a machine translation system which includes the spelling and grammar checking system of the present invention.
  • Figure 22 shows process steps for an optical character recognition system which includes the spelling and grammar checking system of the present invention.
  • Figure 23 shows process steps for a text indexing and retrieving system which includes the spelling and grammar checking system of the present invention.
  • Figure 24 shows a client-server architecture which implements the present invention.
  • Figure 25 shows a text indexing and retrieving system implemented using the architecture shown in Figure 24.
  • FIG. 1 shows a representative embodiment of a computer system on which the present invention may be implemented.
  • PC 4 includes network connection 9 for interfacing to a network, such as a local area network ("LAN”) or the World Wide Web (hereinafter "WWW”), and fax/modem connection 10 for interfacing with other remote sources.
  • PC 4 also includes display screen 11 for displaying information to a user, keyboard 12 for inputting text and user commands, mouse 14 for positioning a cursor on display screen 11 and for inputting user commands, disk drive 16 for reading from and writing to floppy disks installed therein, and CD-ROM drive 17 for accessing information stored on CD-ROM.
  • PC 4 may also have one or more peripheral devices attached thereto, such as scanner 13 for inputting document text images, graphics images, or the like, and printer 19 for outputting images, text, or the like.
  • FIG. 2 shows the internal structure of PC 4.
  • PC 4 includes memory 20, which comprises a computer-readable medium such as a computer hard disk.
  • Memory 20 stores data 21, applications 22, print driver 24, and an operating system 26.
  • operating system 26 is a windowing operating system, such as Microsoft ® Windows95; although the invention may be used with other operating systems as well.
  • word processing programs 41 such as WordPerfect ® and Microsoft ® Word '97
  • Internet access program 42 i.e. , a web browser
  • Netscape ® which includes one or more search engines, such as Infoseek, Lycos, Yahoo! , Excite, AOL NetFind, HotBot, LookSmart, Snap!
  • Processor 38 preferably comprises a microprocessor or the like for executing applications, such those noted above, out of RAM 37.
  • applications including spelling and grammar checking code 49 of the present invention, may be stored in memory 20 (as noted above) or, alternatively, on a floppy disk in disk drive 16 or a CD-ROM in CD-ROM drive 17.
  • processor 38 accesses applications (or other data) stored on a floppy disk via disk drive interface 32 and accesses applications (or other data) stored on a CD-ROM via CD-ROM drive interface 34.
  • PC 4 Application execution and other tasks of PC 4 may be initiated using keyboard 12 or mouse 14, commands from which are transmitted to processor 38 via keyboard interface 30 and mouse interface 31, respectively.
  • Output results from applications running on PC 4 may be processed by display interface 29 and then displayed to a user on display
  • display interface 29 preferably comprises a display processor for forming images based on data provided by processor 38 over computer bus 36, and for outputting those images to display 11.
  • Output results from applications, such spelling and grammar checking code 49, running on PC 4 may also be provided to printer 19 via printer interface
  • processor 38 also executes print driver 24 so as to perform appropriate formatting of the output results prior to their transmission to printer 19.
  • this code is comprised of computer-executable process steps for, among other things, detecting a misspelled word in input text, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context of the input text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words.
  • the present invention is operable in an interactive mode, in which the selecting step is performed manually (i.e. , a user selects an alternative word from the list), or in an automatic mode, in which the selecting step is performed automatically (i.e. , without user intervention) based on predetermined criteria.
  • Interactive Mode Figure 3 depicts operation of spelling and grammar checking code 49 in the interactive mode, and the various modules (i.e. , computer- executable process steps) included therein.
  • text 50 is input into the spelling and grammar checking system.
  • step 51 a misspelled word in the text is detected by a spell-checking module (not shown).
  • step 51 detects misspelled words by comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database.
  • step 51 also checks for proper placement of accent marks and/or diacritic marks in the input word. In cases where these marks are improperly placed, step 51 characterizes the word as misspelled.
  • misspelled word is passed to spelling suggestion module 52.
  • Spelling suggestion module 52 suggests "out-of- context" corrections for the misspelled word. That is, spelling suggestion module 52 determines a list of correctly-spelled alternative (or
  • spelling suggestion module 52 determines this list of alternative words by inserting, deleting, replacing, and/or transposing characters in the misspelled word until correctly-spelled alternative words are obtained. Spelling suggestion module 52 also identifies portions (e.g. , characters) of the misspelled word which sound substantially similar to portions of correctly-spelled alternative words in order to obtain additional correctly-spelled alternatives words. Once all alternative words have been determined, spelling suggestion module 52 ranks these words in a list based, e.g. , on a number of typographical and/or phonetic modifications that must be made to the misspelled word in order to arrive at each alternative word.
  • Automaton conversion module 55 converts text 50 and list 54 into an input finite state machine (hereinafter "FSM"), such as a finite state transducer (hereinafter “FST”) or a finite state automaton (hereinafter "FSM”), having a plurality of arcs.
  • FSM finite state machine
  • FST finite state transducer
  • FSM finite state automaton
  • Each arc in the input FSM includes an alternative word and a rank (e.g. , a weight, a probability, etc.) associated with each alternative word. This rank corresponds to a likelihood that the alternative word, taken out of context, comprises a correctly-spelled version of the original misspelled word.
  • FSMs have a finite number of states with arcs between the states. These arcs have one input and one or more outputs.
  • an FST functions as a particular method for mapping inputs to outputs.
  • the present invention uses FSTs with weights, such as the those described in Pereira et al. "Weighted Rational Transductions and Their Application to
  • automaton conversion module 55 also identifies predetermined words in the input text which are commonly confused, but which are correctly spelled. Examples of such word are principal and principle and who and whom. Specifically, in these embodiments of the invention, automaton conversion module 55 identifies such words by reference to a pre- stored database, and then either adds such words to the FSM or creates a new FSM specifically for these words.
  • these commonly-confused words may be identified by spelling suggestion module 52, characterized as misspelled words by virtue of their identification, and then processed in the same manner as misspelled words.
  • the output of the automaton conversion module 55 is the same, i.e. , an FSM containing arcs with alternative words for a misspelled word.
  • Automaton conversion module 55 then transmits input FSM
  • Contextual ranking module 57 ranks alternative words in input FSM 56 by taking into account the context (e.g. , grammar, parts-of-speech, etc.) of text 50.
  • contextual ranking module 57 generates a second FSM for text 50 and the alternative words in accordance with one or more of a plurality of predetermined grammatical rules.
  • This second FSM is comprised of a plurality of arcs which include the alternative words and ranks (e.g. , weights) associated therewith, where a rank of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly-spelled version of the misspelled word.
  • Contextual ranking module 57 then combines corresponding ranks of input FSM 56 and the second FSM (e.g. , contextual ranking module 57 adds weights from respective FSMs) so as to generate an "in-context" ranking of the alternative words. Then, contextual ranking module 57 outputs a list 59 of alternative words for the misspelled word, which are ranked according to context. A more detailed description of the operation of contextual ranking module 57 is provided below.
  • Ranked list 59 of alternative words which was generated by contextual ranking module 57, is then displayed to a user, e.g., on display screen 11.
  • the user can then manually select (using, e.g. , mouse 14, keyboard 12, etc.) one of the alternative words from ranked list 59.
  • the selected one of the alternative words i.e. , selected alternative 61
  • replacement module 62 replaces the misspelled word in text 50 with user-selected alternative word 61, and then outputs corrected text 63.
  • FIG. 4 shows the operation of the automatic mode of the present invention. More specifically, figure 4 depicts operation of spelling and grammar checking code 49 in the automatic mode, and the various modules (i.e. , computer-executable process steps) included therein. Those modules which are identical to modules described above with respect to the interactive mode are described only briefly.
  • Spell checking module 64 is identical to that described above in the interactive mode, except that, in this mode, spell checking module 64 searches through all of text 50 in order to detect all misspelled words. Which mode (i.e., interactive or automatic) spell checking module 64 operates in is set beforehand, e.g., in response to a user input. Once all misspelled words have been detected, spell checking module 64 outputs text 66 with the incorrectly- spelled words appropriately identified.
  • spelling suggestion module 52 determines and outputs a list of correctly-spelled alternative (or "replacement") words for every misspelled word in text 50, rather than for just one misspelled word.
  • Which mode (i.e. , interactive or automatic) spelling suggestion module 52 operates in is set beforehand, e.g. , in response to a user input.
  • spelling suggestion module 52 outputs a list of "out-of-context" alternative words to automaton conversion module 55.
  • Automaton conversion module 55 is identical to that described above, except that, in this mode, automaton conversion module 55 generates an FSM 56 (see above) for each misspelled word in input text 50. These FSMs are then transmitted to contextual ranking module 57.
  • Contextual ranking module 57 is identical to that described above, in that it generates a second FSM for input text 50 based on a plurality of predetermined grammatical rules and combines this second FSM with FSM 56 generated by automaton conversion module 55 in order to provide a contextually- ranked list 59 of the alternatives for each misspelled word in text 50.
  • list 59 is provided from contextual ranking module 57 to best suggestion selection module 60.
  • Best suggestion selection module 60 selects the "best" alternative for each misspelled word, replaces each misspelled word in the text with its corresponding best alternative, and outputs corrected text 61, which includes these best alternatives in place of the misspelled words.
  • best suggestion selection module 60 selects each best alternative based on list 59 without any user intervention. For example, best suggestion module 60 may select the first, or highest, ranked alternative word in list 59, and then use that word to correct the input text.
  • spelling suggestion module 52 determines and outputs alternative words for a misspelled word in input text 50.
  • these alternative words are not ranked according to context, but rather are ranked based on the number of typographical changes that must be made to the misspelled word to arrive at an alterative word.
  • spelling suggestion module 52 is comprised of computer-executable process steps to store one or more lexicon FSTs (in general, FSMs), where each of the lexicon FSTs includes plural reference words and a phonetic representation of each reference word, and to generate an input FST (in general, an FSM) for a misspelled word, where the input FST includes the misspelled word and a phonetic representation of the misspelled word.
  • FSMs lexicon FSTs
  • Spelling suggestion module 52 also includes computer-executable process steps to select one or more reference words from the lexicon FSTs based on the input FST, where the one or more reference words substantially corresponds to either a spelling of the misspelled word or to the phonetic representation of the misspelled word.
  • Figure 5 shows process steps comprising spelling suggestion module 52, together with sub-modules included therein.
  • word 70 is input from a spell-checking module (see. e.g. , Figure 4).
  • Pronunciation conversion module 73 then converts input word 70 into input FST 71. The details of the operation of pronunciation conversion module 73 are provided below.
  • Input FST 71 represents the spelling and pronunciation of input word 70. More specifically, each arc of input FST 71 includes a pair of characters dp, where c is a character in input word 70 and p is a phonetic symbol representing the pronunciation of character c.
  • Figure 6 shows such an input FST for the word asthmatic (with its pronunciation azmatic).
  • Figure 7 shows an example of another input FST, this time for the misspelled word cati (with its pronunciation c@ti).
  • the phonetic symbol "-" shown in Figure 6 is used to represent a character which is not pronounced.
  • the present invention mostly employs standard characters to illustrate pronunciation, the invention is not limited to using such characters. In fact, any convention can be adopted.
  • Lexicon FST 74 is preferably stored in a single memory, and comprises one or more lexicon FSTs (or FSMs, in general) which have been generated by the process steps of the present invention.
  • Each of these lexicon FSTs includes plural reference words (e.g. , English-language words, French-language words, German-language words, etc.) and a phonetic representation of each reference word.
  • An example of a lexicon FST is shown in Figure 8. This FST represents the following word/pronunciation pairs: cacti/k@ktA, caws/kc-s, face/fes-, fire/fAr-, and foci/fosA.
  • Spelling FSA 76 comprises an additional FSM which has been generated by the process steps of the present invention.
  • spelling FSA 78 includes a plurality of states, the states comprising at least states of lexicon FST 74 and states of input FST 71. Spelling FSA 76 is used to select one or more reference words from lexicon FST 74. These selected reference words comprise the alternative words for output by spelling suggestion module 52.
  • each state of spelling FSA 76 is identified by a quadruple (i, I, t, cost) , in which the first element i is a state in input FST 71 and records which portion of input word 70 has been already processed; the second element / is a state in lexicon FST 74 which records words that are potential alternatives for the input word; the third element t indicates whether a character transposition has occurred in the input word (e.g.
  • the fourth element cost is the cost associated with a current suggested alternative to input word 70, meaning an indication of the likelihood that the current suggested alternative is actually the correct spelling of input word 70.
  • the lower the cost of a state in spelling FSA 76 the more likely that state represents a path to the correct spelling of input word 70.
  • Figure 9 shows a representative embodiment of spelling FSA 76.
  • the arcs of spelling FSA 76 are labeled with characters which represent suggested alternatives for input word 70.
  • spelling suggestion module 52 includes state selection module 77.
  • State selection module 77 selects which states of spelling FSA 76 are to be processed. For example, state selection module 77 may select states having lowest costs, so as to assure that potentially best solutions are processed first. Other embodiments of the present invention, of course, may use a different strategy.
  • state selection module 77 has selected a state (i, I, t, cost) to be processed, this state is provided as input to each of following modules: character identity module 80, phonetic identity module 81, character insertion module 82, character deletion module 83, character replacement module 84, character transposition module 85, and character transposition completion module 86.
  • Each of these modules process the current state (i, I, t, cost) 78 of spelling FSA 76 (as set by state selection unit 77), and may also add new states to spelling FSA 76.
  • character identity module 80 determines whether characters of a reference word in lexicon FST 74 match characters of word
  • Phonetic identity module 81 determines whether characters of the reference word are pronounced the same as characters of the input word.
  • Character insertion module 82 determines whether a character inserted in the input word causes at least part of the input word to match at least part of the reference word.
  • Character deletion module 83 determines whether a character deleted from the input word causes at least part of the input word to match at least part of the reference word.
  • Character replacement module 84 replaces characters in the input word with characters in the reference word in order to determine whether at least part of the input word matches at least part of the reference word.
  • Character transposition module 85 changes the order of two or more characters in the input word and compares a changed character in the input word to a corresponding character in the reference word. Finally, character transposition completion module 86 compares characters in the input word which were not compared by character transposition module 85 in order to determine if at least part of the input word matches at least part of the reference word.
  • character identity module 80 checks whether there is a word in lexicon FST 74 which starts at state 1 and which has a next character that is the same as the next character in input FST 71 at state i. Given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state / in lexicon FST 74 going to a state ' and labeled with pair c/p (where c is a character and p is a pronunciation of that character), and for all outgoing arcs from state i in input FST 71 going to state ' and labeled with the pair d ' (where c is a character and p ' is a pronunciation of the character) , character identity module 80 creates an arc in spelling FSA 76 from state (i,l,t,cost) to a newly-added state (i',V,0,cost), and labels that arc with character c.
  • Phonetic identity module 81 checks whether there is a word in lexicon FST 74 starting at state / whose next character is pronounced the same as the next character in input FST 71 at state i. For this processing, the phonetic representations of characters are processed.
  • phonetic identify module 81 creates an arc in spelling FSA 76 from state (i,l,t,cost) to a newly-added state
  • This newly-added state has its cost increased by a predetermined cost, called phonetic dentity _cost , which has a pre-set value that is associated with the fact that the pronunciation of a current character in input FST 71 is identical to the pronunciation of the current character in lexicon FST 74 even though the characters are different.
  • Character insertion module 82 inserts a character from lexicon FST 74 into input word 70 in input FST 71. More specifically, given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state / in lexicon FST 74 going to a state /' and labeled with the pair dp (where c is a character and p is a pronunciation of that character), character insertion module 82 creates an arc in spelling FSA 76 from state (i,l,t,cost) to state (i, l',0 insertion _cost), and labels that arc with character c.
  • a current spelling FSA state of (i,l,t,cost) for all outgoing arcs from state / in lexicon FST 74 going to a state /' and labeled with the pair dp (where c is a character and p is a pronunciation of that character)
  • character insertion module 82 creates an arc in spelling FSA
  • This newly-added state has its cost increased by a predetermined cost, called insertion _cost, which has a pre-set value that is associated with the fact that a character has been inserted into word 70 in input FST 71.
  • Character deletion module 83 deletes a character from input word 70 in input FST 71.
  • character deletion module 83 creates an arc in spelling FSA 76, which is labeled with "empty character” ⁇ from state (i,l,t,cost) to a newly added state (V, 1,0, cost + deletion _cost).
  • This newly added state has a cost that is increased by a predetermined cost, called deletion _cost, which has a pre-set value that is associated with the fact that a character has been deleted from input word 70 in input FST 71.
  • Character replacement module 84 replaces a next character in input word 70 with a next character in lexicon FST 74. More specifically, given a current spelling FSA state of (i, I, t, cost) , for all outgoing arcs from state / in lexicon FST 74 going to a state /' and labeled with the pair dp (where c is a character and /?
  • character replacement module 84 creates an arc in spelling FSA 76 to a newly added state (i ', I ',0, cost + replacement _cost), and labels that arc with character c from state (i, I, t, cost) .
  • This newly-added state has its cost increased by a predetermined cost, called replacement _cost, that has a pre-set value and that is associated with the fact that a character has been replaced by another character in input word 70.
  • Character transposition module 85 interchanges the order of two consecutive characters in input word 70, and checks the validity of the next character while remembering the original order of the characters. More specifically, given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state in input FST 71 going to a state il and labeled with the pair cl/pl (where cl is a character and pi is a pronunciation of that character), for all outgoing arcs from state il in input FST 71 going to a state 12 and labeled with the pair c2/p2 (where c2 is a character and p2 is a pronunciation of that character), and for all outgoing arcs in lexicon FST 74 going from state / to state / ' labeled with the pair c2/p ' (where c2 is a character and /?
  • character transposition module 85 creates an arc in spelling FSA 76 from state (i,l,t,cost) to a newly-added state (i2,V , cl ,cost+transposition_cost), and labels that arc with character c2.
  • This newly-added state has its cost increased by a predetermined cost, called transposition _cost, which has a value that is pre-set and that is associated with the fact that two characters have been transposed in input word 70.
  • Character transposition completion module 86 completes the transposition of two characters that was started by character transposition module 85. More specifically, given a current spelling FSA state (i,l,t,cost), where t is not zero (indicating that character transposition has occurred), for all outgoing arcs in lexicon FST 74 going from state / to state /' labeled with the pair tip ' (where t is a character and p is a pronunciation of that character), character transposition completion module
  • transposition _completion_cost a predetermined cost
  • spelling FSA 76 has no cost.
  • state selection module 77 selects the following additional states
  • character identity module 80 was used.
  • input FST 71 remains at state 2, while lexicon FST 74 moves from state 3 to state 4, thereby creating state 90 in spelling FSA 76, which has a state of (2,4,0, 1).
  • an additional character namely a c
  • character insertion module 82 was used.
  • a cost of 1 is added to state 90 of spelling FSA 76.
  • input FST 71 moves from state 2 (0 to state 3 ( '), and lexicon FST 74 moves from state 4 (/) to state 5 (/'), thereby creating state 91 in spelling FSA 76, which has a state of (3,5,0, 1) or (i',l',0,cost) and an arc with the character t.
  • character identity module
  • spelling suggestion module 52 and the rest of the invention for that matter, is described with respect to a word in an input text sequence comprised of plural words, the spell-checking aspect of the invention can be used equally well with a single-word input.
  • grammar checking aspects of the invention would not apply in this instance. Accordingly, those modules shown in Figures 2 and 3 which deal solely with grammar checking would simply be skipped when checking a single- word input.
  • Path enumeration module 104 analyzes the spelling FSA in order to associate words therein with appropriate costs, and outputs list 105 of suggested alternative words with their associated costs (e.g. , weight). Thereafter, processing ends.
  • pronunciation conversion module 73 converts input word 70 into input FST 71.
  • pronunciation conversion module 73 converts any word, whether correctly spelled or misspelled, into an input FST which includes a phonetic representation of the input word, together with the input word.
  • Figure 6 shows an input FST for the word asthmatic with its pronunciation azmatic.
  • Pronunciation conversion module 73 utilizes a pre-stored phonetic dictionary of words, in which a pronunciation of each character of a word is associated with a phonetic symbol which represents the pronunciation of that character in the context of a word. In order to associate to each character of an input word with a pronunciation, pronunciation conversion module 73 reads the input word from left to right and finds the longest context in the phonetic dictionary which matches the input word. Pronunciation conversion module 73 then transcribes that longest match with phonetic characters until no characters in the input word are left unpronounced. The output is represented as an FST (see, e.g. , Figure 6), in which each arc is labeled with a pair dp. Automaton Conversion Module
  • automaton conversion module 55 is comprised of computer-executable process steps to generate an FSM for input text 50, which includes a plurality of arcs.
  • Each of these arcs includes an alternative word provided by spelling suggestion module 52 and a corresponding rank (e.g. , weight) of that word.
  • a rank e.L , a weight
  • the ranks may be derived from the cost provided by spelling suggestion module 52.
  • automaton conversion module 55 generates an FST; although an FSM may be used in the present invention as well.
  • FST comprises a finite-number of states, with arcs between the states.
  • Each arc is labeled with a pair of symbols.
  • the first symbol in each pair is an alternative word to the misspelled word found in text 50.
  • the second symbol of each pair is a number representing a rank for that word.
  • these rankings are determined based on the number of character transpositions, deletions, additions, etc. that must be performed on the misspelled word in order to arrive at each alternative word.
  • Figure 10 illustrates an FST generated by automaton conversion module 55 for the input text he thre a ball.
  • the word thre is misspelled (as determined by the spell-checking module).
  • spelling suggestion module 52 provides the following alternative words to automaton conversion module 55: then, there, the, thew and three.
  • the number and identity of these alternative words may vary depending upon the exact implementation of spelling suggestion module 52.
  • the alternative words are limited to those shown above.
  • ranks associated with the alternative words are negative, and correspond to a number of typographical changes that were made to the original word thre to arrive at each alternative word. For example, then has an associated weight of -2 because then can be obtained from thre by deleting the letter r and then inserting the letter n from thre.
  • Figure 11 shows another example of an FST generated by automaton conversion module 55.
  • the example shown in Figure 11 the
  • the FST is generated for the text He left the air corpus.
  • the incorrectly spelled word is apele
  • the "out-of-context" alternative words provided by spelling suggestion module 52 are baize, bass, baba, base, bade.
  • the second symbol of each arc in the FST comprises a ranking, in this case a weight, for the alternative word on that arc. The higher the weight, the more likely the alternative word associated with that weight is the correct replacement word for the misspelled word.
  • suggested alternative words have negative weights which reflect the number of typographical and phonetic changes were made to the original misspelled word.
  • FIG 12 shows computer-executable process steps in automaton conversion module 55 for generating such an FST. More specifically, in step S1201, text 50 is input into automaton conversion module 55, together with alternative words from spelling suggestion module 52. In step S1202, variables are initialized. Specifically, in this example, word number i is set to 1 so that, initially, the FST has a single state labeled . Also, the variable n is set to the number of words in the input text.
  • step S1203 determines whether the i* input word in the text is misspelled and, in preferred embodiments of the invention, if the i th word is one of a plurality of predetermined words that are commonly confused. This aspect of automaton conversion module 55 is described in more detail below.
  • step SI 204 If step S1203 determines that the i ⁇ input word is misspelled, step SI 204 generates a new state labeled i+1 for each of the alternative words provided by spelling suggestion module 52. Step S1204 also adds a transition from state i to state i + 1. This transition is labeled with an alternative word and with a ranking (e.g. , a negative weight). If, on the other hand, step S1203 determines that the i th input word is not misspelled, step S1205 creates a new state i+1 and a transition from state i to state i+1. This transition is labeled with the i th word and has a weight of zero.
  • a ranking e.g. , a negative weight
  • step SI 206 current state is increased by one, and processing proceeds to step S1207. If step S1207 determines that a current state i is less than the number of words n, meaning that there are words in the input text still to be processed, flow returns to step S1203. If i equals n, processing ends, and the FST generated by steps S1201 to S1207 is output in step S1208.
  • automaton conversion module 55 may characterize words which are correctly spelled, but which are commonly confused, as misspelled words.
  • Appendix B shows a short lists of such words.
  • this list is merely representative, and, in the actual invention, the list is much more extensive.
  • This list is preferably stored in a database, e.g. , in memory 20, and can be updated or modified via, e.g. , fax/modem line 10. Alternatively, this list may be accessed from a remote location via network connection 9.
  • automaton conversion module 55 identifies words which are often misused or confused based on such a list, and treats these words in the same manner as misspelled words provided by spelling suggestion module 52. That is, such words are included in arcs in the FST generated by automaton conversion module 55.
  • contextual ranking module 57 includes computer executable process steps to generate a second FST for the input text and the alternative words in accordance with one or more of a plurality of predetermined grammatical rules (with the first FSM being FST 56 described above) .
  • the second FST has a plurality of arcs therein which include the alternative words and ranks (e.g. , weights) associated therewith.
  • a weight of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly-spelled version of the misspelled word.
  • Contextual ranking module 57 also includes computer-executable process steps to add corresponding weights of the first FST and the second FSM, to rank the alternative words in accordance with the added weights, and to output a list of the alternative words ranked according to context.
  • Figure 13 shows computer-executable process steps in contextual ranking unit 57, together with executable modules included therein.
  • FST 56 is input.
  • FST 56 was generated by automaton conversion module 55, and includes alternative words (e.g. , misspelled words, commonly-confused words, etc.) ranked out of context.
  • alternative words e.g. , misspelled words, commonly-confused words, etc.
  • Figure 11 an example of such an FST is shown in Figure 11 for the input text he left the airphage.
  • FST 56 is provided to compound words and lexical phrases module 110.
  • Compound word and lexical phrases module 110 identifies words which may comprise part of a predetermined list of compound words (i.e. , a word comprised of two separate words), and also adds these words as arcs in FST 56.
  • the word stimulators is not necessarily misspelled, but is incorrect in context. That is, the typist meant to type flight simulators, but accidentally included an extra t in simulators.
  • Compound words and lexical phrases module 110 compares the word stimulators to a pre-stored database of compound words. In a case that an input word, in this case stimulators, is similar to a word in a compound word (as measured, e.g.
  • compound words and lexical phrases module 110 includes the compound word as an alternative word in an arc of FST 56, together with a single rank associated with the compound word.
  • a database of compound words is preferably pre-stored, e.g. , in memory 20.
  • each of the compounds words in the database is associated with a part-of-speech that defines a syntactic behavior of the compound word in a sentence.
  • a noun- noun compound such as air base may be stored in the database and defined therein as a noun ("N").
  • each compound word or phrase has a single part-of-speech (e.g. , part-of-speech tag "N", "Adv” , etc.) associated therewith.
  • part-of-speech e.g. , part-of-speech tag "N", "Adv” , etc.
  • these words and phrases exhibit very little morphological or syntactic variation. For example, according to exhibits no morphological or syntactic variation.
  • air base can be pluralized (air bases), but little else.
  • Appendix C shows a list of representative compound words and phrases, together with their associated parts-of-speech, that are included in the database that is used by compound words and lexical phrase module 110.
  • compound words and lexical phrases module 110 also adds, to FST 56, a part-of-speech tag for each compound word or phrase.
  • compound words and lexical phrases module 110 also adds a relatively large weight to arcs containing potential compound words, reflecting the fact a word may, more likely than not, be a compound word.
  • compound words and lexical phrases module 57 produces the
  • FST shown in Figure 14 That is, compound words and lexical phrases module 110 adds a new arc labeled "air base#NOUN/9" from state 3 to state 5 in Figure 14. As shown in the figure, this arc passes over both the word air and the five alternative words (baize, bass, babe, base, and bade). This new arc treats "air base” as if it were one word acting as a noun with relatively high weight of 9.
  • FST 111 output by compound words and lexical phrase module 110 is provided to morphology module 112.
  • Morphology module 112 adds all possible morphological analyses of each word to FST 111.
  • This morphological analysis is performed using a pre-stored morphological dictionary of words.
  • this morphological dictionary is represented as a collection of small FSTs, each representing a possible morphological analysis of each word. Weights in such FSTs correspond to a relative likelihood that a word is a particular part-of-speech. For example, for the word left, FST 114 shown in Figure 15 is stored in the morphological dictionary.
  • each path of the FST has a length of length three, with a first element being the initial word (in this case left) with a corresponding weight, the second element being a part-of-speech tag with a corresponding weight, and the third element being a root form of the initial word with a corresponding weight.
  • FST 114 shown in Figure 15 indicates that left can be an adjective ("ADJ") having a base form of left and a weight of 5, a noun (“N”) having a base form of left and a weight of 1, a verb in past participle form (“Vpp”) having a base form of leave and a weight of 4, or a verb in past tense form having a base form of leave and a weight of 3.
  • ADJ an adjective
  • N noun
  • Vpp verb in past participle form
  • Vpp verb in past tense form having a base form of leave and a weight of 3
  • a weight of a particular path through an FST is computed as the sum of the weights of each of the arcs in the FST.
  • Morphology module 112 replaces every arc in the FST which does not represent a compound word or a lexical phrase with an FST from the morphological dictionary.
  • an arc is replaced by three arcs, where a first arc includes the compound word or lexical phrase, the second arc includes the part-of-speech of the compound word or lexical phrase, and the third arc also includes the compound word or lexical phrase.
  • morphology module 112 outputs FST 116 shown in Figure 16.
  • grammar application module 117 comprises computer-executable process steps to receive a first FST 116 (or, in general, an FSM) from morphology module 112, where the first FST includes alternative words for at least one word in the input text and a weight (or, in general, a rank) associated with each alternative word. Grammar application module 117 then executes process steps to adjust the ranks in the first FST in accordance with one or more of a plurality of predetermined grammatical rules.
  • grammar application module 117 does this by generating a second FST (or, in general, an FSM) for the input text based on the predetermined grammatical rules, where the second FST includes the alternative words and ranks associated with each alternative word. The ranks in the second FST are then combined with the ranks in the first FST in order to generate a "contextual" FST in which weights of words therein are adjusted according to grammar.
  • Figure 17 depicts operation of grammar application module 117.
  • grammar application module 117 includes weight application module 119.
  • Weight application module 119 inputs FST 116 which was generated by morphology unit 112, together with grammar FST 120 (described below) which includes corresponding weights.
  • grammar FST 120 comprises general grammatical structures of a language, such as French, English, Spanish, etc., together with predetermined phrases in that language.
  • Grammar FST 120 has substantially the same format as parts of input FST 116. Every path in grammar FST 116 has a length which is a multiple of three.
  • Each arc therein includes three elements, with a first element comprising a reference word with a corresponding weight, a second element comprising a part-of-speech tag with a corresponding weight, and a third element comprising a root form of the reference word with a corresponding weight.
  • Weights application module 119 combines (e.g. , adds) weights of input FST 116 and grammar FST 120 in order to produce a combined FST 121 in which weights therein are adjusted according to grammatical rules. More specifically, for each path from an initial state to a final state of grammar FST 120, weights application module finds a corresponding path in input FST 116. Thereafter, weights application module 119 replaces weights of input FST 116 with the combined weights of input FST 116 and grammar FST 120. By doing this, weights application module 119 reinforces paths in input FST 116 which are also found in grammar FST 120.
  • grammar FST 120 might include a path which indicates that a singular noun precedes a verb in the third person. Such a path can be used to reinforce portions of input FST 116 where a noun precedes a verb in third person.
  • Figure 18 is an example of FST 121 which was produced by grammar application module 117 from the FST shown in Figure 16.
  • the weights on the path 125 corresponding to he left, where he is analyzed as a pronoun and left is analyzed as a verb in past tense, have been increased by weights application module 119.
  • the weight for this path has been increased since it matches the subject-verb agreement rule, which indicates that a pronoun can be the subject of a verb. This and other rules are described in more detail below.
  • Grammar FST 120 (see Figure 17) is constructed from contextual grammatical rules, examples of which are set forth in Appendix A.
  • application rules indicate which rules must be applied, whereas definition rules define the rules themselves. Taking application rules first, application rules comprise items which do not contain an "equals" sign. For example, the application rule "*NP/0" indicates that a noun phrase rule (i.e. , a rule stating that all nouns must be preceded by determiners, such as a, an, this, etc.) must be applied with a weight of 0.
  • the weight of 0 means that, in the event that words in input FST 116 comply with this rule, a value of 0 is added to the weight of the matching words in input FST 116.
  • a "*" before an item in a rule indicates that the item is defined elsewhere by a definition rule. When there is no "*" before an item, the item refers to a word which can be specified with the word itself, its root form, and its part of speech. For example, the application rule
  • weights 10 and 20 are added to weights of the matching words in input FST 116.
  • WORD is a word
  • POS a part- of-speech
  • NUMBER is a weight
  • the root form is not specified and matches any root form.
  • WORD;ROOT POS/NUMBER
  • WORD is a word
  • ROOT its root form
  • POS its part-of-speech
  • NUMBER is a weight
  • NUMBER a weight; in this item, the word and its root form are not specified and match any word and root form.
  • Definition rules include an "equal" sign.
  • the left side of the equal sign includes an item of the form "*SYMBOL"; and the right side of the equal sign includes any sequence of items. For example,
  • *NP3S indicates that a noun phrase in the third person singular is formed by an adjective (*ADJP/0) and a noun (:N/10).
  • a noun in such words is incremented by 10 (from the 10 in " :N/10") and the adjective is not incremented (from the 0 in "*ADJP/0").
  • the grammatical rules are non- recursive, meaning that at no point does a symbol refer to itself.
  • the rules can be combined into a grammar FST for comparison with input FST 116.
  • items with a "*" preceding them are recursively replaced by their definitions.
  • the grammatical rules are converted into an FST by concatenating an FST of each obtained item.
  • Application rules are then used to define paths from an initial state to a final state in the constructed FST.
  • the present invention also includes specific grammatical constructions in the grammar FST.
  • general grammatical rules such as subject-verb agreement rules
  • the present invention also includes specific grammatical constructions in the grammar FST.
  • Grammar FST 120 also includes auxiliary verb groups ("*VG"), examples of which are also shown in Appendix A.
  • FST 121 generated by grammar application module 117 is output to morphology deletion module 130.
  • Morphology deletion module 130 deletes unnecessary morphological information from the FST, such as part-of-speech information. Morphology deletion module 130 also reorganizes weights in the FST so that the weights correspond to possible alternatives to a misspelled word.
  • An example of such an FST is shown in Figure 19, in which only words and weights remain. As shown in Figure 19, base 132 has a weight of 14, since morphology deletion module 130 moved the weight of the compound "air base” to "base” (see Figure 18).
  • FST 134 having words and weights only, is then output from morphology deletion module 130 to best path enumeration module 135.
  • Best path enumeration module 135 sums the weights of each path of FST 134, and outputs a ranked list 136 of alternative words that can be used to replace a misspelled word or a grammatically-incorrect word in the input text.
  • this list of alternative words may contain words having an accent mark and/or a diacritic which is different from, and/or missing from, the original word.
  • this ranked list ranks the alternative words according to which have the highest weights. Of course, in a case that weights are not used, or different types of weights are used, the ranking can be performed differently.
  • the spelling and grammar checking system of the present invention may be used in conjunction with a variety of different types of applications. Examples of such uses of the invention are provided in more detail below.
  • Spelling and grammar checking code 49 of the present invention may be used in the context of a word processing application, such as those described above.
  • Figure 20 is a flow diagram depicting computer- executable process steps which are used in such a word processing application.
  • step S2001 inputs text into a text document.
  • step S2002 spell-checks the text so as to replace misspelled words in the text with correctly-spelled words.
  • step S2002 is performed in accordance with Figures 3 or 4 described above, and comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words.
  • step S2003 checks the document for grammatically-incorrect words.
  • step S2003 checks the document by (i) generating a finite state machine (“FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
  • step S2004 replaces grammatically-incorrect words in the document with grammatically-correct word
  • step S2005 outputs the document with little or no grammatical and/or spelling errors Machine Translation
  • Spelling and grammar checking code 49 of the present invention may be used in the context of a machine translation system which translates documents from one language to another language, such as those described above.
  • Figure 21 is a flow diagram depicting computer- executable process steps which are used in such a machine translation system.
  • step S2101 inputs text in a first language
  • step S2102 spell-checks the text in the first language so as to replace misspelled words in the text with correctly-spelled words.
  • this spell-checking step is performed in accordance with Figures 3 or 4 described above, and comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
  • step S2103 checks the text in the first language for grammatically-incorrect words.
  • Step S2103 does this by (i) generating a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words. Grammatically-incorrect words in the text are then replaced with grammatically-correct words in step S2104.
  • FSM finite state machine
  • step S2105 translates the text from the first language into the second language, and step S2106 spell-checks the text in the second language so as to replace misspelled words in the text with correctly-spelled words.
  • step S2106 spell checks the text in the same manner as did step S2102. Accordingly, a detailed description of this process is not repeated.
  • step S2107 checks the text in the second language for grammatically-incorrect words in the same manner that step S2103 checked the text in the first language. Accordingly, a detailed description of this process is not repeated.
  • Step S2108 then replaces grammatically-incorrect words in the text with grammatically-correct words, and step S2109 outputs the text with little or no grammatical and/or spelling errors.
  • Spelling and grammar checking code 49 of the present invention may be used in the context of an optical character recognition system which recognizes input character images.
  • Figure 22 is a flow diagram depicting computer-executable process steps which are used in such an optical character recognition system.
  • step S2201 inputs a document image, e.g. , via scanner 13, and step S2202 parses character images from the document image. Thereafter, step S2203 performs character recognition processing on parsed character images so as to produce document text. Step S2204 then spell-checks the document text so as to replace misspelled words in the document text with correctly-spelled words.
  • This spell checking is performed in accordance with Figures 3 or 4 described above, and comprises detecting misspelled words in the document text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document text with the selected one of the alternative words.
  • step S2205 checks the document text for grammatically-incorrect words. This checking is performed in accordance with Figures 3 or 4 described above, and comprises (i) generating a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks associated for the alternative words. Thereafter, step S2206 replaces grammatically-incorrect words in the document text with grammatically correct words, and step S2207 outputs the document text with little or no grammatical and/or spelling errors.
  • FSM finite state machine
  • Spelling and grammar checking code 49 of the present invention may be used in the context of a text indexing and retrieval system for retrieving text from a source based on an input search word.
  • Examples of such text indexing and retrieving systems in which the present invention may be used include, but are not limited to, Internet search engines, document retrieval software, etc.
  • Figure 23 is a flow diagram depicting computer-executable process steps which are used in such a text indexing and retrieval system.
  • step S2301 comprises inputting a search word or a search phrase comprised of plural search words
  • step S2302 comprises correcting a spelling of each search word to produce corrected search word(s).
  • step S2303 replaces grammatically-incorrect words in the search phrase with a grammatically-correct word in order to produce a corrected search phrase.
  • steps S2302 and S2303 are preferably performed by spelling and grammar checking code 49 shown in Figures 3 or 4.
  • Step S2304 then retrieves text from a source (e.g. , a pre-stored database or a remote location such as a URL on the WWW) that includes the corrected search word/phrase, and step S2305 displays the retrieved text on local display, such as display screen 11.
  • a source e.g. , a pre-stored database or a remote location such as a URL on the WWW
  • the spelling and grammar checking system of the present invention may also be utilized in a plurality of different hardware contexts.
  • the invention may be utilized in a client-server context.
  • a single computer such as PC 4
  • processor 38 is multi-tasking.
  • this aspect of the invention comprises computer- executable process steps to correct misspelled words in input text sequences received from a plurality of different clients.
  • the process steps include code to store in a memory on a server (e.g. , PC 4 shown in Figure 1 or a stand-alone server), a lexicon comprised of a plurality of reference words, code to receive the input text sequences from the plurality of different clients (e.g. , over fax/modem line 10, network interface 9, etc.), code to spell-check the input text sequences using the reference words in the lexicon, and code to output spell-checked text sequences to the plurality of different clients.
  • a server e.g. , PC 4 shown in Figure 1 or a stand-alone server
  • a lexicon comprised of a plurality of reference words
  • code to receive the input text sequences from the plurality of different clients e.g. , over fax/modem line 10, network interface 9, etc.
  • the lexicon comprises one or more lexicon FSTs (in general, FSMs), stored in a single memory, where the lexicon FSTs include the plurality of reference words and a phonetic representation each reference word.
  • the spell-checking code comprises a code to correct misspelled words in each of the input text sequences substantially in parallel using the lexicon FSTs stored in the single memory. This code corresponds to that described above in Figures 3 and 4.
  • Figure 24 shows representative architecture of the client- server multi-threaded spelling correction system of the present invention. As shown in Figure 24, lexicon memory 150 (which stores lexicon FSTs of the type described above) is shared across each program thread 151, 152 and 153 of the client-server spelling correction system. In this regard, each program thread comprises a substantially complete copy of spelling and grammar checking code 49.
  • Each of program threads 151 to 153 contains a corresponding memory (i.e. , memories 154, 155 and 156) that is used by processor 38 to execute that thread, as well as to perform other processing in relation thereto.
  • Each spelling memory also stores an FSA generated by spelling suggestion module 52 (see Figure 5), and may also store additional programs and variables.
  • Lexicon memory 150 is identical to a memory used to store the lexicon FSTs described with respect to Figure 5, but, unlike that in Figure 5, is being shared by plural program threads on the server.
  • multiple text sequences (TEXT1 160, TEXT2 161...TEXTn 162) from a plurality of different clients are input to the server from remote sources, such as a LAN, the Internet, a modem, or the like, and are processed by respective program threads. Specifically, each program thread identifies misspelled words in the text, and, using lexicon memory 150, outputs corrected text, as shown in Figure 24.
  • the operation of the spelling and grammar checking code used in this aspect of the invention is identical to that described above, with the only difference being memory allocation.
  • Figure 25 shows the multi-threaded client-server spelling correction system described above used in a text indexing and retrieval context (e.g. , in conjunction with a WWW search engine, database searching software, etc.).
  • a text indexing and retrieval context e.g. , in conjunction with a WWW search engine, database searching software, etc.
  • textual queries are sent to a database, and information related to the textual queries is retrieved from the database.
  • queries are misspelled and, as a result, correct information cannot be retrieved from the database.
  • the system shown in Figure 25 addresses this problem.
  • # relief to relief,N/10 to,Prep/40 ft give birth to #give+V/10 birth,N/10 to,Prep/40 give;:V/10 birth,N/10 to,Pre ⁇ /40 gave;give:Vpt/10 birth,N/10 to,Prep/40 given;give:Vpp/10 birth,N/10 to,Prep/40 gives;give:V3sg/10 birth,N/10 to,Prep/40
  • ⁇ VG-Complex ⁇ VG-BD/O
  • ⁇ VG-D is,V3sg/10 :Vpp/10
  • ⁇ VG-D are;be:xx/10 :Vpp/10
  • ⁇ VG-AC ⁇ MODAL/10 be,V/10 :Ving/10 # AD tt may be examined
  • ⁇ VG-AD ⁇ MODAL/10 be,V/10 :Vpp/10
  • VG-not-Complex ⁇ VG-not-AB/0
  • VG-not-Complex ⁇ VG-not-AC/0
  • VG-not-Complex ⁇ VG-not-CD/0
  • VG-not-Complex ⁇ VG-not-ABC/0
  • ⁇ VG-not-C is,V3sg/10 not,Adv/10 :Ving/10
  • ⁇ VG-not-C are;be:xx/10 not,Adv/10 :Ving/10 tt VG-not-D TO FD
  • VG-not-ACD ⁇ MODALnt 10 be,V/10 being, Ving:/ 10 :Vpp/10 # BCD tt has been being examined
  • ⁇ NP ⁇ NP-SS/O
  • ⁇ NP3S ⁇ DETSING/0 ⁇ ADJP/0 :N/10

Abstract

A system of correcting misspelled words in input text detects a misspelled word in the input text, determines a list of alternative words for the misspelled word, and ranks the list of alternative words based on a context of the input text. The system then selects one of the alternative words from the list, and replaces the misspelled word in the text with the selected one of the alternative words.

Description

SPELLING AND GRAMMAR CHECKING SYSTEM
BACKGROUND OF THE INVENTION
Field Of The Invention
The present invention relates generally to a spelling and grammar checking system, and more particularly to a spelling and grammar checking system which corrects misspelled words, incorrectly -used words, and contextual and grammatical errors. The invention has particular utility in connection with machine translation systems, word processing systems, and text indexing and retrieval systems such as World Wide Web search engines.
Description Of The Related Art
Conventional spelling correction systems, such as those found in most common word processing applications, check whether each word in a document is found in a dictionary database. When a word is not found in the dictionary, the word is flagged as being incorrectly spelled. Suggestions for replacing the incorrectly-spelled word with its correctly- spelled counterpart are then determined by inserting, deleting and/or transposing characters in the misspelled word. For example, in a sentence like My son thre a ball at me, the word thre is not correctly-spelled.
Conventional spelling correction systems, such as those described in U.S. Patent No. 4,580,241 (Kucera) and U.S. Patent No. 4,730,269 (Kucera), suggest words such as threw, three, there and the, as possible alternatives for the misspelled word by adding and deleting characters at different locations in the misspelled word. These alternative words are then displayed to a user, who must then select one of the alternatives.
One of the drawbacks of conventional systems is that they lack the ability to suggest alternative words based on the context in which the misspelled word appears. For example, in the following three sentences, the word thre appears in different contexts and, therefore, should be corrected differently in each sentence.
My son thre a ball through the window. He broke thre window.
He moved thre years ago.
More specifically, in the first sentence, the incorrectly-spelled word thre should be replaced by threw. In the second sentence, the word thre should be replaced by the. In the third sentence, the word thre should be replaced by three. In spite of these differences in context, conventional spelling correction systems suggest the same list of alternative words, ranked in the same order, for all three of the foregoing sentences. For example, the spelling correction program provided in Microsoft® Word '97 suggests the following words, in the following order, for all three of the foregoing sentences: three, there, the, throe, threw.
Since conventional spelling correction systems do not rank alternative words according to context, such systems are not able to correct spelling mistakes automatically, since to do so often leads to an inordinate number of incorrectly corrected words. Rather, such systems typically use an interactive approach to correcting misspelled words. While such an approach can be effective, it is inefficient, and oftentimes very slow, particularly when large documents are involved. Accordingly, there exists a need for a spell checking system which is capable of ranking alternative words according to context, and which is also capable of automatically correcting misspelled words without significant user intervention.
Conventional spelling correction systems are also unable to correct grammatical errors in a document or other input text, particularly if those words are spelled correctly but are misused in context. By way of example, although the word too is misused in the sentence He would like too go home, conventional spelling correction systems would not change too to to, since too is correctly spelled. In this regard, grammar checking systems are available which correct improperly used words (see, e.g. , U.S. Patent No. 4,674,065 (Lange), U.S. Patent No. 5,258,909 (Damerau), U.S. Patent No. 5,537,317 (Schabes), U.S. Patent No. 4,672,571 (Bass), and U.S. Patent No. 4,847,766 (McRae))! Such systems, however, are of limited use, since they are only capable of correcting relatively short lists of predefined words. More importantly, such systems are not capable of performing grammar corrections on words that have been misspelled.
Accordingly, there exists a need for a spelling and grammar checking system which is capable of correcting words that have misused in a given context in cases where the words have been spelled incorrectly and in cases where the words have been spelled correctly.
SUMMARY OF THE INVENTION
The present invention addresses the foregoing needs by providing a system which corrects both the spelling and grammar of words using finite state machines, such as finite state transducers and finite state automata. For each word in a text sequence, the present invention provides a list of alternative words ranked according to a context of the text sequence, and then uses this list to correct words in the text (either interactively or automatically). The invention has a variety of uses, and is of particular use in the fields of word processing, machine translation, text indexing and retrieval, and optical character recognition, to name a few.
In brief, the present invention determines alternatives for misspelled words, and ranks these alternatives based on a context in which the misspelled word occurs. For example, for the sentence My son thre a ball through the window, the present invention suggests the word threw as the best correction for the word thre, whereas for the sentence He broke thre window, the present invention suggests the word the as the best correction for the word thre. In its interactive mode, the invention displays alternative word suggestions to a user and then corrects misspelled words in response to a user's selection of an alternative word. In contrast, in its automatic mode, the present invention determines, on its own, which of the alternatives should be used, and then implements any necessary corrections automatically (i.e. , without user input).
Advantageously, the invention also addresses incorrect word usage in the same manner that it addresses misspelled words. Thus, the invention can be used to correct improper use of commonly-confused words such as who and whom, homophones such as then and than, and other such words that are spelled correctly, but that are improper in context. For example, the invention will correct the sentence He thre the ball to the sentence He threw the ball (and not three, the, ...); the sentence fragment flight smulator to flight simulator (and not stimulator); the sentence fragment air baze to air base (and not baize, bass, babe, or bade); the phrase 77zre Miles Island to 77tree Miles Island (and not The or Threw) ; and the phrase ar traffic controller to air traffic controller (and not are, arc,
...). The invention also can be used to restore accents (such as a, a, e, ....) or diacritic marks (such as n, c, ...) in languages such as French and Spanish. For example, the current invention corrects the sentence il I 'a releve to il I'a releve (and not releve, relevent, ...). According to one aspect, the present invention is a system
(i.e. , an apparatus, a method and/or computer-executable process steps) for correcting misspelled words in input text. The system detects a misspelled word in the input text, and determines a list of alternative words for the misspelled word. The list of alternative words is then ranked based on a context of the input text.
According to another aspect, the present invention is a word processing system for creating and editing text documents. The word processing system inputs text into a text document, spell-checks the text so as to replace misspelled words in the text with correctly-spelled words, and outputs the document. The spell-checking performed by the system comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words.
According to another aspect, the present invention is a machine translation system for translating text from a first language into a second language. The machine translation system inputs text in the first language, spell-checks the text in the first language so as to replace misspelled words in the text with correctly-spelled words, translates the text from the first language into the second language, and outputs translated text. The spell-checking performed by the system comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words. According to another aspect, the present invention is a machine translation system for translating text from a first language into a second language. The machine translation system inputs text in the first language, translates the text from the first language into the second language, spell-checks the text in the second language so as to replace misspelled words in the text with correctly-spelled words, and outputs the text. The spell-checking performed by the system comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
According to another aspect, the present invention is an optical character recognition system for recognizing input character images. The optical character recognition system inputs a document image, parses character images from the document image, performs recognition processing on parsed character images so as to produce document text, spell-checks the document text so as to replace misspelled words in the document text with correctly-spelled words, and outputs the document text. The spell-checking performed by the system comprises detecting misspelled words in the document text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document text with the selected one of the alternative words.
According to another aspect, the present invention is a system for retrieving text from a source. The system inputs a search word, corrects a spelling of the search word to produce a corrected search word, and retrieves text from the source that includes the corrected search word.
According to another aspect, the present invention is a system for retrieving text from a source. The system inputs a search phrase comprised of a plurality of words, at least one of the plurality of words being an incorrect word, and replaces the incorrect word in the search phrase with a corrected word in order to produce a corrected search phrase. Text is then retrieved from the source based on the corrected search phrase.
According to another aspect, the present invention is a system for correcting misspelled words in input text sequences received from a plurality of different clients. The system stores, in a memory on a server, a lexicon comprised of a plurality of reference words, and receives the input text sequences from the plurality of different clients. The system then spell-checks the input text sequences using the reference words in the lexicon, and outputs spell-checked text sequences to the plurality of different clients.
According to another aspect, the present invention is a system for selecting a replacement word for an input word in a phrase. The system determines alternative words for the input word, the alternative words including at least one compound word which is comprised of two or more separate words, each alternative word having a rank associated therewith. The system then selects, as the replacement word, an alternative word having a highest rank.
According to another aspect, the present invention is a system for correcting grammatical errors in input text. The system generates a first finite state machine ("FSM") for the input text, the first finite state machine including alternative words for at least one word in the input text and a rank associated with each alternative word, and adjusts the ranks in the first FSM in accordance with one or more of a plurality of predetermined grammatical rules. The system then determines which of the alternative words is grammatically correct based on the ranks associated with the alternative words, and replaces the at least one word in the input text with a grammatically-correct alternative word determined in the determining step.
According to another aspect, the present invention is a word processing system for creating and editing text documents. The word processing system inputs text into a text document, checks the document for grammatically-incorrect words, replaces grammatically-incorrect words in the document with grammatically-correct words, and outputs the document. The checking performed by the system comprises (i) generating a finite state machine ("FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
According to another aspect, the present invention is a machine translation system for translating text from a first language into a second language. The machine translation system inputs text in the first language, checks the text in the first language for grammatically-incorrect words, and replaces grammatically-incorrect words in the text with grammatically-correct words. The machine translation system then translates the text with the grammatically-correct words from the first language into the second language, and outputs the text in the second language. The checking performed by the machine translation system comprises (i) generating a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
According to another aspect, the present invention is a machine translation system for translating text from a first language into a second language. The machine translation system inputs text in the first language, translates the text from the first language into the second language, checks the text in the second language for grammatically- incorrect words, replaces grammatically-incorrect words in the text with grammatically-correct words, and outputs the text with the grammatically- correct words. The checking performed by the system comprises (i) generating a finite state machine ("FSM") for the text in the second language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
According to another aspect, the present invention is an optical character recognition system for recognizing input character images. The optical character recognition system inputs a document image, parses character images from the document image, performs recognition processing on parsed character images so as to produce document text, checks the document text for grammatically-incorrect words, replaces grammatically-incorrect words in the document text with grammatically correct words, and outputs the document text. The checking performed by the system comprises (i) generating a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
According to another aspect, the present invention is a system for retrieving text from a source. The system inputs a search phrase comprised of a plurality of words, at least one of the plurality of words being a grammatically-incorrect word, replaces the grammatically- incorrect word in the search phrase with a grammatically-correct word in order to produce a corrected search phrase, and retrieves text from the source based on the corrected search phrase.
According to another aspect, the present invention is a system of spell-checking input text. The system detects a misspelled word in the input text, stores one or more lexicon finite state machines ("FSM") in a memory, each of the lexicon FSMs including plural reference words, generates an input FSM for the misspelled word, selects one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word, and outputs selected ones of the one or more reference words. This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiments thereof in connection with the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows representative computer-hardware on which the spelling and grammar checking system of the present invention may be executed.
Figure 2 shows the internal construction of the hardware shown in Figure 1.
Figure 3 depicts operation of the spelling and grammar checking system of the present invention in a manual mode.
Figure 4 depicts operation of the spelling and grammar checking system of the present invention in an automatic mode.
Figure 5 depicts operation of a spelling suggestion module used in the spelling and grammar checking system of the present invention. Figure 6 depicts an input finite state transducer ("FST") generated by the spelling suggestion module depicted in Figure 6.
Figure 7 shows another example of an FST generated by the spelling suggestion module depicted in Figure 6.
Figure 8 shows an example of a lexicon FST used in the spelling suggestion module depicted in Figure 6.
Figure 9 shows an example of a spelling FST used in the spelling suggestion module depicted in Figure 6.
Figure 10 illustrates an FST generated by an automaton conversion module used in the spelling and grammar checking code shown in Figures 3 and 4.
Figure 11 shows another example of an FST generated by the automaton conversion module used in the spelling and grammar checking code shown in Figures 3 and 4.
Figure 12 shows process steps used by the automaton conversion module to generate FSTs.
Figure 13 shows process steps executed by a contextual ranking module in the spelling and grammar checking code to generate a ranked list of alternative words for a misspelled word. Figure 14 shows an FST which includes a compound word which is used by the contextual ranking module to generate the ranked list.
Figure 15 shows an FST stored in a morphological dictionary which is used by the contextual ranking module to generate the ranked list. Figure 16 shows an FST generated by a morphology module in the contextual ranking module.
Figure 17 shows operation of a grammar application module included in the contextual ranking module.
Figure 18 shows an FST generated by the grammar application module in the contextual ranking module.
Figure 19 shows an FST generated by a morphological deletion module of the present invention.
Figure 20 shows process steps for a word processing system which includes the spelling and grammar checking system of the present invention.
Figure 21 shows process steps for a machine translation system which includes the spelling and grammar checking system of the present invention.
Figure 22 shows process steps for an optical character recognition system which includes the spelling and grammar checking system of the present invention.
Figure 23 shows process steps for a text indexing and retrieving system which includes the spelling and grammar checking system of the present invention. Figure 24 shows a client-server architecture which implements the present invention.
Figure 25 shows a text indexing and retrieving system implemented using the architecture shown in Figure 24.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Figure 1 shows a representative embodiment of a computer system on which the present invention may be implemented. As shown in Figure 1, PC 4 includes network connection 9 for interfacing to a network, such as a local area network ("LAN") or the World Wide Web (hereinafter "WWW"), and fax/modem connection 10 for interfacing with other remote sources. PC 4 also includes display screen 11 for displaying information to a user, keyboard 12 for inputting text and user commands, mouse 14 for positioning a cursor on display screen 11 and for inputting user commands, disk drive 16 for reading from and writing to floppy disks installed therein, and CD-ROM drive 17 for accessing information stored on CD-ROM. PC 4 may also have one or more peripheral devices attached thereto, such as scanner 13 for inputting document text images, graphics images, or the like, and printer 19 for outputting images, text, or the like.
Figure 2 shows the internal structure of PC 4. As shown in Figure 2, PC 4 includes memory 20, which comprises a computer-readable medium such as a computer hard disk. Memory 20 stores data 21, applications 22, print driver 24, and an operating system 26. In preferred embodiments of the invention, operating system 26 is a windowing operating system, such as Microsoft® Windows95; although the invention may be used with other operating systems as well. Among the applications stored in memory 20 are word processing programs 41, such as WordPerfect® and Microsoft® Word '97; Internet access program 42 (i.e. , a web browser), such as Netscape®, which includes one or more search engines, such as Infoseek, Lycos, Yahoo! , Excite, AOL NetFind, HotBot, LookSmart, Snap! , and WebCrawler; other text indexing and retrieving programs 44, such as such as programs for accessing Lexis®-Nexis® and Westlaw® databases; machine translation system 46, such as Professional by Systran®, which translates words and/or documents retrieved, e.g. , from the WWW, from one language (e.g. , French) to another language (e.g. , English); and optical character recognition ("hereinafter "OCR") system 47 for recognizing characters from scanned-in documents or the like. Other applications may be stored in memory 20 as well. Among these other applications is spelling and grammar checking code 49 which comprises computer-executable process steps for performing contextual spelling and grammatical correction in the manner set forth in detail below.
Also included in PC 4 are display interface 29, keyboard interface 30, mouse interface 31, disk drive interface 32, CD-ROM drive interface 34, computer bus 36, RAM 37, processor 38, and printer interface 40. Processor 38 preferably comprises a microprocessor or the like for executing applications, such those noted above, out of RAM 37. Such applications, including spelling and grammar checking code 49 of the present invention, may be stored in memory 20 (as noted above) or, alternatively, on a floppy disk in disk drive 16 or a CD-ROM in CD-ROM drive 17. In this regard, processor 38 accesses applications (or other data) stored on a floppy disk via disk drive interface 32 and accesses applications (or other data) stored on a CD-ROM via CD-ROM drive interface 34.
Application execution and other tasks of PC 4 may be initiated using keyboard 12 or mouse 14, commands from which are transmitted to processor 38 via keyboard interface 30 and mouse interface 31, respectively. Output results from applications running on PC 4 may be processed by display interface 29 and then displayed to a user on display
11. To this end, display interface 29 preferably comprises a display processor for forming images based on data provided by processor 38 over computer bus 36, and for outputting those images to display 11. Output results from applications, such spelling and grammar checking code 49, running on PC 4 may also be provided to printer 19 via printer interface
40. In this case, processor 38 also executes print driver 24 so as to perform appropriate formatting of the output results prior to their transmission to printer 19.
Turning to spelling and grammar checking code 49, this code is comprised of computer-executable process steps for, among other things, detecting a misspelled word in input text, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context of the input text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words. In preferred embodiments, the present invention is operable in an interactive mode, in which the selecting step is performed manually (i.e. , a user selects an alternative word from the list), or in an automatic mode, in which the selecting step is performed automatically (i.e. , without user intervention) based on predetermined criteria. These modes are described in more detail below.
Interactive Mode Figure 3 depicts operation of spelling and grammar checking code 49 in the interactive mode, and the various modules (i.e. , computer- executable process steps) included therein. To begin, text 50 is input into the spelling and grammar checking system. Next, in step 51, a misspelled word in the text is detected by a spell-checking module (not shown). In preferred embodiments of the invention, step 51 detects misspelled words by comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database. To this end, step 51 also checks for proper placement of accent marks and/or diacritic marks in the input word. In cases where these marks are improperly placed, step 51 characterizes the word as misspelled.
Following step 51, the misspelled word is passed to spelling suggestion module 52. Spelling suggestion module 52 suggests "out-of- context" corrections for the misspelled word. That is, spelling suggestion module 52 determines a list of correctly-spelled alternative (or
"replacement") words for the misspelled word without regard to the context in which the misspelled word appears in input text 50. A detailed description of the operation of spelling suggestion module 52 is provided below. For now, suffice it to say that spelling suggestion module 52 determines this list of alternative words by inserting, deleting, replacing, and/or transposing characters in the misspelled word until correctly-spelled alternative words are obtained. Spelling suggestion module 52 also identifies portions (e.g. , characters) of the misspelled word which sound substantially similar to portions of correctly-spelled alternative words in order to obtain additional correctly-spelled alternatives words. Once all alternative words have been determined, spelling suggestion module 52 ranks these words in a list based, e.g. , on a number of typographical and/or phonetic modifications that must be made to the misspelled word in order to arrive at each alternative word.
List 54 of alternative words, which was output by spelling suggestion module 52, is then passed to automaton conversion module 55, along with original text 50. A detailed description of the operation of automaton conversion module 55 is provided below. For now, suffice it to say that automaton conversion module 55 converts text 50 and list 54 into an input finite state machine (hereinafter "FSM"), such as a finite state transducer (hereinafter "FST") or a finite state automaton (hereinafter "FSM"), having a plurality of arcs. Each arc in the input FSM includes an alternative word and a rank (e.g. , a weight, a probability, etc.) associated with each alternative word. This rank corresponds to a likelihood that the alternative word, taken out of context, comprises a correctly-spelled version of the original misspelled word.
In this regard, the concept of FSTs is described in Roche, Emmanuel, "Text Disambiguation by Finite-State Automata: An Algorithm and Experiments on Corpora", Proceedings of the Conference, Nantes (1992), Roche, Emmanuel and Schabes, Yves, "Introduction to Finite-State Language Processing", Finite-State Language Processing. (1997), Koskenniemi, Kimmo, "Finite-State Parsing and Disambiguation", Proceedings of the Thirteenth International Conference on Computational
Linguistics, Helsinki, Finland (1990), and Koskenniemi et al. "Compiling and using Finite-State Syntactic Rules", Proceedings of the Fifteenth International Conference on Computational Linguistics. (1992). The contents of these articles are hereby incorporated by reference into the subject application as if set forth herein in full. To summarize, FSTs are
FSMs have a finite number of states with arcs between the states. These arcs have one input and one or more outputs. Generally speaking, an FST functions as a particular method for mapping inputs to outputs. The present invention uses FSTs with weights, such as the those described in Pereira et al. "Weighted Rational Transductions and Their Application to
Human Language Processing", ARPA Workshop on Human Language Technology (1994). The contents of this article is hereby incorporated by reference into the subject application as if set forth herein in full. Returning to Figure 3 , in preferred embodiments of the invention, automaton conversion module 55 also identifies predetermined words in the input text which are commonly confused, but which are correctly spelled. Examples of such word are principal and principle and who and whom. Specifically, in these embodiments of the invention, automaton conversion module 55 identifies such words by reference to a pre- stored database, and then either adds such words to the FSM or creates a new FSM specifically for these words. In other embodiments of the invention, these commonly-confused words may be identified by spelling suggestion module 52, characterized as misspelled words by virtue of their identification, and then processed in the same manner as misspelled words. In either case, the output of the automaton conversion module 55 is the same, i.e. , an FSM containing arcs with alternative words for a misspelled word. Automaton conversion module 55 then transmits input FSM
56 (which in preferred embodiments is an FST) to contextual ranking module 57. Contextual ranking module 57 ranks alternative words in input FSM 56 by taking into account the context (e.g. , grammar, parts-of-speech, etc.) of text 50. In brief, contextual ranking module 57 generates a second FSM for text 50 and the alternative words in accordance with one or more of a plurality of predetermined grammatical rules. This second FSM is comprised of a plurality of arcs which include the alternative words and ranks (e.g. , weights) associated therewith, where a rank of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly-spelled version of the misspelled word. Contextual ranking module 57 then combines corresponding ranks of input FSM 56 and the second FSM (e.g. , contextual ranking module 57 adds weights from respective FSMs) so as to generate an "in-context" ranking of the alternative words. Then, contextual ranking module 57 outputs a list 59 of alternative words for the misspelled word, which are ranked according to context. A more detailed description of the operation of contextual ranking module 57 is provided below.
Ranked list 59 of alternative words, which was generated by contextual ranking module 57, is then displayed to a user, e.g., on display screen 11. In step 60, the user can then manually select (using, e.g. , mouse 14, keyboard 12, etc.) one of the alternative words from ranked list 59. Thereafter, the selected one of the alternative words (i.e. , selected alternative 61) is provided to replacement module 62, along with original text 50. Replacement module 62 replaces the misspelled word in text 50 with user-selected alternative word 61, and then outputs corrected text 63.
Automatic Mode Figure 4 shows the operation of the automatic mode of the present invention. More specifically, figure 4 depicts operation of spelling and grammar checking code 49 in the automatic mode, and the various modules (i.e. , computer-executable process steps) included therein. Those modules which are identical to modules described above with respect to the interactive mode are described only briefly.
To begin, text 50 is input to spell checking module 64. Spell checking module 64 is identical to that described above in the interactive mode, except that, in this mode, spell checking module 64 searches through all of text 50 in order to detect all misspelled words. Which mode (i.e., interactive or automatic) spell checking module 64 operates in is set beforehand, e.g., in response to a user input. Once all misspelled words have been detected, spell checking module 64 outputs text 66 with the incorrectly- spelled words appropriately identified.
Next, text 66, i.e. , the text with the incorrectly spelled words identified, is provided to spelling suggestion module 52. Spelling suggestion module 52 is identical to that described above, except that, in this mode, spelling suggestion module 52 determines and outputs a list of correctly-spelled alternative (or "replacement") words for every misspelled word in text 50, rather than for just one misspelled word. Which mode (i.e. , interactive or automatic) spelling suggestion module 52 operates in is set beforehand, e.g. , in response to a user input.
As before, spelling suggestion module 52 outputs a list of "out-of-context" alternative words to automaton conversion module 55. Automaton conversion module 55 is identical to that described above, except that, in this mode, automaton conversion module 55 generates an FSM 56 (see above) for each misspelled word in input text 50. These FSMs are then transmitted to contextual ranking module 57. Contextual ranking module 57 is identical to that described above, in that it generates a second FSM for input text 50 based on a plurality of predetermined grammatical rules and combines this second FSM with FSM 56 generated by automaton conversion module 55 in order to provide a contextually- ranked list 59 of the alternatives for each misspelled word in text 50. Thereafter, list 59 is provided from contextual ranking module 57 to best suggestion selection module 60. Best suggestion selection module 60 selects the "best" alternative for each misspelled word, replaces each misspelled word in the text with its corresponding best alternative, and outputs corrected text 61, which includes these best alternatives in place of the misspelled words. In preferred embodiments of the invention, best suggestion selection module 60 selects each best alternative based on list 59 without any user intervention. For example, best suggestion module 60 may select the first, or highest, ranked alternative word in list 59, and then use that word to correct the input text.
Spelling Suggestion Module
In brief, spelling suggestion module 52 determines and outputs alternative words for a misspelled word in input text 50. In preferred embodiments of the invention, these alternative words are not ranked according to context, but rather are ranked based on the number of typographical changes that must be made to the misspelled word to arrive at an alterative word. To this end, spelling suggestion module 52 is comprised of computer-executable process steps to store one or more lexicon FSTs (in general, FSMs), where each of the lexicon FSTs includes plural reference words and a phonetic representation of each reference word, and to generate an input FST (in general, an FSM) for a misspelled word, where the input FST includes the misspelled word and a phonetic representation of the misspelled word. Spelling suggestion module 52 also includes computer-executable process steps to select one or more reference words from the lexicon FSTs based on the input FST, where the one or more reference words substantially corresponds to either a spelling of the misspelled word or to the phonetic representation of the misspelled word. In more detail, Figure 5 shows process steps comprising spelling suggestion module 52, together with sub-modules included therein. To begin, word 70 is input from a spell-checking module (see. e.g. , Figure 4). Pronunciation conversion module 73 then converts input word 70 into input FST 71. The details of the operation of pronunciation conversion module 73 are provided below.
Input FST 71 represents the spelling and pronunciation of input word 70. More specifically, each arc of input FST 71 includes a pair of characters dp, where c is a character in input word 70 and p is a phonetic symbol representing the pronunciation of character c. Figure 6 shows such an input FST for the word asthmatic (with its pronunciation azmatic). Figure 7 shows an example of another input FST, this time for the misspelled word cati (with its pronunciation c@ti). The phonetic symbol "-" shown in Figure 6 is used to represent a character which is not pronounced. In this regard, although the present invention mostly employs standard characters to illustrate pronunciation, the invention is not limited to using such characters. In fact, any convention can be adopted.
Lexicon FST 74 is preferably stored in a single memory, and comprises one or more lexicon FSTs (or FSMs, in general) which have been generated by the process steps of the present invention. Each of these lexicon FSTs includes plural reference words (e.g. , English-language words, French-language words, German-language words, etc.) and a phonetic representation of each reference word. An example of a lexicon FST is shown in Figure 8. This FST represents the following word/pronunciation pairs: cacti/k@ktA, caws/kc-s, face/fes-, fire/fAr-, and foci/fosA.
Spelling FSA 76 comprises an additional FSM which has been generated by the process steps of the present invention. Specifically, spelling FSA 78 includes a plurality of states, the states comprising at least states of lexicon FST 74 and states of input FST 71. Spelling FSA 76 is used to select one or more reference words from lexicon FST 74. These selected reference words comprise the alternative words for output by spelling suggestion module 52. In more detail, each state of spelling FSA 76 is identified by a quadruple (i, I, t, cost) , in which the first element i is a state in input FST 71 and records which portion of input word 70 has been already processed; the second element / is a state in lexicon FST 74 which records words that are potential alternatives for the input word; the third element t indicates whether a character transposition has occurred in the input word (e.g. , rluer to ruler, in which the / and u have been transposed) and thus whether characters preceding the transposed characters must be re-examined; and the fourth element cost is the cost associated with a current suggested alternative to input word 70, meaning an indication of the likelihood that the current suggested alternative is actually the correct spelling of input word 70. In this regard, in preferred embodiments of the invention, the lower the cost of a state in spelling FSA 76, the more likely that state represents a path to the correct spelling of input word 70.
Figure 9 shows a representative embodiment of spelling FSA 76. As shown in Figure 9, the arcs of spelling FSA 76 are labeled with characters which represent suggested alternatives for input word 70. To begin operation, spelling FSA 76 is initialized to state (i=0,l=0,t=0,cost=0) , which represents the fact that the process starts at the initial state 0 in input FST 71, and at initial state 0 in lexicon FST 74, with no character transpositions (represented by t=0) and a 0 cost.
Thereafter, each state of spelling FSA 76 is processed. Of course, the invention can be modified to process less than all states of spelling FSA 76. To this end, spelling suggestion module 52 includes state selection module 77. State selection module 77 selects which states of spelling FSA 76 are to be processed. For example, state selection module 77 may select states having lowest costs, so as to assure that potentially best solutions are processed first. Other embodiments of the present invention, of course, may use a different strategy. Once state selection module 77 has selected a state (i, I, t, cost) to be processed, this state is provided as input to each of following modules: character identity module 80, phonetic identity module 81, character insertion module 82, character deletion module 83, character replacement module 84, character transposition module 85, and character transposition completion module 86. Each of these modules process the current state (i, I, t, cost) 78 of spelling FSA 76 (as set by state selection unit 77), and may also add new states to spelling FSA 76.
In brief, character identity module 80 determines whether characters of a reference word in lexicon FST 74 match characters of word
70 in input FST 71. Phonetic identity module 81 determines whether characters of the reference word are pronounced the same as characters of the input word. Character insertion module 82 determines whether a character inserted in the input word causes at least part of the input word to match at least part of the reference word. Character deletion module 83 determines whether a character deleted from the input word causes at least part of the input word to match at least part of the reference word. Character replacement module 84 replaces characters in the input word with characters in the reference word in order to determine whether at least part of the input word matches at least part of the reference word.
Character transposition module 85 changes the order of two or more characters in the input word and compares a changed character in the input word to a corresponding character in the reference word. Finally, character transposition completion module 86 compares characters in the input word which were not compared by character transposition module 85 in order to determine if at least part of the input word matches at least part of the reference word.
In more detail, character identity module 80 checks whether there is a word in lexicon FST 74 which starts at state 1 and which has a next character that is the same as the next character in input FST 71 at state i. Given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state / in lexicon FST 74 going to a state ' and labeled with pair c/p (where c is a character and p is a pronunciation of that character), and for all outgoing arcs from state i in input FST 71 going to state ' and labeled with the pair d ' (where c is a character and p ' is a pronunciation of the character) , character identity module 80 creates an arc in spelling FSA 76 from state (i,l,t,cost) to a newly-added state (i',V,0,cost), and labels that arc with character c.
Phonetic identity module 81 checks whether there is a word in lexicon FST 74 starting at state / whose next character is pronounced the same as the next character in input FST 71 at state i. For this processing, the phonetic representations of characters are processed. That is, given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state / in lexicon FST 74 going to a state /' and labeled with the pair dp (where c is a character and p is a pronunciation of that character), and for all outgoing arcs from state i in input FST 71 going to state ' and labeled with the pair c '/p (where c ' is a character and p is a pronunciation of the character), phonetic identify module 81 creates an arc in spelling FSA 76 from state (i,l,t,cost) to a newly-added state
(i',l',0,cost+phonetic_identity_cost), and labels that arc with character c. This newly-added state has its cost increased by a predetermined cost, called phonetic dentity _cost , which has a pre-set value that is associated with the fact that the pronunciation of a current character in input FST 71 is identical to the pronunciation of the current character in lexicon FST 74 even though the characters are different.
Character insertion module 82 inserts a character from lexicon FST 74 into input word 70 in input FST 71. More specifically, given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state / in lexicon FST 74 going to a state /' and labeled with the pair dp (where c is a character and p is a pronunciation of that character), character insertion module 82 creates an arc in spelling FSA 76 from state (i,l,t,cost) to state (i, l',0 insertion _cost), and labels that arc with character c. This newly-added state has its cost increased by a predetermined cost, called insertion _cost, which has a pre-set value that is associated with the fact that a character has been inserted into word 70 in input FST 71. Character deletion module 83 deletes a character from input word 70 in input FST 71. More specifically, given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state i in input FST 71 going to a state i ' and labeled with the pair dp (where c is a character and p is a pronunciation of that character), character deletion module 83 creates an arc in spelling FSA 76, which is labeled with "empty character" ε from state (i,l,t,cost) to a newly added state (V, 1,0, cost + deletion _cost). This newly added state has a cost that is increased by a predetermined cost, called deletion _cost, which has a pre-set value that is associated with the fact that a character has been deleted from input word 70 in input FST 71.
Character replacement module 84 replaces a next character in input word 70 with a next character in lexicon FST 74. More specifically, given a current spelling FSA state of (i, I, t, cost) , for all outgoing arcs from state / in lexicon FST 74 going to a state /' and labeled with the pair dp (where c is a character and /? is a pronunciation of that character), and for all outgoing arcs from state in input FST 71 going to a state ' and labeled with the pair c '/p ' (where c ' is a character and p ' is a pronunciation of that character), character replacement module 84 creates an arc in spelling FSA 76 to a newly added state (i ', I ',0, cost + replacement _cost), and labels that arc with character c from state (i, I, t, cost) . This newly-added state has its cost increased by a predetermined cost, called replacement _cost, that has a pre-set value and that is associated with the fact that a character has been replaced by another character in input word 70.
Character transposition module 85 interchanges the order of two consecutive characters in input word 70, and checks the validity of the next character while remembering the original order of the characters. More specifically, given a current spelling FSA state of (i,l,t,cost), for all outgoing arcs from state in input FST 71 going to a state il and labeled with the pair cl/pl (where cl is a character and pi is a pronunciation of that character), for all outgoing arcs from state il in input FST 71 going to a state 12 and labeled with the pair c2/p2 (where c2 is a character and p2 is a pronunciation of that character), and for all outgoing arcs in lexicon FST 74 going from state / to state / ' labeled with the pair c2/p ' (where c2 is a character and /? ' is a pronunciation of that character), character transposition module 85 creates an arc in spelling FSA 76 from state (i,l,t,cost) to a newly-added state (i2,V , cl ,cost+transposition_cost), and labels that arc with character c2. This newly-added state has its cost increased by a predetermined cost, called transposition _cost, which has a value that is pre-set and that is associated with the fact that two characters have been transposed in input word 70.
Character transposition completion module 86 completes the transposition of two characters that was started by character transposition module 85. More specifically, given a current spelling FSA state (i,l,t,cost), where t is not zero (indicating that character transposition has occurred), for all outgoing arcs in lexicon FST 74 going from state / to state /' labeled with the pair tip ' (where t is a character and p is a pronunciation of that character), character transposition completion module
86 creates an arc in spelling FSA 76 from the state (i, I, t, cost) to a newly- added state (i,l',0,cost+transposition_ completion _cost) , and labels that arc with the character t. This newly- added state has its cost increased by a predetermined cost, called transposition _completion_cost, which has a value that is pre-set and that is associated with the fact that the second of the transposed characters has been read.
The following describes operation of some of the foregoing modules in an actual example. More specifically, with reference to Figures 7, 8 and 9, when input FST 71 (see Figure 7) moves from state 0 (i) to state 1 ( '), and lexicon FST 74 (see Figure 8) moves from state 0 (/) to state 2 (/'), state 88 is created in spelling FSA 76 (see Figure 9), which has a state of (1,2,0,0) or (i ',l',0,cost) and an arc with the character c. In this example, there is no character transposition or cost, since character identity module 80 was used (i.e. , there is a "c" in the arcs of both input FST 71 and lexicon FST 74). Accordingly, at state 88, spelling FSA 76 has no cost. Following this processing (i.e. , if state selection module 77 selects the following additional states), when input FST 71 moves from state 1 ( ) to state 2 ( '), and lexicon FST 74 moves from state 2 (I) to state 3 (/'), state 89 is created in spelling FSA 76, which has a state of (2,3,0,0) or (i',l',0,cost) and an arc with the character a. Again, there is no character transposition or cost, since character identity module 80 was used. Next, input FST 71 remains at state 2, while lexicon FST 74 moves from state 3 to state 4, thereby creating state 90 in spelling FSA 76, which has a state of (2,4,0, 1). In this case, an additional character, namely a c, is added in lexicon FST 74 which is not present in input FST 7, i.e. , character insertion module 82 was used. As a result, a cost of 1 is added to state 90 of spelling FSA 76. Next, input FST 71 moves from state 2 (0 to state 3 ( '), and lexicon FST 74 moves from state 4 (/) to state 5 (/'), thereby creating state 91 in spelling FSA 76, which has a state of (3,5,0, 1) or (i',l',0,cost) and an arc with the character t. In this case, there is no character transposition or additional cost, since character identity module
80 was used. Finally, input FST 71 moves from state 3 (i) to end state 4 ( ') (marked by double circle 93), and lexicon FST 74 moves from state 5 (0 to end state 13 (/') (marked by double circle 94), thereby generating state 95 in spelling FSA 76, which has a state of (4, 13,0, 1) or (i ',l',0,cost) and an arc with the character i. Again, there is no character transposition or additional cost, since character identity module 80 was used.
Similar processing is also performed for the other states shown in lexicon FST 74 to create additional states 97 to 101 with character deletion module 83 being used between states 97 and 99, and with an ε in arcs between those states indicating that a character has been deleted from the word in input FST 71. Once this processing is finished, as shown in Figure 9, the cost of state 101 (i.e. , 4) is higher than the cost of state 95 (i.e. , 1). Accordingly, the word corresponding to the path of state 95 (in this case, cacti) is ranked by spelling suggestion module 52 higher than the word corresponding to the path of state 101 (in this case, caws) .
At this point, it is noted that although spelling suggestion module 52, and the rest of the invention for that matter, is described with respect to a word in an input text sequence comprised of plural words, the spell-checking aspect of the invention can be used equally well with a single-word input. Of course, the grammar checking aspects of the invention would not apply in this instance. Accordingly, those modules shown in Figures 2 and 3 which deal solely with grammar checking would simply be skipped when checking a single- word input.
Once all states of input FST 71 and lexicon FST 74 have been processed in the foregoing manner, as determined in block 103 of Figure 5, the spelling FSA generated by the process is provided to path enumeration module 104. Path enumeration module 104 analyzes the spelling FSA in order to associate words therein with appropriate costs, and outputs list 105 of suggested alternative words with their associated costs (e.g. , weight). Thereafter, processing ends.
Pronunciation Conversion Module
As noted above, pronunciation conversion module 73 converts input word 70 into input FST 71. In general, pronunciation conversion module 73 converts any word, whether correctly spelled or misspelled, into an input FST which includes a phonetic representation of the input word, together with the input word. As noted above, Figure 6 shows an input FST for the word asthmatic with its pronunciation azmatic.
Pronunciation conversion module 73 utilizes a pre-stored phonetic dictionary of words, in which a pronunciation of each character of a word is associated with a phonetic symbol which represents the pronunciation of that character in the context of a word. In order to associate to each character of an input word with a pronunciation, pronunciation conversion module 73 reads the input word from left to right and finds the longest context in the phonetic dictionary which matches the input word. Pronunciation conversion module 73 then transcribes that longest match with phonetic characters until no characters in the input word are left unpronounced. The output is represented as an FST (see, e.g. , Figure 6), in which each arc is labeled with a pair dp. Automaton Conversion Module
Returning to Figure 3 , in brief, automaton conversion module 55 is comprised of computer-executable process steps to generate an FSM for input text 50, which includes a plurality of arcs. Each of these arcs includes an alternative word provided by spelling suggestion module 52 and a corresponding rank (e.g. , weight) of that word. As noted above, a rank (e.L , a weight) of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of a misspelled word. The ranks may be derived from the cost provided by spelling suggestion module 52.
In more detail, in preferred embodiments of the invention, automaton conversion module 55 generates an FST; although an FSM may be used in the present invention as well. For the sake of brevity, however, the invention will be described with respect to an FST. In this regard, such an FST comprises a finite-number of states, with arcs between the states. Each arc is labeled with a pair of symbols. The first symbol in each pair is an alternative word to the misspelled word found in text 50. The second symbol of each pair is a number representing a rank for that word. As noted above, these rankings are determined based on the number of character transpositions, deletions, additions, etc. that must be performed on the misspelled word in order to arrive at each alternative word.
Figure 10 illustrates an FST generated by automaton conversion module 55 for the input text he thre a ball. In this text, the word thre is misspelled (as determined by the spell-checking module).
Accordingly, spelling suggestion module 52 provides the following alternative words to automaton conversion module 55: then, there, the, thew and three. Of course, the number and identity of these alternative words may vary depending upon the exact implementation of spelling suggestion module 52. In this embodiment of the invention, however, the alternative words are limited to those shown above. As shown in Figure 10, ranks associated with the alternative words are negative, and correspond to a number of typographical changes that were made to the original word thre to arrive at each alternative word. For example, then has an associated weight of -2 because then can be obtained from thre by deleting the letter r and then inserting the letter n from thre.
Figure 11 shows another example of an FST generated by automaton conversion module 55. In the example shown in Figure 11, the
FST is generated for the text He left the air baze. In this text, the incorrectly spelled word is baze, and the "out-of-context" alternative words provided by spelling suggestion module 52 are baize, bass, baba, base, bade. As noted above, the second symbol of each arc in the FST comprises a ranking, in this case a weight, for the alternative word on that arc. The higher the weight, the more likely the alternative word associated with that weight is the correct replacement word for the misspelled word. In the examples shown in Figures 10 and 11, suggested alternative words have negative weights which reflect the number of typographical and phonetic changes were made to the original misspelled word. In this regard, as shown in Figure 11, the alternative words baize, babe, base and bade have the same weight, since each of these words differs from the misspelled word base by the same number of typographical changes. Figure 12 shows computer-executable process steps in automaton conversion module 55 for generating such an FST. More specifically, in step S1201, text 50 is input into automaton conversion module 55, together with alternative words from spelling suggestion module 52. In step S1202, variables are initialized. Specifically, in this example, word number i is set to 1 so that, initially, the FST has a single state labeled . Also, the variable n is set to the number of words in the input text. Thereafter, step S1203 determines whether the i* input word in the text is misspelled and, in preferred embodiments of the invention, if the ith word is one of a plurality of predetermined words that are commonly confused. This aspect of automaton conversion module 55 is described in more detail below.
If step S1203 determines that the iΛ input word is misspelled, step SI 204 generates a new state labeled i+1 for each of the alternative words provided by spelling suggestion module 52. Step S1204 also adds a transition from state i to state i + 1. This transition is labeled with an alternative word and with a ranking (e.g. , a negative weight). If, on the other hand, step S1203 determines that the ith input word is not misspelled, step S1205 creates a new state i+1 and a transition from state i to state i+1. This transition is labeled with the ith word and has a weight of zero. Thereafter, in step SI 206, current state is increased by one, and processing proceeds to step S1207. If step S1207 determines that a current state i is less than the number of words n, meaning that there are words in the input text still to be processed, flow returns to step S1203. If i equals n, processing ends, and the FST generated by steps S1201 to S1207 is output in step S1208.
As noted above, in preferred embodiments of the invention, automaton conversion module 55 may characterize words which are correctly spelled, but which are commonly confused, as misspelled words.
This is done in order to flag these words as possible candidates for the grammar correction process which is described in more detail below. Examples of such words include who and whom. That is, these words are often misused, such as in the sentence / need an assistant who I can trust. Similarly, homophones, such s principal and principle are often confused.
Appendix B shows a short lists of such words. Of course, this list is merely representative, and, in the actual invention, the list is much more extensive. This list is preferably stored in a database, e.g. , in memory 20, and can be updated or modified via, e.g. , fax/modem line 10. Alternatively, this list may be accessed from a remote location via network connection 9. Thus, automaton conversion module 55 identifies words which are often misused or confused based on such a list, and treats these words in the same manner as misspelled words provided by spelling suggestion module 52. That is, such words are included in arcs in the FST generated by automaton conversion module 55.
Contextual Ranking Module
Returning to Figure 3, in brief, contextual ranking module 57 includes computer executable process steps to generate a second FST for the input text and the alternative words in accordance with one or more of a plurality of predetermined grammatical rules (with the first FSM being FST 56 described above) . The second FST has a plurality of arcs therein which include the alternative words and ranks (e.g. , weights) associated therewith. In this second FST, a weight of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly-spelled version of the misspelled word. Contextual ranking module 57 also includes computer-executable process steps to add corresponding weights of the first FST and the second FSM, to rank the alternative words in accordance with the added weights, and to output a list of the alternative words ranked according to context.
In more detail, Figure 13 shows computer-executable process steps in contextual ranking unit 57, together with executable modules included therein. To begin, FST 56 is input. As noted above, FST 56 was generated by automaton conversion module 55, and includes alternative words (e.g. , misspelled words, commonly-confused words, etc.) ranked out of context. As also noted above, an example of such an FST is shown in Figure 11 for the input text he left the air baze. As shown in Figure 13, FST 56 is provided to compound words and lexical phrases module 110.
Compound Words And Lexical Phrases Module
Compound word and lexical phrases module 110 identifies words which may comprise part of a predetermined list of compound words (i.e. , a word comprised of two separate words), and also adds these words as arcs in FST 56. By way of example, in the sentence Pilots practice with flight stimulators, the word stimulators is not necessarily misspelled, but is incorrect in context. That is, the typist meant to type flight simulators, but accidentally included an extra t in simulators. Compound words and lexical phrases module 110 compares the word stimulators to a pre-stored database of compound words. In a case that an input word, in this case stimulators, is similar to a word in a compound word (as measured, e.g. , by a number of typographical changes between the input word and a word in a compound word, in this case simulators), compound words and lexical phrases module 110 includes the compound word as an alternative word in an arc of FST 56, together with a single rank associated with the compound word. In the present invention, a database of compound words is preferably pre-stored, e.g. , in memory 20. In preferred embodiments of the invention, each of the compounds words in the database is associated with a part-of-speech that defines a syntactic behavior of the compound word in a sentence. For example, a noun- noun compound, such as air base may be stored in the database and defined therein as a noun ("N").
Another example of a compound word is commercial passenger flight , which is defined in the database as a noun ("N"). Similarly, the phrase according to will be defined in the database as a preposition ("Prep"). As borne out in the examples provided above, in the database, each compound word or phrase has a single part-of-speech (e.g. , part-of-speech tag "N", "Adv" , etc.) associated therewith. Moreover, these words and phrases exhibit very little morphological or syntactic variation. For example, according to exhibits no morphological or syntactic variation. Similarly air base can be pluralized (air bases), but little else. Appendix C shows a list of representative compound words and phrases, together with their associated parts-of-speech, that are included in the database that is used by compound words and lexical phrase module 110.
In preferred embodiments of the invention, compound words and lexical phrases module 110 also adds, to FST 56, a part-of-speech tag for each compound word or phrase. In addition, compound words and lexical phrases module 110 also adds a relatively large weight to arcs containing potential compound words, reflecting the fact a word may, more likely than not, be a compound word. For the example FST shown in Figure 11 , compound words and lexical phrases module 57 produces the
FST shown in Figure 14. That is, compound words and lexical phrases module 110 adds a new arc labeled "air base#NOUN/9" from state 3 to state 5 in Figure 14. As shown in the figure, this arc passes over both the word air and the five alternative words (baize, bass, babe, base, and bade). This new arc treats "air base" as if it were one word acting as a noun with relatively high weight of 9. Returning to Figure 13, FST 111 output by compound words and lexical phrase module 110 is provided to morphology module 112.
Morphology Module
Morphology module 112 adds all possible morphological analyses of each word to FST 111. This morphological analysis is performed using a pre-stored morphological dictionary of words. In preferred embodiments of the invention, this morphological dictionary is represented as a collection of small FSTs, each representing a possible morphological analysis of each word. Weights in such FSTs correspond to a relative likelihood that a word is a particular part-of-speech. For example, for the word left, FST 114 shown in Figure 15 is stored in the morphological dictionary. As shown in Figure 15, each path of the FST has a length of length three, with a first element being the initial word (in this case left) with a corresponding weight, the second element being a part-of-speech tag with a corresponding weight, and the third element being a root form of the initial word with a corresponding weight. Thus, FST 114 shown in Figure 15 indicates that left can be an adjective ("ADJ") having a base form of left and a weight of 5, a noun ("N") having a base form of left and a weight of 1, a verb in past participle form ("Vpp") having a base form of leave and a weight of 4, or a verb in past tense form having a base form of leave and a weight of 3.
In the present invention, a weight of a particular path through an FST is computed as the sum of the weights of each of the arcs in the FST. For example, in the FST shown in Figure 15, the path from states 1 to 2 to 4 to 5, in which the word left is a verb in past participle form of the base verb leave, has a weight of 4 (i.e. , 0+4+0 = 4).
Morphology module 112 replaces every arc in the FST which does not represent a compound word or a lexical phrase with an FST from the morphological dictionary. In addition, for each arc corresponding to a compound word or lexical phrase, such an arc is replaced by three arcs, where a first arc includes the compound word or lexical phrase, the second arc includes the part-of-speech of the compound word or lexical phrase, and the third arc also includes the compound word or lexical phrase. Thus, given as input the FST shown in Figure 14, morphology module 112 outputs FST 116 shown in Figure 16.
Grammar Application Module
Returning to Figure 13, FST 116 produced by morphology module 112 is provided to grammar application module 117. In brief, grammar application module 117 comprises computer-executable process steps to receive a first FST 116 (or, in general, an FSM) from morphology module 112, where the first FST includes alternative words for at least one word in the input text and a weight (or, in general, a rank) associated with each alternative word. Grammar application module 117 then executes process steps to adjust the ranks in the first FST in accordance with one or more of a plurality of predetermined grammatical rules. Specifically, grammar application module 117 does this by generating a second FST (or, in general, an FSM) for the input text based on the predetermined grammatical rules, where the second FST includes the alternative words and ranks associated with each alternative word. The ranks in the second FST are then combined with the ranks in the first FST in order to generate a "contextual" FST in which weights of words therein are adjusted according to grammar. In more detail, Figure 17 depicts operation of grammar application module 117. As shown in Figure 17, grammar application module 117 includes weight application module 119. Weight application module 119 inputs FST 116 which was generated by morphology unit 112, together with grammar FST 120 (described below) which includes corresponding weights. In this regard, grammar FST 120 comprises general grammatical structures of a language, such as French, English, Spanish, etc., together with predetermined phrases in that language. Grammar FST 120 has substantially the same format as parts of input FST 116. Every path in grammar FST 116 has a length which is a multiple of three. Each arc therein includes three elements, with a first element comprising a reference word with a corresponding weight, a second element comprising a part-of-speech tag with a corresponding weight, and a third element comprising a root form of the reference word with a corresponding weight. A detail description of the construction of grammar FST 120 is provided below.
Weights application module 119 combines (e.g. , adds) weights of input FST 116 and grammar FST 120 in order to produce a combined FST 121 in which weights therein are adjusted according to grammatical rules. More specifically, for each path from an initial state to a final state of grammar FST 120, weights application module finds a corresponding path in input FST 116. Thereafter, weights application module 119 replaces weights of input FST 116 with the combined weights of input FST 116 and grammar FST 120. By doing this, weights application module 119 reinforces paths in input FST 116 which are also found in grammar FST 120. For example, grammar FST 120 might include a path which indicates that a singular noun precedes a verb in the third person. Such a path can be used to reinforce portions of input FST 116 where a noun precedes a verb in third person.
Figure 18 is an example of FST 121 which was produced by grammar application module 117 from the FST shown in Figure 16. As shown in Figure 18, the weights on the path 125 corresponding to he left, where he is analyzed as a pronoun and left is analyzed as a verb in past tense, have been increased by weights application module 119. The weight for this path has been increased since it matches the subject-verb agreement rule, which indicates that a pronoun can be the subject of a verb. This and other rules are described in more detail below.
Construction Of Grammar FST
Grammar FST 120 (see Figure 17) is constructed from contextual grammatical rules, examples of which are set forth in Appendix A. In the present invention, there are two types of such rules: application rules and definition rules. Application rules indicate which rules must be applied, whereas definition rules define the rules themselves. Taking application rules first, application rules comprise items which do not contain an "equals" sign. For example, the application rule "*NP/0" indicates that a noun phrase rule (i.e. , a rule stating that all nouns must be preceded by determiners, such as a, an, this, etc.) must be applied with a weight of 0. The weight of 0 means that, in the event that words in input FST 116 comply with this rule, a value of 0 is added to the weight of the matching words in input FST 116. A "*" before an item in a rule, such as "*NP/0", indicates that the item is defined elsewhere by a definition rule. When there is no "*" before an item, the item refers to a word which can be specified with the word itself, its root form, and its part of speech. For example, the application rule
there,Adv/10 is;be:V3sg/20
indicates that the word there should be matched as an adverb, followed immediately by the word be in the third person singular, i.e. , is. If a match is found, meaning that words in input FST 116 comply with this rule, weights 10 and 20 are added to weights of the matching words in input FST 116.
Items to be matched by application rules can have any of the following formats:
SYMBOL./NUMBER where SYMBOL is any symbol, and NUMBER is a weight; * indicates that the SYMBOL is defined elsewhere in the grammar.
WORD,POS/NUMBER where WORD is a word, POS a part- of-speech, and NUMBER is a weight; the root form is not specified and matches any root form.
WORD;ROOT:POS/NUMBER where WORD is a word, ROOT its root form, POS its part-of-speech, and NUMBER is a weight.
:POS/NUMBER where POS is a part-of-speech and
NUMBER a weight; in this item, the word and its root form are not specified and match any word and root form.
Examples of some of the foregoing items are shown in the FST of Figure
10.
Definition rules include an "equal" sign. The left side of the equal sign includes an item of the form "*SYMBOL"; and the right side of the equal sign includes any sequence of items. For example,
*NP3S - *ADJP/0 :N/10
is a definition rule. In this example, *NP3S indicates that a noun phrase in the third person singular is formed by an adjective (*ADJP/0) and a noun (:N/10). In a case that words in input FST comply with this rule, a noun in such words is incremented by 10 (from the 10 in " :N/10") and the adjective is not incremented (from the 0 in "*ADJP/0").
In the present invention, the grammatical rules are non- recursive, meaning that at no point does a symbol refer to itself. As a result, the rules can be combined into a grammar FST for comparison with input FST 116. Specifically, to generate grammar FST 120, items with a "*" preceding them are recursively replaced by their definitions. Next, the grammatical rules are converted into an FST by concatenating an FST of each obtained item. Application rules are then used to define paths from an initial state to a final state in the constructed FST.
In addition to general grammatical rules (such as subject-verb agreement rules), the present invention also includes specific grammatical constructions in the grammar FST. For example the application rule
too,Adv/10 ,A/40 to, Prep/10 corresponds to the construction "too ADJECTIVE to", as in the sentence "He is too young to drive" . Another example of such a construction is:
there,Adv/10 is;be:V3sg/20,
which is used for sentences such as "There is a car in his parking space". Grammar FST 120 also includes auxiliary verb groups ("*VG"), examples of which are also shown in Appendix A.
Post-Grammar Application Module Processing
Returning to Figure 13, FST 121 generated by grammar application module 117 (see, e.g. , Figure 18) is output to morphology deletion module 130. Morphology deletion module 130 deletes unnecessary morphological information from the FST, such as part-of-speech information. Morphology deletion module 130 also reorganizes weights in the FST so that the weights correspond to possible alternatives to a misspelled word. An example of such an FST is shown in Figure 19, in which only words and weights remain. As shown in Figure 19, base 132 has a weight of 14, since morphology deletion module 130 moved the weight of the compound "air base" to "base" (see Figure 18). FST 134, having words and weights only, is then output from morphology deletion module 130 to best path enumeration module 135. Best path enumeration module 135 sums the weights of each path of FST 134, and outputs a ranked list 136 of alternative words that can be used to replace a misspelled word or a grammatically-incorrect word in the input text. In accordance with the invention, and particularly in cases where the invention is used in a non-English-language context, this list of alternative words may contain words having an accent mark and/or a diacritic which is different from, and/or missing from, the original word. In addition, in preferred embodiments of the invention, this ranked list ranks the alternative words according to which have the highest weights. Of course, in a case that weights are not used, or different types of weights are used, the ranking can be performed differently.
The spelling and grammar checking system of the present invention may be used in conjunction with a variety of different types of applications. Examples of such uses of the invention are provided in more detail below.
Word Processing
Spelling and grammar checking code 49 of the present invention may be used in the context of a word processing application, such as those described above. Figure 20 is a flow diagram depicting computer- executable process steps which are used in such a word processing application.
More specifically, step S2001 inputs text into a text document. Next, step S2002 spell-checks the text so as to replace misspelled words in the text with correctly-spelled words. In preferred embodiments of the invention, step S2002 is performed in accordance with Figures 3 or 4 described above, and comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words. Next, step S2003 checks the document for grammatically-incorrect words. In preferred embodiments of the invention, step S2003 checks the document by (i) generating a finite state machine ("FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words. Finally, step S2004 replaces grammatically-incorrect words in the document with grammatically-correct word, and step S2005 outputs the document with little or no grammatical and/or spelling errors Machine Translation
Spelling and grammar checking code 49 of the present invention may be used in the context of a machine translation system which translates documents from one language to another language, such as those described above. Figure 21 is a flow diagram depicting computer- executable process steps which are used in such a machine translation system.
More specifically, step S2101 inputs text in a first language, and step S2102 spell-checks the text in the first language so as to replace misspelled words in the text with correctly-spelled words. In preferred embodiments of the invention, this spell-checking step is performed in accordance with Figures 3 or 4 described above, and comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words. Next, step S2103 checks the text in the first language for grammatically-incorrect words. Step S2103 does this by (i) generating a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words. Grammatically-incorrect words in the text are then replaced with grammatically-correct words in step S2104.
Following step S2104, step S2105 translates the text from the first language into the second language, and step S2106 spell-checks the text in the second language so as to replace misspelled words in the text with correctly-spelled words. In preferred embodiments of the invention, step S2106 spell checks the text in the same manner as did step S2102. Accordingly, a detailed description of this process is not repeated. Thereafter, step S2107 checks the text in the second language for grammatically-incorrect words in the same manner that step S2103 checked the text in the first language. Accordingly, a detailed description of this process is not repeated. Step S2108 then replaces grammatically-incorrect words in the text with grammatically-correct words, and step S2109 outputs the text with little or no grammatical and/or spelling errors.
Optical Character Recognition
Spelling and grammar checking code 49 of the present invention may be used in the context of an optical character recognition system which recognizes input character images. Figure 22 is a flow diagram depicting computer-executable process steps which are used in such an optical character recognition system.
More specifically, step S2201 inputs a document image, e.g. , via scanner 13, and step S2202 parses character images from the document image. Thereafter, step S2203 performs character recognition processing on parsed character images so as to produce document text. Step S2204 then spell-checks the document text so as to replace misspelled words in the document text with correctly-spelled words. This spell checking is performed in accordance with Figures 3 or 4 described above, and comprises detecting misspelled words in the document text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document text with the selected one of the alternative words. Next, step S2205 checks the document text for grammatically-incorrect words. This checking is performed in accordance with Figures 3 or 4 described above, and comprises (i) generating a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks associated for the alternative words. Thereafter, step S2206 replaces grammatically-incorrect words in the document text with grammatically correct words, and step S2207 outputs the document text with little or no grammatical and/or spelling errors.
Text Indexing And Retrieval
Spelling and grammar checking code 49 of the present invention may be used in the context of a text indexing and retrieval system for retrieving text from a source based on an input search word. Examples of such text indexing and retrieving systems in which the present invention may be used include, but are not limited to, Internet search engines, document retrieval software, etc. Figure 23 is a flow diagram depicting computer-executable process steps which are used in such a text indexing and retrieval system.
More specifically, step S2301 comprises inputting a search word or a search phrase comprised of plural search words, and step S2302 comprises correcting a spelling of each search word to produce corrected search word(s). Next, in a case that a search phrase is input, step S2303 replaces grammatically-incorrect words in the search phrase with a grammatically-correct word in order to produce a corrected search phrase. In the invention, steps S2302 and S2303 are preferably performed by spelling and grammar checking code 49 shown in Figures 3 or 4. Step S2304 then retrieves text from a source (e.g. , a pre-stored database or a remote location such as a URL on the WWW) that includes the corrected search word/phrase, and step S2305 displays the retrieved text on local display, such as display screen 11.
Client-Server Configuration
The spelling and grammar checking system of the present invention may also be utilized in a plurality of different hardware contexts. For example, the invention may be utilized in a client-server context. In this aspect of the invention, a single computer, such as PC 4, can service multiple requests for spelling correction at the same time by executing multiple threads of the same program, such as spelling suggestion module 52. To perform this function, in this embodiment of the invention, processor 38 is multi-tasking.
In brief, this aspect of the invention comprises computer- executable process steps to correct misspelled words in input text sequences received from a plurality of different clients. The process steps include code to store in a memory on a server (e.g. , PC 4 shown in Figure 1 or a stand-alone server), a lexicon comprised of a plurality of reference words, code to receive the input text sequences from the plurality of different clients (e.g. , over fax/modem line 10, network interface 9, etc.), code to spell-check the input text sequences using the reference words in the lexicon, and code to output spell-checked text sequences to the plurality of different clients. In preferred embodiments of the invention, the lexicon comprises one or more lexicon FSTs (in general, FSMs), stored in a single memory, where the lexicon FSTs include the plurality of reference words and a phonetic representation each reference word. In these embodiments, the spell-checking code comprises a code to correct misspelled words in each of the input text sequences substantially in parallel using the lexicon FSTs stored in the single memory. This code corresponds to that described above in Figures 3 and 4. Figure 24 shows representative architecture of the client- server multi-threaded spelling correction system of the present invention. As shown in Figure 24, lexicon memory 150 (which stores lexicon FSTs of the type described above) is shared across each program thread 151, 152 and 153 of the client-server spelling correction system. In this regard, each program thread comprises a substantially complete copy of spelling and grammar checking code 49.
Each of program threads 151 to 153 contains a corresponding memory (i.e. , memories 154, 155 and 156) that is used by processor 38 to execute that thread, as well as to perform other processing in relation thereto. Each spelling memory also stores an FSA generated by spelling suggestion module 52 (see Figure 5), and may also store additional programs and variables. Lexicon memory 150 is identical to a memory used to store the lexicon FSTs described with respect to Figure 5, but, unlike that in Figure 5, is being shared by plural program threads on the server. In operation, multiple text sequences (TEXT1 160, TEXT2 161...TEXTn 162) from a plurality of different clients are input to the server from remote sources, such as a LAN, the Internet, a modem, or the like, and are processed by respective program threads. Specifically, each program thread identifies misspelled words in the text, and, using lexicon memory 150, outputs corrected text, as shown in Figure 24. In this regard, the operation of the spelling and grammar checking code used in this aspect of the invention is identical to that described above, with the only difference being memory allocation.
Client-Server Information Retrieval System
Figure 25 shows the multi-threaded client-server spelling correction system described above used in a text indexing and retrieval context (e.g. , in conjunction with a WWW search engine, database searching software, etc.). In this regard, in text indexing and retrieving systems, textual queries are sent to a database, and information related to the textual queries is retrieved from the database. Often, however, queries are misspelled and, as a result, correct information cannot be retrieved from the database. The system shown in Figure 25 addresses this problem.
More specifically, in Figure 25, as was the case above with respect to Figure 24, multiple queries are input at the same time to the server (i.e. , PC 4). As was the case in Figure 24, lexicon memory 750 is shared among all of program threads 151, 152 and 153. In addition, as before, each program thread contains its own spelling memory. In operation, multiple queries (i.e. , QUERYl 164, QUERY2 165...QUERYn 166) are input to the client-server spelling correction system of the present invention before each query is actually used to retrieve information from database 169. The present invention then corrects each query in the manner described above with respect to Figures 3, 4 and in particular,
Figure 5. Each corrected query is then used to retrieve information from database 169.
The present invention has been described with respect to particular illustrative embodiments. It is to be understood that the invention is not limited to the above-described embodiments and modifications thereto, and that various changes and modifications may be made by those of ordinary skill in the art without departing from the spirit and scope of the appended claims.
Appendix A
Copyright 1998 Teragram Corporation All Rights Reserved
ft APPLICATIONS of RULES #
# Subject- Verb agreement
#
*SubjectVerb/9 :PN/10 :PN/10
# verb groups *VG/0 #
#
# Application of Noun phrases
# Noun phrase agreement #
*NP/0
NP3S/0 that,C/10 NP3S/0 that,C/10 *DETSING/10
#
# whom/who
#
:Preρ/10 whom,WPro/10 :Preρ/10 whom,WPro/10
:Prep/10 whom,WPro/10 *SubjectVerb/10
#:Prep/10 whom,WPro/10 did,Vpt/10 #:Prep/10 whom,WPro/10 does,V3sg/10 #:Prep/10 whom,WPro/10 do,V/10
# who took the call
# whom I saw who,WPro/10 *VG/10 *NP/10 whom,WPro/10 *NP/10 *VG/10 # ft whomever/ whoever
#
Prep/ 10 whomever, WPro/ 10
Prep/ 10 whomever, WPro/ 10
Prep/ 10 whomever, WPro/ 10 *SubjectVerb/10 whoever, WPro/ 10 *VG/10 *NP/10 whomever,WPro/10 *NP/10 *VG/10 ft — ft to
# want,V/10 to,Prep/20 wanted, Vpt 10 to,Prep/20 wants, V3sg/ 10 to,Prep/20 wish,V/10 to,Prep/20 wished,Vpt/10 to,Prep/20 wishes, V3sg/ 10 to,Prep/20
to,Prep/10 :V/10
:V/10 to,Prep/10 :Vpt/10 to,Prep/10 :V3sg/10 to,Prep/10 whether.C/lO to,Prep/10 :Ving/10 to,Prep/10 ft to a new home ft to det to,Prep/10 :Det 20 ft 40 to 50
:digit/10 to,Preρ/10 :digit/40 ft resemblance to resemblance,N/10 to,Prep/40 ft membership to membership,N/10 to,Prep/40 ft transition to transition,N/10 to,Prep/40
# critics to critics,Npl/10 to,Prep/40 ft trip to
Figure imgf000048_0001
trips,Npl/10 to,Prep/40 ft extradition to extradition,N/10 to, Prep/40 ft ambassador to ambassador,N/10 to, Prep/40 ft subject to subject,N/10 to,Prep/40 ft in contrast to in,Prep/10 contrast,N/10 to,Prep/40
# risk to risk,N/10 to,Prep/40
# relief to relief,N/10 to,Prep/40 ft give birth to #give+V/10 birth,N/10 to,Prep/40 give;:V/10 birth,N/10 to,Preρ/40 gave;give:Vpt/10 birth,N/10 to,Prep/40 given;give:Vpp/10 birth,N/10 to,Prep/40 gives;give:V3sg/10 birth,N/10 to,Prep/40
# ft t o
# two,Num/10 :Npl/20 two,Num/10 :A/10 :Npl/20 two,Num/10 hundred, Num/30 two,Num/10 thousand,Num/30 two,Num/10 million,Num/30 two,Num/10 billion,Num/30 twenty,Num/10 two,Num/30
Λirty,Num/10 two,Num/30 forty,Num/10 two,Num/30 fιfty,Num/10 two,Num/30 sixty,Num/10 two,Num/30 seventy,Num/10 two,Num/30 eighty, Num/ 10 two,Num/30 ninety, Num 10 two, Num/30 ft — # too
# tt too much too,Adv/10 much,Adv/40 ft too ADV too ADV too,Adv/200 :Adv/200 too,Adv/200 :Adv/200 ft far too many far,Adv/10 too,Adv/40 many,A/10 ft far too little far,Adv/10 too,Adv/40 little,Adv/10 ft far too much far,Adv/10 too,Adv/40 much,Adv/10 ft far too A far,Adv/10 too,Adv/20 ,A/10 ft too many too,Adv/10 many,A/20 ft too many of too,Adv/40 many,A/10 of,Prep/10 ft too A too,Adv/10 ,A/20 ft too self too,Adv/10 self, A/ 1
# BE too A he is too big is;be:V3sg/10 too,Adv/10 ,A/40 am,xx/10 too,Adv/10 ,A/40 are,xx/10 too,Adv/10 ,A/40 was;be:xx/10 too,Adv/10 ,A/40 were;be:xx/10 too,Adv/10 ,A/40 been;be:Vpp/10 too,Adv/10 ,A/40 being;be:Ving/10 too,Adv/10 ,A/40 for,Prep/10 too,Adv/10 ,A/40
# too small to too,Adv/10 ,A/40 to,Preρ/10
# ft then
# .;:inc/10 then,Adv/20
# ft than
# ftff Acomp than ft better than ,Acomρ/10 than,C/20 tttt more A than ft more courageous than more, Adv/ 10 ,A/10 than,C/20 tttt more than more, Adv/ 10 than,C/20 tttt more often than more,Adv/10 often,Adv/10 than,C/20 tttt more Npl than
# more duds than more,Adv/10 ,Npl/10 than,C/20 tttt more Ns than # more business than more,Adv/10 ,N/10 than,C/20 ftff more Adv than
# more sharply than more,Adv/10 ,Adv/10 than,C/20 tttt less A than ft less courageous than less,Adv/10 ,A/10 than,C/20 tttt less than less, Adv/ 10 than,C/20 ftff less often than less, Adv/ 10 often, Adv/ 10 than,C/20 ftff less Npl than ft less duds than less,Adv/10 ,Npl/10 than,C/20 tttt less Ns than # less business than less,Adv/10 ,N/10 than,C/20 tttt less Adv than ft less sharply than less,Adv/10 ,Adv/10 than,C/20 tttt other than other,Adv/10 than,C/20 tttt rather than rather,Adv/10 than,C/20 tttt further than further, Adv/ 10 than,C/20 tfff than most than,C/20 most,A/20 ft no other Ns than ft no other reason than no,Adv/10 other,A/10 ,N/10 than,C/20
ft ft cloth/clothe ft broach/broch ft ft there there,Adv/10 is;be:V3sg/20 there, Adv/ 10 are;be:xx/20 there,Adv/10 was;be:xx/20 there,Adv/10 were;be:xx/20 ft bridle/bridal bridal,A/30 fashion,N/30
:V/10 :Adv/10
mwmmmmmmmmmmmιt#ftιtttιtttιtitιtititιtitιtιtttftιtttιtιtιt # DEFINIΗON of RULES mmmmmmmmmmmmmmιtttitιtttttιtttιtttιtftιtttιtttιtιttt # _# ft Verb Group ft ft
*VG = *VG-Simple/0 *VG = *VG-Complex/0 *VG = *VG-not-Simple/0 *VG = *VG-not-Complex/0
# # ft Verb Groups (not negated)
# # ft VG-Simple VG-Simple = :V3sg/10 VG-Simple = :Vpt/10 *VG-Simple = *VG-A/0 *VG-Simple = *VG-B/0 *VG-Simple = *VG-C/0
VG-Simple = *VG-D/0 ft VG-Complex
VG-Complex = *VG-AB/0 VG-Complex = VG-AC/O
VG-Complex = *VG-AD/0
VG-Complex = *VG-BC/0
VG-Complex = VG-BD/O
*VG-Complex = *VG-CD/0 VG-Complex = *VG-ABC/0
*VG-Complex = *VG-ABD/0
*VG-Complex = *VG-ACD/0
*VG-Complex = *VG-BCD/0
*VG-Complex = *VG-ABCD/0
tt should add the verb to be ft MODAL
MODAL = must,Md/10
*MODAL = could,Md/10 *MODAL = should,Md/10
MODAL = might,Md/10
MODAL = may,Md/10
MODAL = can,Md/10
MODAL = would,Md/10 MODAL = shall,Md/10
MODAL = will,Md/10
MODALnt = mus 't,Mdn't/10
MODALnt = couldn't.Mdn't/lO MODALnt = shouldn't.Mdn't/lO
MODALnt = mightn't,Mdn't/10
MODALnt = can't.Mdn't/lO
MODALnt = wouldn't,Mdn't/10
MODALnt = shan't,Mdn't/10 MODALnt = won't,Mdn't/10
# VG-A ft MODAL+inf # must examine VG-A = MODAL/0 :V/10
# VG-B ft HAVE + -ed ft has examined VG-B = have, V/ 10 : Vpp/ 10 VG-B = has,V3sg/10 :Vpp/10 *VG-B = had,Vpt/10 :Vpp/10 ft BE + -ing ft is examining
VG-C = am;be:xx/10 :Ving/10 VG-C = is,V3sg/10 :Ving/10 VG-C = are;be:xx/10 :Ving/10 VG-C = was;be:xx/10 :Ving/10
VG-C = were;be:xx/10 :Ving/10 ft VG-D TO FIX ft BE + -ed ft is examined
VG-D = am;be:xx/10 :Vpp/10
VG-D = is,V3sg/10 :Vpp/10
VG-D = are;be:xx/10 :Vpp/10
VG-D = was;be:xx/10 :Vpp/10 VG-D = were;be:xx/10 :Vpp/10 ft VG-AB tt may have examined
VG-AB = MODAL/10 have,V/10 :Vpp/10
# AC tt may be examining
VG-AC = MODAL/10 be,V/10 :Ving/10 # AD tt may be examined
VG-AD = MODAL/10 be,V/10 :Vpp/10
# BC tt has been examining
VG-BC = have,V/10 been,Vpp:/10 :Ving/10 VG-BC = has,V3sg/10 been,Vpp:/10 :Ving/10 VG-BC = had,Vpt/10 been,Vpp:/10 :Ving/10 # BD tt has been examined
VG-BD = have,V/10 been,Vpp:/10 :Vpp/10 VG-BD = has,V3sg/10 been,Vpp:/10 :Vpp/10 VG-BD = had,Vpt/10 been,Vpp:/10 :Vpp/10
# CD tt is being examined
VG-CD = am;be:xx/10 being, Ving/ 10 : Vpp/ 10
VG-CD = is,V3sg/10 being,Vrng/10 :Vpp/10 VG-CD = are;be:xx/10 being,Ving/10 :Vpp/10 VG-CD = was;be:xx/10 being,Ving/10 :Vpp/10 VG-CD = were;be:xx/10 being,Ving/10 :Vpp/10
tt ABC tt may have been examining
VG-ABC = MODAL/0 have,V/10 been,Vpp:/10 :Ving/10 tt ABD tt may have been examined
VG-ABD = MODAL/0 have,V/10 been,Vpp:/10 :Vpp/10 tt ACD ft may be being examined VG-ACD = MODAL/0 be,V/10 being,Ving:/10 :Vpp/10
# BCD tt has been being examined
VG-BCD = have,V/10 been,Vpp:/10 being,Ving/10 :Vpp/10 VG-BCD = has,V3sg/10 been,Vpp:/10 being,Ving/10 :Vpp/10
VG-BCD = had,Vpt/10 been,Vpp:/10 being,Ving/10 :Vpp/10 tt ABCD tt may have been being examined VG-ABCD = MODAL/0 have,V/10 been,Vpp:/10 being,Ving/10
:Vpp/10
# # tt Verb Groups negated ft ft ft VG-not-Simple
VG-not-Simple = does,V3sg/10 not, Adv/ 10 :V/10 VG-not-Simple = do,V/10 not,Adv/10 :V/10 VG-not-Simple = did,Vpt 10 not,Adv/10 :V/10 VG-not-Simple = MODAL/0 not, Adv/ 10 :V/10
VG-not-Simple = MODALnt/0 :V/10 tt VG-not-Complex
VG-not-Complex = VG-not-AB/0 VG-not-Complex = VG-not-AC/0
VG-not-Complex = VG-not-AD/0
VG-not-Complex = VG-not-BC/O
VG-not-Complex = VG-not-BD/0
VG-not-Complex = VG-not-CD/0 VG-not-Complex = VG-not-ABC/0
VG-not-Complex = VG-not-ABD/0
VG-not-Complex = VG-not-ACD/0 tt VG-not-Complex = VG-not-BCD/0 ft VG-not-Complex = VG-not-ABCD/O ft should add the verb to be
# MODAL
MODAL = must,Md/10 MODAL = could,Md/10 MODAL = should,Md/10
MODAL = might,Md/10 MODAL = may,Md/10 MODAL = can,Md/10 MODAL = would,Md/10 MODAL = shall,Md/10
MODAL = will,Md/10
MODALnt = mustn't,Mdn't/10
MODALnt = couldn't.Mdn't/lO MODALnt = shouldn't,Mdn't/10
MODALnt = can't,Mdn't/10
MODALnt = wouldn't,Mdn't/10
MODALnt = shalln't,Mdn't/10
MODALnt = won't,Mdn't/10 tt VG-not-A
# MODAL +inf tt must examine
VG-not-A = MODAL/0 not,Adv/10 :V/10 VG-not-A = MODALnt/0 :V/10 tt VG-not-B tt HAVE + -ed ft has examined VG-not-B = have,V/10 not,Adv/10 :Vpp/10
VG-not-B = has,V3sg/10 not,Adv/10 :Vρρ/10 VG-not-B = had,Vpt/10 not,Adv/10 :Vpp/10 tt VG-not-C TO FIX (tags for am.is,...) tt BE + -ing tt is examining
VG-not-C = am;be:xx/10 not,Adv/10 :Ving/10
VG-not-C = is,V3sg/10 not,Adv/10 :Ving/10
VG-not-C = are;be:xx/10 not,Adv/10 :Ving/10 tt VG-not-D TO FD
# BE + -ed tt is examined
VG-not-D = am;be:xx/10 not,Adv/10 :Vpp/10 VG-not-D = is,V3sg/10 not,Adv/10 :Vpp/10
VG-not-D = are;be:xx/10 not,Adv/10 :Vpp/10 ft VG-not-AB tt may have examined VG-not-AB = MODAL/10 not,Adv/10 have,V/10 :Vpp/10 VG-not-AB = MODALnt/10 have,V/10 :Vpp/10
# AC # may be examining
VG-not-AC = MODAL/0 not,Adv/10 be,V/10 :Ving/10 VG-not-AC = MODALnt/0 be,V/10 :Ving/10
# AD tt may be examined
VG-not-AD = MODAL/0 not,Adv/10 be,V/10 :Vpp/10 VG-not-AD = MODALnt/0 be,V/10 :Vpp/10
# BC tt has been examining
VG-not-BC = have,V/10 not,Adv/10 been,Vpp:/10 :Ving/10 VG-not-BC = has,V3sg/10 not,Adv/10 been,Vpp:/10 :Ving/10 VG-not-BC = had,Vpt/10 not,Adv/10 been,Vpp:/10 :Ving/10 # BD ft has been examined
VG-not-BD = have,V/10 not,Adv/10 been,Vpp:/10 :Vpρ/10 VG-not-BD = has,V3sg/10 not,Adv/10 been,Vρp:/10 :Vρρ/10 VG-not-BD = had,Vpt/10 not,Adv/10 been,Vpp:/10 :Vpp/10
# CD
# is being examined
VG-not-CD = am;be:xx/10 not,Adv/10 being,Ving/10 :Vpρ/10
VG-not-CD = is,V3sg/10 not,Adv/10 being,Ving/10 :Vpp/10 VG-not-CD = are;be:xx 10 not,Adv/10 being,Ving/10 :Vpp/10
# ABC ft may have been examining
VG-not-ABC = MODAL/10 not,Adv/10 have,V/10 been,Vpp:/10 :Ving/10
VG-not-ABC = MODALnt 10 have,V/10 been,Vpp:/10 :Ving/10
# ABD tt may have been examined VG-not-ABD = MODAL/ 10 not, Adv/ 10 have, V/ 10 been,Vpp:/10
:Vpp/10 VG-not-ABD = MODALnt/10 have,V/10 been,Vpp:/10 :Vpp/10
# ACD # may be being examined
VG-not-ACD = MODAL/10 not,Adv/10 be,V/10 being,Ving:/10
:Vpp/10
VG-not-ACD = MODALnt 10 be,V/10 being, Ving:/ 10 :Vpp/10 # BCD tt has been being examined
VG-not-BC = have,V/10 not,Adv/10 been,Vpp:/10 being,Ving/10 : Vpp/ 10 VG-not-BC = has,V3sg/10 not,Adv/10 been,Vpp:/10 being,Ving/10
:Vpp/10
VG-not-BC = had,Vpt/10 not,Adv/10 been,Vpp:/10 being,Ving/10 : Vpp/ 10 tt ABCD tt may have been being examined
VG-not-ABC = MODAL/0 not,Adv/10 have,V/10 been,Vpp:/10 being,Ving/10 :Vρp/10
VG-not-ABC = MODALnt/0 have,V/10 been,Vpp:/10 being,Ving/10 : Vpp/ 10
# ft POSSPRO tt Possessive pronoun, no number agreeement tt
POSSPRO = his,PossPro/10
POSSPRO = her,PossPro/10
POSSPRO = its,PossPro/10
POSSPRO = my,PossPro/10 POSSPRO = our,PossPro/10
POSSPRO = their,PossPro/10
POSSPRO = your,PossPro/10
# tt DETSING
#
DETSING = a,Det/10 DETSING = one,Num/ll DETSING = an,Det/10 DETSING = the,Det 10
DETSING = this,Det/10 DETSING = POSSPRO/0
# # DETPL
#
DETPL = these,Det/10 DETPL = POSSPRO/0 DETPL = NUM/O DETPL = NUM/O NUM/l
DETPL = NUM/O and,C/10 +NUM/1 DETPL = NUM/O +NUM/1 +NUM/1 DETPL = +NUM/0 +NUM/1 and,C/10 +NUM/1 NUM = one,Num/10 NUM = two,Num/10 NUM = three,Num/10 NUM = four,Num/10 NUM = fιve,Num/10 NUM = six.Num/lO NUM = seven, Num/ 10 NUM = eight,Num/10 NUM = nine,Num/10
NUM = ten,Num/10 NUM = eleven,Num/10 NUM = twelve,Num 10 NUM = thirteen,Num/10 +NUM = fourteen,Num 10
NUM = fιfteen,Num/10 NUM = sixteen,Num/10 NUM = seventeen,Num 10 NUM = eighteen,Num/10 +NUM = nineteen,Num/10
NUM = twenty,Num/10 NUM = thirty,Num/10 NUM = forty, Num/ 10 +NUM = fιfty,Num/10
NUM = sixty, Num/ 10 NUM = seventy,Num/10 NUM = eighty,Num/10 NUM = ninety,Num 10
NUM = hundred,Num/10 NUM = thousand,Num/10 NUM = million,Num/10 tt tt PRO-3S tt pronouns not with 3rd person singular morphological sign #
PRO-3S = they, Pro/ 10 PRO-SS = we,Pro/10
PRO-3S = I,Pro/10 PRO-3S = you,Pro/10 PRO-3S = you,Pro/10 tt ft PRO3S ft pronouns with 3rd person singular moφhological sign #
PRO3S = this,Pro/10 PRO3S = it, Pro/30 PRO3S = he,Pro/10 PRO3S = she,Pro/10 PRO3S = that, Pro/ 10
# tt ADJP Adjectival Phrase
#
ADJP = : Adv/ 10 ADJJ/0 ADJP = ADJJ/0
# tt ADJJ sequence of adjective(s)
# #A/11 is there to compensante against Vpp/5 + 5 unigram
ADJJ = A/11 ADJJ = A/10 :A/10 ADJJ = Vpp/5 ADJJ = Ving/5
#
# NP
# NOUN Phrase
# *NP = NPSS/O
NP = NP-SS/O
#
# NP3S tt NOUN Phrase with 3rd person singular moφhological sign #
NP3S = PRO3S/0 NP3S = :N/10 NP3S = ADJP/0 :N/10 NPSS = DETSING/0 :N/10
NP3S = DETSING/0 ADJP/0 :N/10
NP3S = each,Pro/10 of,Prep/10 PRO-3S/10 NP3S = each,Pro/10 of,Prep/10 DETPL/10 :Npl 10 NP3S = each,Pro/10 of,Prep/10 DETPL/10 ADJP/0 :Npl/10
#
# NP-3S tt NOUN Phrase without 3rd person singular moφhological sign #
NP-3S = PRO-3S/0 NP-3S = DETPL/0 :Npl/10 NP-3S = DETPL/0 ADJP/0 : Npl/ 10 NP-3S = ADJP/SO : Npl/ 10 NP-3S = : Npl/ 10
# # tt Subject- Verb Agreement
# #
SubjectVerb = NPSS/O :V3sg/10 SubjectVerb = NPSS/O :Vpt/10 SubjectVerb = NP-SS/O :V/10 SubjectVerb = NP-SS/O :Vpt/10 ft he/John was
SubjectVerb = NPSS/O was;be:xx/10 ft I was
SubjectVerb = I,Pro/10 was;be:xx/10 # I am
SubjectVerb = I,Pro/10 am;be:xx/10 ft you are
SubjectVerb = you,Pro/10 are;be:xx/10 tt they are SubjectVerb = they,Pro/10 are;be:xx/10 tt WH Questions tt this should be done differently tt Where can I rent a car ... tt Who can rent a car? tt Can you rent a car?
:WAdv/10 MODAL/ 10 :WPro/10 MODAL/10
Appendix B
The following is a list of pairs of often-confused words. If a first word in a pair is found, the second word is also processed by the invention, meaning that a substantially similar word thereto is search for and, if found, output. adept,adapt adept,adopt adopt.adept adopt,adapt adapt,adept adapt, adopt ads, adds adds, ads advice, advise advise, advice aid, aide aide, aid alter,altar altar.alter backwards,backward backward.backwards bare,bear bear,bare beet,beat beat,beet border,boarder boarder,border breath,breathe breathe,breath to,two to,too too.to too.two two.to two.too than,then then, than cloth.clothe clothe, cloth broach.brooch brooch,broach bridal,bridle bridle,bridal brows,browse browse.brows cant.can't can't.cant confident , confidant confidant , confident decent,descent dependent,dependant dependant, dependent desert,dessert dessert, desert downward , downwards downwards , downward elicit, illicit illicit, elicit envelop, envelope envelope,envelop feat, feet feet.feat find,fined fined,find flare, flair flair, flare flea.flee flee, flea herd,heard heard,herd hew.hue hue,hew hoard.horde horde.hoard incite.insight insight, incite indoor.indoors indoors,indoor inward, inwards inwards, inward its.it's it's.its laps.lapse lapse.laps let's.lets lets.let's mmd,mined mmed,mind mmer,minor mmor,miner morn.moura mourn.morn navel,naval naval.navel new.knew knew.new no, know know, no outdoor, outdoors outdoors, outdoor outwards, outward outward , outwards pedal.peddle pray,prey prey, pray pride.pried pried.pride principal.principle principle,principal thats.that's that's.thats their.there their.they're there,their there,they're they're,their tide,tied tied,tide upwards.upward upward,upwards who's.whose whose.who'S who,whom whom, who whoever,whomever whomever.whoever wont.won't wont.won won't, wont you're,your your.you're
Appendix C
according as = C according to = Prep again and again = Adv all in all = Adv all of a sudden = Adv all the better = Adv all the closer = Adv all the farther = Adv all the worse = Adv alongside of = Prep an arm and a leg = N anything but = Prep arm in arm = Adv as much again = Adv at all costs = Adv at arm's length = Adv away with = Prep back and forth = Adv behind his back = Adv between ourselves = Adv bit by bit = Adv by and large = Adv charter flight = N commercial flight = N commercial passenger flight = N connecting flight = N cream of the crop = N cum lauda = Adv daily flight = N domestic flight = N flight attendant = N flight bag = N flight control = N flight crew = N flight deck = N flight engineer = N flight feather = N flight line = N flight plan = N flight shooting = N flight simulator = N flight strip = N flight surgeon = N for crying out loud = Adv in advance = Adv in due course = Adv in the abstract = Adv in the affirmative = Adv in the altogether = Adv in the bag = Adv in the balance = Adv in the course of = Prep in the meantime = Adv in the same boat = Adv international flight = N medical evacuation flight = N military flight = N nonsmoking flight = N nonstop flight = N of a certain age = Adv on account of = Prep on the alert = Adv once in a blue moon = Adv out of bounds = Adv out of the blue = Adv overnight flight = N regular flight = N under arrest = Adv under cover = Adj weekly flight = N
Exchange.3-011705.1

Claims

WHAT IS CLAIMED IS:
1. A method of correcting misspelled words in input text, the method comprising the steps of: detecting a misspelled word in the input text; determining a list of alternative words for the misspelled word; and ranking the list of alternative words based on a context of the input text.
2. A method according to Claim 1, further comprising the steps of: selecting one of the alternative words from the list; and replacing the misspelled word in the text with the selected one of the alternative words.
3. A method according to Claim 1, wherein the detecting step comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word either (i) does not match any words in the dictionary database, or (ii) is spelled correctly but corresponds to one of a plurality of words which are substantially similar.
4. A method according to Claim 1, wherein the determining step comprises: storing one or more lexicon finite state machines ("FSM"), each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word; generating an input FSM for the misspelled word, the input FSM including the misspelled word and a phonetic representation of the misspelled word; selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the misspelled word or to the phonetic representation of the misspelled word; and adding selected ones of the one or more reference words to the list of alternative words.
5. A method according to Claim 4, wherein the selecting step comprises the step generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the one or more reference words from the lexicon FSM using the additional FSM.
6. A method according to Claim 5, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and the additional FSM comprises a finite state automaton ("FSA").
7. A method according to Claim 5, wherein the selecting step selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
8. A method according to Claim 1, wherein the ranking step comprises: generating a first finite state machine ("FSM") for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of the misspelled word; generating a second FSM for the input text and the alternative words in accordance with one or more of a plurality of predetermined grammatical rules, the second FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly- spelled version of the misspelled word; and adding corresponding weights of the first FSM and the second FSM and ranking the alternative words in accordance with the weights of the alternative words.
9. A method according to Claim 2, wherein the selecting step comprises displaying the list of alternative words and manually selecting one of the alternative words.
10. A method according to Claim 2, wherein the selecting step is performed automatically.
11. A method according to Claim 10, wherein the selecting step selects a one of the alternative words that is ranked highest in the list of alternative words.
12. A word processing method for creating and editing text documents, the word processing method comprising the steps of: inputting text into a text document; spell-checking the text so as to replace misspelled words in the text with correctly-spelled words; and outputting the document; wherein the spell-checking step comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words.
13. A machine translation method for translating text from a first language into a second language, the machine translation method comprising the steps of: inputting text in the first language; spell-checking the text in the first language so as to replace misspelled words in the text with correctly-spelled words; translating the text from the first language into the second language; and outputting translated text; wherein the spell-checking step comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
14. A machine translation method for translating text from a first language into a second language, the machine translation method comprising the steps of: inputting text in the first language; translating the text from the first language into the second language; spell-checking the text in the second language so as to replace misspelled words in the text with correctly-spelled words; and outputting the text; wherein the spell-checking step comprises detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
15. An optical character recognition method for recognizing input character images, the optical character recognition method comprising the steps of: inputting a document image; parsing character images from the document image; performing recognition processing on parsed character images so as to produce document text; spell-checking the document text so as to replace misspelled words in the document text with correctly-spelled words; and outputting the document text; wherein the spell-checking step comprises detecting misspelled words in the document text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document text with the selected one of the alternative words.
16. A method of retrieving text from a source, the method comprising the steps of: inputting a search word; correcting a spelling of the search word to produce a corrected search word; and retrieving text from the source that includes the corrected search word.
17. A method according to Claim 16, wherein the correcting step comprises the steps of: storing one or more lexicon finite state machines ("FSM"), each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word; generating an input FSM for the search word, the input FSM including the search word and a phonetic representation of the search word; selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the search word or to the phonetic representation of the search word; and setting a selected one of the reference words as the corrected search word.
18. A method according to Claim 17, wherein the selecting step comprises the step generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the one or more reference words from the lexicon FSM using the additional FSM.
19. A method according to Claim 17, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and the additional FSM comprises a finite state automaton ("FSA").
20. A method according to Claim 18, wherein the selecting step selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the search word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the search word, (iii) the character insertion module determines whether a character inserted in the search word causes at least part of the search word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the search word causes at least part of the search word to match at least part of the reference word, (v) the character replacement module replaces characters in the search word with characters in the reference word in order to determine whether at least part of the search word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the search word and compares a changed character in the search word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the search word which were not compared by the character transposition module in order to determine if at least part of the search word matches at least part of the reference word.
21. A method according to Claim 16, wherein the source comprises a remote network location; and wherein the method further comprises the step of displaying the retrieved text on a local display screen.
22. A method according to Claim 16, wherein the correcting step comprises displaying one or more corrected search words and manually selecting one of plural corrected search words.
23. A method according to Claim 16, wherein the correcting step comprises automatically selecting one of the corrected search words.
24. A method according to Claim 16, wherein the source comprises a pre-stored database.
25. A method of retrieving text from a source, the method comprising the steps of: inputting a search phrase comprised of a plurality of words, at least one of the plurality of words being an incorrect word; replacing the incorrect word in the search phrase with a corrected word in order to produce a corrected search phrase; and retrieving text from the source based on the corrected search phrase.
26. A method according to Claim 25, further comprising, between the inputting and replacing steps, the steps of: generating a first finite state machine ("FSM") comprised of two or more arcs which include alternatives to the incorrect word, each alternative having a rank associated therewith; selecting, as the corrected word, an alternative having a highest rank.
27. A method according to Claim 26, wherein the generating step comprises the steps of: storing one or more lexicon FSMs, each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word; generating an input FSM for the incorrect word, the input FSM including the incorrect word and a phonetic representation of the incorrect word; a second selecting step for selecting two or more reference words from the lexicon FSMs based on the input FSM, the two or more reference words substantially corresponding to either a spelling of the incorrect word or to the phonetic representation of the incorrect word and having a rank associated therewith; and storing, in a first FSM, the two or more reference words and corresponding ranks associated therewith.
28. A method according to Claim 27, wherein the second selecting step comprises the step of generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the two or more reference words from the lexicon FSMs using the additional FSM.
29. A method according to Claim 28, wherein the first FSM, the lexicon FSMs, and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton ("FSA").
30. A method according to Claim 28, wherein the second selecting step selects the two or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module,
(iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the incorrect word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the incorrect word, (iii) the character insertion module determines whether a character inserted in the incorrect word causes at least part of the incorrect word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted in the incorrect word causes at least part of the incorrect word to match at least part of the reference word, (v) the character replacement module replaces characters in the incorrect word with characters in the reference word in order to determine whether at least part of the incorrect word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the incorrect word and compares a changed character in the incorrect word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the incorrect word which were not compared by the character transposition module in order to determine if at least part of the incorrect word matches at least part of the reference word.
31. A method according to Claim 25, wherein the source comprises a remote network location; and wherein the method further comprises the step of displaying the retrieved text on a local display screen.
32. A method according to Claim 25, wherein the replacing step comprises displaying one or more corrected words and manually selecting one of the corrected words as a replacement for the incorrect word.
33. A method according to Claim 25, wherein the replacing step comprises automatically selecting one of plural corrected words as a replacement for the incorrect word.
34. A method according to Claim 25, wherein the source comprises a pre-stored database.
35. A method of correcting misspelled words in input text sequences received from a plurality of different clients, the method comprising the steps of: storing, in a memory on a server, a lexicon comprised of a plurality of reference words; receiving the input text sequences from the plurality of different clients; spell-checking the input text sequences using the reference words in the lexicon; and outputting spell-checked text sequences to the plurality of different clients.
36. A method according to Claim 35, wherein the lexicon comprises one or more lexicon finite state machines ("FSM"), the lexicon FSMs including the plurality of reference words and a phonetic representation each reference word; and wherein the spell-checking step comprises a correcting step for correcting misspelled words in each of the input text sequences substantially in parallel using the lexicon FSMs stored in the single memory.
37. A method according to Claim 36, wherein, for each text sequence, the correcting step comprises: generating an input FSM for a misspelled word in the text sequence, the input FSM including the misspelled word and a phonetic representation of the misspelled word; selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the misspelled word or to the phonetic representation of the misspelled word; and replacing the misspelled word in the text sequence with a selected one of the one or more reference words.
38. A method according to Claim 37, wherein the selecting step comprises the step of: generating an additional FSM having a plurality of states for the misspelled word, the states comprising at least states of a lexicon FSM and states of the input FSM; and a second selecting step for selecting the one or more reference words from the lexicon FSMs using the additional FSM.
39. A method according to Claim 38, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a fmite state automaton ("FSA").
40. A method according to Claim 35, further comprising, after the outputting step, the step of retrieving a document from a source using one of the spell-checked text sequences.
41. A method of selecting a replacement word for an input word in a phrase, the method comprising the steps of: determimng alternative words for the input word, the alternative words including at least one compound word which is comprised of two or more separate words, each alternative word having a rank associated therewith; and selecting, as the replacement word, an alternative word having a highest rank.
42. A method according to Claim 41, further comprising, between the determining and selecting steps, the step of generating a finite state machine ("FSM") comprised of two or more arcs which include the alternatives to the input words.
43. A method according to Claim 41, wherein, in a case that the selecting step selects a compound word as the replacement word, the method further comprises the step of replacing the input word and at least one other word in the phrase with the compound word.
44. A method according to Claim 41, further comprising, between the detecting and selecting steps, the step of adjusting the rank of each alternative based on a grammatical context of the input word in the phrase.
45. A method according to Claim 44, wherein each word in the phrase has a part of speech associated therewith, and each of the alternative words has a part of speech associated therewith; and wherein the adjusting step adjusts the rank of each alternative word based on whether a part of speech of the alternative word fits with a part of speech of at least one word adjacent to the input word.
46. A method according to Claim 45, wherein each compound word in the phrase has a single part of speech associated therewith.
47. A method according to Claim 41, further comprising, between the determining and selecting steps, the step of displaying the alternative words ranked in order; wherein the selecting step is performed manually.
48. A method according to Claim 41, wherein at least one of the alternative words comprises a word having an accent mark and/or a diacritic which is different from, and/or missing from, the input word.
49. A method of correcting grammatical errors in input text, the method comprising the steps of: generating a first finite state machine ("FSM") for the input text, the first finite state machine including alternative words for at least one word in the input text and a rank associated with each alternative word; adjusting the ranks in the first FSM in accordance with one or more of a plurality of predetermined grammatical rules; determining which of the alternative words is grammatically correct based on the ranks associated with the alternative words; and replacing the at least one word in the input text with a grammatically-correct alternative word determined in the determining step.
50. A method according to Claim 49, wherein the first FSM also includes one or more parts-of-speech for the alternative words; and wherein the determining step determines which of the alternative words is grammatically correct based in addition on the parts-of- speech of the alternative words.
51. A method according to Claim 50, wherein the adjusting step comprises: generating a second FSM for the input text based on the one or more of a plurality of predetermined grammatical rules, the second FSM including the alternative words and ranks associated with each alternative word; and combining the ranks in the second FSM with the ranks in the first FSM.
52. A method according to Claim 50, wherein the generating step comprises performing a moφhological analysis on each word in the input text in order to provide the parts-of-speech and ranks.
53. A word processing method for creating and editing text documents, the word processing method comprising the steps of: inputting text into a text document; checking the document for grammatically-incorrect words; replacing grammatically-incorrect words in the document with grammatically-correct words; and outputting the document; wherein the checking step comprises (i) generating a finite state machine ("FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
54. A machine translation method for translating text from a first language into a second language, the machine translation method comprising the steps of: inputting text in the first language; checking the text in the first language for grammatically- incorrect words; replacing grammatically-incorrect words in the text with grammatically-correct words; translating the text with the grammatically-correct words from the first language into the second language; and outputting the text in the second language; wherein the checking step comprises (i) generating a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the
FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
55. A machine translation method for translating text from a first language into a second language, the machine translation method comprising the steps of: inputting text in the first language; translating the text from the first language into the second language; checking the text in the second language for grammatically- incorrect words; replacing grammatically-incorrect words in the text with grammatically-correct words; and υutputting the text with the grammatically-correct words; wherein the checking step comprises (i) generating a finite state machine ("FSM") for the text in the second language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
56. An optical character recognition method for recognizing input character images, the optical character recognition method comprising the steps of: inputting a document image; parsing character images from the document image; performing recognition processing on parsed character images so as to produce document text; checking the document text for grammatically-incorrect words; replacing grammatically-incorrect words in the document text with grammatically correct words; and outputting the document text; wherein the checking step comprises (i) generating a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
57. A method of retrieving text from a source, the method comprising the steps of: inputting a search phrase comprised of a plurality of words, at least one of the plurality of words being a grammatically-incorrect word; replacing the grammatically-incorrect word in the search phrase with a grammatically-correct word in order to produce a corrected search phrase; and retrieving text from the source based on the corrected search phrase.
58. A method of spell-checking input text, the method comprising the steps of: detecting a misspelled word in the input text; storing one or more lexicon finite state machines ("FSM") in a memory, each of the lexicon FSMs including plural reference words; generating an input FSM for the misspelled word; selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word; and outputting selected ones of the one or more reference words.
59. A method according to Claim 58, wherein each of the lexicon FSMs also includes a phonetic representation of each reference word; wherein the input FSM also includes a phonetic representation of the misspelled word; and wherein the selecting step selects reference words from the lexicon FSMs which also substantially correspond to the phonetic representation of the misspelled word.
60. A method according to Claim 59, wherein the selecting step comprises the step of generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the one or more reference words from the lexicon FSMs using the additional FSM.
61. A method according to Claim 60, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton ("FSA").
62. A method according to Claim 61, wherein the selecting step selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
63. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to correct misspelled words in input text, the computer-executable process steps comprising: code to detect a misspelled word in the input text; code to determine a list of alternative words for the misspelled word; and code to rank the list of alternative words based on a context of the input text.
64. A computer-readable medium according to Claim 63, further comprising: code to select one of the alternative words from the list; and code to replace the misspelled word in the text with the selected one of the alternative words.
65. A computer-readable medium according to Claim 63, wherein the code to detect comprises code to compare each word in the input text to a dictionary database and code to characterize a word as misspelled when the word either (i) does not match any words in the dictionary database, or (ii) is spelled correctly but corresponds to one of a plurality of words which are substantially similar.
66. A computer-readable medium according to Claim 63, wherein code to determine comprises: code to store one or more lexicon finite state machines
("FSM"), each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word; code to generate an input FSM for the misspelled word, the input FSM including the misspelled word and a phonetic representation of the misspelled word; code to select one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the misspelled word or to the phonetic representation of the misspelled word; and code to add selected ones of the one or more reference words to the list of alternative wqrds.
67. A computer-readable medium according to Claim 66, wherein the code to select comprises code to generate an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and code to select the one or more reference words from the lexicon FSM using the additional FSM.
68. A computer-readable medium according to Claim 67, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and the additional FSM comprises a finite state automaton ("FSA").
69. A computer-readable medium according to Claim 67, wherein the code to select selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module deterrnines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
70. A computer-readable medium according to Claim 63, wherein the code to rank comprises: code to generate a first finite state machine ("FSM") for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of the misspelled word; code to generate a second FSM for the input text and the alternative words in accordance with one or more of a plurality of predetermined grammatical rales, the second FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly- spelled version of the misspelled word; and code to add corresponding weights of the first FSM and the second FSM and ranking the alternative words in accordance with the weights of the alternative words.
71. A computer-readable medium according to Claim 64, wherein the code to select comprises code to display the list of alternative words and code to select one of the alternative words in response to a user input.
72. A computer-readable medium according to Claim 64, wherein the code to select performs selection automatically.
73. A computer-readable medium according to Claim 72, wherein the code to select selects a one of the alternative words that is ranked highest in the list of alternative words.
74. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to create and edit text documents, the computer-executable process steps comprising: code to input text into a text document; code to spell-check the text so as to replace misspelled words in the text with correctly-spelled words; and code to output the document; wherein the code to spell-check comprises code to detect misspelled words in the text, and, for each misspelled word, to determine a list of alternative words for the misspelled word, to rank the list of alternative words based on a context in the text, to select one of the alternative words from the list, and to replace the misspelled word in the text with the selected one of the alternative words.
75. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to translate text from a first language into a second language, the computer-executable process steps comprising: code to input text in the first language; code to spell-check the text in the first language so as to replace misspelled words in the text with correctly-spelled words; code to translate the text from the first language into the second language; and code to output translated text; wherein the code to spell-check step comprises code to detect misspelled words in the text, and, for each misspelled word, to determine a list of alternative words for the misspelled word, to rank the list of alternative words based on a context in the text, to select one of the alternative words from the list, and to replace the misspelled word in the document with the selected one of the alternative words.
76. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to translate text from a first language into a second language, the computer-executable process steps comprising: code to input text in the first language; code to translate the text from the first language into the second language; code to spell-check the text in the second language so as to replace misspelled words in the text with correctly-spelled words; and code to output the text; wherein the code to spell-check comprises code to detect misspelled words in the text, and, for each misspelled word, to determine a list of alternative words for the misspelled word, to rank the list of alternative words based on a context in the text, to select one of the alternative words from the list, and to replace the misspelled word in the document with the selected one of the alternative words.
77. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to perform an optical character recognition method for recognizing input character images, the computer-executable process steps comprising: code to input a document image; code to parse character images from the document image; code to perform recognition processing on parsed character images so as to produce document text; code to spell-check the document text so as to replace misspelled words in the document text with correctly-spelled words; and code to output the document text; wherein the coded to spell-check comprises code to detect misspelled words in the document text, and, for each misspelled word, to determine a list of alternative words for the misspelled word, to rank the list of alternative words based on a context in the text, to select one of the alternative words from the list, and to replace the misspelled word in the document text with the selected one of the alternative words.
78. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to retrieve text from a source, the computer-executable process comprising: code to input a search word; code to correct a spelling of the search word to produce a corrected search word; and code to retrieve text from the source that includes the corrected search word.
79. A computer-readable medium according to Claim 78, wherein the code to correct comprises: code to store one or more lexicon finite state machines ("FSM"), each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word; code to generate an input FSM for the search word, the input FSM including the search word and a phonetic representation of the search word; code to select one or more reference words from the lexicon
FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the search word or to the phonetic representation of the search word; and code to set a selected one of the reference words as the corrected search word.
80. A computer-readable medium according to Claim 79, wherein the code to select comprises code to generate an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and code to select the one or more reference words from the lexicon FSM using the additional FSM.
81. A computer-readable medium according to Claim 79, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and the additional FSM comprises a finite state automaton ("FSA").
82. A computer-readable medium according to Claim 80, wherein the code to select selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the search word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the search word, (iii) the character insertion module determines whether a character inserted in the search word causes at least part of the search word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the search word causes at least part of the search word to match at least part of the reference word, (v) the character replacement module replaces characters in the search word with characters in the reference word in order to determine whether at least part of the search word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the search word and compares a changed character in the search word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the search word which were not compared by the character transposition module in order to determine if at least part of the search word matches at least part of the reference word.
83. A computer-readable medium according to Claim 78, wherein the source comprises a remote network location; and wherein the computer-executable process steps further comprise code to display the retrieved text on a local display screen.
84. A computer-readable medium according to Claim 78, wherein the code to correct comprises code to display one or more corrected search words and code to select of one of the corrected search words in response to a user input.
85. A computer-readable medium according to Claim 78, wherein the code to correct comprises code to select, automatically, one of plural corrected search words.
86. A computer-readable medium according to Claim 78, wherein the source comprises a pre-stored database.
87. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to retrieve text from a source, the computer-executable process steps comprising: code to input a search phrase comprised of a plurality of words, at least one of the plurality of words being an incorrect word; code to replace the incorrect word in the search phrase with a corrected word in order to produce a corrected search phrase; and code to retrieve text from the source based on the corrected search phrase.
88. A computer-readable medium according to Claim 87, further comprising: code to generate a first finite state machine ("FSM") comprised of two or more arcs which include alternatives to the incorrect word, each alternative having a rank associated therewith; code to select, as the corrected word, an alternative having a highest rank.
89. A computer-readable medium according to Claim 88, wherein the code to generate comprises: code to store one or more lexicon FSMs, each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word; code to generate an input FSM for the incorrect word, the input FSM including the incorrect word and a phonetic representation of the incorrect word; code to select two or more reference words from the lexicon FSMs based on the input FSM, the two or more reference words substantially corresponding to either a spelling of the incorrect word or to the phonetic representation of the incorrect word and having a rank associated therewith; and code to store, in a first FSM, the two or more reference words and corresponding ranks associated therewith.
90. A computer-readable medium according to Claim 89, wherein the code to select two or more reference words comprises code to generate an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and code to select the two or more reference words from the lexicon FSMs using the additional FSM.
91. A computer-readable medium according to Claim 90, wherein the first FSM, the lexicon FSMs, and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton ("FSA").
92. A computer-readable medium according to Claim 90, wherein the code to select two or more reference words selects the two or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the incorrect word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the incorrect word, (iii) the character insertion module determines whether a character inserted in the incorrect word causes at least part of the incorrect word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted in the incorrect word causes at least part of the incorrect word to match at least part of the reference word, (v) the character replacement module replaces characters in the incorrect word with characters in the reference word in order to determine whether at least part of the incorrect word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the incorrect word and compares a changed character in the incorrect word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the incorrect word which were not compared by the character transposition module in order to determine if at least part of the incorrect word matches at least part of the reference word.
93. A computer-readable medium according to Claim 87, wherein the source comprises a remote network location; and wherein the computer-executable process steps further comprise code to display the retrieved text on a local display screen.
94. A computer-readable medium according to Claim 87, wherein the code to replace comprises code to display one or more corrected words and code to permit manual selection one of the corrected words as a replacement for the incorrect word.
95. A computer-readable medium according to Claim 87, wherein the code to replace comprises code to select, automatically, one of plural corrected words as a replacement for the incorrect word.
96. A computer-readable medium according to Claim 87, wherein the source comprises a pre-stored database.
97. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to correct misspelled words in input text sequences received from a plurality of different clients, the computer-executable process steps comprising: code to store, in a memory on a server, a lexicon comprised of a plurality of reference words; code to receive the input text sequences from the plurality of different clients; code to spell-check the input text sequences using the reference words in the lexicon; and code to output spell-checked text sequences to the plurality of different clients.
98. A computer-readable medium according to Claim 97, wherein the lexicon comprises one or more lexicon finite state machines ("FSM"), the lexicon FSMs including the plurality of reference words and a phonetic representation each reference word; and wherein the code to spell-check comprises code to correct misspelled words in each of the input text sequences substantially in parallel using the lexicon FSMs stored in the single memory.
99. A computer-readable medium according to Claim 98, wherein, for each text sequence, the code to correct comprises: code to generate an input FSM for a misspelled word in the text sequence, the input FSM including the misspelled word and a phonetic representation of the misspelled word; code to select one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the misspelled word or to the phonetic representation of the misspelled word; and code to replace the misspelled word in the text sequence with a selected one of the one or more reference words.
100. A computer-readable medium according to Claim 99, wherein the code to select comprises: code to generate an additional FSM having a plurality of states for the misspelled word, the states comprising at least states of a lexicon FSM and states of the input FSM; and code to select the one or more reference words from the lexicon FSMs using the additional FSM.
101. A computer-readable medium according to Claim 100, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton ("FSA").
102. A computer-readable medium according to Claim 97, further comprising code to retrieve a document from a source using one of the spell-checked text sequences.
103. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to select a replacement word for an input word in a phrase, the computer-executable process steps comprising: code to determine alternative words for the input word, the alternative words including at least one compound word which is comprised of two or more separate words, each alternative word having a rank associated therewith; and code to select, as the replacement word, an alternative word having a highest rank.
104. A computer-readable medium according to Claim 103, further comprising code to generate a finite state machine ("FSM") comprised of two or more arcs which include the alternatives to the input words.
105. A computer-readable medium according to Claim 103, wherein, in a case that the code to select selects a compound word as the replacement word, the computer-executable process steps further comprise code to replace the input word and at least one other word in the phrase with the compound word.
106 A computer-readable medium according to Claim 103, further comprising code to adjust the rank of each alternative based on a grammatical context of the input word in the phrase.
107. A computer-readable medium according to Claim 106, wherein each word in the phrase has a part of speech associated therewith, and each of the alternative words has a part of speech associated therewith; and wherein code to adjust adjusts the rank of each alternative word based on whether a part of speech of the alternative word fits with a part of speech of at least one word adjacent to the input word.
108. A computer-readable medium according to Claim 107, wherein each compound word in the phrase has a single part of speech associated therewith.
109. A computer-readable medium according to Claim 103, further comprising code to display the alternative words ranked in order; wherein the code to select selects an alternative in response to a user input.
110. A computer-readable medium according to Claim 103, wherein at least one of the alternative words comprises a word having an accent mark and/or a diacritic which is different from, and/or missing from, the input word.
111. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to correct grammatical errors in input text, the computer-executable process steps comprising: code to generate a first finite state machine ("FSM") for the input text, the first finite state machine including alternative words for at least one word in the input text and a rank associated with each alternative word; code to adjust the ranks in the first FSM in accordance with one or more of a plurality of predetermined grammatical rales; code to determine which of the alternative words is grammatically correct based on the ranks associated with the alternative words; and code to replace the at least one word in the input text with a grammatically-correct alternative word determined by the code to deteπnine.
112. A computer-readable medium according to Claim 111, wherein the first FSM also includes one or more parts-of-speech for the alternative words; and wherein the code to determine determines which of the alternative words is grammatically correct based in addition on the parts-of- speech of the alternative words.
113. A computer-readable medium according to Claim 112, wherein the code to adjust comprises: code to generate a second FSM for the input text based on the one or more of a plurality of predetermined grammatical rales, the second FSM including the alternative words and ranks associated with each alternative word; and code to combine the ranks in the second FSM with the ranks in the first FSM.
114. A computer-readable medium according to Claim 112, wherein the code to generate comprises code to perform a moφhological analysis on each word in the input text in order to provide the parts-of- speech and ranks.
115. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to create and edit text documents, the computer-executable process steps comprising: code to input text into a text document; code to check the document for grammatically-incorrect words; code to replace grammatically-incorrect words in the document with grammatically-correct words; and code to output the document; wherein the code to check comprises code (i) to generate a finite state machine ("FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) to adjust the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) to determine which of the alternative words is grammatically correct based on ranks for the alternative words.
116. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to translate text from a first language into a second language, the computer-executable process steps comprising: code to input text in the first language; code to check the text in the first language for grammatically-incorrect words; code to replace grammatically-incorrect words in the text with grammatically-correct words; code to translate the text with the grammatically-correct words from the first language into the second language; and code to output the text in the second language; wherein the code to check comprises code (i) to generate a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) to adjust the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) to determine which of the alternative words is grammatically correct based on ranks for the alternative words.
117. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to translate text from a first language into a second language, the computer-executable process steps comprising: code to input text in the first language; code to translate the text from the first language into the second language; code to check the text in the second language for grammatically-incorrect words; code to replace grammatically-incorrect words in the text with grammatically-correct words; and code to output the text with the grammatically-correct words; wherein the code to check comprises code (i) to generate a finite state machine ("FSM") for the text in the second language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) to adjust the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) to determine which of the alternative words is grammatically correct based on ranks for the alternative words.
118. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps for performing an optical character recognition method which recognizes input character images, the computer-executable process steps comprising: code to input a document image; code to parse character images from the document image; code to perform recognition processing on parsed character images so as to produce document text; code to check the document text for grammatically-incorrect words; code to replace grammatically-incorrect words in the document text with grammatically correct words; and code to output the document text; wherein the code to check comprises code (i) to generate a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) to adjust the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rules, and (iii) to determine which of the alternative words is grammatically correct based on ranks for the alternative words.
119. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to retrieve text from a source, the computer-executable process steps comprising: code to input a search phrase comprised of a plurality of words, at least one of the plurality of words being a grammatically- incorrect word; code to replace the grammatically-incorrect word in the search phrase with a grammatically-correct word in order to produce a corrected search phrase; and code to retrieve text from the source based on the corrected search phrase.
120. A computer-readable medium which stores computer- executable process steps, the computer-executable process steps to spell- check input text, the computer-executable process steps comprising: code to detect a misspelled word in the input text; code to store one or more lexicon finite state machines
("FSM") in a memory, each of the lexicon FSMs including plural reference words; code to generate an input FSM for the misspelled word; code to select one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word; and code to output selected ones of the one or more reference words.
121. A computer-readable medium according to Claim 120, wherein each of the lexicon FSMs also includes a phonetic representation of each reference word; wherein the input FSM also includes a phonetic representation of the misspelled word; and wherein the code to select selects reference words from the lexicon FSMs which also substantially correspond to the phonetic representation of the misspelled word.
122. A computer-readable medium according to Claim 121, wherein the code to select comprises code to generate an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and code to select the one or more reference words from the lexicon FSMs using the additional FSM.
123. A computer-readable medium according to Claim 122, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton ("FSA").
124. A computer-readable medium according to Claim 123, wherein the code to select selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
125. An apparatus for correcting misspelled words in input text, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the computer-executable process steps so as to detect a misspelled word in the input text, to determine a list of alternative words for the misspelled word, and to rank the list of alternative words based on a context of the input text.
126. An apparatus according to Claim 125, wherein the processor further executes process steps to select one of the alternative words from the list, and to replace the misspelled word in the text with the selected one of the alternative words.
127. An apparatus according to Claim 125, wherein the processor detects the misspelled word by comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word either (i) does not match any words in the dictionary database, or (ii) is spelled correctly but corresponds to one of a plurality of words which are substantially similar.
128. An apparatus according to Claim 125, wherein the processor determines the list of alternative words by (i) storing one or more lexicon finite state machines ("FSM"), each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word, (ii) generating an input FSM for the misspelled word, the input FSM including the misspelled word and a phonetic representation of the misspelled word, (iii) selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the misspelled word or to the phonetic representation of the misspelled word, and (iv) adding selected ones of the one or more reference words to the list of alternative words.
129. An apparatus according to Claim 128, wherein the processor selects the one or more reference words by (i) generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and (ii) selecting the one or more reference words from the lexicon FSM using the additional
FSM.
130. An apparatus according to Claim 129, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and the additional FSM comprises a finite state automaton ("FSA").
131. An apparatus according to Claim 129, wherein the processor selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
132. An apparatus according to Claim 125, wherein the processor ranks the list of alternative words by (i) generating a first finite state machine ("FSM") for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of the misspelled word, (ii) generating a second FSM for the input text and the alternative words in accordance with one or more of a plurality of predetermined grammatical rales, the second FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken in grammatical context, comprises a correctly-spelled version of the misspelled word, and (iii) adding corresponding weights of the first FSM and the second FSM and ranking the alternative words in accordance with the weights of the alternative words.
133. An apparatus according to Claim 126, wherein the processor selects one of the alterative words by executing process steps to display the list of alternative words and in response to manual selection of one of the displayed alternative words.
134. An apparatus according to Claim 126, wherein the processor selects the one of the alternative words automatically.
135. An apparatus according to Claim 134, wherein the processor selects a one of the alternative words that is ranked highest in the list of alternative words.
136. An apparatus for creating and editing text documents, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input text into a text document, (ii) to spell-check the text so as to replace misspelled words in the text with correctly-spelled words, and (iii) to output the document; wherein the processor spell-checks the text by detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the text with the selected one of the alternative words.
137. An apparats for translating text from a first language into a second language, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input text in the first language, (ii) to spell-check the text in the first language so as to replace misspelled words in the text with correctly-spelled words, (iii) to translate the text from the first language into the second language, and (iv) to output translated text; wherein the processor spell-checks the text in the first language by detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
138. An apparatus for translating text from a first language into a second language, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input text in the first language, (ii) to translate the text from the first language into the second language, (iii) to spell-check the text in the second language so as to replace misspelled words in the text with correctly-spelled words, and (iv) to output text; wherein the processor spell-checks the text by detecting misspelled words in the text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document with the selected one of the alternative words.
139. An apparatus for performing an optical character recognition method to recognize input character images, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input a document image, (ii) to parse character images from the document image, (iii) to perform recognition processing on parsed character images so as to produce document text, (iv) to spell-check the document text so as to replace misspelled words in the document text with correctly-spelled words, and (v) to output the document text; wherein the processor spell-checks the document text by detecting misspelled words in the document text, and, for each misspelled word, determining a list of alternative words for the misspelled word, ranking the list of alternative words based on a context in the text, selecting one of the alternative words from the list, and replacing the misspelled word in the document text with the selected one of the alternative words.
140. An apparatus for retrieving text from a source, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input a search word, (ii) to correct a spelling of the search word to produce a corrected search word, and (iii) to retrieve text from the source that includes the corrected search word.
141. An apparatus according to Claim 140, wherein the processor corrects the spelling of the search word by (i) storing one or more lexicon finite state machines ("FSM"), each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word, (ii) generating an input FSM for the search word, the input FSM including the search word and a phonetic representation of the search word, (iii) selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the search word or to the phonetic representation of the search word, and (iv) setting a selected one of the reference words as the corrected search word.
142. An apparatus according to Claim 141, wherein the processor selects the one or more reference words by generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the one or more reference words from the lexicon FSM using the additional FSM.
143. An apparatus according to Claim 141, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and the additional FSM comprises a finite state automaton ("FSA").
144. An apparatus according to Claim 142, wherein the processor selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the search word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the search word, (iii) the character insertion module determines whether a character inserted in the search word causes at least part of the search word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the search word causes at least part of the search word to match at least part of the reference word, (v) the character replacement module replaces characters in the search word with characters in the reference word in order to determine whether at least part of the search word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the search word and compares a changed character in the search word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the search word which were not compared by the character transposition module in order to determine if at least part of the search word matches at least part of the reference word.
145. An apparatus according to Claim 140, wherein the source comprises a remote network location; and wherein the processor further executes process steps to display the retrieved text on a local display screen.
146. An apparatus according to Claim 140, wherein the processor corrects the spelling of the search word by displaying one or more corrected search words and selecting one of the corrected search words in response to a user input.
147. An apparatus according to Claim 140, wherein the processor corrects the spelling of the search word by automatically selecting one of plural corrected search words.
148. An apparatus according to Claim 140, wherein the source comprises a pre-stored database.
149. An apparatus for retrieving text from a source, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input a search phrase comprised of a plurality of words, at least one of the plurality of words being an incorrect word, (ii) to replace the incorrect word in the search phrase with a corrected word in order to produce a corrected search phrase, and (iii) to retrieve text from the source based on the corrected search phrase.
150. An apparatus according to Claim 149, wherein the processor, between inputting the search phrase and replacing the incorrect word, executes computer-executable process step so as (i) to generate a first finite state machine ("FSM") comprised of two or more arcs which include alternatives to the incorrect word, each alternative having a rank associated therewith, and (ii) to select, as the corrected word, an alternative having a highest rank.
151. An apparatus according to Claim 150, wherein the processor generates the first FSM by (i) storing one or more lexicon FSMs, each of the lexicon FSMs including plural reference words and a phonetic representation of each reference word, (ii) generating an input FSM for the incorrect word, the input FSM including the incorrect word and a phonetic representation of the incorrect word, (iii) selecting two or more reference words from the lexicon FSMs based on the input FSM, the two or more reference words substantially corresponding to either a spelling of the incorrect word or to the phonetic representation of the incorrect word and having a rank associated therewith, and (iv) storing, in a first FSM, the two or more reference words and corresponding ranks associated therewith.
152. An apparatus according to Claim 151, wherein the processor selects the two or more reference words by generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the two or more reference words from the lexicon FSMs using the additional FSM.
153. An apparatus according to Claim 152, wherein the first FSM, the lexicon FSMs, and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton ("FSA").
154. An apparatus according to Claim 152, wherein the processor selects the two or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the incorrect word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the incorrect word, (iii) the character insertion module determines whether a character inserted in the incorrect word causes at least part of the incorrect word to match at least part of the reference word, (iv) the character deletion module determines whether a
10- character deleted in the incorrect word causes at least part of the incorrect word to match at least part of the reference word, (v) the character replacement module replaces characters in the incorrect word with characters in the reference word in order to determine whether at least part of the incorrect word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the incorrect word and compares a changed character in the incorrect word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the incorrect word which were not compared by the character transposition module in order to determine if at least part of the incorrect word matches at least part of the reference word.
155. An apparatus according to Claim 149, wherein the source comprises a remote network location; and wherein the processor further executes process steps to display the retrieved text on a local display screen.
156. An apparatus according to Claim 149, wherein the processor replaces the incorrect word by displaying one or more corrected words and replacing the incorrect word with a user-selected one of the one or more corrected words.
157. An apparatus according to Claim 149, wherein the processor replaces the incorrect word automatically by selecting one of plural corrected words as a replacement for the incorrect word.
158. An apparatus according to Claim 149, wherein the source comprises a pre-stored database.
159. An apparatus for correcting misspelled words in input text sequences received from a plurality of different clients, the apparatus comprising: a server which includes a memory to store computer- executable process steps; and a processor which executes the process steps stored in the memory so as (i) to store, in a memory on the server, a lexicon comprised of a plurality of reference words, (ii) to receive the input text sequences from the plurality of different clients, (iii) to spell-check the input text sequences using the reference words in the lexicon, and (iv) to output spell- checked text sequences to the plurality of different clients.
160. An apparatus according to Claim 159, wherein the lexicon comprises one or more lexicon finite state machines ("FSM"), the lexicon FSMs including the plurality of reference words and a phonetic representation each reference word; and wherein the processor spell-checks the input text sequences by correcting misspelled words in each of the input text sequences substantially in parallel using the lexicon FSMs stored in the memory.
161. An apparatus according to Claim 160, wherein, for each text sequence, the processor corrects the misspelled words by (i) generating an input FSM for a misspelled word in the text sequence, the input FSM including the misspelled word and a phonetic representation of the misspelled word, (ii) selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to either a spelling of the misspelled word or to the phonetic representation of the misspelled word, and (iii) replacing the misspelled word in the text sequence with a selected one of the one or more reference words.
162. An apparatus according to Claim 161, wherein the processor selects the one or more reference words by (i) generating an additional FSM having a plurality of states for the misspelled word, the states comprising at least states of a lexicon FSM and states of the input FSM, and (ii) selecting the one or more reference words from the lexicon FSMs using the additional FSM.
163. An apparatus according to Claim 162, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton
("FSA").
164. An apparatus according to Claim 159, wherein the processor executes process steps to retrieve a document from a source using one of the spell-checked text sequences.
165. An apparatus for selecting a replacement word for an input word in a phrase, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to determine alternative words for the input word, the alternative words including at least one compound word which is comprised of two or more separate words, each alternative word having a rank associated therewith, and (ii) to select, as the replacement word, an alternative word having a highest rank.
166. An apparatus according to Claim 165, wherein the processor, between determining and selecting, further executes process steps to generate a finite state machine ("FSM") comprised of two or more arcs which include the alternatives to the input words.
167. An apparatus according to Claim 165, wherein, in a case that the processor selects a compound word as the replacement word, the processor further executes process steps to replace the input word and at least one other word in the phrase with the compound word.
168. An apparatus according to Claim 165, wherein, between determining and selecting, the processor executes process steps to adjust the rank of each alternative based on a grammatical context of the input word in the phrase.
169. An apparatus according to Claim 168, wherein each word in the phrase has a part of speech associated therewith, and each of the alternative words has a part of speech associated therewith; and wherein the processor adjusts the rank of each alternative word based on whether a part of speech of the alternative word fits with a part of speech of at least one word adjacent to the input word.
170. An apparatus according to Claim 169, wherein each compound word in the phrase has a single part of speech associated therewith.
171. An apparatus according to Claim 165, wherein the processor, between determining and selecting, executes process steps to display the alternative words ranked in order; and wherein the selecting step is performed by the processor in response to a user input.
172. An apparatus according to Claim 165, wherein at least one of the alternative words comprises a word having an accent mark and/or a diacritic which is different from, and/or missing from, the input word.
173. An apparatus for correcting grammatical errors in input text, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to generate a first finite state machine ("FSM") for the input text, the first finite state machine including alternative words for at least one word in the input text and a rank associated with each alternative word, (ii) to adjust the ranks in the first FSM in accordance with one or more of a plurality of predetermined grammatical rales, (iii) to determine which of the alternative words is grammatically correct based on the ranks associated with the alternative words, and (iv) to replace the at least one word in the input text with a grammatically-correct alternative word determined in the determining step.
174. An apparatus according to Claim 173, wherein the first FSM also includes one or more parts-of-speech for the alternative words; and wherein the processor determines which of the alternative words is grammatically correct based in addition on the parts-of-speech of the alternative words.
175. An apparatus according to Claim 174, wherein the processor adjusts the ranks of the first FSM by (i) generating a second FSM for the input text based on the one or more of a plurality of predetermined grammatical rales, the second FSM including the alternative words and ranks associated with each alternative word, and (ii) combining the ranks in the second FSM with the ranks in the first FSM.
176. An apparatus according to Claim 175, wherein the processor generates the second FSM by performing a moφhological analysis on each word in the input text in order to provide the parts-of- speech and ranks.
177. An apparatus for creating and editing text documents, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input text into a text document, (ii) to check the document for grammatically-incorrect words, (iii) to replace grammatically- incorrect words in the document with grammatically-correct words, and (iv) to output the document; wherein the processor checks the document for grammatically-incorrect words by (i) generating a finite state machine
("FSM") for text in the text document, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
178. An apparatus for translating text from a first language into a second language, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input text in the first language, (ii) to check the text in the first language for grammatically-incorrect words, (iii) to replace grammatically-incorrect words in the text with grammatically-correct words, (iv) to translate the text with the grammatically-correct words from the first language into the second language, and (v) to output the text in the second language; wherein the processor checks the text in the first language by (i) generating a finite state machine ("FSM") for the text in the first language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
179. An apparatus for translating text from a first language into a second language, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input text in the first language, (ii) to translate the text from the first language into the second language, (iii) to check the text in the second language for grammatically-incorrect words, (iv) to replace grammatically-incorrect words in the text with grammatically-correct words, and (v) to output the text with the grammatically-correct words; wherein the processor checks the text in the second language by (i) generating a finite state machine ("FSM") for the text in the second language, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
180. An apparatus for performing an optical character recognition method to recognize input character images, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input a document image, (ii) to parse images from the document image, (iii) to perform recognition processing on parsed character images so as to produce document text, (iv) to check the document text for grammatically-incorrect words, (v) to replace grammatically-incorrect words in the document text with grammatically correct words, and (vi) to output the document text; wherein the processor checks the document text by (i) generating a finite state machine ("FSM") for the document text, the finite state machine including alternative words for at least one word in the text and a rank associated with each alternative word, (ii) adjusting the ranks in the FSM in accordance with one or more of a plurality of predetermined grammatical rales, and (iii) determining which of the alternative words is grammatically correct based on ranks for the alternative words.
181. An apparatus for retrieving text from a source, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to input a search phrase comprised of a plurality of words, at least one of the plurality of words being a grammatically- incorrect word, (ii) to replace the grammatically-incorrect word in the search phrase with a grammatically-correct word in order to produce a corrected search phrase, and (iii) to retrieve text from the source based on the corrected search phrase.
182. An apparatus for spell-checking input text, the apparatus comprising: a memory which stores computer-executable process steps; and a processor which executes the process steps stored in the memory so as (i) to detect a misspelled word in the input text, (ii) to store one or more lexicon finite state machines ("FSM") in the memory, each of the lexicon FSMs including plural reference words, (iii) to generate an input FSM for the misspelled word, (iv) to select one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word, and (v) to output selected ones of the one or more reference words.
183. An apparatus according to Claim 182, wherein each of the lexicon FSMs also includes a phonetic representation of each reference word; wherein the input FSM also includes a phonetic representation of the misspelled word; and wherein the processor executes process steps to select reference words from the lexicon FSMs which also substantially correspond to the phonetic representation of the misspelled word.
184. An apparatus according to Claim 183, wherein the processor selects reference words by generating an additional FSM having a plurality of states, the states comprising at least states of a lexicon FSM and states of the input FSM, and selecting the one or more reference words from the lexicon FSMs using the additional FSM.
185. An apparatus according to Claim 184, wherein the lexicon FSMs and the input FSM comprise finite state transducers ("FST"), and wherein the additional FSM comprises a finite state automaton
("FSA").
186. An apparatus according to Claim 185, wherein the processor selects the one or more reference words by applying each state of the additional FSM to one or more of (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module; and wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
187. An apparatus for correcting misspelled words in input text, the apparatus comprising: detecting means for detecting a misspelled word in the input text; determining means for determining a list of alternative words for the misspelled word; and ranking means for ranking the list of alternative words based on a context of the input text.
188. An apparatus for retrieving text from a source, the apparatus comprising: inputting means for inputting a search word; correcting means for correcting a spelling of the search word to produce a corrected search word; and retrieving means for retrieving text from the source that includes the corrected search word.
189. An apparatus for retrieving text from a source, the apparatus comprising: inputting means for inputting a search phrase comprised of a plurality of words, at least one of the plurality of words being an incorrect word; replacing means for replacing the incorrect word in the search phrase with a corrected word in order to produce a corrected search phrase; and retrieving means for retrieving text from the source based on the corrected search phrase.
190. An apparams for correcting misspelled words in input text sequences received from a plurality of different clients, the apparams comprising: storing means for storing, in a memory on a server, a lexicon comprised of a plurality of reference words; receiving means for receiving the input text sequences from the plurality of different clients; spell-checking means for spell-checking the input text sequences using the reference words in the lexicon; and outputting means for outputting spell-checked text sequences to the plurality of different clients.
191. An apparatus for selecting a replacement word for an input word in a phrase, the apparams comprising: determining means for determining alternative words for the input word, the alternative words including at least one compound word which is comprised of two or more separate words, each alternative word having a rank associated therewith; and selecting means for selecting, as the replacement word, an alternative word having a highest rank.
192. An apparams for correcting grammatical errors in input text, the apparams comprising: generating means for generating a first finite state machine ("FSM") for the input text, the first finite state machine including alternative words for at least one word in the input text and a rank associated with each alternative word; adjusting means for adjusting the ranks in the first FSM in accordance with one or more of a plurality of predetermined grammatical rales; determining means for determining which of the alternative words is grammatically correct based on the ranks associated with the alternative words; and replacing means for replacing the at least one word in the input text with a grammatically-correct alternative word determined by the determining means.
193. An apparams for retrieving text from a source, the apparams comprising: inputting means for inputting a search phrase comprised of a plurality of words, at least one of the plurality of words being a grammatically-incorrect word; replacing means for replacing the grammatically-incorrect word in the search phrase with a grammatically-correct word in order to produce a corrected search phrase; and retrieving means for retrieving text from the source based on the corrected search phrase.
194. An apparatus for spell-checking input text, the apparatus comprising: detecting means for detecting a misspelled word in the input text; storing means for storing one or more lexicon finite state machines ("FSM") in a memory, each of the lexicon FSMs including plural reference words; generating means for generating an input FSM for the misspelled word; selecting means for selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word; and outputting means for outputting selected ones of the one or more reference words.
PCT/US1999/011713 1998-05-26 1999-05-26 Spelling and grammar checking system WO1999062000A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU41003/99A AU4100399A (en) 1998-05-26 1999-05-26 Spelling and grammar checking system
EP99924524A EP1145141A3 (en) 1998-05-26 1999-05-26 Spelling and grammar checking system
CA002333402A CA2333402A1 (en) 1998-05-26 1999-05-26 Spelling and grammar checking system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/084,535 1998-05-26
US09/084,535 US6424983B1 (en) 1998-05-26 1998-05-26 Spelling and grammar checking system

Publications (3)

Publication Number Publication Date
WO1999062000A2 true WO1999062000A2 (en) 1999-12-02
WO1999062000A3 WO1999062000A3 (en) 2001-06-07
WO1999062000A8 WO1999062000A8 (en) 2001-11-15

Family

ID=22185571

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/011713 WO1999062000A2 (en) 1998-05-26 1999-05-26 Spelling and grammar checking system

Country Status (5)

Country Link
US (3) US6424983B1 (en)
EP (1) EP1145141A3 (en)
AU (1) AU4100399A (en)
CA (1) CA2333402A1 (en)
WO (1) WO1999062000A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1244260A2 (en) * 2001-03-21 2002-09-25 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
EP1277135A1 (en) * 2000-04-25 2003-01-22 Microsoft Corporation Language model sharing
WO2006115598A2 (en) 2005-04-25 2006-11-02 Microsoft Corporation Method and system for generating spelling suggestions
WO2007094684A2 (en) * 2006-02-17 2007-08-23 Lumex As Method and system for verification of uncertainly recognized words in an ocr system
EP1953622A1 (en) * 2007-02-02 2008-08-06 Research In Motion Limited Handeld electronics device including predictive accent mechanism, and associated method
US7831911B2 (en) 2006-03-08 2010-11-09 Microsoft Corporation Spell checking system including a phonetic speller
US7831423B2 (en) 2006-05-25 2010-11-09 Multimodal Technologies, Inc. Replacing text representing a concept with an alternate written form of the concept
WO2013032617A1 (en) * 2011-09-01 2013-03-07 Google Inc. Server-based spell checking

Families Citing this family (376)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL123129A (en) * 1998-01-30 2010-12-30 Aviv Refuah Www addressing
US6144958A (en) * 1998-07-15 2000-11-07 Amazon.Com, Inc. System and method for correcting spelling errors in search queries
US7293231B1 (en) * 1999-03-18 2007-11-06 British Columbia Ltd. Data entry for personal computing devices
US6618697B1 (en) * 1999-05-14 2003-09-09 Justsystem Corporation Method for rule-based correction of spelling and grammar errors
US7750891B2 (en) 2003-04-09 2010-07-06 Tegic Communications, Inc. Selective input system based on tracking of motion parameters of an input device
US7821503B2 (en) 2003-04-09 2010-10-26 Tegic Communications, Inc. Touch screen and graphical user interface
US7030863B2 (en) * 2000-05-26 2006-04-18 America Online, Incorporated Virtual keyboard system with automatic correction
US7286115B2 (en) * 2000-05-26 2007-10-23 Tegic Communications, Inc. Directional input system with automatic correction
CN1176432C (en) * 1999-07-28 2004-11-17 国际商业机器公司 Method and system for providing national language inquiry service
US6742164B1 (en) * 1999-09-01 2004-05-25 International Business Machines Corporation Method, system, and program for generating a deterministic table to determine boundaries between characters
EP1093058A1 (en) * 1999-09-28 2001-04-18 Cloanto Corporation Method and apparatus for processing text and character data
US8726148B1 (en) 1999-09-28 2014-05-13 Cloanto Corporation Method and apparatus for processing text and character data
US6789231B1 (en) * 1999-10-05 2004-09-07 Microsoft Corporation Method and system for providing alternatives for text derived from stochastic input sources
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US6760636B2 (en) * 2000-04-03 2004-07-06 Xerox Corporation Method and apparatus for extracting short runs of ambiguity from finite state transducers
US6965858B2 (en) * 2000-04-03 2005-11-15 Xerox Corporation Method and apparatus for reducing the intermediate alphabet occurring between cascaded finite state transducers
US7325201B2 (en) * 2000-05-18 2008-01-29 Endeca Technologies, Inc. System and method for manipulating content in a hierarchical data-driven search and navigation system
US7062483B2 (en) * 2000-05-18 2006-06-13 Endeca Technologies, Inc. Hierarchical data-driven search and navigation system and method for information retrieval
US7035864B1 (en) 2000-05-18 2006-04-25 Endeca Technologies, Inc. Hierarchical data-driven navigation system and method for information retrieval
US7617184B2 (en) * 2000-05-18 2009-11-10 Endeca Technologies, Inc. Scalable hierarchical data-driven navigation system and method for information retrieval
US6889361B1 (en) * 2000-06-13 2005-05-03 International Business Machines Corporation Educational spell checker
US7149970B1 (en) * 2000-06-23 2006-12-12 Microsoft Corporation Method and system for filtering and selecting from a candidate list generated by a stochastic input method
US8396859B2 (en) * 2000-06-26 2013-03-12 Oracle International Corporation Subject matter context search engine
DE10124429B4 (en) * 2000-07-07 2008-11-27 International Business Machines Corp. System and method for improved spell checking
CA2323856A1 (en) * 2000-10-18 2002-04-18 602531 British Columbia Ltd. Method, system and media for entering data in a personal computing device
US7136808B2 (en) * 2000-10-20 2006-11-14 Microsoft Corporation Detection and correction of errors in german grammatical case
US20020078106A1 (en) * 2000-12-18 2002-06-20 Carew David John Method and apparatus to spell check displayable text in computer source code
US7254773B2 (en) * 2000-12-29 2007-08-07 International Business Machines Corporation Automated spell analysis
US20020087604A1 (en) * 2001-01-04 2002-07-04 International Business Machines Corporation Method and system for intelligent spellchecking
US7072837B2 (en) * 2001-03-16 2006-07-04 International Business Machines Corporation Method for processing initially recognized speech in a speech recognition session
EP1246077A1 (en) * 2001-03-26 2002-10-02 LION Bioscience AG Method and apparatus for structuring and searching sets of signals
US20020194229A1 (en) * 2001-06-15 2002-12-19 Decime Jerry B. Network-based spell checker
US7003444B2 (en) * 2001-07-12 2006-02-21 Microsoft Corporation Method and apparatus for improved grammar checking using a stochastic parser
US7095513B2 (en) * 2001-10-11 2006-08-22 Hewlett-Packard Development Company, L.P. Method and apparatus for language translation of production job output
US6779934B2 (en) * 2001-10-25 2004-08-24 Hewlett-Packard Development Company, L.P. Printer having a spell checking feature
US20030120630A1 (en) * 2001-12-20 2003-06-26 Daniel Tunkelang Method and system for similarity search and clustering
EP1331630A3 (en) * 2002-01-07 2006-12-20 AT&T Corp. Systems and methods for generating weighted finite-state automata representing grammars
JPWO2003065245A1 (en) * 2002-01-29 2005-05-26 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation Translation method, translation output method, storage medium, program, and computer apparatus
JP2003223437A (en) * 2002-01-29 2003-08-08 Internatl Business Mach Corp <Ibm> Method of displaying candidate for correct word, method of checking spelling, computer device, and program
US7194684B1 (en) * 2002-04-09 2007-03-20 Google Inc. Method of spell-checking search queries
US20030210249A1 (en) * 2002-05-08 2003-11-13 Simske Steven J. System and method of automatic data checking and correction
US7548847B2 (en) * 2002-05-10 2009-06-16 Microsoft Corporation System for automatically annotating training data for a natural language understanding system
US20030214523A1 (en) * 2002-05-16 2003-11-20 Kuansan Wang Method and apparatus for decoding ambiguous input using anti-entities
US7607066B2 (en) * 2002-06-26 2009-10-20 Microsoft Corporation Auto suggestion of coding error correction
US20040002849A1 (en) * 2002-06-28 2004-01-01 Ming Zhou System and method for automatic retrieval of example sentences based upon weighted editing distance
MXPA05005100A (en) * 2002-11-14 2005-12-14 Educational Testing Service Automated evaluation of overly repetitive word use in an essay.
US20050038781A1 (en) * 2002-12-12 2005-02-17 Endeca Technologies, Inc. Method and system for interpreting multiple-term queries
US20040117366A1 (en) * 2002-12-12 2004-06-17 Ferrari Adam J. Method and system for interpreting multiple-term queries
GB0228942D0 (en) * 2002-12-12 2003-01-15 Ibm Linguistic dictionary and method for production thereof
JP4001283B2 (en) * 2003-02-12 2007-10-31 インターナショナル・ビジネス・マシーンズ・コーポレーション Morphological analyzer and natural language processor
US20040193399A1 (en) * 2003-03-31 2004-09-30 Microsoft Corporation System and method for word analysis
US7516404B1 (en) * 2003-06-02 2009-04-07 Colby Steven M Text correction
US7373102B2 (en) * 2003-08-11 2008-05-13 Educational Testing Service Cooccurrence and constructions
US7657832B1 (en) * 2003-09-18 2010-02-02 Adobe Systems Incorporated Correcting validation errors in structured documents
US7421386B2 (en) * 2003-10-23 2008-09-02 Microsoft Corporation Full-form lexicon with tagged data and methods of constructing and using the same
US7447627B2 (en) * 2003-10-23 2008-11-04 Microsoft Corporation Compound word breaker and spell checker
US6973332B2 (en) * 2003-10-24 2005-12-06 Motorola, Inc. Apparatus and method for forming compound words
US7376752B1 (en) 2003-10-28 2008-05-20 David Chudnovsky Method to resolve an incorrectly entered uniform resource locator (URL)
US7412385B2 (en) * 2003-11-12 2008-08-12 Microsoft Corporation System for identifying paraphrases using machine translation
US7717712B2 (en) * 2003-12-19 2010-05-18 Xerox Corporation Method and apparatus for language learning via controlled text authoring
US20050149499A1 (en) * 2003-12-30 2005-07-07 Google Inc., A Delaware Corporation Systems and methods for improving search quality
US7254774B2 (en) * 2004-03-16 2007-08-07 Microsoft Corporation Systems and methods for improved spell checking
US7779354B2 (en) 2004-05-13 2010-08-17 International Business Machines Corporation Method and data processing system for recognizing and correcting dyslexia-related spelling errors
US8321786B2 (en) * 2004-06-17 2012-11-27 Apple Inc. Routine and interface for correcting electronic text
GB0413743D0 (en) 2004-06-19 2004-07-21 Ibm Method and system for approximate string matching
US7664748B2 (en) * 2004-07-12 2010-02-16 John Eric Harrity Systems and methods for changing symbol sequences in documents
US7207004B1 (en) * 2004-07-23 2007-04-17 Harrity Paul A Correction of misspelled words
KR100636181B1 (en) * 2004-10-01 2006-10-19 삼성전자주식회사 Method and apparatus for inserting scanning document
US7966310B2 (en) * 2004-11-24 2011-06-21 At&T Intellectual Property I, L.P. Method, system, and software for correcting uniform resource locators
US7778821B2 (en) * 2004-11-24 2010-08-17 Microsoft Corporation Controlled manipulation of characters
FR2878991A1 (en) * 2004-12-08 2006-06-09 France Telecom Phonetizer e.g. stochastic phonetizer, constructing method for computing system, involves storing probabilities of node output transitions in database, and combining determined transitions and automaton for constructing phonetizer
EP1669886A1 (en) * 2004-12-08 2006-06-14 France Telecom Construction of an automaton compiling grapheme/phoneme transcription rules for a phonetiser
US8552984B2 (en) * 2005-01-13 2013-10-08 602531 British Columbia Ltd. Method, system, apparatus and computer-readable media for directing input associated with keyboard-type device
US20060190447A1 (en) * 2005-02-22 2006-08-24 Microsoft Corporation Query spelling correction method and system
US7461059B2 (en) * 2005-02-23 2008-12-02 Microsoft Corporation Dynamically updated search results based upon continuously-evolving search query that is based at least in part upon phrase suggestion, search engine uses previous result sets performing additional search tasks
JP2006276903A (en) * 2005-03-25 2006-10-12 Fuji Xerox Co Ltd Document processing device
US20060241932A1 (en) * 2005-04-20 2006-10-26 Carman Ron C Translation previewer and validator
US10699593B1 (en) * 2005-06-08 2020-06-30 Pearson Education, Inc. Performance support integration with E-learning system
US7711551B2 (en) * 2005-06-13 2010-05-04 Microsoft Corporation Static analysis to identify defects in grammars
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US20070074254A1 (en) * 2005-09-27 2007-03-29 Microsoft Corporation Locating content in a television environment
US7908132B2 (en) * 2005-09-29 2011-03-15 Microsoft Corporation Writing assistance using machine translation techniques
US8019752B2 (en) 2005-11-10 2011-09-13 Endeca Technologies, Inc. System and method for information retrieval from object collections with complex interrelationships
KR100735559B1 (en) * 2005-11-18 2007-07-04 삼성전자주식회사 Apparatus and method for constructing language model
US8006180B2 (en) * 2006-01-10 2011-08-23 Mircrosoft Corporation Spell checking in network browser based applications
US8660244B2 (en) * 2006-02-17 2014-02-25 Microsoft Corporation Machine translation instant messaging applications
US8752062B2 (en) * 2006-03-17 2014-06-10 Verint Americas Inc. Monitoring of computer events and steps linked by dependency relationships to generate completed processes data and determining the completed processed data meet trigger criteria
US7991608B2 (en) * 2006-04-19 2011-08-02 Raytheon Company Multilingual data querying
US7853555B2 (en) * 2006-04-19 2010-12-14 Raytheon Company Enhancing multilingual data querying
JP2009537038A (en) 2006-05-07 2009-10-22 バーコード リミティド System and method for improving quality control in a product logistic chain
US7562811B2 (en) 2007-01-18 2009-07-21 Varcode Ltd. System and method for improved quality management in a product logistic chain
US20070265831A1 (en) * 2006-05-09 2007-11-15 Itai Dinur System-Level Correction Service
US7818332B2 (en) * 2006-08-16 2010-10-19 Microsoft Corporation Query speller
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US7624075B2 (en) * 2006-09-15 2009-11-24 Microsoft Corporation Transformation of modular finite state transducers
US7627541B2 (en) * 2006-09-15 2009-12-01 Microsoft Corporation Transformation of modular finite state transducers
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US7590626B2 (en) 2006-10-30 2009-09-15 Microsoft Corporation Distributional similarity-based models for query correction
US8035534B2 (en) 2006-11-10 2011-10-11 Research In Motion Limited Method for automatically preferring a diacritical version of a linguistic element on a handheld electronic device based on linguistic source and associated apparatus
US8676802B2 (en) 2006-11-30 2014-03-18 Oracle Otc Subsidiary Llc Method and system for information retrieval with clustering
US20080155399A1 (en) * 2006-12-20 2008-06-26 Yahoo! Inc. System and method for indexing a document that includes a misspelled word
US9275036B2 (en) 2006-12-21 2016-03-01 International Business Machines Corporation System and method for adaptive spell checking
US20080167876A1 (en) * 2007-01-04 2008-07-10 International Business Machines Corporation Methods and computer program products for providing paraphrasing in a text-to-speech system
US8225203B2 (en) 2007-02-01 2012-07-17 Nuance Communications, Inc. Spell-check for a keyboard system with automatic correction
US8201087B2 (en) 2007-02-01 2012-06-12 Tegic Communications, Inc. Spell-check for a keyboard system with automatic correction
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US7991609B2 (en) * 2007-02-28 2011-08-02 Microsoft Corporation Web-based proofing and usage guidance
CA2581824A1 (en) * 2007-03-14 2008-09-14 602531 British Columbia Ltd. System, apparatus and method for data entry using multi-function keys
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
EP2156369B1 (en) 2007-05-06 2015-09-02 Varcode Ltd. A system and method for quality management utilizing barcode indicators
US9037613B2 (en) * 2007-05-23 2015-05-19 Oracle International Corporation Self-learning data lenses for conversion of information from a source form to a target form
US9043367B2 (en) * 2007-05-23 2015-05-26 Oracle International Corporation Self-learning data lenses for conversion of information from a first form to a second form
US8812296B2 (en) 2007-06-27 2014-08-19 Abbyy Infopoisk Llc Method and system for natural language dictionary generation
US8082274B2 (en) * 2007-06-28 2011-12-20 Microsoft Corporation Scheduling application allowing freeform data entry
CA2694327A1 (en) 2007-08-01 2009-02-05 Ginger Software, Inc. Automatic context sensitive language correction and enhancement using an internet corpus
US8103498B2 (en) * 2007-08-10 2012-01-24 Microsoft Corporation Progressive display rendering of processed text
US20090055731A1 (en) * 2007-08-24 2009-02-26 Joyce Etta Knowles Homonym words dictionary
US7949516B2 (en) * 2007-08-31 2011-05-24 Research In Motion Limited Handheld electronic device and method employing logical proximity of characters in spell checking
WO2009040790A2 (en) * 2007-09-24 2009-04-02 Robert Iakobashvili Method and system for spell checking
US20090089057A1 (en) * 2007-10-02 2009-04-02 International Business Machines Corporation Spoken language grammar improvement tool and method of use
US7856434B2 (en) 2007-11-12 2010-12-21 Endeca Technologies, Inc. System and method for filtering rules for manipulating search results in a hierarchical search and navigation system
EP2218042B1 (en) 2007-11-14 2020-01-01 Varcode Ltd. A system and method for quality management utilizing barcode indicators
US8209164B2 (en) * 2007-11-21 2012-06-26 University Of Washington Use of lexical translations for facilitating searches
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US8176419B2 (en) * 2007-12-19 2012-05-08 Microsoft Corporation Self learning contextual spell corrector
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US8131714B2 (en) * 2008-01-02 2012-03-06 Think Village-OIP, LLC Linguistic assistance systems and methods
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
WO2009144701A1 (en) * 2008-04-16 2009-12-03 Ginger Software, Inc. A system for teaching writing based on a user's past writing
US20100275118A1 (en) * 2008-04-22 2010-10-28 Robert Iakobashvili Method and system for user-interactive iterative spell checking
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US20090300487A1 (en) * 2008-05-27 2009-12-03 International Business Machines Corporation Difference only document segment quality checker
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8464150B2 (en) 2008-06-07 2013-06-11 Apple Inc. Automatic language identification for dynamic text processing
US11704526B2 (en) 2008-06-10 2023-07-18 Varcode Ltd. Barcoded indicators for quality management
US20090319258A1 (en) * 2008-06-24 2009-12-24 Shaer Steven J Method and system for spell checking in two or more languages
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US9317589B2 (en) * 2008-08-07 2016-04-19 International Business Machines Corporation Semantic search by means of word sense disambiguation using a lexicon
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US20100138402A1 (en) * 2008-12-02 2010-06-03 Chacha Search, Inc. Method and system for improving utilization of human searchers
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US8160911B2 (en) * 2009-05-19 2012-04-17 Microsoft Corporation Project management applications utilizing summary tasks for top-down project planning
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US20100325539A1 (en) * 2009-06-18 2010-12-23 Microsoft Corporation Web based spell check
US20100332215A1 (en) * 2009-06-26 2010-12-30 Nokia Corporation Method and apparatus for converting text input
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US8307040B2 (en) 2009-07-14 2012-11-06 International Business Machines Corporation Reducing errors in sending file attachments
US20110099193A1 (en) * 2009-10-26 2011-04-28 Ancestry.Com Operations Inc. Automatic pedigree corrections
US8600152B2 (en) * 2009-10-26 2013-12-03 Ancestry.Com Operations Inc. Devices, systems and methods for transcription suggestions and completions
US8365059B2 (en) * 2009-11-03 2013-01-29 Oto Technologies, Llc E-reader semantic text manipulation
US8494852B2 (en) 2010-01-05 2013-07-23 Google Inc. Word-level correction of speech input
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
DE202011111062U1 (en) 2010-01-25 2019-02-19 Newvaluexchange Ltd. Device and system for a digital conversation management platform
US8600173B2 (en) * 2010-01-27 2013-12-03 Dst Technologies, Inc. Contextualization of machine indeterminable information based on machine determinable information
CN102884518A (en) * 2010-02-01 2013-01-16 金格软件有限公司 Automatic context sensitive language correction using an internet corpus particularly for small keyboard devices
JP5819860B2 (en) * 2010-02-12 2015-11-24 グーグル・インコーポレーテッド Compound word division
US8782556B2 (en) * 2010-02-12 2014-07-15 Microsoft Corporation User-centric soft keyboard predictive technologies
US8423351B2 (en) * 2010-02-19 2013-04-16 Google Inc. Speech correction for typed input
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9165065B2 (en) * 2010-03-26 2015-10-20 Paypal Inc. Terminology management database
JP4940325B2 (en) * 2010-03-29 2012-05-30 株式会社東芝 Document proofreading support apparatus, method and program
US9262397B2 (en) * 2010-10-08 2016-02-16 Microsoft Technology Licensing, Llc General purpose correction of grammatical and word usage errors
KR20120048140A (en) * 2010-11-05 2012-05-15 한국전자통신연구원 Automatic translation device and method thereof
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9239708B2 (en) * 2010-12-28 2016-01-19 Microsoft Technology Licensing, Llc Contextually intelligent code editing
US9787725B2 (en) 2011-01-21 2017-10-10 Qualcomm Incorporated User input back channel for wireless displays
US20120197628A1 (en) * 2011-01-28 2012-08-02 International Business Machines Corporation Cross-language spell checker
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8855997B2 (en) 2011-07-28 2014-10-07 Microsoft Corporation Linguistic error detection
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8700654B2 (en) 2011-09-13 2014-04-15 Microsoft Corporation Dynamic spelling correction of search queries
CN103918027B (en) * 2011-09-21 2016-08-24 纽安斯通信有限公司 Effective gradual modification of the optimum Finite State Transformer (FST) in voice application
US8290772B1 (en) * 2011-10-03 2012-10-16 Google Inc. Interactive text editing
US9934218B2 (en) * 2011-12-05 2018-04-03 Infosys Limited Systems and methods for extracting attributes from text content
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9002702B2 (en) 2012-05-03 2015-04-07 International Business Machines Corporation Confidence level assignment to information from audio transcriptions
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9047540B2 (en) * 2012-07-19 2015-06-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9135912B1 (en) * 2012-08-15 2015-09-15 Google Inc. Updating phonetic dictionaries
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9021380B2 (en) 2012-10-05 2015-04-28 Google Inc. Incremental multi-touch gesture recognition
US8782549B2 (en) 2012-10-05 2014-07-15 Google Inc. Incremental feature-based gesture-keyboard decoding
US8701032B1 (en) 2012-10-16 2014-04-15 Google Inc. Incremental multi-word recognition
AU2013237735B1 (en) * 2012-10-16 2013-11-14 Google Llc Correction of errors in character strings that include a word delimiter
US8850350B2 (en) 2012-10-16 2014-09-30 Google Inc. Partial gesture text entry
US8612213B1 (en) * 2012-10-16 2013-12-17 Google Inc. Correction of errors in character strings that include a word delimiter
US8713433B1 (en) 2012-10-16 2014-04-29 Google Inc. Feature-based autocorrection
US8843845B2 (en) 2012-10-16 2014-09-23 Google Inc. Multi-gesture text input prediction
US8819574B2 (en) 2012-10-22 2014-08-26 Google Inc. Space prediction for text input
US8807422B2 (en) 2012-10-22 2014-08-19 Varcode Ltd. Tamper-proof quality management barcode indicators
US9025877B2 (en) * 2013-01-04 2015-05-05 Ricoh Company, Ltd. Local scale, rotation and position invariant word detection for optical character recognition
US8832589B2 (en) 2013-01-15 2014-09-09 Google Inc. Touch keyboard using language and spatial models
US20140214401A1 (en) 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN103970765B (en) * 2013-01-29 2016-03-09 腾讯科技(深圳)有限公司 Correct mistakes model training method, device and text of one is corrected mistakes method, device
US10228819B2 (en) 2013-02-04 2019-03-12 602531 British Cilumbia Ltd. Method, system, and apparatus for executing an action related to user selection
KR20230137475A (en) 2013-02-07 2023-10-04 애플 인크. Voice trigger for a digital assistant
US9031829B2 (en) 2013-02-08 2015-05-12 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US8996352B2 (en) 2013-02-08 2015-03-31 Machine Zone, Inc. Systems and methods for correcting translations in multi-user multi-lingual communications
US10650103B2 (en) 2013-02-08 2020-05-12 Mz Ip Holdings, Llc Systems and methods for incentivizing user feedback for translation processing
US8996353B2 (en) * 2013-02-08 2015-03-31 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9231898B2 (en) 2013-02-08 2016-01-05 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9298703B2 (en) 2013-02-08 2016-03-29 Machine Zone, Inc. Systems and methods for incentivizing user feedback for translation processing
US8996355B2 (en) 2013-02-08 2015-03-31 Machine Zone, Inc. Systems and methods for reviewing histories of text messages from multi-user multi-lingual communications
US8990068B2 (en) 2013-02-08 2015-03-24 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9600473B2 (en) 2013-02-08 2017-03-21 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
US9081500B2 (en) 2013-05-03 2015-07-14 Google Inc. Alternative hypothesis error correction for gesture typing
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
EP3937002A1 (en) 2013-06-09 2022-01-12 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9384303B2 (en) * 2013-06-10 2016-07-05 Google Inc. Evaluation of substitution contexts
US9613021B2 (en) 2013-06-13 2017-04-04 Red Hat, Inc. Style-based spellchecker tool
AU2014278595B2 (en) 2013-06-13 2017-04-06 Apple Inc. System and method for emergency calls initiated by voice command
US20140380169A1 (en) * 2013-06-20 2014-12-25 Google Inc. Language input method editor to disambiguate ambiguous phrases via diacriticization
DE112014003653B4 (en) 2013-08-06 2024-04-18 Apple Inc. Automatically activate intelligent responses based on activities from remote devices
WO2015069994A1 (en) * 2013-11-07 2015-05-14 NetaRose Corporation Methods and systems for natural language composition correction
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
JP6301647B2 (en) * 2013-12-24 2018-03-28 株式会社東芝 SEARCH DEVICE, SEARCH METHOD, AND PROGRAM
KR20150086086A (en) * 2014-01-17 2015-07-27 삼성전자주식회사 server for correcting error in voice recognition result and error correcting method thereof
US9037967B1 (en) * 2014-02-18 2015-05-19 King Fahd University Of Petroleum And Minerals Arabic spell checking technique
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
AU2015266863B2 (en) 2014-05-30 2018-03-15 Apple Inc. Multi-command single utterance input method
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
RU2639684C2 (en) * 2014-08-29 2017-12-21 Общество С Ограниченной Ответственностью "Яндекс" Text processing method (versions) and constant machine-readable medium (versions)
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
EP3195145A4 (en) 2014-09-16 2018-01-24 VoiceBox Technologies Corporation Voice commerce
WO2016044321A1 (en) 2014-09-16 2016-03-24 Min Tang Integration of domain information into state transitions of a finite state transducer for natural language processing
WO2016046232A1 (en) 2014-09-26 2016-03-31 British Telecommunications Public Limited Company Improved pattern matching
EP3198476A1 (en) * 2014-09-26 2017-08-02 British Telecommunications Public Limited Company Efficient pattern matching
US10776427B2 (en) 2014-09-26 2020-09-15 British Telecommunications Public Limited Company Efficient conditional state mapping in a pattern matching automaton
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10083167B2 (en) * 2014-10-03 2018-09-25 At&T Intellectual Property I, L.P. System and method for unsupervised text normalization using distributed representation of words
US9747896B2 (en) 2014-10-15 2017-08-29 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US10162811B2 (en) 2014-10-17 2018-12-25 Mz Ip Holdings, Llc Systems and methods for language detection
US9372848B2 (en) 2014-10-17 2016-06-21 Machine Zone, Inc. Systems and methods for language detection
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
KR102396983B1 (en) * 2015-01-02 2022-05-12 삼성전자주식회사 Method for correcting grammar and apparatus thereof
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
EP3089159B1 (en) 2015-04-28 2019-08-28 Google LLC Correcting voice recognition using selective re-speak
EP3298367B1 (en) 2015-05-18 2020-04-29 Varcode Ltd. Thermochromic ink indicia for activatable quality labels
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
WO2017006326A1 (en) 2015-07-07 2017-01-12 Varcode Ltd. Electronic quality indicator
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US9930168B2 (en) * 2015-12-14 2018-03-27 International Business Machines Corporation System and method for context aware proper name spelling
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10765956B2 (en) 2016-01-07 2020-09-08 Machine Zone Inc. Named entity recognition on chat data
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
CN105824804A (en) * 2016-03-31 2016-08-03 长安大学 English spelling error correction tool and method based on word bank
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179588B1 (en) 2016-06-09 2019-02-22 Apple Inc. Intelligent automated assistant in a home environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
JP2018067159A (en) * 2016-10-19 2018-04-26 京セラドキュメントソリューションズ株式会社 Image processing apparatus and image forming apparatus
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US11151130B2 (en) * 2017-02-04 2021-10-19 Tata Consultancy Services Limited Systems and methods for assessing quality of input text using recurrent neural networks
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
DK201770429A1 (en) 2017-05-12 2018-12-14 Apple Inc. Low-latency intelligent automated assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10970481B2 (en) * 2017-06-28 2021-04-06 Apple Inc. Intelligently deleting back to a typographical error
JP6979294B2 (en) * 2017-07-06 2021-12-08 株式会社朝日新聞社 Calibration support device, calibration support method and calibration support program
US20190050391A1 (en) * 2017-08-09 2019-02-14 Lenovo (Singapore) Pte. Ltd. Text suggestion based on user context
US10565304B2 (en) 2017-09-16 2020-02-18 Noredink Corp. System and method for implementing a proficiency-driven feedback and improvement platform
WO2019060353A1 (en) 2017-09-21 2019-03-28 Mz Ip Holdings, Llc System and method for translating chat messages
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US11222056B2 (en) * 2017-11-13 2022-01-11 International Business Machines Corporation Gathering information on user interactions with natural language processor (NLP) items to order presentation of NLP items in documents
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
CN108595410B (en) * 2018-03-19 2023-03-24 小船出海教育科技(北京)有限公司 Automatic correction method and device for handwritten composition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US11386266B2 (en) * 2018-06-01 2022-07-12 Apple Inc. Text correction
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance
CN112328737B (en) * 2019-07-17 2023-05-05 北方工业大学 Spelling data generation method
CA3150031A1 (en) * 2019-08-05 2021-02-11 Ai21 Labs Systems and methods of controllable natural language generation
CN110991166B (en) * 2019-12-03 2021-07-30 中国标准化研究院 Chinese wrongly-written character recognition method and system based on pattern matching
CN112541342B (en) * 2020-12-08 2022-07-22 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0283685A2 (en) * 1987-03-27 1988-09-28 International Business Machines Corporation A spelling assistance method for compound words
JPH0242574A (en) * 1988-08-02 1990-02-13 Ricoh Co Ltd Spelling check system for translation system
JPH0283664A (en) * 1988-09-20 1990-03-23 Ricoh Co Ltd Reference part qualifying and analyzing system
JPH02103662A (en) * 1988-10-12 1990-04-16 Ricoh Co Ltd Sentence dividing system
US4994966A (en) * 1988-03-31 1991-02-19 Emerson & Stern Associates, Inc. System and method for natural language parsing by initiating processing prior to entry of complete sentences
US5610812A (en) * 1994-06-24 1997-03-11 Mitsubishi Electric Information Technology Center America, Inc. Contextual tagger utilizing deterministic finite state transducer
EP0788062A2 (en) * 1996-01-30 1997-08-06 Sun Microsystems, Inc. Internet-based spelling checker dictionary system with automatic updating
US5659771A (en) * 1995-05-19 1997-08-19 Mitsubishi Electric Information Technology Center America, Inc. System for spelling correction in which the context of a target word in a sentence is utilized to determine which of several possible words was intended
WO1997049043A1 (en) * 1996-06-20 1997-12-24 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document
US5737734A (en) * 1995-09-15 1998-04-07 Infonautics Corporation Query word relevance adjustment in a search of an information retrieval system

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4674065A (en) 1982-04-30 1987-06-16 International Business Machines Corporation System for detecting and correcting contextual errors in a text processing system
US4580241A (en) 1983-02-18 1986-04-01 Houghton Mifflin Company Graphic word spelling correction using automated dictionary comparisons with phonetic skeletons
US4730269A (en) 1983-02-18 1988-03-08 Houghton Mifflin Company Method and apparatus for generating word skeletons utilizing alpha set replacement and omission
US4701851A (en) 1984-10-24 1987-10-20 International Business Machines Corporation Compound word spelling verification
US4672571A (en) 1984-10-24 1987-06-09 International Business Machines Corporation Compound word suitability for spelling verification
US4818131A (en) 1985-12-29 1989-04-04 Brother Kogyo Kabushiki Kaisha Typewriter having means for automatic indication of candidate correct word for misspelled word, and/or automatic correction of misspelled word
US4864502A (en) 1987-10-07 1989-09-05 Houghton Mifflin Company Sentence analyzer
US4868750A (en) 1987-10-07 1989-09-19 Houghton Mifflin Company Collocational grammar system
US4847766A (en) 1988-01-05 1989-07-11 Smith Corona Corporation Dictionary typewriter with correction of commonly confused words
US5258909A (en) 1989-08-31 1993-11-02 International Business Machines Corporation Method and apparatus for "wrong word" spelling error detection and correction
US5604897A (en) 1990-05-18 1997-02-18 Microsoft Corporation Method and system for correcting the spelling of misspelled words
US5369577A (en) * 1991-02-01 1994-11-29 Wang Laboratories, Inc. Text searching system
US5625554A (en) * 1992-07-20 1997-04-29 Xerox Corporation Finite-state transduction of related word forms for text indexing and retrieval
AU677605B2 (en) * 1992-09-04 1997-05-01 Caterpillar Inc. Integrated authoring and translation system
US5606690A (en) * 1993-08-20 1997-02-25 Canon Inc. Non-literal textual search using fuzzy finite non-deterministic automata
US5537317A (en) 1994-06-01 1996-07-16 Mitsubishi Electric Research Laboratories Inc. System for correcting grammer based parts on speech probability
US5485372A (en) * 1994-06-01 1996-01-16 Mitsubishi Electric Research Laboratories, Inc. System for underlying spelling recovery
US6047300A (en) * 1997-05-15 2000-04-04 Microsoft Corporation System and method for automatically correcting a misspelled word
US6219453B1 (en) * 1997-08-11 2001-04-17 At&T Corp. Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm
US6016471A (en) * 1998-04-29 2000-01-18 Matsushita Electric Industrial Co., Ltd. Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0283685A2 (en) * 1987-03-27 1988-09-28 International Business Machines Corporation A spelling assistance method for compound words
US4994966A (en) * 1988-03-31 1991-02-19 Emerson & Stern Associates, Inc. System and method for natural language parsing by initiating processing prior to entry of complete sentences
JPH0242574A (en) * 1988-08-02 1990-02-13 Ricoh Co Ltd Spelling check system for translation system
JPH0283664A (en) * 1988-09-20 1990-03-23 Ricoh Co Ltd Reference part qualifying and analyzing system
JPH02103662A (en) * 1988-10-12 1990-04-16 Ricoh Co Ltd Sentence dividing system
US5610812A (en) * 1994-06-24 1997-03-11 Mitsubishi Electric Information Technology Center America, Inc. Contextual tagger utilizing deterministic finite state transducer
US5659771A (en) * 1995-05-19 1997-08-19 Mitsubishi Electric Information Technology Center America, Inc. System for spelling correction in which the context of a target word in a sentence is utilized to determine which of several possible words was intended
US5737734A (en) * 1995-09-15 1998-04-07 Infonautics Corporation Query word relevance adjustment in a search of an information retrieval system
EP0788062A2 (en) * 1996-01-30 1997-08-06 Sun Microsystems, Inc. Internet-based spelling checker dictionary system with automatic updating
WO1997049043A1 (en) * 1996-06-20 1997-12-24 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
KUDO I ET AL: "ENGLISH CAI: A USER-INITIATIVE CAI SYSTEM WITH MACHINE TRANSLATION TECHNIQUES" SYSTEMS & COMPUTERS IN JAPAN,US,SCRIPTA TECHNICA JOURNALS. NEW YORK, vol. 21, no. 9, 1 January 1990 (1990-01-01), pages 46-59, XP000220500 ISSN: 0882-1666 *
PATENT ABSTRACTS OF JAPAN vol. 014, no. 201 (P-1041), 24 April 1990 (1990-04-24) & JP 02 042574 A (RICOH CO LTD), 13 February 1990 (1990-02-13) *
PATENT ABSTRACTS OF JAPAN vol. 014, no. 283 (P-1063), 19 June 1990 (1990-06-19) & JP 02 083664 A (RICOH CO LTD), 23 March 1990 (1990-03-23) *
PATENT ABSTRACTS OF JAPAN vol. 014, no. 317 (P-1073), 9 July 1990 (1990-07-09) & JP 02 103662 A (RICOH CO LTD), 16 April 1990 (1990-04-16) *
UTHUTASAMY R ET AL: "EXTRACTING KNOWLEDGE FROM DIAGNOSTIC DATABASES" IEEE EXPERT,US,IEEE INC. NEW YORK, vol. 8, no. 6, 1 December 1993 (1993-12-01), pages 27-38, XP000414493 ISSN: 0885-9000 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1277135A1 (en) * 2000-04-25 2003-01-22 Microsoft Corporation Language model sharing
US7895031B2 (en) 2000-04-25 2011-02-22 Microsoft Corporation Language model sharing
EP1244260A2 (en) * 2001-03-21 2002-09-25 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
EP1244260A3 (en) * 2001-03-21 2003-09-03 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
US7016700B2 (en) 2001-03-21 2006-03-21 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
US7167720B2 (en) 2001-03-21 2007-01-23 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
US7822435B2 (en) 2001-03-21 2010-10-26 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
US7266387B2 (en) 2001-03-21 2007-09-04 Kabushiki Kaisha Toshiba Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
EP1875462A4 (en) * 2005-04-25 2010-08-11 Microsoft Corp Method and system for generating spelling suggestions
EP1875462A2 (en) * 2005-04-25 2008-01-09 Microsoft Corporation Method and system for generating spelling suggestions
JP2008539476A (en) * 2005-04-25 2008-11-13 マイクロソフト コーポレーション Spelling presentation generation method and system
WO2006115598A2 (en) 2005-04-25 2006-11-02 Microsoft Corporation Method and system for generating spelling suggestions
WO2007094684A3 (en) * 2006-02-17 2007-12-13 Lumex As Method and system for verification of uncertainly recognized words in an ocr system
WO2007094684A2 (en) * 2006-02-17 2007-08-23 Lumex As Method and system for verification of uncertainly recognized words in an ocr system
US8315484B2 (en) 2006-02-17 2012-11-20 Lumex As Method and system for verification of uncertainly recognized words in an OCR system
US7831911B2 (en) 2006-03-08 2010-11-09 Microsoft Corporation Spell checking system including a phonetic speller
US7831423B2 (en) 2006-05-25 2010-11-09 Multimodal Technologies, Inc. Replacing text representing a concept with an alternate written form of the concept
EP1953622A1 (en) * 2007-02-02 2008-08-06 Research In Motion Limited Handeld electronics device including predictive accent mechanism, and associated method
WO2013032617A1 (en) * 2011-09-01 2013-03-07 Google Inc. Server-based spell checking

Also Published As

Publication number Publication date
EP1145141A3 (en) 2002-09-11
US6424983B1 (en) 2002-07-23
US7243305B2 (en) 2007-07-10
WO1999062000A3 (en) 2001-06-07
WO1999062000A8 (en) 2001-11-15
AU4100399A (en) 1999-12-13
US7853874B2 (en) 2010-12-14
EP1145141A2 (en) 2001-10-17
US20040093567A1 (en) 2004-05-13
US20080077859A1 (en) 2008-03-27
CA2333402A1 (en) 1999-12-02

Similar Documents

Publication Publication Date Title
US7243305B2 (en) Spelling and grammar checking system
US7574348B2 (en) Processing collocation mistakes in documents
EP0907924B1 (en) Identification of words in japanese text by a computer system
JP4544674B2 (en) A system that provides information related to the selected string
CA1300272C (en) Word annotation system
US5680628A (en) Method and apparatus for automated search and retrieval process
EP0953192B1 (en) Natural language parser with dictionary-based part-of-speech probabilities
US5890103A (en) Method and apparatus for improved tokenization of natural language text
US6115683A (en) Automatic essay scoring system using content-based techniques
US20110270603A1 (en) Method and Apparatus for Language Processing
WO1997004405A9 (en) Method and apparatus for automated search and retrieval processing
JPH0782498B2 (en) Machine translation system
US7136803B2 (en) Japanese virtual dictionary
JP3231004B2 (en) Database access device and method
US7620541B2 (en) Critiquing clitic pronoun ordering in french
JP2002503849A (en) Word segmentation method in Kanji sentences
JP3136973B2 (en) Language analysis system and method
JP3884001B2 (en) Language analysis system and method
Trushkina Automatic error detection in second language learners' writing
Muller TREATING'KRE-8-WE'SPELLINGS FOR NATURAL LANGUAGE PROCESSING
JPH0546612A (en) Sentence error detector
JPH0736885A (en) Method and device for controlling character information conversion in document preparing device
JPH02271467A (en) Morpheme analyzer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2333402

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 1999924524

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

WWP Wipo information: published in national office

Ref document number: 1999924524

Country of ref document: EP

AK Designated states

Kind code of ref document: C1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

WR Later publication of a revised version of an international search report
WWW Wipo information: withdrawn in national office

Ref document number: 1999924524

Country of ref document: EP