(19) United States
(12) Patent Application Publication (io) Pub. No.: US 2002/0102025 Al
WU et al. (43) Pub. Date: Aug. 1,2002
(54) WORD SEGMENTATION IN CHINESE TEXT (52) U.S. CI 382/229
(76) Inventors: ANDI WU, BELLEVUE, WA (US);
STEPHEN D. RICHARDSON,
REDMOND, WA (US); ZIXIN JIANG,
REDMOND, WA (US)
WESTMAN CHAMPLIN & KELLY
900 SECOND AVENUE SOUTH
MINNEAPOLIS, MN 554023319
( * ) Notice: This is a publication of a continued prosecution application (CPA) filed under 37 CFR 1.53(d).
(21) Appl. No.: 09/087,468
(22) Filed: May 29, 1998
Related U.S. Application Data
(63) Continuation-in-part of application No. 09/023,586, filed on Feb. 13, 1998, now abandoned.
(51) Int. CI.7 G06K 9 34
The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word. In some embodiments, the facility proceeds to compare the combination of characters to a list of valid words to determine whether the combination of characters is a word.