A system for recognition of characters on a medium. The system includes a scanner for scanning a medium such as a page of printed text and graphics and producing a bit-mapped representation of the page. The bit-mapped representation of the page is then stored in a memory means such as the memory of a computer system. A processor processes the bit-mapped image to produce an output comprising coded character representations of the text on the page. The present invention discloses parsing a page to allow for production of the output characters in a logical sequence, a combination of feature detection methods and template matching methods for recognition of characters and a number of methods for feature detection such as use of statistical data and polygon fitting. |
Citations|
| US4783829 | Feb 22, 1984 | Nov 8, 1988 | Hitachi, Ltd. | Pattern recognition apparatus | | US4802230 | Mar 13, 1987 | Jan 31, 1989 | GTX Corporation | Method and apparatus for generating size and orientation invariant shape features | | US4837842 | Sep 19, 1986 | Jun 6, 1989 | | Character and pattern recognition machine and method | | US5033098 | Aug 27, 1990 | Jul 16, 1991 | Sharp Kabushiki Kaisha | Method of processing character blocks with optical character reader | | US5113453 | May 14, 1990 | May 12, 1992 | L'Etat Francais represente par le Ministre des Postes et Telecommunications Centre National d'Etudes des Telecommunications | Character recognition method and apparatus | | US5131053 | Aug 10, 1988 | Jul 14, 1992 | Caere Corporation | Optical character recognition method and apparatus | | US5133023 | May 19, 1988 | Jul 21, 1992 | The Palantir Corporation | Means for resolving ambiguities in text based upon character context | | US5524453 | Aug 1, 1994 | Jun 11, 1996 | | Thermal energy storage apparatus for chilled water air-conditioning systems |
Referenced by|
| US5535313 | Jul 18, 1994 | Jul 9, 1996 | Motorola, Inc. | Automated quality control in a document conversion system | | US5727220 | Nov 29, 1995 | Mar 10, 1998 | International Business Machines Corporation | Method and system for caching and referencing cached document pages utilizing a presentation data stream | | US5856832 | Jan 13, 1997 | Jan 5, 1999 | Hewlett-Packard Company | System and method for parsing multiple sets of data | | US5893127 | Nov 18, 1996 | Apr 6, 1999 | Canon Information Systems, Inc. | Generator for document with HTML tagged table having data elements which preserve layout relationships of information in bitmap image of original document | | US6011877 | Dec 9, 1996 | Jan 4, 2000 | Minolta Co., Ltd. | Apparatus and method for determining the directional orientation of a document image based upon the location of detected punctuation marks within the document image | | US6054990 | Jul 5, 1996 | Apr 25, 2000 | | Computer system with handwriting annotation | | US6202060 | Oct 29, 1996 | Mar 13, 2001 | | Data management system | | US6327388 | Aug 14, 1998 | Dec 4, 2001 | Matsushita Electric Industrial Co., Ltd. | Identification of logos from document images | | US6336094 | Jun 30, 1995 | Jan 1, 2002 | Price Waterhouse World Firm Services BV. Inc. | Method for electronically recognizing and parsing information contained in a financial statement | | US6341176 | Nov 13, 1997 | Jan 22, 2002 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for character recognition | | US6496600 | Jun 17, 1996 | Dec 17, 2002 | Canon Kabushiki Kaisha | Font type identification | | US6512848 | Nov 18, 1996 | Jan 28, 2003 | Canon Kabushiki Kaisha | Page analysis system | | US6577771 | Jul 29, 1998 | Jun 10, 2003 | Matsushita Electric Industrial Co., Ltd. | Method for supplementing digital image with picture element, and digital image encoder and decoder using the same | | US7039856 | Sep 30, 1998 | May 2, 2006 | Ricoh Co., Ltd. | Automatic document classification using text and images | | US7657120 | Aug 24, 2007 | Feb 2, 2010 | SRI International | Method and apparatus for determination of text orientation | | US7831098 | Nov 7, 2006 | Nov 9, 2010 | Recognition Robotics | System and method for visual searching of objects using lines |
Claims1. A processor for identifying an unknown character in an optical character recognition system, said system for providing a bit-mapped representation of said unknown character and for providing shape information for a plurality of known characters, said processor comprising: - a) means for generating shape information corresponding to said unknown character, said shape information including statistical data and polygonal representation data; and
- b) identifying means for identifying said unknown character if a comparison between said shape information corresponding to said unknown character and said shape information corresponding to one known character of said plurality of said known characters is within a predetermined range.
2. The processor of claim 1 wherein said means for generating shape information includes: - means for generating a first window and a second window, said first window corresponding to a first portion of said bit-mapped representation, said second window corresponding to a second portion of said bit-mapped representation;
- means for generating profile data from said first window and said second window; and
- means for generating shape information from said profile data.
3. The processor of claim 2 wherein said means for generating shape information includes: - means for generating a first polygon from said profile data;
- means for generating a second shaded polygon from said profile data, and
- means for generating a third polygon by subtracting the first polygon from the second polygon.
4. The processor of claim 1 wherein said shape information includes phase change data. 5. The processor of claim 1 wherein said unknown character has segments of varying length, and wherein said shape information includes relative length of segment data. 6. The processor of claim 1 wherein said polygonal representation data includes polygon peaks data, and said identifying means compares polygon peaks data of said unknown character to polygon peaks data of known characters. 7. The processor of claim 1 wherein said identifying means compares loops represented in said unknown character shape information to loops represented in said known character shape information. 8. The processor of claim 1 wherein said system further provides a plurality of templates of known characters and a bit-mapped representation of said unknown character, and said processor further comprises: - means for comparing said bit-mapped representation of said unknown character to said plurality of templates of known characters; and
- means for determining if said bit-mapped representation of said unknown character matches one of said plurality of templates of known characters.
9. A method for identifying an unknown character in an optical character recognition system, said system for providing shape information for a plurality of known characters, said method comprising: - a) generating shape information corresponding to said unknown character, said shape information including phase change data, statistical data and polygonal representation data, and
- b) identifying said unknown character if a comparison between said shape information corresponding to said unknown character and a shape information corresponding to one of said plurality of said known characters is within a predetermined range.
10. The method of claim 9 wherein said generating shape information includes: - corresponding to a first portion of said bit-mapped representation, said second window corresponding to a second portion of said bit-mapped representation;
- generating profile data from said first window and said second window; and
- generating shape information from said profile data.
11. The method of claim 9 wherein said generating shape information includes: - generating a first polygon from said profile data;
- generating a second shaded polygon from said profile data; and
- generating a third polygon by subtracting the first polygon from the second polygon.
12. The method of claim 9 wherein said polygonal representation data includes polygon peaks data, and wherein said identifying step compares polygon peaks data of said unknown character to polygon peaks data of known characters. 13. The method of claim 9 wherein said identifying step compares loops represented in said unknown character. |