Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A system for recognition of characters on a medium. The system includes a scanner for scanning a medium such as a page of printed text and graphics and producing a bit-mapped representation of the page. The bit-mapped representation of the page is then stored in a memory means such as the memory of a computer system. A processor processes the bit-mapped image to produce an output comprising coded character representations of the text on the page. The present invention discloses parsing a page to allow for production of the output characters in a logical sequence, a combination of feature detection methods and template matching methods for recognition of characters and a number of methods for feature detection such as use of statistical data and polygon fitting.

InventorsPhilip Bernzott, John Dilworth, David George, Bryan Higgins, Jeremy Knight
Original AssigneeCaere Corporation
Primary Examiner: Michael R. Cammarata
Current U.S. Classification382/229; 382/197; 382/228
International Classification: G06K 972; G06K 948; G06K 962

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US4783829Feb 22, 1984Nov 8, 1988Hitachi, Ltd.Pattern recognition apparatus
US4802230Mar 13, 1987Jan 31, 1989GTX CorporationMethod and apparatus for generating size and orientation invariant shape features
US4837842Sep 19, 1986Jun 6, 1989Character and pattern recognition machine and method
US5033098Aug 27, 1990Jul 16, 1991Sharp Kabushiki KaishaMethod of processing character blocks with optical character reader
US5113453May 14, 1990May 12, 1992L'Etat Francais represente par le Ministre des Postes et Telecommunications Centre National d'Etudes des TelecommunicationsCharacter recognition method and apparatus
US5131053Aug 10, 1988Jul 14, 1992Caere CorporationOptical character recognition method and apparatus
US5133023May 19, 1988Jul 21, 1992The Palantir CorporationMeans for resolving ambiguities in text based upon character context
US5524453Aug 1, 1994Jun 11, 1996Thermal energy storage apparatus for chilled water air-conditioning systems

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US5535313Jul 18, 1994Jul 9, 1996Motorola, Inc.Automated quality control in a document conversion system
US5727220Nov 29, 1995Mar 10, 1998International Business Machines CorporationMethod and system for caching and referencing cached document pages utilizing a presentation data stream
US5856832Jan 13, 1997Jan 5, 1999Hewlett-Packard CompanySystem and method for parsing multiple sets of data
US5893127Nov 18, 1996Apr 6, 1999Canon Information Systems, Inc.Generator for document with HTML tagged table having data elements which preserve layout relationships of information in bitmap image of original document
US6011877Dec 9, 1996Jan 4, 2000Minolta Co., Ltd.Apparatus and method for determining the directional orientation of a document image based upon the location of detected punctuation marks within the document image
US6054990Jul 5, 1996Apr 25, 2000Computer system with handwriting annotation
US6202060Oct 29, 1996Mar 13, 2001Data management system
US6327388Aug 14, 1998Dec 4, 2001Matsushita Electric Industrial Co., Ltd.Identification of logos from document images
US6336094Jun 30, 1995Jan 1, 2002Price Waterhouse World Firm Services BV. Inc.Method for electronically recognizing and parsing information contained in a financial statement
US6341176Nov 13, 1997Jan 22, 2002Matsushita Electric Industrial Co., Ltd.Method and apparatus for character recognition
US6496600Jun 17, 1996Dec 17, 2002Canon Kabushiki KaishaFont type identification
US6512848Nov 18, 1996Jan 28, 2003Canon Kabushiki KaishaPage analysis system
US6577771Jul 29, 1998Jun 10, 2003Matsushita Electric Industrial Co., Ltd.Method for supplementing digital image with picture element, and digital image encoder and decoder using the same
US7039856Sep 30, 1998May 2, 2006Ricoh Co., Ltd.Automatic document classification using text and images
US7657120Aug 24, 2007Feb 2, 2010SRI InternationalMethod and apparatus for determination of text orientation
US7831098Nov 7, 2006Nov 9, 2010Recognition RoboticsSystem and method for visual searching of objects using lines

Claims

1. A processor for identifying an unknown character in an optical character recognition system, said system for providing a bit-mapped representation of said unknown character and for providing shape information for a plurality of known characters, said processor comprising:

a) means for generating shape information corresponding to said unknown character, said shape information including statistical data and polygonal representation data; and
b) identifying means for identifying said unknown character if a comparison between said shape information corresponding to said unknown character and said shape information corresponding to one known character of said plurality of said known characters is within a predetermined range.

2. The processor of claim 1 wherein said means for generating shape information includes:

means for generating a first window and a second window, said first window corresponding to a first portion of said bit-mapped representation, said second window corresponding to a second portion of said bit-mapped representation;
means for generating profile data from said first window and said second window; and
means for generating shape information from said profile data.

3. The processor of claim 2 wherein said means for generating shape information includes:

means for generating a first polygon from said profile data;
means for generating a second shaded polygon from said profile data, and
means for generating a third polygon by subtracting the first polygon from the second polygon.

4. The processor of claim 1 wherein said shape information includes phase change data.

5. The processor of claim 1 wherein said unknown character has segments of varying length, and wherein said shape information includes relative length of segment data.

6. The processor of claim 1 wherein said polygonal representation data includes polygon peaks data, and said identifying means compares polygon peaks data of said unknown character to polygon peaks data of known characters.

7. The processor of claim 1 wherein said identifying means compares loops represented in said unknown character shape information to loops represented in said known character shape information.

8. The processor of claim 1 wherein said system further provides a plurality of templates of known characters and a bit-mapped representation of said unknown character, and said processor further comprises:

means for comparing said bit-mapped representation of said unknown character to said plurality of templates of known characters; and
means for determining if said bit-mapped representation of said unknown character matches one of said plurality of templates of known characters.

9. A method for identifying an unknown character in an optical character recognition system, said system for providing shape information for a plurality of known characters, said method comprising:

a) generating shape information corresponding to said unknown character, said shape information including phase change data, statistical data and polygonal representation data, and
b) identifying said unknown character if a comparison between said shape information corresponding to said unknown character and a shape information corresponding to one of said plurality of said known characters is within a predetermined range.

10. The method of claim 9 wherein said generating shape information includes:

corresponding to a first portion of said bit-mapped representation, said second window corresponding to a second portion of said bit-mapped representation;
generating profile data from said first window and said second window; and
generating shape information from said profile data.

11. The method of claim 9 wherein said generating shape information includes:

generating a first polygon from said profile data;
generating a second shaded polygon from said profile data; and
generating a third polygon by subtracting the first polygon from the second polygon.

12. The method of claim 9 wherein said polygonal representation data includes polygon peaks data, and wherein said identifying step compares polygon peaks data of said unknown character to polygon peaks data of known characters.

13. The method of claim 9 wherein said identifying step compares loops represented in said unknown character.