Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A method for cutting character images from a line segment of pixel image data includes a first cutting layer step in which nontouching and nonoverlapping characters are cut from a line segment, and a second cutting layer step in which touching characters are cut from the line segment.

InventorsMehrzad R. Vaezi, Christopher Allen Sherrick
Original AssigneeCanon Kabushiki Kaisha
Primary Examiner: Matthew C. Bella
Current U.S. Classification382/171; 382/174; 382/177; 382/178
International Classification: G06K 934; G06K 946

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US4933984Aug 25, 1989Jun 12, 1990Hitachi, Ltd.Document analysis system
US5048107Jun 5, 1990Sep 10, 1991Ricoh Company, Ltd.Table region identification method
US5065442May 1, 1990Nov 12, 1991Canon Kabushiki KaishaCharacter recognition apparatus determining whether read data is a character line
US5075895Mar 30, 1990Dec 24, 1991Ricoh Company, Ltd.Method and apparatus for recognizing table area formed in binary image of document
US5091964Apr 1, 1991Feb 25, 1992Fuji Electric Co., Ltd.
Fujifacom Corporation
Apparatus for extracting a text region in a document image
US5093868Apr 18, 1990Mar 3, 1992Sharp Kabushiki KaishaMethod for determining lines of character images for use in an optical reader
US5101439Aug 31, 1990Mar 31, 1992AT&T Bell LaboratoriesSegmentation process for machine reading of handwritten information
US5101448Aug 21, 1989Mar 31, 1992Hitachi, Ltd.Method and apparatus for processing a document by utilizing an image
US5129012Mar 21, 1990Jul 7, 1992Sony CorporationDetecting line segments and predetermined patterns in an optically scanned document
US5253304Nov 27, 1991Oct 12, 1993AT&T Bell LaboratoriesMethod and apparatus for image segmentation
US5307422Jun 25, 1991Apr 26, 1994Industrial Technology Research InstituteMethod and system for identifying lines of text in a document
US5313526Dec 30, 1991May 17, 1994Goldstar Co., Ltd.Method for disconnecting character strings of a compressed image
US5335290Apr 6, 1992Aug 2, 1994Ricoh Corporation
Ricoh Company Ltd.
Segmentation of text, picture and lines of a document image
US5351314Oct 4, 1991Sep 27, 1994Canon Information Systems, Inc.Method and apparatus for image enhancement using intensity dependent spread filtering
US5561720Oct 8, 1993Oct 1, 1996CGK Computer Gesellschaft Konstanz mbHMethod for extracting individual characters from raster images of a read-in handwritten or typed character sequence having a free pitch

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US6504955Aug 31, 1998Jan 7, 2003Canon Kabushiki KaishaInformation processing apparatus, information processing method, storage medium, and printing system
US6563949Nov 24, 1998May 13, 2003Fujitsu LimitedCharacter string extraction apparatus and pattern extraction apparatus
US6711292Dec 30, 1998Mar 23, 2004Canon Kabushiki KaishaBlock selection of table features
US6721452Sep 12, 2002Apr 13, 2004Auburn UniversitySystem and method of handwritten character recognition
US7039856Sep 30, 1998May 2, 2006Ricoh Co., Ltd.Automatic document classification using text and images
US7283669Jan 29, 2003Oct 16, 2007Lockheed Martin CorporationFine segmentation refinement for an optical character recognition system
US7555711Jun 24, 2005Jun 30, 2009Hewlett-Packard Development Company, L.P.Generating a text layout boundary from a text block in an electronic document
US7623716Sep 8, 2005Nov 24, 2009Fuji Xerox Co., Ltd.Language translation device, image processing apparatus, image forming apparatus, language translation method and storage medium
US7681121Jun 8, 2005Mar 16, 2010Canon Kabushiki KaishaImage processing apparatus, control method therefor, and program
US8023741May 23, 2008Sep 20, 2011Sharp Laboratories of America, Inc.Methods and systems for detecting numerals in a digital image
US8023770May 23, 2008Sep 20, 2011Sharp Laboratories of America, Inc.Methods and systems for identifying the orientation of a digital image
US8027550Nov 30, 2007Sep 27, 2011Sharp Kabushiki KaishaImage-document retrieving apparatus, method of retrieving image document, program, and recording medium
US8036463Sep 13, 2007Oct 11, 2011Keyence CorporationCharacter extracting apparatus, method, and program
US8059895Oct 1, 2008Nov 15, 2011Canon Kabushiki KaishaImage processing apparatus
US8200043May 1, 2008Jun 12, 2012Xerox CorporationPage orientation detection based on selective character recognition
US8229248Aug 30, 2011Jul 24, 2012Sharp Laboratories of America, Inc.Methods and systems for identifying the orientation of a digital image
US8238662Jul 17, 2007Aug 7, 2012SMART Technologies ULCMethod for manipulating regions of a digital image

Claims

1. A method for cutting character images from a line segment of pixel image data comprising the steps of:

a first cutting layer step in which non-touching and non-overlapping characters are cut from a line segment;
a recognition step of recognition-processing characters cut in said first cutting layer step;
a second cutting layer step in which touching characters are cut from the line segment, said second layer cutting step being performed only for characters not recognized in said recognition step; and
a recombining step of recombining characters not recognized in said recognition step.

2. A method according to claim 1, further comprising the step of pre-processing line segment image data so as to compress high resolution image segments into low resolution.

3. A method according to claim 2, wherein different compression ratios are utilized for vertical and horizontal directions.

4. A method according to claim 1, further comprising an intermediate cutting layer step in which non-touching but overlapping character data is cut from the line segment.

5. A method according to claim 4, wherein said intermediate cutting layer step includes the step of cutting non-touching but overlapping characters by outlining contours of pixel data.

6. A method according to claim 4, wherein said intermediate cutting layer step includes the step of recombining inadvertently cut characters.

7. A method according to claim 1, wherein said first cutting layer step includes the step of sparsely stepping across said line segment until non-blank pixel data is found.

8. A method according to claim 7, further comprising the step of densely searching forwardly and backwardly in the event that a non-blank pixel is found in said sparsely stepping step.

9. A method according to claim 1, wherein said second cutting layer step includes the step of cutting touching characters by non-vertical cuts.

10. A method according to claim 9, wherein the angle of the non-vertical cuts is determined in accordance with a vertical projection profile of pixel density.

11. A method according to claim 10, further comprising the step of obtaining at least one rotated projection profile based on the vertical projection profile.

12. A method according to claim 11, further comprising the step of obtaining plural rotated projection profiles at angles corresponding to angles in the vicinity of the angle of said vertical projection profile.

13. A method according to claim 12, wherein the angle of the non-vertical cut is obtained based on the minimum value of the projection profiles obtained from said rotated projection profiling steps and said vertical projection profiling steps.

14. A method according to claim 1, wherein said second cutting layer step is selectable based on whether characteristics of the characters in said line segment are known.

15. A method according to claim 14, wherein said second cutting layer step includes the step of cutting based on spacing statistics of character data in the character set.

16. A method according to claim 15, further comprising the step of recombining pairs of undersized characters.

17. A method for cutting between touching characters in a line segment of character pixel data comprising the steps of:

obtaining a vertical projection profile of pixel data in the line segment;
a first calculating step in which at least one angle is calculated from a minimum in the vertical projection profile to an adjacent maximum in the vertical projection profile;
a second calculating step in which at least one rotated projection profile is calculated based on the angle calculated in said first calculating step; and
cutting the line segment at the angle of said rotated projection profile and at a position corresponding to a minimum in the rotated projection profile.

18. A method according to claim 17, wherein said first calculating step includes the step of comparing the vertical projection profile to plural thresholds to determine the location of the minimum and the adjacent maximum.

19. A method according to claim 18, further comprising the step of incrementing said thresholds in the event that a minimum and an adjacent maximum are not located.

20. A method according to claim 18, wherein a minimum is identified as a point on the vertical projection profile below a first threshold that is surrounded on at least one side by a maximum that is above a second threshold.

21. A method according to claim 17, further comprising the step of calculating plural rotated projection profiles, each of the plural rotated projection profiles being calculated at angles in the vicinity of a rotated projection profile calculated in said second calculating step.

22. A method according to claim 21, further comprising the step of identifying an overall minimum in the plural rotated projection profiles and the vertical projection profile, wherein said cutting step cuts at an angle corresponding to the angle from which the minimum was obtained and at a position corresponding to the position of the minimum.

23. An apparatus for cutting character images from a line segment of pixel image data comprising:

first cutting means for cutting a first layer of non-touching and non-overlapping characters from a line segment;
recognition means for recognition-processing characters cut by said first cutting means;
second cutting means for cutting a second layer of touching characters from the line segment, said second cutting means utilized only for characters not recognized by said recognition means; and
recombining means for recombining characters not recognized by said recognition means.

24. An apparatus according to claim 23, further comprising pre-processing means for pre-processing line segment image data so as to compress high resolution image segments into low resolution.

25. An apparatus according to claim 24, wherein different compression ratios are utilized for vertical and horizontal directions.

26. An apparatus according to claim 24, further comprising third cutting means for cutting an intermediate layer of non-touching but overlapping character data from the line segment.

27. An apparatus according to claim 26, wherein said third cutting means includes outlining means for outlining contours of pixel data of the non-touching but overlapping characters.

28. An apparatus according to claim 26, wherein said third cutting means includes first recombining means for recombining inadvertently cut characters.

29. An apparatus according to claim 23, wherein said first cutting means includes sparsely stepping means for sparsely stepping across said line segment until non-blank pixel data is found.

30. An apparatus according to claim 29, further comprising searching means for densely searching forwardly and backwardly in the event that a non-blank pixel is found by said sparsely stepping means.

31. An apparatus according to claim 25, wherein said second cutting means cuts touching characters by non-vertical cuts.

32. An apparatus according to claim 31, wherein the angle of the non-vertical cuts is determined in accordance with a vertical projection profile of pixel density.

33. An apparatus according to claim 32, further comprising obtaining means for obtaining at least one rotated projection profile based on the vertical projection profile.

34. An apparatus according to claim 33, further comprising second obtaining means for obtaining plural rotated projection profiles at angles corresponding to angles in the vicinity of the angle of said vertical projection profile.

35. An apparatus according to claim 34, wherein the angle of the non-vertical cut is obtained based on the minimum value of the projection profiles obtained from said rotated projection profiles and said vertical projection profiles.

36. An apparatus according to claim 23, wherein said second cutting means is selectable based on whether characteristics of the characters in said line segment are known.

37. An apparatus according to claim 36, wherein said second cutting means makes cuts based on known statistics of character data in the character set.

38. An apparatus according to claim 37, further comprising third recombining means for recombining pairs of undersized characters.

39. An apparatus for cutting between touching characters in a line segment of character pixel data comprising:

obtaining means for obtaining a vertical projection profile of pixel data in the line segment;
first calculating means for calculating at least one angle from a minimum in the vertical projection profile to an adjacent maximum in the vertical projection profile;
second calculating means for calculating at least one rotated projection profile based on the angle calculated by said first calculating means; and
cutting means for cutting the line segment at the angle of said rotated projection profile and at a position corresponding to a minimum in the rotated projection profile.

40. An apparatus according to claim 39, wherein said first calculating means includes comparing means for comparing the vertical projection profile to plural thresholds to determine the location of the minimum and the adjacent maximum.

41. An apparatus according to claim 40, further comprising incrementing means for incrementing said thresholds in the event that a minimum and an adjacent maximum are not located.

42. An apparatus according to claim 41, wherein a minimum is identified as a point on the vertical projection profile below a first threshold that is surrounded on at least one side by a maximum that is above a second threshold.

43. An apparatus according to claim 39, further comprising third calculating means for calculating plural rotated projection profiles, each of the plural rotated projection profiles being calculated at angles in the vicinity of a rotated projection profile calculated by said second calculating means.

44. An apparatus according to claim 43, further comprising identifying means for identifying an overall minimum in the plural rotated projection profiles and the vertical projection profile, wherein said cutting means makes a cut at an angle corresponding to the angle from which the minimum was obtained, at a position corresponding to the position of the minimum.

45. A computer-readable memory medium storing computer-executable process steps to cut character images from a line segment of pixel image data, the steps comprising:

a first cutting layer step to cut non-touching and non-overlapping characters from a line segment;
a recognition step to recognize characters cut in said first cutting layer step;
a second cutting layer step to cut touching characters from the line segment, said second layer cutting step being performed only for characters not recognized in said recognition step; and
a recombining step to recombine characters not recognized in said recognition step.

46. A computer-readable memory medium storing computer-executable process steps according to claim 45, further comprising a step to pre-process line segment image data so as to compress high resolution image segments into low resolution.

47. A computer-readable memory medium storing computer-executable process steps according to claim 46, wherein different compression ratios are utilized for vertical and horizontal directions.

48. A computer-readable memory medium storing computer-executable process steps according to claim 45, further comprising an intermediate cutting layer step to cut non-touching but overlapping character data from the line segment.

49. A computer-readable memory medium storing computer-executable process steps according to claim 48, wherein said intermediate cutting layer step includes a step to cut non-touching but overlapping characters by outlining contours of pixel data.

50. A computer-readable memory medium storing computer-executable process steps according to claim 48, wherein said intermediate cutting layer step includes a step to recombine inadvertently cut characters.

51. A computer-readable memory medium storing computer-executable process steps according to claim 45, wherein said first cutting layer step includes a step to sparsely step across said line segment until non-blank pixel data is found.

52. A computer-readable memory medium storing computer-executable process steps according to claim 51, further comprising a step to densely search forwardly and backwardly in the event that a non-blank pixel is found in said sparsely stepping step.

53. A computer-readable memory medium storing computer-executable process steps according to claim 45, wherein said second cutting layer step includes a step to cut touching characters by non-vertical cuts.

54. A computer-readable memory medium storing computer-executable process steps according to claim 53, wherein the angle of the non-vertical cuts is determined in accordance with a vertical projection profile of pixel density.

55. A computer-readable memory medium storing computer-executable process steps according to claim 54, further comprising a step to obtain at least one rotated projection profile based on the vertical projection profile.

56. A computer-readable memory medium storing computer-executable process steps according to claim 55, further comprising a step to obtain plural rotated projection profiles at angles corresponding to angles in the vicinity of the angle of said vertical projection profile.

57. A computer-readable memory medium storing computer-executable process steps according to claim 56, wherein the angle of the non-vertical cut is obtained based on the minimum value of the projection profiles obtained from said rotated projection profiling steps and said vertical projection profiling steps.

58. A computer-readable memory medium storing computer-executable process steps according to claim 45, wherein said second cutting layer step is selectable based on whether characteristics of the characters in said line segment are known.

59. A computer-readable memory medium storing computer-executable process steps according to claim 58, wherein said second cutting layer step includes a step to cut based on spacing statistics of character data in the character set.

60. A computer-readable memory medium storing computer-executable process steps according to claim 59, further comprising a step to recombine pairs of undersized characters.

61. A computer-readable memory medium storing computer-executable process steps to cut between touching characters in a line segment of character pixel data, the steps comprising:

an obtaining step to obtain a vertical projection profile of pixel data in the line segment;
a first calculating step to calculate at least one angle from a minimum in the vertical projection profile to an adjacent maximum in the vertical projection profile;
a second calculating step to calculate at least one rotated projection profile based on the angle calculated in said first calculating step; and
a cutting step to cut the line segment at the angle of said rotated projection profile and at a position corresponding to a minimum in the rotated projection profile.

62. A computer-readable memory medium storing computer-executable process steps according to claim 61, wherein said first calculating step includes a step to compare the vertical projection profile to plural thresholds to determine the location of the minimum and the adjacent maximum.

63. A computer-readable memory medium storing computer-executable process steps according to claim 62, further comprising a step to increment said thresholds in the event that a minimum and an adjacent maximum are not located.

64. A computer-readable memory medium storing computer-executable process steps according to claim 62, wherein a minimum is identified as a point on the vertical projection profile below a first threshold that is surrounded on at least one side by a maximum that is above a second threshold.

65. A computer-readable memory medium storing computer-executable process steps according to claim 61, further comprising a step to calculate plural rotated projection profiles, each of the plural rotated projection profiles being calculated at angles in the vicinity of a rotated projection profile calculated in said second calculating step.

66. A computer-readable memory medium storing computer-executable process steps according to claim 65, further comprising a step to identify an overall minimum in the plural rotated projection profiles and the vertical projection profile, wherein, in said cutting step, the line segment is cut at an angle corresponding to the angle from which the minimum was obtained and at a position corresponding to the position of the minimum.