WO2007103984A2 - Multiple image input for optical character recognition processing systems and methods - Google Patents

Multiple image input for optical character recognition processing systems and methods Download PDF

Info

Publication number
WO2007103984A2
WO2007103984A2 PCT/US2007/063508 US2007063508W WO2007103984A2 WO 2007103984 A2 WO2007103984 A2 WO 2007103984A2 US 2007063508 W US2007063508 W US 2007063508W WO 2007103984 A2 WO2007103984 A2 WO 2007103984A2
Authority
WO
WIPO (PCT)
Prior art keywords
output file
ocr
binarization
character recognition
optical character
Prior art date
Application number
PCT/US2007/063508
Other languages
French (fr)
Other versions
WO2007103984A3 (en
Inventor
Donald B. Curtis
Shawn Reid
Original Assignee
The Generations Network, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Generations Network, Inc. filed Critical The Generations Network, Inc.
Publication of WO2007103984A2 publication Critical patent/WO2007103984A2/en
Publication of WO2007103984A3 publication Critical patent/WO2007103984A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data

Definitions

  • Embodiments of the present invention relate generally to image processing. More specifically, embodiments of the present invention relate to systems and methods for performing Optical Character Recognition on source images.
  • OCR Optical Character Recognition
  • Some entities who process documents using OCR have taken the approach of running multiple OCR engines on a single digital image and then using a technique such as voting to determine which text to actually output from the various engines.
  • the idea behind this approach is to use the best of each OCR engine to obtain the over-all highest-quality text output. Nevertheless, this approach is not optimal and improvements are desired.
  • Embodiments of the invention provide a method of processing an image.
  • the method includes receiving a digital version of the image, processing the digital version of the image through at least two binarization processes to thereby create a first binarization and a second binarization, and processing the first binarization through a first optical character recognition process to thereby create a first OCR output file.
  • Processing the first binarization through a first optical character recognition process includes compiling first metrics associated with the first OCR output file.
  • the method also includes processing the second binarization through the first optical character recognition process to thereby create a second OCR output file.
  • Processing the second binarization through the first optical character recognition process includes compiling second metrics associated with the second OCR output file.
  • the method also includes using the metrics, at least in part, to select a final OCR output file from among the OCR output files.
  • the method includes processing the first binarization through a second optical character recognition process to thereby create a third OCR output file. Processing the first binarization through a second optical character recognition process may include compiling third metrics associated with the third OCR output file. The method also may include processing the second binarization through the second optical character recognition process to thereby create a fourth OCR output file. Processing the second binarization through the second optical character recognition process may include compiling fourth metrics associated with the fourth OCR output file.
  • the binarization processes may include clustering, global-thresholding, adaptive thresholding, and/or the like.
  • the first and second optical character recognition processes may be the same optical character recognition process. The first and second optical character recognition processes may be different optical character recognition process.
  • the metrics associated with a particular output file may include a number of characters recognized in the particular output file; a number of dictionary words in the particular output file; a number of unknown words in the particular output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular output file and other output files; which binarization and OCR process produced the particular output file; a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular output file.
  • the method also may include creating the digital version of the image from a physical version of the image.
  • Other embodiments provide a method of optically recognizing characters in an image.
  • the method includes creating multiple binarizations of the image using different binarization techniques, presenting each binarization to an optical character recognition (OCR) engine to produce OCR output file for each binarization, developing metrics relating to each OCR output file, and using the metrics, at least in part, to select a final OCR output file from among the OCR output files.
  • OCR optical character recognition
  • the different binarization techniques may include clustering, global-thresholding, adaptive thresholding, and/or the like.
  • Presenting each binarization to an OCR engine may include presenting each binarization to a different OCR engine.
  • the metrics may include a number of characters recognized in the particular OCR output file; a number of dictionary words in the particular OCR output file; a number of unknown words in the particular OCR output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular OCR output file and other OCR output files; which binarization and OCR process produced the particular OCR output file; a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular OCR output file; and/or the like.
  • Still other embodiments provide an optical character recognition system.
  • the system includes at least two binarization processes configured to convert grayscale images to bitonal images, at least one optical character recognition process configured to process bitonal images into final output files having characters therein, a metrics generation process configured to analyze output files and produce metrics associated therewith, a voting process configured to select a final output file from among the output files based on the metrics, a storage arrangement configured to store final output files and serve the information therein to users, and at least one processor programmed to execute the at least one optical character recognition processes, the at least one optical character recognition process, metrics generation process, and the voting process.
  • the at least two binarization processes may include clustering, global-thresholding, adaptive thresholding, and/or the like.
  • the metrics associated with a particular output file may include a number of characters recognized in the particular output file; a number of dictionary words in the particular output file; a number of unknown words in the particular output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular output file and other output files; which binarization and OCR process produced the particular output file; a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular output file; and/or the like.
  • Figure IA depicts an exemplary Optical Character Recognition (OCR) system according to embodiments of the invention.
  • OCR Optical Character Recognition
  • Figure IB depicts a block diagram of an exemplary Binarization/OCR process according to embodiments of the invention, which process may be implemented in the system of Figure 1.
  • Figure 2 depicts an exemplary OCR process according to embodiments of the invention which process maybe implemented in the system of Figure 1.
  • the present invention relates to systems and methods for improving the quality of document processing using Optical Character Recognition (OCR).
  • OCR Optical Character Recognition
  • the ensuing description provides preferred exemplary embodiment(s) only, and is not intended to limit the scope, applicability or configuration of the invention. Rather, the ensuing description of the preferred exemplary embodiment(s) will provide those skilled in the art with an enabling description for implementing a preferred exemplary embodiment of the invention. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.
  • a process is terminated when its operations are completed, but could have additional steps not included in the figure.
  • a process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
  • a process corresponds to a function
  • its termination corresponds to a return of the function to the calling function or the main function.
  • the term “storage medium” may represent one or more devices for storing data, including read only memory (ROM), random access memory (RAM), magnetic RAM, core memory, magnetic disk storage mediums, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information.
  • ROM read only memory
  • RAM random access memory
  • magnetic RAM magnetic RAM
  • core memory magnetic disk storage mediums
  • optical storage mediums flash memory devices and/or other machine readable mediums for storing information.
  • computer-readable medium includes, but is not limited to portable or fixed storage devices, optical storage devices, wireless channels and various other mediums capable of storing, containing or carrying instruction(s) and/or data.
  • embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
  • the program code or code segments to perform the necessary tasks may be stored in a machine readable medium such as storage medium.
  • a processor(s) may perform the necessary tasks.
  • a code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements.
  • a code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
  • OCR errors as a quality measure, differences in OCR errors from different binarizations of the same image can be significant. Moreover, different binarizers yield the best results for different types of documents. Hence, rather than provide a single image to the set of OCR engines (whose output will then be voted on), the quality of OCR output is maximized by presenting several incarnations of a single image to a set of one or more OCR engines. Each incarnation may be the result of different scanning techniques, (e.g. scanning with different light settings, with different resolution settings, etc.), different image processing techniques (e.g.
  • the OCR engines then process each incarnation (each input image), annotating the outputs with its metrics (e.g. confidence metrics).
  • the outputs are voted on, using the metrics collected about the images, the OCR engines, and the confidence levels, to determine which outputs to actually send as the final result.
  • New voting algorithms are not required, although data about each input image, the processes applied to it and their associated confidence levels could become integrated into the metrics that are used in the voting process.
  • Fig. IA illustrates an exemplary OCR production system 100 according to embodiments of the invention.
  • the system 100 operates on source images 102, which may be color or grayscale.
  • Source images also may be physical 102-1 or digital 102-2.
  • Physical images 102-1 are processed through a hardware scanner 104, or other appropriate process, to thereby produce a digital image 102-3 for further processing. Further processing typically takes place digitally.
  • the electronic images 102-2, 102-3 are then passed to a computing device 106, which may be a mainframe or other appropriate computing device, having a storage system 108 associated therewith.
  • the images are then processed through a binarization and OCR process as will be described more fully with reference to Fig. IB.
  • the final image or information thereafter may be made available via a network 110, such as the Internet.
  • the images or information may be, for example, hosted by a web server 112 and made available to subscribers who access the images or information via subscriber computers 114.
  • Fig. IB depicts a block diagram of an exemplary binarization/OCR process.
  • Source electronic images 102-3 are first subjected to at least two binarizations 130. Any suitable binarization method may be used. In this example, clustering 130-1, global-thresholding 130-2, and adaptive thresholding 130-3 are used. Additionally, a single binarization method (such as global thresholding) may provide more than one binarization by using more than one value for an input parameter (e.g. the global threshold value).
  • Each binarization produces a bitonal image that is then passed to one or more OCR processes 132. Any suitable OCR process may be used. In this example, each of the three bitonal images is subjected to three different OCR processes, thereby producing nine OCR output files.
  • Metrics are maintained on the intermediate and final results. Metrics may include, for example the number of characters recognized in an image, the number of dictionary words recognized, the number of unknown words, degree of agreement among different output files, which binarization and OCR process produced the output file, historical accuracy of the particular binarization or binarization/OCR combination, per-character and per-image confidence ratings, and the like.
  • the OCR output files are then passed to a voting process.
  • the voting process selects a particular set of characters from any one or more available output files to be the final output.
  • the image or information associated with the selected output file is thereafter stored for future use.
  • Fig. 2 illustrates an exemplary OCR production process 200 according to embodiments of the invention.
  • the process may be implemented in the system 100 of Fig. IA or other appropriate system.
  • Fig. IA or other appropriate system.
  • the process 200 is merely exemplary of a number of possible processes, which may include more, fewer, or different steps than those illustrated and described herein.
  • the steps illustrated and described herein may be traversed in different steps than those shown here.
  • the process 200 begins at block 202, at which point an image is received for processing.
  • the image may be physical or digital, color or black-and-white, etc.
  • the image may be bitonal, although the advantages of the present invention are particularly evident with respect to grayscale images.
  • physical images are scanned or otherwise processed to produce electronic images.
  • Electronic images are thereafter passed to at least two binarizations 206, 208. hi some cases, the electronic images are processed through additional binarizations 210.
  • Acceptable binarizations include clustering, global-thresholding, and adaptive thresholding. The binarizations produce bitonal images.
  • Bitonal images produced by the binarizations are thereafter processed through at least one OCR process 212. Li some examples, the bitonal images are processed through additional OCR processes 214. The OCR processes produce output files.
  • the output files are analyzed, and metrics are collected related to them. Metrics may include any of a number of quality measures, including number of recognized characters, number of recognized words, ratio of recognized words :unrecognized words, and the like.
  • a voting process selects a set of characters for the final output file from among the output files. The results are thereafter stored and made available at block 220.

Abstract

A method of processing an image includes receiving a digital version of the image, processing the digital version of the image through at least two binarization processes to thereby create a first binarization and a second binarization, and processing the first binarization through a first optical character recognition process to thereby create a first OCR output file. Processing the first binarization through a first optical character recognition process includes compiling first metrics associated with the first OCR output file. The method also includes processing the second binarization through the first optical character recognition process to thereby create a second OCR output file. Processing the second binarization through the first optical character recognition process includes compiling second metrics associated with the second OCR output file. The method also includes using the metrics, at least in part, to select a final OCR output file from among the OCR output files.

Description

MULTIPLE IMAGE INPUT FOR OPTICAL CHARACTER RECOGNITION PROCESSING SYSTEMS AND METHODS
[0001] Embodiments of the present invention relate generally to image processing. More specifically, embodiments of the present invention relate to systems and methods for performing Optical Character Recognition on source images.
CROSS-REFERENCES TO RELATED APPLICATIONS
[0002] This application is a non-provisional of U.S. Patent Application No. 60/780,484, filed on March 7, 2006, and incorporates by reference U.S. Patent Application No. 11/188,137, entitled "ADAPTIVE CONTRAST CONTROL SYSTEMS AND METHODS," filed on July 21, 2005, by Curtis.
BACKGROUND OF THE INVENTION
[0003] Optical Character Recognition (OCR) engines are widely available. OCR engines differ in their approach to the problem of recognizing characters. Some entities who process documents using OCR have taken the approach of running multiple OCR engines on a single digital image and then using a technique such as voting to determine which text to actually output from the various engines. The idea behind this approach is to use the best of each OCR engine to obtain the over-all highest-quality text output. Nevertheless, this approach is not optimal and improvements are desired.
BRIEF SUMMARY OF THE INVENTION
[0004] Embodiments of the invention provide a method of processing an image. The method includes receiving a digital version of the image, processing the digital version of the image through at least two binarization processes to thereby create a first binarization and a second binarization, and processing the first binarization through a first optical character recognition process to thereby create a first OCR output file. Processing the first binarization through a first optical character recognition process includes compiling first metrics associated with the first OCR output file. The method also includes processing the second binarization through the first optical character recognition process to thereby create a second OCR output file. Processing the second binarization through the first optical character recognition process includes compiling second metrics associated with the second OCR output file. The method also includes using the metrics, at least in part, to select a final OCR output file from among the OCR output files.
[0005] In some embodiments, the method includes processing the first binarization through a second optical character recognition process to thereby create a third OCR output file. Processing the first binarization through a second optical character recognition process may include compiling third metrics associated with the third OCR output file. The method also may include processing the second binarization through the second optical character recognition process to thereby create a fourth OCR output file. Processing the second binarization through the second optical character recognition process may include compiling fourth metrics associated with the fourth OCR output file. The binarization processes may include clustering, global-thresholding, adaptive thresholding, and/or the like. The first and second optical character recognition processes may be the same optical character recognition process. The first and second optical character recognition processes may be different optical character recognition process. The metrics associated with a particular output file may include a number of characters recognized in the particular output file; a number of dictionary words in the particular output file; a number of unknown words in the particular output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular output file and other output files; which binarization and OCR process produced the particular output file; a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular output file. The method also may include creating the digital version of the image from a physical version of the image.
[0006] Other embodiments provide a method of optically recognizing characters in an image. The method includes creating multiple binarizations of the image using different binarization techniques, presenting each binarization to an optical character recognition (OCR) engine to produce OCR output file for each binarization, developing metrics relating to each OCR output file, and using the metrics, at least in part, to select a final OCR output file from among the OCR output files. The different binarization techniques may include clustering, global-thresholding, adaptive thresholding, and/or the like. Presenting each binarization to an OCR engine may include presenting each binarization to a different OCR engine. The metrics may include a number of characters recognized in the particular OCR output file; a number of dictionary words in the particular OCR output file; a number of unknown words in the particular OCR output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular OCR output file and other OCR output files; which binarization and OCR process produced the particular OCR output file; a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular OCR output file; and/or the like.
[0007] Still other embodiments provide an optical character recognition system. The system includes at least two binarization processes configured to convert grayscale images to bitonal images, at least one optical character recognition process configured to process bitonal images into final output files having characters therein, a metrics generation process configured to analyze output files and produce metrics associated therewith, a voting process configured to select a final output file from among the output files based on the metrics, a storage arrangement configured to store final output files and serve the information therein to users, and at least one processor programmed to execute the at least one optical character recognition processes, the at least one optical character recognition process, metrics generation process, and the voting process. The at least two binarization processes may include clustering, global-thresholding, adaptive thresholding, and/or the like. The metrics associated with a particular output file may include a number of characters recognized in the particular output file; a number of dictionary words in the particular output file; a number of unknown words in the particular output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular output file and other output files; which binarization and OCR process produced the particular output file; a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular output file; and/or the like.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings wherein like reference numerals are used throughout the several drawings to refer to similar components. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[0009] Figure IA depicts an exemplary Optical Character Recognition (OCR) system according to embodiments of the invention.
[0010] Figure IB depicts a block diagram of an exemplary Binarization/OCR process according to embodiments of the invention, which process may be implemented in the system of Figure 1.
[0011] Figure 2 depicts an exemplary OCR process according to embodiments of the invention which process maybe implemented in the system of Figure 1.
DETAILED DESCRIPTION OF THE INVENTION
[0012] The present invention relates to systems and methods for improving the quality of document processing using Optical Character Recognition (OCR). The ensuing description provides preferred exemplary embodiment(s) only, and is not intended to limit the scope, applicability or configuration of the invention. Rather, the ensuing description of the preferred exemplary embodiment(s) will provide those skilled in the art with an enabling description for implementing a preferred exemplary embodiment of the invention. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.
[0013] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits may be shown in block diagrams in order not to obscure the embodiments in unnecessary detail, hi other instances, well-known circuits, structures and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. [0014] Also, it is noted that the embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed, but could have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.
[0015] Moreover, as disclosed herein, the term "storage medium" may represent one or more devices for storing data, including read only memory (ROM), random access memory (RAM), magnetic RAM, core memory, magnetic disk storage mediums, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information. The term "computer-readable medium" includes, but is not limited to portable or fixed storage devices, optical storage devices, wireless channels and various other mediums capable of storing, containing or carrying instruction(s) and/or data.
[0016] Furthermore, embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium such as storage medium. A processor(s) may perform the necessary tasks. A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
[0017] Most OCR technologies today operate on bitonal (black-and-white) digital images. Source images, however, typically begin as grayscale or color images. A process called binarization converts a grayscale or color image to a bitonal one. Many techniques have been developed for binarizing images, including global-thresholding, adaptive thresholding, clustering, and so on. In the area of thresholding, many techniques have been developed for choosing an appropriate threshold, at least one of which is described more fully in previously-incorporated U.S. Patent Application No. 11/188,137. The binarization process is not an exact science and different techniques yield different results for different types of images.
[0018] Using OCR errors as a quality measure, differences in OCR errors from different binarizations of the same image can be significant. Moreover, different binarizers yield the best results for different types of documents. Hence, rather than provide a single image to the set of OCR engines (whose output will then be voted on), the quality of OCR output is maximized by presenting several incarnations of a single image to a set of one or more OCR engines. Each incarnation may be the result of different scanning techniques, (e.g. scanning with different light settings, with different resolution settings, etc.), different image processing techniques (e.g. brightening, contrast adjusting, sharpening, deskewing, resampling, etc.) or other image-modification processes, and/or different binarization algorithms. The OCR engines then process each incarnation (each input image), annotating the outputs with its metrics (e.g. confidence metrics). The outputs are voted on, using the metrics collected about the images, the OCR engines, and the confidence levels, to determine which outputs to actually send as the final result. New voting algorithms are not required, although data about each input image, the processes applied to it and their associated confidence levels could become integrated into the metrics that are used in the voting process.
[0019] Having described embodiments of the present invention generally, attention is directed to Fig. IA, which illustrates an exemplary OCR production system 100 according to embodiments of the invention. Those skilled in the art will appreciate that the system 100 is merely exemplary of a number of possible embodiments. The system 100 operates on source images 102, which may be color or grayscale. Source images also may be physical 102-1 or digital 102-2. Physical images 102-1 are processed through a hardware scanner 104, or other appropriate process, to thereby produce a digital image 102-3 for further processing. Further processing typically takes place digitally.
[0020] The electronic images 102-2, 102-3 are then passed to a computing device 106, which may be a mainframe or other appropriate computing device, having a storage system 108 associated therewith. The images are then processed through a binarization and OCR process as will be described more fully with reference to Fig. IB.
[0021] The final image or information thereafter may be made available via a network 110, such as the Internet. The images or information may be, for example, hosted by a web server 112 and made available to subscribers who access the images or information via subscriber computers 114.
[0022] Fig. IB depicts a block diagram of an exemplary binarization/OCR process. Source electronic images 102-3 are first subjected to at least two binarizations 130. Any suitable binarization method may be used. In this example, clustering 130-1, global-thresholding 130-2, and adaptive thresholding 130-3 are used. Additionally, a single binarization method (such as global thresholding) may provide more than one binarization by using more than one value for an input parameter (e.g. the global threshold value).
[0023] Each binarization produces a bitonal image that is then passed to one or more OCR processes 132. Any suitable OCR process may be used. In this example, each of the three bitonal images is subjected to three different OCR processes, thereby producing nine OCR output files.
[0024] During the binarization/OCR process, metrics are maintained on the intermediate and final results. Metrics may include, for example the number of characters recognized in an image, the number of dictionary words recognized, the number of unknown words, degree of agreement among different output files, which binarization and OCR process produced the output file, historical accuracy of the particular binarization or binarization/OCR combination, per-character and per-image confidence ratings, and the like.
[0025] The OCR output files are then passed to a voting process. The voting process selects a particular set of characters from any one or more available output files to be the final output. The image or information associated with the selected output file is thereafter stored for future use.
[0026] Attention is now directed to Fig. 2, which illustrates an exemplary OCR production process 200 according to embodiments of the invention. The process may be implemented in the system 100 of Fig. IA or other appropriate system. Those skilled in the art will appreciate that the process 200 is merely exemplary of a number of possible processes, which may include more, fewer, or different steps than those illustrated and described herein. Moreover, the steps illustrated and described herein may be traversed in different steps than those shown here.
[0027] The process 200 begins at block 202, at which point an image is received for processing. The image may be physical or digital, color or black-and-white, etc. The image may be bitonal, although the advantages of the present invention are particularly evident with respect to grayscale images.
[0028] At block 204, physical images are scanned or otherwise processed to produce electronic images. Electronic images are thereafter passed to at least two binarizations 206, 208. hi some cases, the electronic images are processed through additional binarizations 210. Acceptable binarizations include clustering, global-thresholding, and adaptive thresholding. The binarizations produce bitonal images.
[0029] Bitonal images produced by the binarizations are thereafter processed through at least one OCR process 212. Li some examples, the bitonal images are processed through additional OCR processes 214. The OCR processes produce output files.
[0030] At block 216, the output files are analyzed, and metrics are collected related to them. Metrics may include any of a number of quality measures, including number of recognized characters, number of recognized words, ratio of recognized words :unrecognized words, and the like. At block 218 a voting process selects a set of characters for the final output file from among the output files. The results are thereafter stored and made available at block 220.
[0031] Having described several embodiments, it will be recognized by those of skill in the art that various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the invention. Additionally, a number of well-known processes and elements have not been described in order to avoid unnecessarily obscuring the present invention. Accordingly, the above description should not be taken as limiting the scope of the invention, which is defined in the following claims.

Claims

WHAT IS CLAIMED IS:
1. A method of processing an image, comprising: receiving a digital version of the image; processing the digital version of the image through at least two binarization processes to thereby create a first binarization and a second binarization; processing the first binarization through a first optical character recognition process to thereby create a first OCR output file, wherein processing the first binarization through a first optical character recognition process comprises compiling first metrics associated with the first OCR output file; processing the second binarization through the first optical character recognition process to thereby create a second OCR output file, wherein processing the second binarization through the first optical character recognition process comprises compiling second metrics associated with the second OCR output file; and using the metrics, at least in part, to select a final OCR output file from among the OCR output files.
2. The method of claim 1, further comprising: processing the first binarization through a second optical character recognition process to thereby create a third OCR output file, wherein processing the first binarization through a second optical character recognition process comprises compiling third metrics associated with the third OCR output file; and processing the second binarization through the second optical character recognition process to thereby create a fourth OCR output file, wherein processing the second binarization through the second optical character recognition process comprises compiling fourth metrics associated with the fourth OCR output file.
3. The method of claim 1, wherein the binarization processes are selected from a group consisting of: clustering; global-thresholding; and adaptive thresholding.
4. The method of claim 1, wherein the first and second optical character recognition processes comprise the same optical character recognition process.
5. The method of claim 1, wherein the first and second optical character recognition processes comprise different optical character recognition process.
6. The method of claim 1, wherein the metrics associated with a particular output file comprise one or more selections from a group consisting of: a number of characters recognized in the particular output file; a number of dictionary words in the particular output file; a number of unknown words in the particular output file; a per-character confidence level in the particular output file; a per-word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular output file and other output files; which binarization and OCR process produced the particular output file; and a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular output file.
7. The method of claim 1, creating the digital version of the image from a physical version of the image.
8. A method of optically recognizing characters in an image, comprising: creating multiple binarizations of the image using different binarization techniques; presenting each binarization to an optical character recognition (OCR) engine to produce OCR output file for each binarization; developing metrics relating to each OCR output file; and using the metrics, at least in part, to select a final OCR output file from among the OCR output files.
9. The method of claim 8, wherein the different binarization techniques comprise one or more selections from a group consisting of: clustering; global-thresholding; and adaptive thresholding.
10. The method of claim 8, wherein presenting each binarization to an OCR engine comprises presenting each binarization to a different OCR engine.
11. The method of claim 8, wherein the metrics relating to a particular OCR output file comprise one or more selections from a group consisting of: a number of characters recognized in the particular OCR output file; a number of dictionary words in the particular OCR output file; a number of unknown words in the particular OCR output file; a per-character confidence level in the particular output file; a per- word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular OCR output file and other OCR output files; which binarization and OCR process produced the particular OCR output file; and a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular OCR output file.
12. An optical character recognition system, comprising: at least two binarization processes configured to convert grayscale images to bitonal images; at least one optical character recognition process configured to process bitonal images into final output files having characters therein; a metrics generation process configured to analyze output files and produce metrics associated therewith; a voting process configured to select a final output file from among the output files based on the metrics; a storage arrangement configured to store final output files and serve the information therein to users; and at least one processor programmed to execute the at least one optical character recognition processes, the at least one optical character recognition process, metrics generation process, and the voting process.
13. The optical character recognition system of claim 12, wherein the at least two binarization processes comprise at least one selection from a group consisting of: clustering; global-thresholding; and adaptive thresholding.
14. The optical character recognition system of claim 12, wherein the metrics associated with a particular output file comprise one or more selections from a group consisting of: a number of characters recognized in the particular output file; a number of dictionary words in the particular output file; a number of unknown words in the particular output file; a per-character confidence level in the particular output file; a per- word confidence level in the particular output file; a per-image confidence level in the particular output file; a degree of agreement between the particular output file and other output files; which binarization and OCR process produced the particular output file; and a measure of historical accuracy associated with the particular binarization/OCR combination that produced the particular output file.
PCT/US2007/063508 2006-03-07 2007-03-07 Multiple image input for optical character recognition processing systems and methods WO2007103984A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US78048406P 2006-03-07 2006-03-07
US60/780,484 2006-03-07
US11/560,026 US7734092B2 (en) 2006-03-07 2006-11-15 Multiple image input for optical character recognition processing systems and methods
US11/560,026 2006-11-15

Publications (2)

Publication Number Publication Date
WO2007103984A2 true WO2007103984A2 (en) 2007-09-13
WO2007103984A3 WO2007103984A3 (en) 2008-11-06

Family

ID=38475835

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/063508 WO2007103984A2 (en) 2006-03-07 2007-03-07 Multiple image input for optical character recognition processing systems and methods

Country Status (2)

Country Link
US (1) US7734092B2 (en)
WO (1) WO2007103984A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8290273B2 (en) 2009-03-27 2012-10-16 Raytheon Bbn Technologies Corp. Multi-frame videotext recognition

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4713107B2 (en) * 2004-08-20 2011-06-29 日立オムロンターミナルソリューションズ株式会社 Character string recognition method and device in landscape
US8908998B2 (en) * 2007-12-07 2014-12-09 Educational Testing Service Method for automated quality control
US8073284B2 (en) * 2008-04-03 2011-12-06 Seiko Epson Corporation Thresholding gray-scale images to produce bitonal images
US8320674B2 (en) * 2008-09-03 2012-11-27 Sony Corporation Text localization for image and video OCR
US8452099B2 (en) * 2010-11-27 2013-05-28 Hewlett-Packard Development Company, L.P. Optical character recognition (OCR) engines having confidence values for text types
US9330323B2 (en) 2012-04-29 2016-05-03 Hewlett-Packard Development Company, L.P. Redigitization system and service
US8768058B2 (en) * 2012-05-23 2014-07-01 Eastman Kodak Company System for extracting text from a plurality of captured images of a document
US8773733B2 (en) 2012-05-23 2014-07-08 Eastman Kodak Company Image capture device for extracting textual information
US8908970B2 (en) 2012-05-23 2014-12-09 Eastman Kodak Company Textual information extraction method using multiple images
JP2014036314A (en) * 2012-08-08 2014-02-24 Canon Inc Scan service system, scan service method, and scan service program
US8947745B2 (en) 2013-07-03 2015-02-03 Symbol Technologies, Inc. Apparatus and method for scanning and decoding information in an identified location in a document
US9870520B1 (en) * 2013-08-02 2018-01-16 Intuit Inc. Iterative process for optimizing optical character recognition
US9922247B2 (en) * 2013-12-18 2018-03-20 Abbyy Development Llc Comparing documents using a trusted source
US9251139B2 (en) * 2014-04-08 2016-02-02 TitleFlow LLC Natural language processing for extracting conveyance graphs
US9619702B2 (en) * 2014-08-29 2017-04-11 Ancestry.Com Operations Inc. System and method for transcribing handwritten records using word grouping with assigned centroids
CN106874906B (en) * 2017-01-17 2023-02-28 腾讯科技(上海)有限公司 Image binarization method and device and terminal
US10984274B2 (en) * 2018-08-24 2021-04-20 Seagate Technology Llc Detecting hidden encoding using optical character recognition
US11961316B2 (en) * 2022-05-10 2024-04-16 Capital One Services, Llc Text extraction using optical character recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020067851A1 (en) * 2000-12-06 2002-06-06 Lange Peter J. Device that scans both sides of a photo and associates information found on the back of the photo with the photo
US6571013B1 (en) * 1996-06-11 2003-05-27 Lockhead Martin Mission Systems Automatic method for developing custom ICR engines
US20030113016A1 (en) * 1996-01-09 2003-06-19 Fujitsu Limited Pattern recognizing apparatus
US20070047816A1 (en) * 2005-08-23 2007-03-01 Jamey Graham User Interface for Mixed Media Reality
US7236632B2 (en) * 2003-04-11 2007-06-26 Ricoh Company, Ltd. Automated techniques for comparing contents of images

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5617484A (en) * 1992-09-25 1997-04-01 Olympus Optical Co., Ltd. Image binarizing apparatus
DE69519323T2 (en) * 1994-04-15 2001-04-12 Canon Kk System for page segmentation and character recognition
US5519786A (en) * 1994-08-09 1996-05-21 Trw Inc. Method and apparatus for implementing a weighted voting scheme for multiple optical character recognition systems
US5920655A (en) * 1995-02-10 1999-07-06 Canon Kabushiki Kaisha Binarization image processing for multi-level image data
US6226094B1 (en) * 1996-01-05 2001-05-01 King Jim Co., Ltd. Apparatus and method for processing character information
JPH11232378A (en) * 1997-12-09 1999-08-27 Canon Inc Digital camera, document processing system using the same, computer readable storage medium and program code transmitter
US6269188B1 (en) * 1998-03-12 2001-07-31 Canon Kabushiki Kaisha Word grouping accuracy value generation
DE69822608T2 (en) * 1998-05-28 2005-01-05 International Business Machines Corp. Binarization method in a character recognition system
JP4018310B2 (en) * 1999-04-21 2007-12-05 株式会社リコー Image binarization apparatus, image imaging apparatus, image binarization method, image imaging method, and computer-readable recording medium storing a program for causing a computer to function as each step of the method
US6330003B1 (en) * 1999-07-30 2001-12-11 Microsoft Corporation Transformable graphical regions
DE10034629A1 (en) * 1999-08-11 2001-03-22 Ibm Combing optical character recognition, address block location for automatic postal sorting involves connecting both systems to enable all results from one to be passed to other for processing
JP4377494B2 (en) * 1999-10-22 2009-12-02 東芝テック株式会社 Information input device
US6868524B1 (en) * 1999-10-22 2005-03-15 Microsoft Corporation Method and apparatus for text layout across a region
US6577762B1 (en) * 1999-10-26 2003-06-10 Xerox Corporation Background surface thresholding
US6738496B1 (en) * 1999-11-01 2004-05-18 Lockheed Martin Corporation Real time binarization of gray images
WO2001058129A2 (en) * 2000-02-03 2001-08-09 Alst Technical Excellence Center Image resolution improvement using a color mosaic sensor
US6351566B1 (en) * 2000-03-02 2002-02-26 International Business Machines Method for image binarization
JP4150842B2 (en) * 2000-05-09 2008-09-17 コニカミノルタビジネステクノロジーズ株式会社 Image recognition apparatus, image recognition method, and computer-readable recording medium on which image recognition program is recorded
JP3575683B2 (en) * 2000-10-05 2004-10-13 松下電器産業株式会社 Multi-element type magnetoresistive element
JP4613397B2 (en) * 2000-06-28 2011-01-19 コニカミノルタビジネステクノロジーズ株式会社 Image recognition apparatus, image recognition method, and computer-readable recording medium on which image recognition program is recorded
JP3904840B2 (en) * 2000-08-15 2007-04-11 富士通株式会社 Ruled line extraction device for extracting ruled lines from multi-valued images
US7738706B2 (en) * 2000-09-22 2010-06-15 Sri International Method and apparatus for recognition of symbols in images of three-dimensional scenes
US7062093B2 (en) * 2000-09-27 2006-06-13 Mvtech Software Gmbh System and method for object recognition
US6741745B2 (en) * 2000-12-18 2004-05-25 Xerox Corporation Method and apparatus for formatting OCR text
JP4164272B2 (en) * 2001-04-24 2008-10-15 キヤノン株式会社 Image processing apparatus and image processing method
US6741351B2 (en) * 2001-06-07 2004-05-25 Koninklijke Philips Electronics N.V. LED luminaire with light sensor configurations for optical feedback
JP4100885B2 (en) * 2001-07-11 2008-06-11 キヤノン株式会社 Form recognition apparatus, method, program, and storage medium
US6922487B2 (en) * 2001-11-02 2005-07-26 Xerox Corporation Method and apparatus for capturing text images
US7339992B2 (en) * 2001-12-06 2008-03-04 The Trustees Of Columbia University In The City Of New York System and method for extracting text captions from video and generating video summaries
US20040146200A1 (en) * 2003-01-29 2004-07-29 Lockheed Martin Corporation Segmenting touching characters in an optical character recognition system to provide multiple segmentations
JP4713107B2 (en) * 2004-08-20 2011-06-29 日立オムロンターミナルソリューションズ株式会社 Character string recognition method and device in landscape
US7724981B2 (en) * 2005-07-21 2010-05-25 Ancestry.Com Operations Inc. Adaptive contrast control systems and methods
US7650041B2 (en) * 2006-02-24 2010-01-19 Symbol Technologies, Inc. System and method for optical character recognition in an image
US20080008383A1 (en) * 2006-07-07 2008-01-10 Lockheed Martin Corporation Detection and identification of postal metermarks
US7650035B2 (en) * 2006-09-11 2010-01-19 Google Inc. Optical character recognition based on shape clustering and multiple optical character recognition processes
US8155444B2 (en) * 2007-01-15 2012-04-10 Microsoft Corporation Image text to character information conversion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030113016A1 (en) * 1996-01-09 2003-06-19 Fujitsu Limited Pattern recognizing apparatus
US6571013B1 (en) * 1996-06-11 2003-05-27 Lockhead Martin Mission Systems Automatic method for developing custom ICR engines
US20020067851A1 (en) * 2000-12-06 2002-06-06 Lange Peter J. Device that scans both sides of a photo and associates information found on the back of the photo with the photo
US7236632B2 (en) * 2003-04-11 2007-06-26 Ricoh Company, Ltd. Automated techniques for comparing contents of images
US20070047816A1 (en) * 2005-08-23 2007-03-01 Jamey Graham User Interface for Mixed Media Reality

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8290273B2 (en) 2009-03-27 2012-10-16 Raytheon Bbn Technologies Corp. Multi-frame videotext recognition

Also Published As

Publication number Publication date
US20070211942A1 (en) 2007-09-13
US7734092B2 (en) 2010-06-08
WO2007103984A3 (en) 2008-11-06

Similar Documents

Publication Publication Date Title
US7734092B2 (en) Multiple image input for optical character recognition processing systems and methods
US20210192202A1 (en) Recognizing text in image data
EP3370188B1 (en) Facial verification method, device, and computer storage medium
US10867171B1 (en) Systems and methods for machine learning based content extraction from document images
RU2721188C2 (en) Improved contrast and noise reduction on images obtained from cameras
US8917275B2 (en) Automated contrast verifications
CN110008961B (en) Text real-time identification method, text real-time identification device, computer equipment and storage medium
US20060133671A1 (en) Image processing apparatus, image processing method, and computer program
US20050047660A1 (en) Image processing apparatus, image processing method, program, and storage medium
AU2019419891B2 (en) System and method for spatial encoding and feature generators for enhancing information extraction
US9691004B2 (en) Device and method for service provision according to prepared reference images to detect target object
US8773733B2 (en) Image capture device for extracting textual information
US8228564B2 (en) Apparatus, system, and method for identifying embedded information
US8768058B2 (en) System for extracting text from a plurality of captured images of a document
JP2018200524A (en) Classification device, classification method, and classification program
CN114067335A (en) Electronic archive text recognition method, system, computer equipment and storage medium
US8908970B2 (en) Textual information extraction method using multiple images
JP5291387B2 (en) Number recognition apparatus and number recognition method
JP2007043662A (en) Image forming apparatus and image processor
US20210326629A1 (en) Systems and methods for digitized document image text contouring
US11568634B2 (en) Machine learning pipeline for document image quality detection and correction
US11657632B2 (en) Image processing device, image reading device, image processing method, and non-transitory computer readable medium, using two pieces of image data
CN114329030A (en) Information processing method and device, computer equipment and storage medium
JP2007026027A (en) Character recognition program, character recognition device and character recognition method
JP2001022883A (en) Character recognizing system and recording medium for realizing function for the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2007758095

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE