CROSS-REFERENCE TO RELATED APPLICATIONS
FIELD OF THE INVENTION
This application is a continuation-in-part of U.S. patent application Ser. No. 10/022,023 filed on Dec. 13, 2001. The disclosure of the above application is incorporated herein by reference.
- BACKGROUND OF THE INVENTION
The present invention generally relates to speech recognition and particularly relates to automated form filling over a telephone system.
Automatic electronic form filling by a user, particularly over a telephone system, is a notoriously laborious and error prone process. Use of numerical keypad entries to attempt retrieval of user information is one existing process that proves to be only as reliable as completeness and correctness of the user information database and user knowledge of the required information and successful operation of the telephone keypad. Other applications, such as confirming availability and/or delivery of a product over the phone or on a networked computer system, suffer from similar problems.
The key to improving the automated form filling process, by increasing reliability of information and decreasing effort on the part of a user, involves recognizing that information from multiple information sources can be fused in an intelligent manner. To be successful, the information fusion process should not trust the information sources to be reliable, and should intelligently use information from the multiple sources to constrain and supplement one another based on differences in reliability between sources and of particular information inputs. To be further successful, the fusion process should be able to incorporate a dialogue with a user to increase knowledge relating to reliability of information content, and/or gather additional information inputs.
- SUMMARY OF THE INVENTION
Automated form filling processes do not currently succeed in intelligently fusing information from multiple information sources based on knowledge relating to differences in reliability information form different sources, thereby simultaneously increasing reliability of form contents and decreasing effort on the part of the user. The need remains, therefore, for a solution to the problems associated with automated form filling as detailed above. The present invention provides such a solution.
In accordance with the present invention, an automated form filling system includes an input receptive of a plurality of information inputs from a plurality of information sources. An information fuser is operable to select information from the plurality of information inputs based on a comparison of the information inputs, and based on knowledge relating to reliability of the information sources. A form filler is operable to fill an electronic form with the selected information.
The form filling system of the present invention is advantageous over previous form filling systems in that it has knowledge relating to reliability of multiple information sources and is able to compare and select information content accordingly. It is further advantageous in its further aspects, wherein a prompt formulator is capable of formulating a prompt based on a comparison of the information content of the sources, and based on a level of knowledge relating to reliability of the source content, wherein the formulated prompt is designed to elicit a response from a user operable to increase knowledge of reliability relating to the multiple information sources. A resulting dialog with a user succeeds in filling out an electronic form while simultaneously decreasing effort on the user's part and increasing reliability of the form contents.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is particularly suitable for use with electronic form filling over a telephone, wherein user input and accessible databases are unreliable information sources. Further areas of applicability of the present invention will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The present invention will become more fully understood from the detailed description and the accompanying drawings, wherein:
FIG. 1 is an illustrated block diagram depicting a general implementation of the present invention with a telephone call center;
FIG. 2 is a flow chart depicting the method of the present invention;
FIG. 3 is a block diagram depicting a form filling system according to the present invention; and
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
FIG. 4 is a partial block and information flow diagram depicting a detailed implementation of the present invention with a telephone call center.
The following description of the preferred embodiment(s) is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. In a preferred embodiment, the present invention is employed to perform form filling by fusing multimodal user input over a telephone with one or more user information databases. The form filling system 100 of FIG. 1 exemplifies one implementation of the preferred embodiment of the present invention to fill out a form requiring a user's name, address, and telephone number.
According to form filling system 100, a user, one Mr. Baker, is initially prompted to speak his name into the telephone receiver as at 102, and to spell his name via the telephone keypad as at 104. The speech input 106 is communicated to an automatic speech recognizer 108, whereas the keypad entry 110 is communicated to an information database 112 of names indexed by predefined classes defined by the telephone keypad. In turn, a constraint list 114 of candidate names is generated from the keypad entry 110 and the information database 112 of names, and the generated constraint list 114 is further communicated to the speech recognizer 108. The speech recognizer functions to recognize the speech input 106 by generating a plurality of speech recognition hypotheses, and then selects the N best generated hypotheses by comparing them to the constraint list 114. Thus, if “Bater” were one of the original speech recognition hypotheses, it would be discarded based on it's absence from the constraint list.
The information database 112 further serves as an information source having the names and addresses of most residents of the nation in which the user, Mr. Baker, is located, so frequency information 116 relating to the frequency with which names appear in the database can be communicated to rescoring module 118 used to further rescore the N best speech recognition hypotheses. Thus, if “Baker” and “Bakes” are both present in the N best speech recognition hypotheses, they can be rescored to increase the ranking of “Baker” with respect to the ranking of “Bakes” based on a higher frequency of “Baker” in the information database 112 compared to the frequency of “Bakes”.
The user is additionally prompted to enter his or her postal code, such as a zip code, and the received postal code 120, the rescored speech recognition hypotheses, and the information database 112 are communicated to an information fuser 122. The information fuser 122, in turn, selects information content of the information database 112 based on the rescored recognition hypotheses and the received zip code 120. The information fuser 122, however, does not merely trust the database 112, the rescored hypotheses, and the received zip code 120 to be accurate. Instead, the information fuser 122 recognizes and adjusts for several potential causes of unreliability.
The data communicated to information fuser 122 may be unreliable for several reasons. For example, the user's name may be spoken in such a way that the speech recognition system misrecognizes it. Also, the customer may mistype his or her postal code. Further, the customer database may be incomplete (address missing) or incorrect (name or address misspelled, information out of date). For these and similar reasons, the information fuser employs an information selection strategy that selects the most reliable information available.
The information fuser 122 selects the most reliable information available based on knowledge relating to reliability of various types of information sources, and based on a comparison of the information content from the different sources. For example, if the top-ranking speech recognition hypothesis does not match any name in the information database 112 having the received zip code 120, but the second-highest ranking speech recognition hypothesis has only a slightly lower score than the top-ranking hypothesis and does match a name in the information database 112 having the received zip code 120, then the information fuser can select the name indicated by the second-highest ranking speech recognition hypothesis and prompt the user for confirmation. Similarly, if the top-ranking speech recognition hypothesis does not match any name in the information database 112 having the received postal code 120, but has a much higher score than the second-highest ranking speech recognition hypothesis, then the information fuser 122 can select the name and address matching the highest ranked speech recognition hypothesis and/or prompt the user to reenter the postal code or confirm whether the postal code is correct. Further, the information fuser may take a different approach by distinguishing between an entirely incorrect postal code and one that is only partially incorrect, and further consider the first two digits of the postal code (in the case of a zip code) more reliable than the last three digits of the postal code.
Form filling system 100 preferably has an interviewer 123 for implementing a strategy to prompt the user for input increasing knowledge relating to reliability of information content. The interviewer 123 can be and/or incorporate a human agent to assist in confirming, correcting, selecting, and/or supplementing information. Alternatively, the interviewer 123 can be partially or wholly automated using a prompt formulator to request different inputs from the user in different situations. Accordingly, information fuser 122 can select, deselect, and/or reselect information content based on the increased knowledge relating to the reliability of the information content, and the user responses constitute new information sources to be used in the fusion process. Thus, the prompt formulator can preferably ask the user to supply supplemental information for fields of the form 124 for which reliable information cannot be obtained with the information sources currently available. Form filling system 100 also has a form filler for filling appropriate fields of the electronic form 124 with the selected information content.
The method 200 of the present invention is illustrated in FIG. 2, and begins at 202. Thence, the method 200 proceeds to step 203, wherein information inputs are sought, preferably by initially prompting a user for specific inputs where a user is an applicable information source. Thence, the method proceeds to step 204, wherein multiple information inputs are received from multiple information sources. In addition to information inputs provided by a user, these information inputs may also include data from an information database, and/or additional data such as that provided by caller ID or a biometric (measured physical characteristic: fingerprint, retina scan, voice pattern, DNA, etc.) of a user. Thence, the method 200 proceeds to step 206, wherein information content is selected from one or more of the information sources based on a comparison of source contents and knowledge relating to reliability of the information sources. In one aspect, the knowledge of reliability of a source relates to the type of source, and thus is prior knowledge. In another aspect, however, the knowledge relating to reliability of an information source and/or specific information contents stems from a comparison of the contents of the information sources. Thus, content of an information source of a reliable type may be deemed less reliable based on comparison with content of another information source when the information content conflicts. Similarly, content of an information source of an unreliable type may be deemed more reliable based on comparison with content of another information source when the information content matches. This process can be used to identify more and less reliable portions of information content within one or more information sources.
Once selection of information has taken place, method 200 proceeds to step 208, wherein an electronic form is filled with the selected information. If the form is deemed reliably completed as at 210, then the method ends at 214 and a filled form has been generated. On the other hand, some or all of the selected information may be deemed insufficiently reliable, and/or a sufficient amount of fields of the form may not be deemed completed. In either of these latter cases, method 200 returns to step 203.
In step 203, the user is prompted for additional information inputs, and the prompt is designed to elicit a response to increase knowledge of reliability relating to selectable information and/or gather supplemental information. Thus, the request for additional sources may be a request for confirmation of selected information, a request to reenter one of the user inputs, a request for a different information type, or a similar type of request. With this step, speech generation is preferred to communicate the request, especially over a telephone system. Following, the prompt for additional information inputs, the method 200 returns to step 204, wherein the response is received. The method then proceeds again to step 206, wherein new information content is selected and/or it is determined whether selected content is reliable based on the new information sources. The method 200 further proceeds to step 208 and fills the form based on the revised selection. If the newly filled form is deemed reliably completed at step 210, then the method 200 ends at 216. Otherwise, processing continues in a recursive fashion until a reliably completed form is obtained or the process is otherwise interrupted.
A form filling system 300 of the present invention is more generally illustrated in FIG. 3, wherein a first information input 302 from a first information source and a second information input 304 from a second information source are received by input 306 and communicated to information fuser 308. Information fuser 308 compares and selects information based on reliability of the information as determined based on the comparison and in accordance with predefined rules developed with regard to reliability of different types of information sources and/or information inputs. The selected information 310 is communicated to form filler 312, which fills an electronic form 314 with the selected information to generate a filled form 316 and communicate it to an outside system via output 316.
Information fuser 308 mutually communicates with prompt formulator 320, and prompt formulator 320 formulates a prompt 322 for additional information sources, if needed, based on reliability of the information sources, comparative and/or objective reliability of the available information inputs, and/or requirements for supplemental information content. Prompt formulator 320 further communicates its current state to information fuser 308 so that information fuser 308 is aware of the type of information input(s) requested and how to interpret its information content in view of the other information inputs. The formulated prompt 322 is communicated to a dialogue manager 324 that generates a prompt in a manner communicable to and understandable by a user, preferably by speech generation. The generated prompt is communicated to the user via output 328. A response from the user constitutes an additional information input communicable to information fuser 308 via input 306.
A detailed implementation of the present invention with a telephone call center is described with reference to FIG. 4. Therein, multi-modal information inputs 400 from an information source corresponding to a user include a user speech input 402 and a user keypad entry 404. The user speech input includes a spoken user name, and the user keypad entry includes a spelling and/or initials of the user name and a zip code for the user location. Also, textual information inputs 406 from database information sources include nickname data input 408 from a nickname database and personal data input 410 from a personal information database. Use of nickname data input 408 in the present detailed implementation represents an improvement over the more simplified implementation previously discussed with reference to FIG. 1. Further, additional data inputs 412 that also represent an improvement include telephone subscriber data 414 provided by a caller ID service. Other examples of additional data inputs 412 that can also be used include biometrics identifying a user and gathered, for example, by a handheld device the user employs to communicate with the call center. Further examples include an IP address for the user location, assuming the user communicates with the call center using a computer network. These types of information inputs all assist to varying degrees in identifying the caller, and the form filling system of the present invention is adapted to use some or all of these types of information inputs accordingly.
Form filling system 100 receives the multi-modal information inputs 400, textual information inputs 406, and additional data inputs 412 and uses the various information inputs to constrain and supplement one another according to their varying modalities, utilities, and levels of reliability. For example, user speech input 402 is processed by speech recognizer 108A to produce a plurality of speech recognition hypotheses 414. Also, user keypad entry 404, additional data inputs 412, nickname data input 408, and personal data input 410 are all communicated to constraint list generator 416.
Constraint list generator 416 is adapted in the present implementation to generate a constraint list of candidate names by using a user keypad entry 404 containing a first and last name is used to access the nickname database and generate a plurality of first names based on the input first name, such that a keypad entry generating a constraint list containing “Bob” will also contain “Robert”, “Robby”, “Bobby”, and so on. The plurality of first names thus generated, a zip code from the user keypad entry, and any matching caller ID information are then used to access the personal information database, such as Phonedisc, containing names and addresses of all telephone subscribes in the United States, to generate a constraint list 114 that is communicated to N best hypotheses generator 108B.
The speech recognition hypotheses 414 are processes by N best hypotheses generator 108B to generate an N best list of speech recognition hypotheses 418. One skilled in the art of speech recognition will recognize that an alternative embodiment may be realized with a word lattice. The N best list of speech recognition hypotheses 418 is communicated to rescoring mechanism 118, as is data from the personal information database pertaining to frequency of appearance of names in the personal information database. Rescoring mechanism 118 rescores the N best list of speech recognition hypotheses 418 to generate a list of rescored hypotheses 420. Caller ID information and a postal code portion of the user keypad entry 404 may alternatively and/or additionally be used during this process to rescore hypotheses based on the frequency of names in the database having the corresponding postal code and/or names matching the caller ID information.
The rescored hypotheses 420, caller ID information, postal code, and personal data input 410 are communicated to the information fuser 122 and the prompt formulator 320. The information fuser selects most reliable information from the personal data input 410, caller ID information, and rescored hypotheses 420, and the selected data 310 is communicated to the prompt formulator 320 and to form filler 312. Form filler fills electronic form 314 to produce a filled form 316 that is partially or wholly filled with the selected data. Meanwhile, the prompt formulator 320 determines whether more information is needed based on the selected data, the information available for selection, and requirements of the electronic form. If more information inputs are required, the prompt formulator formulates an appropriate prompt 322 and communicates a current state 422 to information fuser 122. Otherwise, the form is deemed completed.
The description of the invention is merely exemplary in nature and, thus, variations that do not depart from the gist of the invention are intended to be within the scope of the invention. For example, one may recognize that information fusion according to the present invention occurs at several levels and at several points in the information selection process. Thus, information fusion is used in generating a constraint list, and in altering confidence scores associated with speech recognition hypotheses. A multi-layered information fusion-based form filling system is thus within the scope of the present invention, and various embodiments may be realized with respect to various types of available inputs, various modalities of input, and various applications of form filling. Such variations are not to be regarded as a departure from the spirit and scope of the invention.