Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A system, method and computer program product for providing answers to questions based on any corpus of data. The method facilitates generating a number of candidate passages from the corpus that answer an input query, and finds the correct resulting answer by collecting supporting evidence from the multiple passages. By analyzing all retrieved passages and that passage's metadata in parallel, there is generated an output plurality of data structures including candidate answers based upon the analyzing. Then, by each of a plurality of parallel operating modules, supporting passage retrieval operations are performed upon the set of candidate answers, and for each candidate answer, the data corpus is traversed to find those passages having candidate answer in addition to query terms. All candidate answers are automatically scored causing the supporting passages by a plurality of scoring modules, each producing a module score. The modules scores are processed to determine one or more...

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US7904825Mar 14, 2007Mar 8, 2011Xerox CorporationGraphical user interface for gathering image evaluation information

Claims

1. A computer-implemented method of generating answers to questions based on any corpus of data, said method comprising:

receiving an input query and performing query context analysis upon said query to break down said input query into query terms, all query terms comprising both searchable and non-searchable components;

utilizing one of more searchable components and conducting a search in any corpus of data including structured and unstructured data to obtain passages potentially including candidate answers, all passages potentially including candidate answers being stored in a data storage device;

analyzing all retrieved passages and that passage's metadata, in a candidate answer generation module, to generate an output plurality of data structures including candidate answers based upon the analyzing;

performing, by each of a plurality of parallel operating modules, supporting passage retrieval operation upon the set of candidate answers, and for each candidate answer, traversing the said data corpus and the said data storage device to find those passages having candidate answer in addition to query terms;
automatically scoring all candidate answers using the said supporting passages by a plurality of scoring modules, each producing a module score;
applying a candidate answer ranking function to the said modules scores to determine one or more query answers; and,
generating a query response based on said one or more query answers for delivery to a user.

2. The computer-implemented method as claimed in claim 1, wherein said query context analysis includes determining, from said query, one or more predicate argument structures for each input query.

3. The computer-implemented method as claimed in claim 1, wherein said query context analysis includes determining, from said query, one or more lexical answer types for each input query.

4. The computer-implemented method as claimed in claim 1, further including extending said one or more searchable components using a functionality for term weighting and query expansion.

5. The computer-implemented method as claimed in claim 1, further comprising a candidate answer generation module operating in parallel on said passages retrieved by the said searchable components.

6. The computer-implemented method as claimed in claim 1, wherein said automatically scoring of supporting passages by multiple scorers includes conducting, in parallel, one or more analyses each producing a score.

7. The computer-implemented method as claimed in claim 6, wherein one score comprises a term match score obtained by implementing executable instructions for counting the number of terms in said supporting passage and determining if said number matches a number of terms in a candidate answer.

8. The computer-implemented method as claimed in claim 6, wherein a further produced score comprises a textual alignment score obtained by implementing executable instructions for determining if placement of words in said supporting passages are in alignment with placement of words of said candidate answers

9. The computer-implemented method as claimed in claim 8, wherein said determining if placement of words in said supporting passages are in alignment includes determining whether said words in said supporting passages are one of: a same order, a similar order, or with a similar distance between them.

10. The computer-implemented method as claimed in claim 6, wherein a further produced score comprises a deeper analysis score obtained by implementing executable instructions for determining the meaning of the supporting passages and input queries and computing how well lexical and semantic relations in the candidate answer passages are satisfied.

11. The computer-implemented method as claimed in claim 1, wherein a determined answer includes one of: a single query answer, or a ranked list of query answers.

12. The computer-implemented method as claimed in claim 11, wherein the generated query response is one of: the determined answer or an elaboration question, said elaboration question generated for delivery to a user, an elaboration question requiring user input information in response, and said user input response information being used in answer determining.

13. The computer-implemented method as claimed in claim 11, further comprising, prior to determining said query response, conducting an interactive session with said user and generating, for delivery to said user, one or more elaboration questions, an elaboration question requiring user input information in response, and said input response information being used in answer determining.

14. The computer-implemented method as claimed in claim 11, further comprising: determining if a query answer or ranked list of query answers is above a threshold rank level, and if below said threshold rank level, delivering a response to a user comprising one or more clarification questions, each clarification question requiring user input information in response, said user input information being added to said query.

15. The computer-implemented method as claimed in claim 11, wherein an input query or said query response or both said input query and query response is provided in accordance with one or more of multiple modalities including text, audio, image, video, tactile or gesture.

16. The computer-implemented method as claimed in claim 11, further comprising:

providing a previously obtained candidate answer ranking function operating on a collection of correctly scored examples by applying machine learning technique to a corpus of scored question answer pairs.

17. The computer-implemented method as claimed in claim 1, wherein said determining a single answer further comprises:

collecting results across all data structures having said scored candidate answers,

normalizing and merging candidate answers produced by a same answer scorer across multiple instances of the candidate answer, and aggregating the results; and,

implementing said candidate answer scoring function to produce said final candidate answer.

18. The computer-implemented method as claimed in claim 1, wherein said applying a candidate answer scoring function includes one of:

performing context independent scoring where the answer is scored independently of the passage; and,

performing context dependent scoring where the answer score depends on the passage content.

19. The computer-implemented method as claimed in claim 16, wherein said applying said final candidate answer scoring function to produce said final candidate answer comprises: applying a logistic regression function or linear regression function to a complete feature set or subset.

20. A system for generating answers to questions based on any corpus of data comprising:

query analysis means for receiving an input query and performing query context analysis function to break down said input query into query terms, all query terms comprising both searchable and non-searchable components;

candidate answer generating means utilizing all searchable components and conducting a search in any corpus of data including structured and unstructured data to obtain passages potentially including candidate answers,

a data storage device for storing said obtained passages potentially including said candidate answers;

means for analyzing all retrieved passages and that passage's metadata to generate an output plurality of data structures including candidate answers based upon the analyzing;
a plurality of parallel operating means each for performing supporting passage retrieval operation upon the set of candidate answers, and for each candidate answer, traversing said data corpus and the said data storage device to find those passages having candidate answer in addition to query terms;
a plurality of scoring modules each for automatically scoring all candidate answers using the said supporting passages and producing a module score; and,
means for applying a candidate answer ranking function to the said modules scores to determine one or more query answers, and, for generating a query response based on said one or more query answers for delivery to a user.

21. The system as claimed in claim 20, wherein an input query or said query response or both said input query and query response is provided in accordance with one or more of multiple modalities including text, audio, image, video, tactile or gesture.

22. The system as claimed in claim 20, wherein said query analysis means includes:

means for determining, from said query, one or more predicate argument structures for each input query; and,

means for determining, from said query, one or more lexical answer types for each input query.

23. The system as claimed in claim 20, wherein said query analysis means generates a first plurality of data structures each comprising said query terms including said searchable components, said system further including:

a first splitter means for initiating a parallel search for candidate answers by distributing said first plurality of data structures to enable concurrent search results processing providing said candidate answers.

24. The system as claimed in claim 20, wherein said candidate answer generating means generates a second plurality of data structures each comprising candidate answer sets, said system further including:

a second splitter means for splitting said candidate answer sets into separate data structures each including one or more candidate answers, and providing said data structures to said plurality of parallel operating means performing concurrent parallel supporting passage retrieval operations.

25. The system as claimed in claim 20, wherein each said plurality of scoring modules for automatically scoring all candidate answers using the said supporting passages comprises:

means for conducting, in parallel, one or more analyses each producing a score, wherein one score comprises a term match score obtained by implementing executable instructions for counting the number of terms in said supporting passage and determining if said number matches a number of terms in a candidate answer; and,

wherein a further score comprises a textual alignment score obtained by implementing executable instructions for determining if placement of words in said supporting passages are in alignment with placement of words of said candidate answers; and,

wherein a further score comprises a deeper analysis score obtained by implementing executable instructions for determining the meaning of the supporting passages and input queries by analyzing lexical or semantic relations.

26. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for generating answers to questions based on any corpus of data, said method steps including the steps of:

receiving an input query and performing query context analysis upon said query to break down said input query into query terms, all query terms comprising both searchable and non-searchable components;

utilizing all searchable components and conducting a search in any corpus of data including structured and unstructured data to obtain passages potentially including candidate answers, all passages potentially including candidate answers being stored in a data storage device;

analyzing all retrieved passages and that passage's metadata, in a candidate answer generation module, to generate an output plurality of data structures including candidate answers based upon the analyzing;

performing, by each of a plurality of parallel operating modules, supporting passage retrieval operation upon the set of candidate answers, and for each candidate answer, traversing the said data corpus and the said data storage device to find those passages having candidate answer in addition to query terms;
automatically scoring all candidate answers using the said supporting passages by a plurality of scoring modules, each producing a module score;
applying a candidate answer ranking function to the said modules scores to determine one or more query answers; and,
generating a query response based on said one or more query answers for delivery to a user.

27. A method of deploying a computer program product for generating answers to questions based on any corpus of data, wherein, when executed, the computer program performs the steps of:

receiving an input query and performing query context analysis upon said query to break down said input query into query terms, all query terms comprising both searchable and non-searchable components;

utilizing all searchable components and conducting a search in any corpus of data including structured and unstructured data to obtain passages potentially including candidate answers, all passages potentially including candidate answers being stored in a data storage device;

analyzing all retrieved passages and that passage's metadata, in a candidate answer generation module, to generate an output plurality of data structures including candidate answers based upon the analyzing;

performing, by each of a plurality of parallel operating modules, supporting passage retrieval operation upon the set of candidate answers, and for each candidate answer, traversing the said data corpus and the said data storage device to find those passages having candidate answer in addition to query terms;
automatically scoring all candidate answers using the said supporting passages by a plurality of scoring modules, each producing a module score;
applying a candidate answer ranking function to the said modules scores to determine one or more query answers; and,
generating a query response based on said one or more query answers for delivery to a user.