CA2457693A1 - Method and system for enhanced data searching - Google Patents

Method and system for enhanced data searching Download PDF

Info

Publication number
CA2457693A1
CA2457693A1 CA002457693A CA2457693A CA2457693A1 CA 2457693 A1 CA2457693 A1 CA 2457693A1 CA 002457693 A CA002457693 A CA 002457693A CA 2457693 A CA2457693 A CA 2457693A CA 2457693 A1 CA2457693 A1 CA 2457693A1
Authority
CA
Canada
Prior art keywords
meaningful
grammatical
terms
sentence
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002457693A
Other languages
French (fr)
Other versions
CA2457693C (en
Inventor
Giovanni B. Marchisio
Krzysztof Koperski
Jisheng Liang
Alejandro Murua
Thien Nguyen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VCVC III LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2457693A1 publication Critical patent/CA2457693A1/en
Application granted granted Critical
Publication of CA2457693C publication Critical patent/CA2457693C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching

Abstract

Methods and systems for syntactically indexing and searching data sets to achieve more accurate search results are provided. Example embodiments provide a Syntactic Query Engine ("SQE") that parses, indexes, and stores a data set, as well as processes natural language queries subsequently submitted against the data set. The SQE comprises a Query Preprocessor, a Data Set Preprocessor, a Query Builder, a Data Set Indexer, an Enhanced Natural Language Parser ("ENLP"), a data set repository, and, in some embodiments, a user interface.
After preprocessing the data set, the SQE parses the data set and determines the syntactic and grammatical roles of each term to generate enhanced data representations for each object in the data set. The SQE indexes and stores these enhanced data representations in the data set repository. Upon subsequently receiving a query, the SQE parses the query similarly and searches the indexed stored data set to locate data that contains similar terms used in similar grammatical roles. In this manner, the SQE is able to achieve more contextually accurate search results more frequently than using traditional search engines.

Claims (221)

1. A method in a computer system for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements;
determining a set of meaningful terms of the sentence from the syntactic elements;
determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term;
determining an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles; and storing in an enhanced data representation data structure a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that at least one meaningful term is associated with a plurality of grammatical relationships.
2. The method of claim 1 wherein heuristics are used to determine the additional grammatical role for the at least one of the meaningful terms.
3. The method of claim 2 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with an object as the additional grammatical role.
4. The method of claim 2 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with a subject as the additional grammatical role.
5. The method of claim 2 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with a verb as the additional grammatical role.
6. The method of claim 2 wherein a meaningful term is associated with a subject as the determined grammatical role and is associated with an object as the additional grammatical role.
7. The method of claim 2 wherein a meaningful term is associated with a object as the determined grammatical role and is associated with a subject as the additional grammatical role.
8. The method of claim 2 wherein a meaningful term is associated with a noun modifier as the determined grammatical role and is associated with a subject as the additional grammatical role.
9. The method of claim 2 wherein a meaningful term is associated with a noun modifier as the determined grammatical role and is associated with an object as the additional grammatical role.
10. The method of claim 1 wherein the determined additional grammatical role is a part of grammar that is not implied by the position of the at least one meaningful term relative to the structure of the sentence.
11. The method of claim 1 wherein heuristics are used to determine which grammatical relationships are to be stored in the enhanced data representation data structure.
12. The method of claim 1 wherein the determining the grammatical role for each meaningful term and the determining of the additional grammatical role for at least one of the meaningful terms yields a plurality of grammatical relationships between meaningful terms that are identical.
13. The method of claim 1 wherein the determining of a grammatical role for each meaningful term includes determining whether the term is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, and verb modifier.
14. The method of claim 1 wherein the document is part of a corpus of heterogeneous documents.
15. The method of claim 1 wherein the document comprises text and graphics and a sentence is created to correspond to and to describe each portion of graphics.
16. The method of claim 1 wherein the enhanced data representation data structure is used to index a corpus of documents.
17. The method of claim 1 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
18. The method of claim 17 wherein results are returned that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
19. The method of claim 18 wherein the objects in the corpus are sentences and sentences are returned that satisfy the query.
20. The method of claim 18, further comprising returning paragraphs that contain similar terms to those found in an indicated sentence.
21. The method of claim 18, further comprising returning documents that contain similar terms to those found in an indicated sentence.
22. The method of claim 17 wherein terms that are associated with designated grammatical roles are returned for each object in the corpus that contains similar terms associated with similar grammatical roles to the terms and associated roles of designated relationships from the enhanced data representation data structure.
23. The method of claim 22 wherein heuristics are used to determine the designated relationships from the enhanced data representation data structure.
24. The method of claim 17 further comprising adding additional grammatical relationships to the enhanced data representation data structure to be used to execute a query against objects in a corpus of documents.
25. The method of claim 24 wherein heuristics are used to determine the additional grammatical relationships.
26. The method of claim 24 wherein at least one of entailed verbs and related verbs are used to add additional grammatical relationships.
27. The method of claim 17 wherein weighted results are returned that satisfy the query.
28. A computer-readable memory medium containing instructions for controlling a computer processor to transform a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, by:
for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements;
determining a set of meaningful terms of the sentence from the syntactic elements;
determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term;
determining an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles; and storing in an enhanced data representation data structure a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that at least one meaningful term is associated with a plurality of grammatical relationships.
29. The computer-readable memory medium of claim 28 wherein heuristics are used to determine the additional grammatical role for the at least one of the meaningful terms.
30. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with an object as the additional grammatical role.
31. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with a subject as the additional grammatical role.
32. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with a verb as the additional grammatical role.
33. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a subject as the determined grammatical role and is associated with an object as the additional grammatical role.
34. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a object as the determined grammatical role and is associated with a subject as the additional grammatical role.
35. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a noun modifier as the determined grammatical role and is associated with a subject as the additional grammatical role.
36. The computer-readable memory medium of claim 29 wherein a meaningful term is associated with a noun modifier as the determined grammatical role and is associated with an object as the additional grammatical role.
37. The computer-readable memory medium of claim 28 wherein the determined additional grammatical role is a part, of grammar that is not implied by the position of the at least one meaningful term relative to the structure of the sentence.
38. The computer-readable memory medium of claim 28 wherein heuristics are used to determine which grammatical relationships are to be stored in the enhanced data representation data structure.
39. The computer-readable memory medium of claim 28 wherein the determining the grammatical role for each meaningful term and the determining of the additional grammatical role for at least one of the meaningful terms yields a plurality of grammatical relationships between meaningful terms that are identical.
40. The computer-readable memory medium of claim 28 wherein the determining of a grammatical role for each meaningful term includes determining whether the term is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, and verb modifier.
41. The computer-readable memory medium of claim 28 wherein the document is part of a corpus of heterogeneous documents.
42. The computer-readable memory medium of claim 28 wherein the document comprises text and graphics and a sentence is created to correspond to and to describe each portion of graphics.
43. The computer-readable memory medium of claim 28 wherein the enhanced data representation data structure is used to index a corpus of documents.
44. The computer-readable memory medium of claim 28 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
45. The computer-readable memory medium of claim 44 wherein results are returned that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
46. The computer-readable memory medium of claim 45 wherein the objects in the corpus are sentences and sentences are returned that satisfy the query.
47. The computer-readable memory medium of claim 45, the instructions further controlling the computer processor by returning paragraphs that contain similar terms to those found in an indicated sentence.
48. The computer-readable memory medium of claim 45, the instructions further controlling the computer processor by returning documents that contain similar terms to those found in an indicated sentence.
49. The computer-readable memory medium of claim 44 wherein terms that are associated with designated grammatical roles are returned for each object in the corpus that contains similar terms associated with similar grammatical roles to the terms and associated roles of designated relationships from the enhanced data representation data structure.
50. The computer-readable memory medium of claim 49 wherein heuristics are used to determine the designated relationships from the enhanced data representation data structure.
51. The computer-readable memory medium of claim 44, the instructions further controlling the computer processor by adding additional grammatical relationships to the enhanced data representation data structure to be used to execute a query against objects in a corpus of documents.
52. The computer-readable memory medium of claim 51 wherein heuristics are used to determine the additional grammatical relationships.
53. The computer-readable memory medium of claim 51 wherein at least one of entailed verbs and related verbs are used to add additional grammatical relationships.
54. The computer-readable memory medium of claim 44 wherein weighted results are returned that satisfy the query.
55. A syntactic query engine for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
parser that is structured to decompose each sentence to generate a parse structure for the sentence having a plurality of syntactic elements;
and postprocessor that is structured to receive from the parser the parse structure of the sentence;
determine a set of meaningful terms of the sentence from the syntactic elements;
determine from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term;
determine an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles; and store, in an enhanced data representation data structure, a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that at least one meaningful term is associated with a plurality of grammatical relationships.
56. The query engine of claim 55 wherein the postprocessor uses heuristics to determine the additional grammatical role for the at least one of the meaningful terms.
57. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a verb modifier as the determined grammatical role and with an object as the additional grammatical role.
58. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a verb modifier as the determined grammatical role and with a subject as the additional grammatical role.
59. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a verb modifier as the determined grammatical role and with a verb as the additional grammatical role.
60. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a subject as the determined grammatical role and with an object as the additional grammatical role.
61. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a object as the determined grammatical role and with a subject as the additional grammatical role.
62. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a noun modifier as the determined grammatical role and with a subject as the additional grammatical role.
63. The query engine of claim 56 wherein the postprocessor associates a meaningful term with a noun modifier as the determined grammatical role and with an object as the additional grammatical role.
64. The query engine of claim 55 wherein the determined additional grammatical role is a part of grammar that is not implied by the position of the at least one meaningful term relative to the structure of the sentence.
65. The query engine of claim 55 wherein the postprocessor uses heuristics to determine which grammatical relationships are to be stored in the enhanced data representation data structure.
66. The query engine of claim 55 wherein the determining the grammatical role for each meaningful term and the determining of the additional grammatical role for at least one of the meaningful terms yields a plurality of grammatical relationships between meaningful terms that are identical.
67. The query engine of claim 55 wherein the determining of a grammatical role for each meaningful term includes determining whether the term is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, and verb modifier.
68. The query engine of claim 55 wherein the document is part of a corpus of heterogeneous documents.
69. The query engine of claim 55 wherein the document comprises text and graphics and a sentence is created to correspond to and to describe each portion of graphics.
70. The query engine of claim 55 wherein the enhanced data representation data structure is used to index a corpus of documents.
71. The query engine of claim 55, further comprising a query processor that uses the enhanced data representation data structure to execute a query against objects in a corpus of documents.
72. The query engine of claim 71 wherein the query processor returns results that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
73. The query engine of claim 72 wherein the objects in the corpus are sentences and the query processor returns sentences that satisfy the query.
74. The query engine of claim 72 wherein the query processor returns paragraphs that contain similar terms to those found in an indicated sentence.
75. The query engine of claim 72 wherein the query processor returns documents that contain similar terms to those found in an indicated sentence.
76. The query engine of claim 71 wherein the query processor returns terms that are associated with designated grammatical roles for each object in the corpus that contains similar terms associated with similar grammatical roles to the terms and associated roles of designated relationships from the enhanced data representation data structure.
77. The query engine of claim 76 wherein heuristics are used to determine the designated relationships from the enhanced data representation data structure.
78. The query engine of claim 71 wherein the query processor adds dding additional grammatical relationships to the enhanced data representation data structure to be used to execute a query against objects in a corpus of documents.
79. The query engine of claim 78 wherein heuristics are used to determine the additional grammatical relationships.
80. The query engine of claim 78 wherein the query processor uses at least one of entailed verbs and related verbs to add additional grammatical relationships.
81. The query engine of claim 71 wherein the query processor returns weighted results that satisfy the query.
82. A method in a computer system for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements;

determining a set of meaningful terms of the sentence from these syntactic elements;

determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term, wherein at least one of the grammatical roles for a meaningful term is at least one of a verb modifier of a prepositional phrase and a noun modifier of a noun phrase;
and storing in an enhanced data representation data structure a representation of each meaningful term associated with its determined grammatical role, in a manner that indicates a grammatical relationship between a plurality of the meaningful units.
83. The method of claim 82, further comprising storing the full grammar of the sentence.
84. The method of claim 82, further comprising, when it is determined that a noun modifier grammatical role is associated with one of the meaningful terms, associating the one of the meaningful terms with a subject grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as a subject of the sentence.
85. The method of claim 84 wherein the noun modifier is a modifier of a noun that is used as an object of the sentence.
86. The method of claim 84 wherein the noun modifier is a modifier of a noun that is used as a subject of the sentence.
87. The method of claim 82, further comprising, when it is determined that a noun modifier grammatical role is associated with one of the meaningful terms, associating the one of the meaningful terms with an object grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as an object of the sentence.
88. The method of claim 87 wherein the noun modifier is a modifier of a noun that is stored as an object of the sentence.
89. The method of claim 87 wherein the noun modifier is a modifier of a noun that is used as a subject of the sentence.
90. The method of claim 82, further comprising, when it is determined that a verb modifier of a prepositional phrase is a grammatical role associated with one of the meaningful terms, associating the one of the meaningful terms with an object grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as an object of the sentence.
91. The method of claim 82 wherein heuristics are used to determine which grammatical relationships are to be stored in the enhanced data representation data structure.
92. The method of claim 82 wherein a plurality of grammatical relationships between meaningful terms that are identical are stored in the enhanced data representation data structure.
93. The method of claim 82 wherein the document is part of a corpus of heterogeneous documents.
94. The method of claim 82 wherein the document comprises text and graphics and a sentence is created to correspond to and to describe each portion of graphics.
95. The method of claim 82 wherein the enhanced data representation data structure is used to index a corpus of documents.
96. The method of claim 82 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
97. The method of claim 96 wherein results are returned that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
98 98. The method of claim 97 wherein the objects in the corpus are sentences and sentences are returned that satisfy the query.
99. The method of claim 97, further comprising returning paragraphs that contain similar terms to those found in an indicated sentence.
100. The method of claim 97, further comprising returning documents that contain similar terms to those found in an indicated sentence.
101. The method of claim 96 wherein terms that are associated with designated grammatical roles are returned for each object in the corpus that contains similar terms associated with similar grammatical roles to the terms and associated roles of designated relationships from the enhanced data representation data structure.
102. The method of claim 101 wherein heuristics are used to determine the designated relationships from the enhanced data representation data structure.
103. The method of claim 96 further comprising adding additional grammatical relationships to the enhanced data representation data structure to be used to execute a query against objects in a corpus of documents.
104. The method of claim 103 wherein heuristics are used to determine the additional grammatical relationships.
105. The method of claim 103 wherein at least one of entailed verbs and related verbs are used to add additional grammatical relationships.
106. The method of claim 96 wherein weighted results are returned that satisfy the query.
107. A computer-readable memory medium containing instructions for controlling a computer processor to transform a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, by:
for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements;
determining a set of meaningful terms of the sentence from these syntactic elements;
determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term, wherein at least one of the grammatical roles for a meaningful term is at least one of a verb modifier of a prepositional phrase and a noun modifier of a noun phrase;
and storing in an enhanced data representation data structure a representation of each meaningful term associated with its determined grammatical role, in a manner that indicates a grammatical relationship between a plurality of the meaningful units.
108. The computer-readable memory medium of claim 107, the instructions further controlling the computer processor to store the full grammar of the sentence.
109. The computer-readable memory medium of claim 107, the instructions further controlling the computer processor by, when it is determined that a noun modifier grammatical role is associated with one of the meaningful terms, associating the one of the meaningful terms with a subject grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as a subject of the sentence.
110. The computer-readable memory medium of claim 109 wherein the noun modifier is a modifier of a noun that is used as an object of the sentence.
111. The computer-readable memory medium of claim 109 wherein the noun modifier is a modifier of a noun that is used as a subject of the sentence.
112. The computer-readable memory medium of claim 107, the instructions further controlling the computer processor by, when it is determined that a noun modifier grammatical role is associated with one of the meaningful terms, associating the one of the meaningful terms with an object grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as an object of the sentence.
113. The computer-readable memory medium of claim 112 wherein the noun modifier is a modifier of a noun that is stored as an object of the sentence.
114. The computer-readable memory medium of claim 109 wherein the noun modifier is a modifier of a noun that is used as a subject of the sentence.
115. The computer-readable memory medium of claim 107, the instructions further controlling the computer processor by, when it is determined that a verb modifier of a prepositional phrase is a grammatical role associated with one of the meaningful terms, associating the one of the meaningful terms with an object grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as an object of the sentence.
116. The computer-readable memory medium of claim 107 wherein heuristics are used to determine which grammatical relationships are to be stored in the enhanced data representation data structure.
117. The computer-readable memory medium of claim 107 wherein a plurality of grammatical relationships between meaningful terms that are identical are stored in the enhanced data representation data structure.
118. The computer-readable memory medium of claim 107 wherein the document is part of a corpus of heterogeneous documents.
119. The computer-readable memory medium of claim 107 wherein the document comprises text and graphics and a sentence is created to correspond to and to describe each portion of graphics.
120. The computer-readable memory medium of claim 107 wherein the enhanced data representation data structure is used to index a corpus of documents.
121. The computer-readable memory medium of claim 107 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
122. The computer-readable memory medium of claim 121 wherein results are returned that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
123. The computer-readable memory medium of claim 122 wherein the objects in the corpus are sentences and sentences are returned that satisfy the query.
124. The computer-readable memory medium of claim 122, the instructions further controlling the computer processor to return paragraphs that contain similar terms to those found in an indicated sentence.
125. The computer-readable memory medium of claim 122, the instructions further controlling the computer processor to return documents that contain similar terms to those found in an indicated sentence.
126. The computer-readable memory medium of claim 121 wherein terms that are associated with designated grammatical roles are returned for each object in the corpus that contains similar terms associated with similar grammatical roles to the terms and associated roles of designated relationships from the enhanced data representation data structure.
127. The computer-readable memory medium of claim 126 wherein heuristics are used to determine the designated relationships from the enhanced data representation data structure.
128. The computer-readable memory medium of claim 121, the instructions further controlling the computer processor to add additional grammatical relationships to the enhanced data representation data structure to be used to execute a query against objects in a corpus of documents.
129. The computer-readable memory medium of claim 128 wherein heuristics are used to determine the additional grammatical relationships.
130. The computer-readable memory medium of claim 128 wherein at least one of entailed verbs and related verbs are used to add additional grammatical relationships.
131. The computer-readable memory medium of claim 121 wherein weighted results are returned that satisfy the query.
132. A syntactic query engine for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
parser that is structured to decompose each sentence to generate a parse structure for the sentence having a plurality of syntactic elements;
and postprocessor that is structured to receive from the parser the parse structure of the sentence;
determine a set of meaningful terms of the sentence from the syntactic elements;
determine from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term, wherein at least one of the grammatical roles for a meaningful term is at least one of a verb modifier of a prepositional phrase and a noun modifier of a noun phrase;
and store in an enhanced data representation data structure a representation of each meaningful term associated with its determined grammatical role, in a manner that indicates a grammatical relationship between a plurality of the meaningful units.
133. The query engine of claim 132 wherein the postprocessor stores the full grammar of the sentence.
134. The query engine of claim 132 wherein the postprocessor, when it is determined that a noun modifier grammatical role is associated with one of the meaningful terms, is further structured to associate the one of the meaningful terms with a subject grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as a subject of the sentence.
135. The query engine of claim 134 wherein the noun modifier is a modifier of a noun that is used as an object of the sentence.
136. The query engine of claim 134 wherein the noun modifier is a modifier of a noun that is used as a subject of the sentence.
137. The query engine of claim 132 wherein the postprocessor, when it is determined that a noun modifier grammatical role is associated with one of the meaningful terms, is further structured to associate the one of the meaningful terms with an object grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as an object of the sentence.
138. The query engine of claim 137 wherein the noun modifier is a modifier of a noun that is stored as an object of the sentence.
139. The query engine of claim 137 wherein the noun modifier is a modifier of a noun that is used as a subject of the sentence.
140. The query engine of claim 132 wherein the postprocessor, when it is determined that a verb modifier of a prepositional phrase is a grammatical role associated with one of the meaningful terms, is further structured to associate the one of the meaningful terms with an object grammatical role, thereby indicating that the one of the meaningful terms is to be stored also as an object of the sentence.
141. The query engine of claim 132 wherein heuristics are used to determine which grammatical relationships are to be stored in the enhanced data representation data structure.
142. The query engine of claim 132 wherein a plurality of grammatical relationships between meaningful terms that are identical are stored in the enhanced data representation data structure.
143. The query engine of claim 132 wherein the document is part of a corpus of heterogeneous documents.
144. The query engine of claim 132 wherein the document comprises text and graphics and a sentence is created to correspond to and to describe each portion of graphics.
145. The query engine of claim 132 wherein the enhanced data representation data structure is used to index a corpus of documents.
146. The query engine of claim 132 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
147. The query engine of claim 146, further comprising a query processors that returns results that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
148. The query engine of claim 147 wherein the objects in the corpus are sentences and the query processor returns sentences that satisfy the query.
149. The query engine of claim 147 wherein the query processor returns paragraphs that contain similar terms to those found in an indicated sentence.
150. The query engine of claim 147 wherein the query processor returns documents that contain similar terms to those found in an indicated sentence.
151. The query engine of claim 146 wherein the query processor returns terms that are associated with designated grammatical roles for each object in the corpus that contains similar terms associated with similar grammatical roles to the terms and associated roles of designated relationships from the enhanced data representation data structure.
152. The query engine of claim 151 wherein heuristics are used to determine the designated relationships from the enhanced data representation data structure.
153. The query engine of claim 146 wherein the query processor adds additional grammatical relationships to the enhanced data representation data structure to be used to execute a query against objects in a corpus of documents.
154. The query engine of claim 153 wherein heuristics are used to determine the additional grammatical relationships.
155. The query engine of claim 153 wherein the query processor uses at least one of entailed verbs and related verbs to add additional grammatical relationships.
156. The query engine of claim 146 wherein the query processor returns weighted results that satisfy the query.
157. A method in a computer system for storing a normalized data structure representing a document of a data set, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
for each sentence, determining a set of meaningful terms of the sentence and at least one grammatical role for each meaningful term; and storing sets of grammatical relationships between a plurality of meaningful terms based upon the determined grammatical role of each meaningful term relative to a meaningful term that is being used as a governing verb, wherein, for each meaningful term that is being used as a governing verb, the normalized data structure contains a set of meaningful terms that are subjects relative to the governing verb, a set of meaningful terms that are objects relative to the governing verb, and at least one of a set of meaningful terms that are verb modifiers of prepositional phrases that contain the governing verb and a set of meaningful terms that are noun modifiers of noun phrases that relate to the governing verb.
158. The method of claim 157, further comprising storing meaningful terms that correspond to a designated attribute.
159. The method of claim 158 wherein the designated attribute is at least one of country name, date, money, amount, number, location, person, corporate name, and organization.
160. The method of claim 157 wherein the sets of meaningful terms are stored as a plurality of tables.
161. The method of claim 160 wherein the tables comprise a subject table, an object table, a subject-object table, and at least one of a preposition table and a noun modifier table.
162. The method of claim 161 wherein the tables further comprise a sentence table that stores the text of the sentence.
163. The method of claim 161 wherein the tables further comprise an attributes table that stores meaningful terms that are associated with designated attributes.
164. The method of claim 160 wherein the preposition table contains for each meaningful term used as a verb in the sentence, a list of meaningful terms that are prepositions of the meaningful term used as the verb and at least one meaningful term that is a verb modifier associated with each preposition.
165. The method of claim 160 wherein the noun modifier table contains a list of meaningful terms that are noun modifiers of a meaningful terms that is used as a noun in the sentence.
166. The method of claim 157 wherein the tables are tables in a data base.
167. The method of claim 157 wherein the tables are stored as part of a file system.
168. The method of claim 157 wherein a plurality of grammatical relationships between meaningful terms that are identical are stored in the normalized data structure.
169. The method of claim 157 wherein the document is part of a corpus of heterogeneous documents.
170. The method of claim 157 wherein the normalized data structure is used to index a corpus of documents.
171. The method of claim 157 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
172. A data processing system comprising a computer processor and a memory, the memory. containing structured data that stores a normalized representation of sentence data, the structured data being manipulated by the computer processor under the control of program code and stored in the memory as:
a subject table having a set of meaningful term pairs, each pair having a meaningful term that is associated with a grammatical role of a verb and a meaningful term that is associated with a grammatical role of a subject relative to the verb;

an object table having a set of meaningful term pairs, each pair having a meaningful term that is associate with a grammatical role of a verb and a meaningful term that is associated with a grammatical role of an object relative to the verb;
a representation of associations between the subject table and the object table, the representation indicating, for each meaningful term associated with the grammatical role of the verb, the meaningful terms that are associated with the grammatical role of subject relative to the verb and the meaningful terms that are associated with the grammatical role of object relative to the verb;
a preposition table having a set of meaningful term groups, each group having a meaningful term that is associated with a grammatical role of a verb, a meaningful term that is associated with a grammatical role of a preposition relative to the verb, and a meaningful term that is associated with a grammatical role of a verb modifier relative to the verb; and a noun modifier table having a set of meaningful term pairs, each pair having a meaningful term that is associated with a grammatical role of a noun and a meaningful term that is associated with a grammatical role of an noun modifier relative to the noun.
173. The data processing machine of claim 172 wherein the representation of associations between the between the subject table and the object table is produced by a database join operation.
174. The data processing machine of claim 172 wherein the representation of associations between the between the subject table and the object table is a subject-object table that contains a set of meaningful term groups, each group having a meaningful term that is associated with a grammatical of a verb, a meaningful term that is associated with the grammatical role of a subject relative to the verb; and a meaningful term that is associated with the grammatical role of an object relative to the verb.
175. The data processing machine of claim 172 wherein the structured data is manipulated by the computer processor under the control of program code to query objects of a data set that are indexed as the structured data.
176. The data processing machine of claim 172 wherein the program code decomposes objects of a data set and indexes the decomposed objects in the memory as the structured data.
177. A computer-readable memory medium containing instructions for controlling a computer processor to store a normalized data structure representing a document of a data set, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
for each sentence, determining a set of meaningful terms of the sentence and at least one grammatical role for each meaningful term; and storing sets of grammatical relationships between a plurality of meaningful terms based upon the determined grammatical role of each meaningful term relative to a meaningful term that is being used as a governing verb, wherein, for each meaningful term that is being used as a governing verb, the normalized data structure contains a set of meaningful terms that are subjects relative to the governing verb, a set of meaningful terms that are objects relative to the governing verb, and at least one of a set of meaningful terms that are verb modifiers of prepositional phrases that contain the governing verb and a set of meaningful terms that are noun modifiers of noun phrases that relate to the governing verb.
178. The computer-readable memory medium of claim 177, the instructions further controlling the computer processor to store meaningful terms that correspond to a designated attribute.
179. The computer-readable memory medium of claim 178 wherein the designated attribute is at least one of country name, date, money, amount, number, location, person, corporate name, and organization.
180. The computer-readable memory medium of claim 177 wherein the sets of meaningful terms are stored as a plurality of tables.
181. The computer-readable memory medium of claim 180 wherein the tables comprise a subject table, an object table, a subject-object table, and at least one of a preposition table and a noun modifier table.
182. The computer-readable memory medium of claim 181 wherein the tables further comprise a sentence table that stores the text of the sentence.
183. The computer-readable memory medium of claim 181 wherein the tables further comprise an attributes table that stores meaningful terms that are associated with designated attributes.
184. The computer-readable memory medium of claim 180 wherein the preposition table contains for each meaningful term used as a verb in the sentence, a list of meaningful terms that are prepositions of the meaningful term used as the verb and at least one meaningful term that is a verb modifier associated with each preposition.
185. The computer-readable memory medium of claim 180 wherein the noun modifier table contains a list of meaningful terms that are noun modifiers of a meaningful terms that is used as a noun in the sentence.
186. The computer-readable memory medium of claim 177 wherein the tables are tables in a data base.
187. The computer-readable memory medium of claim 177 wherein the tables are stored as part of a file system.
188. The computer-readable memory medium of claim 177 wherein a plurality of grammatical relationships between meaningful terms that are identical are stored in the normalized data structure.
189. The computer-readable memory medium of claim 177 wherein the document is part of a corpus of heterogeneous documents.
190. The computer-readable memory medium of claim 177 wherein the normalized data structure is used to index a corpus of documents.
191. The computer-readable memory medium of claim 177 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
192. A computer system for storing a normalized data structure representing a document of a data set, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
enhanced parsing mechanism that determines a set of meaningful terms for each sentence and at least one grammatical role for each meaningful term; and storage mechanism structured to store sets of grammatical relationships between a plurality of the determined meaningful terms based upon the determined grammatical role of each meaningful term relative to a meaningful term that is being used as a governing verb, wherein, for each meaningful term that is being used as a governing verb, the normalized data structure contains a set of meaningful terms that are subjects relative to the governing verb, a set of meaningful terms that are objects relative to the governing verb, and at least one of a set of meaningful terms that are verb modifiers of prepositional phrases that contain the governing verb and a set of meaningful terms that are noun modifiers of noun phrases that relate to the governing verb.
193. The system of claim 192, the storage mechanism further structured to store meaningful terms that correspond to a designated attribute.
194. The system of claim 193 wherein the designated attribute is at least one of country name, date, money, amount, number, location, person, corporate name, and organization.
195. The system of claim 192 wherein the sets of meaningful terms are stored as a plurality of tables.
196. The system of claim 195 wherein the tables comprise a subject table, an object table, a subject-object table, and at least one of a preposition table and a noun modifier table.
197. The system of claim 196 wherein the tables further comprise a sentence table that stores the text of the sentence.
198. The system of claim 196 wherein the tables further comprise an attributes table that stores meaningful terms that are associated with designated attributes.
199. The system of claim 195 wherein the preposition table contains for each meaningful term used as a verb in the sentence, a list of meaningful terms that are prepositions of the meaningful term used as the verb and at least one meaningful term that is a verb modifier associated with each preposition.
200. The system of claim 195 wherein the noun modifier table contains a list of meaningful terms that are noun modifiers of a meaningful terms that is used as a noun in the sentence.
201. The system of claim 192, further comprising a database in which the tables are stored.
202. The system of claim 192, further comprising a file system, wherein the tables are stored as part of the file system.
203. The system of claim 192 wherein a plurality of grammatical relationships between meaningful terms that are identical are stored in the normalized data structure.
204. The system of claim 192 wherein the document is part of a corpus of heterogeneous documents.
205. The system of claim 192 wherein the normalized data structure is used to index a corpus of documents.
206. The system of claim 192 wherein the enhanced data representation data structure is used to execute a query against objects in a corpus of documents.
207. A method in a computer system for transforming an object of a data set into a canonical representation for use in indexing the objects of the data set and in querying the data set, the object being other than a text-only document and having a plurality of units that are specified according to an object-specific grammar, comprising:
for each object, decomposing the object to generate a parse structure having a plurality of syntactic elements;
determining a set of meaningful units of the object from these syntactic elements;
determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful unit; and storing in an enhanced data representation data structure a representation of each meaningful unit associated with its determined grammatical role, in a manner that indicates a grammatical relationship between a plurality of the meaningful units.
208. The method of claim 207 wherein the objects are audio data and the units of objects are portions of audio data.
209. The method of claim 207 wherein the objects are video data and the units of objects are portions of video data.
210. The method of claim 207 wherein the objects are images and the units of objects are graphical data.
211. The method of claim 207 wherein the data set is a document that contains text and graphical data and wherein each object is one of a text sentence and a sentence created to correspond to and describe a portion of graphical data.
212. A computer-readable memory medium containing instructions for controlling a computer processor to transform an object of a data set into a canonical representation for use in indexing the objects of the data set and in querying the data set, the object being other than a text-only document and having a plurality of units that are specified according to an object-specific grammar, by:
for each object, decomposing the object to generate a parse structure having a plurality of syntactic elements;
determining a set of meaningful units of the object from these syntactic elements;
determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful unit; and storing in an enhanced data representation data structure a representation of each meaningful unit associated with its determined grammatical role, in a manner that indicates a grammatical relationship between a plurality of the meaningful units.
213. The computer-readable memory medium of claim 212 wherein the objects are audio data and the units of objects are portions of audio data.
214. The computer-readable memory medium of claim 212 wherein the objects are video data and the units of objects are portions of video data.
215. The computer-readable memory medium of claim 212 wherein the objects are images and the units of objects are graphical data.
216. The computer-readable memory medium of claim 212 wherein the data set is a document that contains text and graphical data and wherein each object is one of a text sentence and a sentence created to correspond to and describe a portion of graphical data.
217. A query.engine in a computer system for transforming an object of a data set into a canonical representation for use in indexing the objects of the data set and in querying the data set, the object being other than a text-only document and having a plurality of units that are specified according to an object-specific grammar, comprising:

decomposition processor that is structured to decompose each object to generate a parse structure having a plurality of syntactic elements;
and postprocessor that is structured to receive from the decomposition processor the generated parse structure;
determine a set of meaningful units of the object from these syntactic elements;
determine from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful unit; and store in an enhanced data representation data structure a representation of each meaningful unit associated with its determined grammatical role, in a manner that indicates a grammatical relationship between a plurality of the meaningful units.
218. The query engine of claim 217 wherein the objects are audio data and the units of objects are portions of audio data.
219. The query engine of claim 217 wherein the objects are video data and the units of objects are portions of video data.
220. The query engine of claim 217 wherein the objects are images and the units of objects are graphical data.
221. The query engine of claim 217 wherein the data set is a document that contains text and graphical data and wherein each object is one of a text sentence and a sentence created to correspond to and describe a portion of graphical data.
CA2457693A 2001-08-14 2002-08-14 Method and system for enhanced data searching Expired - Fee Related CA2457693C (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US31238501P 2001-08-14 2001-08-14
US60/312,385 2001-08-14
US10/007,299 US7283951B2 (en) 2001-08-14 2001-11-08 Method and system for enhanced data searching
US10/007,299 2001-11-08
PCT/US2002/025756 WO2003017143A2 (en) 2001-08-14 2002-08-14 Method and system for enhanced data searching

Publications (2)

Publication Number Publication Date
CA2457693A1 true CA2457693A1 (en) 2003-02-27
CA2457693C CA2457693C (en) 2012-09-11

Family

ID=26676799

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2457693A Expired - Fee Related CA2457693C (en) 2001-08-14 2002-08-14 Method and system for enhanced data searching

Country Status (6)

Country Link
US (1) US7283951B2 (en)
EP (1) EP1419461A2 (en)
CA (1) CA2457693C (en)
MX (1) MXPA04001488A (en)
NZ (1) NZ542960A (en)
WO (1) WO2003017143A2 (en)

Families Citing this family (165)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US7526425B2 (en) * 2001-08-14 2009-04-28 Evri Inc. Method and system for extending keyword searching to syntactically and semantically annotated data
US7398201B2 (en) * 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
US7398209B2 (en) 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7693720B2 (en) * 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
EP1391830A1 (en) * 2002-07-19 2004-02-25 Albert Inc. S.A. System for extracting informations from a natural language text
US7721192B2 (en) * 2002-11-27 2010-05-18 Microsoft Corporation User interface for a resource search tool
US20040122693A1 (en) * 2002-12-23 2004-06-24 Michael Hatscher Community builder
US7849175B2 (en) * 2002-12-23 2010-12-07 Sap Ag Control center pages
US8195631B2 (en) * 2002-12-23 2012-06-05 Sap Ag Resource finder tool
US10475116B2 (en) 2003-06-03 2019-11-12 Ebay Inc. Method to identify a suggested location for storing a data entry in a database
EP1661008A4 (en) * 2003-08-05 2007-01-24 Cnet Networks Inc Product placement engine and method
US7467131B1 (en) * 2003-09-30 2008-12-16 Google Inc. Method and system for query data caching and optimization in a search engine system
US7814127B2 (en) * 2003-11-20 2010-10-12 International Business Machines Corporation Natural language support for database applications
US20050149858A1 (en) * 2003-12-29 2005-07-07 Stern Mia K. System and method for managing documents with expression of dates and/or times
US20050187920A1 (en) * 2004-01-23 2005-08-25 Porto Ranelli, Sa Contextual searching
US7613719B2 (en) * 2004-03-18 2009-11-03 Microsoft Corporation Rendering tables with natural language commands
US20060080292A1 (en) * 2004-10-08 2006-04-13 Alanzi Faisal Saud M Enhanced interface utility for web-based searching
TWI269268B (en) * 2005-01-24 2006-12-21 Delta Electronics Inc Speech recognizing method and system
US20060225055A1 (en) * 2005-03-03 2006-10-05 Contentguard Holdings, Inc. Method, system, and device for indexing and processing of expressions
US20060253423A1 (en) * 2005-05-07 2006-11-09 Mclane Mark Information retrieval system and method
US8214310B2 (en) * 2005-05-18 2012-07-03 International Business Machines Corporation Cross descriptor learning system, method and program product therefor
US7640160B2 (en) 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7949529B2 (en) 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US20070078814A1 (en) * 2005-10-04 2007-04-05 Kozoru, Inc. Novel information retrieval systems and methods
US20070094250A1 (en) * 2005-10-20 2007-04-26 Yahoo! Inc. Using matrix representations of search engine operations to make inferences about documents in a search engine corpus
EP1949273A1 (en) 2005-11-16 2008-07-30 Evri Inc. Extending keyword searching to syntactically and semantically annotated data
US7870031B2 (en) * 2005-12-22 2011-01-11 Ebay Inc. Suggested item category systems and methods
US7786979B2 (en) * 2006-01-13 2010-08-31 Research In Motion Limited Handheld electronic device and method for disambiguation of text input and providing spelling substitution
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US20080065370A1 (en) * 2006-09-11 2008-03-13 Takashi Kimoto Support apparatus for object-oriented analysis and design
US9495358B2 (en) 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US9235573B2 (en) 2006-10-10 2016-01-12 Abbyy Infopoisk Llc Universal difference measure
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
CA2717462C (en) 2007-03-14 2016-09-27 Evri Inc. Query templates and labeled search tip system, methods, and techniques
US7680780B2 (en) * 2007-04-06 2010-03-16 International Business Machines Corporation Techniques for processing data from a multilingual database
ES2428546T3 (en) * 2007-05-21 2013-11-08 Google Inc. Statistical query provider
US8594996B2 (en) 2007-10-17 2013-11-26 Evri Inc. NLP-based entity recognition and disambiguation
WO2009052308A1 (en) 2007-10-17 2009-04-23 Roseman Neil S Nlp-based content recommender
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US8126881B1 (en) 2007-12-12 2012-02-28 Vast.com, Inc. Predictive conversion systems and methods
US20090198488A1 (en) * 2008-02-05 2009-08-06 Eric Arno Vigen System and method for analyzing communications using multi-placement hierarchical structures
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US8589161B2 (en) 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9424339B2 (en) 2008-08-15 2016-08-23 Athena A. Smyros Systems and methods utilizing a search engine
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
CA2796408A1 (en) * 2009-04-16 2010-10-21 Evri Inc. Enhanced advertisement targeting
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US8296319B2 (en) * 2009-06-26 2012-10-23 Rakuten, Inc. Information retrieving apparatus, information retrieving method, information retrieving program, and recording medium on which information retrieving program is recorded
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
CA2779208C (en) * 2009-10-30 2016-03-22 Evri, Inc. Improving keyword-based search engine results using enhanced query strategies
US9171541B2 (en) 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
WO2011059997A1 (en) 2009-11-10 2011-05-19 Voicebox Technologies, Inc. System and method for providing a natural language content dedication service
US9047283B1 (en) * 2010-01-29 2015-06-02 Guangsheng Zhang Automated topic discovery in documents and content categorization
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9710556B2 (en) 2010-03-01 2017-07-18 Vcvc Iii Llc Content recommendation based on collections of entities
US9773056B1 (en) * 2010-03-23 2017-09-26 Intelligent Language, LLC Object location and processing
US8645125B2 (en) 2010-03-30 2014-02-04 Evri, Inc. NLP-based systems and methods for providing quotations
US8527494B2 (en) * 2010-05-14 2013-09-03 Business Objects Software Limited Tools discovery in cloud computing
US9081767B2 (en) * 2010-07-26 2015-07-14 Radiant Logic, Inc. Browsing of contextual information
US8838633B2 (en) 2010-08-11 2014-09-16 Vcvc Iii Llc NLP-based sentiment analysis
US9405848B2 (en) 2010-09-15 2016-08-02 Vcvc Iii Llc Recommending mobile device activities
US9852732B2 (en) * 2010-10-07 2017-12-26 Avaya Inc. System and method for near real-time identification and definition query
US8725739B2 (en) 2010-11-01 2014-05-13 Evri, Inc. Category-based content recommendation
WO2012067586A1 (en) * 2010-11-15 2012-05-24 Agency For Science, Technology And Research Database searching
US20120166415A1 (en) * 2010-12-23 2012-06-28 Microsoft Corporation Supplementing search results with keywords derived therefrom
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9116995B2 (en) 2011-03-30 2015-08-25 Vcvc Iii Llc Cluster-based identification of news stories
US20130013616A1 (en) * 2011-07-08 2013-01-10 Jochen Lothar Leidner Systems and Methods for Natural Language Searching of Structured Data
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US20130110853A1 (en) * 2011-10-31 2013-05-02 Microsoft Corporation Sql constructs ported to non-sql domains
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
CN103514181B (en) * 2012-06-19 2018-07-31 阿里巴巴集团控股有限公司 A kind of searching method and device
US9390174B2 (en) 2012-08-08 2016-07-12 Google Inc. Search result ranking and presentation
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9411803B2 (en) * 2012-09-28 2016-08-09 Hewlett Packard Enterprise Development Lp Responding to natural language queries
US9152623B2 (en) * 2012-11-02 2015-10-06 Fido Labs, Inc. Natural language processing system and method
US20140236986A1 (en) * 2013-02-21 2014-08-21 Apple Inc. Natural language document search
US9104718B1 (en) 2013-03-07 2015-08-11 Vast.com, Inc. Systems, methods, and devices for measuring similarity of and generating recommendations for unique items
US10007946B1 (en) 2013-03-07 2018-06-26 Vast.com, Inc. Systems, methods, and devices for measuring similarity of and generating recommendations for unique items
US9465873B1 (en) 2013-03-07 2016-10-11 Vast.com, Inc. Systems, methods, and devices for identifying and presenting identifications of significant attributes of unique items
US9830635B1 (en) 2013-03-13 2017-11-28 Vast.com, Inc. Systems, methods, and devices for determining and displaying market relative position of unique items
US10438254B2 (en) 2013-03-15 2019-10-08 Ebay Inc. Using plain text to list an item on a publication system
US9674132B1 (en) * 2013-03-25 2017-06-06 Guangsheng Zhang System, methods, and user interface for effectively managing message communications
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
EP3937002A1 (en) 2013-06-09 2022-01-12 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10127596B1 (en) 2013-12-10 2018-11-13 Vast.com, Inc. Systems, methods, and devices for generating recommendations of unique items
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10585924B2 (en) * 2014-08-08 2020-03-10 Cuong Duc Nguyen Processing natural-language documents and queries
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
CN107003996A (en) 2014-09-16 2017-08-01 声钰科技 VCommerce
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
WO2016061309A1 (en) 2014-10-15 2016-04-21 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US20160124937A1 (en) * 2014-11-03 2016-05-05 Service Paradigm Pty Ltd Natural language execution system, method and computer readable medium
US9904667B2 (en) * 2014-11-20 2018-02-27 International Business Machines Corporation Entity-relation based passage scoring in a question answering computer system
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10318527B2 (en) 2015-01-27 2019-06-11 International Business Machines Corporation Search-based detection, link, and acquisition of data
WO2016141187A1 (en) * 2015-03-04 2016-09-09 The Allen Institute For Artificial Intelligence System and methods for generating treebanks for natural language processing by modifying parser operation through introduction of constraints on parse tree structure
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10102275B2 (en) 2015-05-27 2018-10-16 International Business Machines Corporation User interface for a query answering system
US9727552B2 (en) * 2015-05-27 2017-08-08 International Business Machines Corporation Utilizing a dialectical model in a question answering system
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11423023B2 (en) 2015-06-05 2022-08-23 Apple Inc. Systems and methods for providing improved search functionality on a client device
US10769184B2 (en) * 2015-06-05 2020-09-08 Apple Inc. Systems and methods for providing improved search functionality on a client device
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10558718B2 (en) * 2015-11-03 2020-02-11 Dell Products, Lp Systems and methods for website improvement
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10146858B2 (en) 2015-12-11 2018-12-04 International Business Machines Corporation Discrepancy handler for document ingestion into a corpus for a cognitive computing system
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US9842161B2 (en) 2016-01-12 2017-12-12 International Business Machines Corporation Discrepancy curator for documents in a corpus of a cognitive computing system
US10176250B2 (en) 2016-01-12 2019-01-08 International Business Machines Corporation Automated curation of documents in a corpus for a cognitive computing system
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
WO2018023106A1 (en) 2016-07-29 2018-02-01 Erik SWART System and method of disambiguating natural language processing requests
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10997227B2 (en) * 2017-01-18 2021-05-04 Google Llc Systems and methods for processing a natural language query in data tables
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US11494395B2 (en) 2017-07-31 2022-11-08 Splunk Inc. Creating dashboards for viewing data in a data storage system based on natural language requests
US10901811B2 (en) 2017-07-31 2021-01-26 Splunk Inc. Creating alerts associated with a data storage system based on natural language requests
CN107679186B (en) * 2017-09-30 2021-12-21 北京奇虎科技有限公司 Method and device for searching entity based on entity library
US10268704B1 (en) 2017-10-12 2019-04-23 Vast.com, Inc. Partitioned distributed database systems, devices, and methods
US10810266B2 (en) * 2017-11-17 2020-10-20 Adobe Inc. Document search using grammatical units
US10956670B2 (en) 2018-03-03 2021-03-23 Samurai Labs Sp. Z O.O. System and method for detecting undesirable and potentially harmful online behavior
US11182539B2 (en) * 2018-11-30 2021-11-23 Thomson Reuters Enterprise Centre Gmbh Systems and methods for event summarization from data
WO2020113225A1 (en) 2018-11-30 2020-06-04 Thomson Reuters Enterprise Centre Gmbh Systems and methods for identifying an event in data
CA3139081A1 (en) * 2019-05-17 2020-11-26 Thomson Reuters Enterprise Centre Gmbh Systems and methods for event summarization from data
US11816102B2 (en) * 2020-08-12 2023-11-14 Oracle International Corporation Natural language query translation based on query graphs

Family Cites Families (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0280866A3 (en) 1987-03-03 1992-07-08 International Business Machines Corporation Computer method for automatic extraction of commonly specified information from business correspondence
US4839853A (en) 1988-09-15 1989-06-13 Bell Communications Research, Inc. Computer information retrieval using latent semantic structure
US5301109A (en) 1990-06-11 1994-04-05 Bell Communications Research, Inc. Computerized cross-language document retrieval using latent semantic indexing
US5317507A (en) 1990-11-07 1994-05-31 Gallant Stephen I Method for document retrieval and for word sense disambiguation using neural networks
US5325298A (en) 1990-11-07 1994-06-28 Hnc, Inc. Methods for generating or revising context vectors for a plurality of word stems
US5377103A (en) 1992-05-15 1994-12-27 International Business Machines Corporation Constrained natural language interface for a computer that employs a browse function
IL107482A (en) 1992-11-04 1998-10-30 Conquest Software Inc Method for resolution of natural-language queries against full-text databases
US5331556A (en) 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5619709A (en) 1993-09-20 1997-04-08 Hnc, Inc. System and method of context vector generation and retrieval
US5799268A (en) 1994-09-28 1998-08-25 Apple Computer, Inc. Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like
US5794050A (en) 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
WO1997008604A2 (en) 1995-08-16 1997-03-06 Syracuse University Multilingual document retrieval system and method using semantic vector matching
US6026388A (en) 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US5778362A (en) 1996-06-21 1998-07-07 Kdl Technologies Limted Method and system for revealing information structures in collections of data items
US5857179A (en) 1996-09-09 1999-01-05 Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
US5836771A (en) * 1996-12-02 1998-11-17 Ho; Chi Fai Learning method and system based on questioning
US5950189A (en) 1997-01-02 1999-09-07 At&T Corp Retrieval system and method
GB9713019D0 (en) 1997-06-20 1997-08-27 Xerox Corp Linguistic search system
US5933822A (en) 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
KR980004126A (en) 1997-12-16 1998-03-30 양승택 Query Language Conversion Apparatus and Method for Searching Multilingual Web Documents
US6122647A (en) 1998-05-19 2000-09-19 Perspecta, Inc. Dynamic generation of contextual links in hypertext documents
US6006225A (en) 1998-06-15 1999-12-21 Amazon.Com Refining search queries by the suggestion of correlated terms from prior searches
US6192360B1 (en) 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US6167370A (en) 1998-09-09 2000-12-26 Invention Machine Corporation Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures
US6480843B2 (en) 1998-11-03 2002-11-12 Nec Usa, Inc. Supporting web-query expansion efficiently using multi-granularity indexing and query processing
US6460029B1 (en) 1998-12-23 2002-10-01 Microsoft Corporation System for improving search text
US6584464B1 (en) 1999-03-19 2003-06-24 Ask Jeeves, Inc. Grammar template query system
US6510406B1 (en) 1999-03-23 2003-01-21 Mathsoft, Inc. Inverse inference engine for high performance web search
US6601026B2 (en) * 1999-09-17 2003-07-29 Discern Communications, Inc. Information retrieval by natural language querying
US6532469B1 (en) 1999-09-20 2003-03-11 Clearforest Corp. Determining trends using text mining
US20020010574A1 (en) * 2000-04-20 2002-01-24 Valery Tsourikov Natural language processing and query driven information retrieval
US20020007267A1 (en) 2000-04-21 2002-01-17 Leonid Batchilo Expanded search and display of SAO knowledge base information
US20040125877A1 (en) * 2000-07-17 2004-07-01 Shin-Fu Chang Method and system for indexing and content-based adaptive streaming of digital video content
US6738765B1 (en) 2000-08-11 2004-05-18 Attensity Corporation Relational text index creation and searching
US6732097B1 (en) 2000-08-11 2004-05-04 Attensity Corporation Relational text index creation and searching
US6741988B1 (en) 2000-08-11 2004-05-25 Attensity Corporation Relational text index creation and searching
US6728707B1 (en) 2000-08-11 2004-04-27 Attensity Corporation Relational text index creation and searching
US6732098B1 (en) 2000-08-11 2004-05-04 Attensity Corporation Relational text index creation and searching
AUPR082400A0 (en) 2000-10-17 2000-11-09 Telstra R & D Management Pty Ltd An information retrieval system
US20020091671A1 (en) 2000-11-23 2002-07-11 Andreas Prokoph Method and system for data retrieval in large collections of data
US7295965B2 (en) * 2001-06-29 2007-11-13 Honeywell International Inc. Method and apparatus for determining a measure of similarity between natural language sentences
US20030101182A1 (en) * 2001-07-18 2003-05-29 Omri Govrin Method and system for smart search engine and other applications
US7526425B2 (en) 2001-08-14 2009-04-28 Evri Inc. Method and system for extending keyword searching to syntactically and semantically annotated data
US7398201B2 (en) 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
US20030115191A1 (en) 2001-12-17 2003-06-19 Max Copperman Efficient and cost-effective content provider for customer relationship management (CRM) or other applications
AU2003239962A1 (en) 2002-06-03 2003-12-19 Arizona Board Of Regents Acting For And On Behalf Of Arizona State University System and method of analyzing the temporal evolution of text using dynamic centering resonance analysis
US20040167883A1 (en) 2002-12-06 2004-08-26 Attensity Corporation Methods and systems for providing a service for producing structured data elements from free text sources
GB2411014A (en) 2004-02-11 2005-08-17 Autonomy Corp Ltd Automatic searching for relevant information
DE102008041740A1 (en) 2007-08-31 2009-03-05 Profine Gmbh Plastic profile with photocatalytically active surface

Also Published As

Publication number Publication date
US20040221235A1 (en) 2004-11-04
WO2003017143A3 (en) 2003-10-30
WO2003017143A2 (en) 2003-02-27
NZ542960A (en) 2007-06-29
US7283951B2 (en) 2007-10-16
CA2457693C (en) 2012-09-11
MXPA04001488A (en) 2004-10-27
EP1419461A2 (en) 2004-05-19

Similar Documents

Publication Publication Date Title
CA2457693A1 (en) Method and system for enhanced data searching
US7987189B2 (en) Content data indexing and result ranking
US6161084A (en) Information retrieval utilizing semantic representation of text by identifying hypernyms and indexing multiple tokenized semantic structures to a same passage of text
US9448995B2 (en) Method and device for performing natural language searches
AU2005217413B2 (en) Intelligent search and retrieval system and method
US8131540B2 (en) Method and system for extending keyword searching to syntactically and semantically annotated data
US8041697B2 (en) Semi-automatic example-based induction of semantic translation rules to support natural language search
US6678677B2 (en) Apparatus and method for information retrieval using self-appending semantic lattice
US6901399B1 (en) System for processing textual inputs using natural language processing techniques
EP2347354B1 (en) Retrieval using a generalized sentence collocation
US20050203900A1 (en) Associative retrieval system and associative retrieval method
CN103136352A (en) Full-text retrieval system based on two-level semantic analysis
WO2001084376A2 (en) System for answering natural language questions
US20060184523A1 (en) Search methods and associated systems
Ngo et al. Ontology-based query expansion with latently related named entities for semantic text search
Litkowski Summarization experiments in DUC 2004
Osipov et al. Application of linguistic knowledge to search precision improvement
Rishel et al. Augmentation of a term/document matrix with part-of-speech tags to improve accuracy of latent semantic analysis.
Schwitter et al. ExtrAns-answer extraction from technical documents by minimal logical forms and selective highlighting
US20040039562A1 (en) Para-linguistic expansion
Klyuev An approach to implementing an intelligent web search
Zhang et al. Research on Lucene-based English-Chinese Cross-Language Information Retrieval.
Smirnov et al. Heterogeneous semantic networks for text representation in intelligent search engine EXACTUS
Han et al. TSS: A hybrid web searches
Stratica et al. Schema-based natural language semantic mapping

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20200831