Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

The disclosed embodiments of computer systems and techniques utilize an ensemble semantics framework to combine knowledge acquisition systems that yield significantly higher quality resources than each system in isolation. Gains in entity extraction are achieved by combining state-of-the-art distributional and pattern-based systems with a large set of features from, for example, a webcrawl, query logs, and wisdom of the crowd sources. This results in improved query interpretation and greater relevancy in providing search results and advertising, for example.

Claims

1. A computer system for providing results to users, the computer system configured to:

extract instances from a plurality of sources using a plurality of knowledge extractors;

aggregate the instances;

extract a feature vector for an instance using a plurality of feature generators, wherein one of the feature generators
extracts contexts of a query log for a plurality of seeds;
calculates an association statistic between the contexts and seeds;
sorts the contexts by the calculated association statistics and selects a group of the sorted contexts;
for each selected context, generates a feature for a candidate instance comprising the association statistic between the candidate instance and the context; and
wherein the computer system is configured to build a model by using a modeler, and utilizing features extracted by the plurality of feature generators and extracted instances.

2. The computer system of claim 1, wherein the computer system is further configured to decode candidate instances with the decoder, based on the model.

3. The computer system of claim 1, wherein the association statistic is a pointwise mutual information value.

4. The computer system of claim 1, wherein the computer system is further configured to:

generate a vector centroid for a group of seeds from the feature vectors of the seeds; and

for each candidate instance, calculate a vector similarity between a feature vector of the candidate instance and a feature vector of the centroid.

5. The computer system of claim 1, wherein the computer system is further configured to:

generate a centroid for a group of seeds; and

for each candidate instance, calculate a vector similarity between the feature vector of the candidate instance and a feature vector for each of the seeds.

6. The computer system of claim 1, wherein the computer system is further configured to extract a group of tables that contain a seed.

7. The computer system of claim 6, wherein the computer system is further configured to generate a feature that is the pointwise mutual information value between the seed and a candidate occurring in the same rows and columns extracted tables.

8. The computer system of claim 6, wherein the computer system is further configured to generate a feature that is an average of the pointwise mutual information value between the candidate and all seeds co-occurring in the same rows and columns of extracted table.

9. The computer system of claim 1, wherein the system is further configured to build the model using manually annotated negative and positive instances and feature vectors.

10. The computer system of claim 9, wherein the computer system is configured to generate the training sets with trusted positive instances.

11. The computer system of claim 10, wherein the computer system is configured to generate the trusted positive examples with a trusted knowledge extractor of the plurality of knowledge extractors.

12. The computer system of claim 9, wherein the computer system is configured to generate the training sets with external positive instances.

13. The computer system of claim 9, wherein the computer system is configured to generate the training sets with same class negative instances.

14. The computer system of claim 9, wherein the computer system is configured to generate the training sets with near class negative instances.

15. The computer system of claim 13, wherein the computer system is configured to generate the training sets with same class negatives acquired as a random sample of instances extracted by only a distributional knowledge extractor of the plurality of knowledge extractors.

16. The computer system of claim 13, wherein the computer system is configured to generate the training sets with same class negatives acquired as a random sample of instances extracted by only a pattern based knowledge extractor of the plurality of knowledge extractors.

17. The computer system of claim 10, wherein the computer system is configured to generate the training sets with generic negative instances.

18. A computer system for providing results to users, the computer system configured to:

extract instances from a plurality of sources using a plurality of knowledge extractors;

aggregate the instances;

extract a feature vector for an instance using one of a plurality of feature generators, wherein one of the feature generators is configured to calculate a distributional similarity on a query log between a seed and a candidate instance for each feature vector; and

build a model by using a modeler, and utilizing feature vectors extracted by the plurality of feature generators and extracted instances.

19. The computer system of claim 18, wherein the computer system is further configured to decode candidate instances with a decoder based on the model.

20. The computer system of claim 18, wherein the computer system, to calculate the distributional similarity, is further configured to:

generate a centroid for a group of seeds; and

for each candidate instance, calculate a vector similarity between a feature vector of the candidate instance and a feature vector of the centroid.

21. The computer system of claim 18, wherein the computer system, to calculate the distributional similarity, is further configured to:

generate a centroid for a group of seeds; and

for each candidate instance, calculate a vector similarity between the feature vector of the candidate instance and a feature vector for each of the group of seeds.

22. A computer system for providing results to users, the computer system configured to:

extract instances from a plurality of sources using a plurality of knowledge extractors;

aggregate the instances;

extract a feature vector for an instance using one of a plurality of feature generators, wherein one of the feature generators is configured to generate a feature that is a pointwise mutual information value between the seed and a candidate occurring in the same rows and columns of extracted tables; and

build a decoder utilizing feature vectors extracted by the plurality of feature generators and extracted instances.

23. The computer system of claim 22, wherein the computer system is further configured to generate a feature that is an average of a pointwise mutual information value between the candidate and all seeds co-occurring in the same rows and columns of the extracted tables.