Google Knowledge Graph Search API

nl · on Dec 20, 2015

This is pretty nice.

Google's knowledge graph is much, much better than any of the open data competition because they have done the work to make it consistent (not in the ACID sense but in the completeness sense)

For example, Wikidata appears good on the surface, but as soon as you try to build against it you find huge holes in the data.

As a more specific example, the most common example you will see on Wikidata is "list the cities with a female Mayor in order if population." Great, except it turns out that many (most?) cities aren't marked up with the attribute that makes them considered cities for the purpose of that query.

Knowledge APIs add typing to search. That's really important because it let's you disambiguate queries well (Apple computer vs Apple fruit) and behave more intelligently based on that type.

Things like the DDG API (mentioned in this thread) don't do that. DBpedia/Wikidata/Yago do it, but so inconsistently that the benefits are hard to make useful (as you are coding for the multiple ways types are handled).

frik · on Dec 20, 2015

The data was originally open, they closed it (https://en.wikipedia.org/wiki/Freebase ). You call it nice?

yohui · on Dec 20, 2015

According to that Wikipedia article, Freebase moved their original data over to Wikidata before closing: https://plus.google.com/109936836907132434202/posts/bu3z2wVq...

frik · on Dec 20, 2015

Wikidata has a different license, it's just a PR piece that means little. How many facts have been imported since Jan 2015? Wikidata is at 15,473,837, Freebase (summer 2015) is at 3,146,939,673. Basically Google's Freebase shut-down throw AI research back at least two years (assuming Wikidata can catch up to 90% of Freebase size in 2018). Now Google, Microsoft and IBM have an competitive advantage - each has its own closed knowledge base.

nl · on Dec 20, 2015

That's not what happened at all.

The knowledge graph is much, much more than Freebase. The Freebase data is still available to download, and they are moving it to Wikidata.

frik · on Dec 20, 2015

Wrong.

http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf (page 3, comparison table) KnowledgeGraph is from the same group as Freebase, just more data and closed.

A stale 6 months old Freebase dump gets more useless over time. Wikidata has a different license, it's just a PR piece that means little. 15,473,837 (Wikidata) vs 3,146,939,673 (Freebase) - little has changed since Jan 2015.

yohui · on Dec 20, 2015

If I'm reading it correctly, it looks like the comparison table says Knowledge Graph has an order of magnitude more data than Freebase did?

What's the issue with Wikidata's license? Both Freebase and Wikidata seem to be Creative Commons licensed. Is there a catch?

As for progress, if there's anything slowing the migration of Freebase data into Wikidata, I would guess it's Wikidata's different citation standards.

Given that the original Freebase data is still available, would it be correct to say the issue is Google isn't releasing their new data for free?

frik · on Dec 20, 2015

> would it be correct to say the issue is Google isn't releasing their new data for free?

How about: Google shut-down a knowledge-base that was curated by a community, that provided regular data dumps, an online interface and an API - all with an open license (the original data source is nevertheless Wikipedia et al). Google's new venture is basically the same core technology and data but the crawler run also over the scrapped web content. And the only access for non-Googlers is via an API. Make your own conclusion from that.

nl · on Dec 20, 2015

That 3,146,939,673 number is the number of statements (triples), not the number of resources (which is the Wikidata number). Wikidata has 900M statements, not 15M[1].

Again, the Google Knowledge Base is much more than an expanded Freebase. It uses Google's Knowledge Vault project to extract from sources outside Freebase, as well as to evaluate and update the Freebase resources. To quote:

In particular, KV has 1.6B triples, of which 324M have a confident of 0.7 or higher, and 271M have a confidence of 0.9 or higher. This is about 38 times more than the largest previous comparable system (DeepDive [32]), which has 7M confident facts (Ce Zhang, personal communication). To create a knowledge base of such size, we extract facts from a large variety of sources of Web data, including free text, HTML DOM trees, HTML Web tables, and human annotations of Web pages. (Note that about 1/3 of the 271M confident triples were not previously in Freebase, so we are extracting new knowledge not contained in the prior.)[2]

[1] https://query.wikidata.org/#SELECT%20%28COUNT%28*%29%20AS%20...

[2] http://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

kevbin · on Dec 20, 2015

When I query for "Software Engineer", I get:

    {
    "@type" : "EntitySearchResult",
    "result" : {
      "@id" : "kg:/m/0y49634",
      "name" : "Software engineer",
      "@type" : [ "Thing" ],
      "description" : "Fictional Character"
    }

nl · on Dec 20, 2015

It's picking up a bad reference from Freebase (see the id): https://www.googleapis.com/freebase/v1/topic/m/0y49634

That's disappointing.

dunham · on Dec 20, 2015

Another one:

The second result in a query for "Rogan Josh" is "Jeremy Clarkson" a Person of "Top Gear" fame. Digging up the freebase record doesn't show any obvious reason why this would happen.

frik · on Dec 20, 2015

So will Google release open data dumps? Google Knowledge Graph is based on data what was known as Freebase (see various papers). Google is about to shut down Freebase at the end of 2015. Freebase got bought by Google and was kept open and had a great community. (WikiData is still several magnitudes too small to be an alternative)

And stalled Freebase data from summer 2015 gets more useless every day.

https://en.wikipedia.org/wiki/Freebase

Paper: http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf (comparision table on page 3, compare Google KnowledgeVault, KnowledgeGraph and Freebase)

Microsoft bought Powerset (for Bing and Cortana AI), IBM recently bought Blekko (for Watson AI). Google closed Freebase and reuse it for KnowledgeGraph (GoogleNow AI and Search). That recent development hurts independent AI research and smaller AI companies.

igravious · on Dec 20, 2015

Well, well, well. I know what I'll be playing with over the holidays. Some get Legos®, some get knowledge graphs.

But seriously, I've been playing around with Wikidata's Query Service[0]. Here's an example...[1], the example asks, "What is `nature' a part of?" (Once you click through the URL shortener you can click execute to run the SPARQL[2] query. SPARQL's a W3C recommendation, sort of like SQL but for triplestores, but its details are not readily graspable I think.)

It seems like Google's Knowledge Graph is based on Wikidata? I used to think the Semantic Web was always going to be a decade away but now I think that it is going to play a large part in the near future of the web though if you pushed me to explain my change in reasoning I don't think I'd be able. What we need are Semantic Web Browser, no idea what they'd look like though :(

Here is Wikidata's table of properties from which it builds up its entire knowledge graph[3]. I think it's fascinating.

[0] https://query.wikidata.org/

[1] https://goo.gl/4mIJCL

[2] http://www.w3.org/TR/sparql11-query/

[3] https://www.wikidata.org/wiki/Wikidata:List_of_properties/Su...

jisaacso · on Dec 20, 2015

Google's knowledge graph was in part based on Freebase, a large open knowledge base written by Metaweb. Google acquired Metaweb, continued to grow it's triple extractors [1] and eventually shut write access to the graph. Wikidata is slowly extracting information from the last public version of Freebase to grow out their own knowledge base.

[1] http://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

chris_st · on Dec 20, 2015

Wow, thanks! I didn't know there was a wikipedia knowledge graph. Looks like my holidays may be fun too :-)

iheartmemcache · on Dec 20, 2015

There are tons of RDF sources out there if you'd like: http://dbpedia.org/sparql is generally accepted as a better curated resource (both in quality and quantity) http://data.linkedmdb.org/ http://www.rdfdata.org/ I think the some subsets of the US federal government releases their data structured as such too

chris_st · on Dec 20, 2015

Excellent sources, thanks!

ocdtrekkie · on Dec 20, 2015

Most of Google's data is scraped from Wikipedia, so either/or is probably pretty similar. I assume Google ranks the results of the data better, mind you.

iheartmemcache · on Dec 20, 2015

For a second I thought this actually gave access to the graph, so you could actually traverse it a la 'see the edges, visit the nodes', which surely would have piqued my curiosity. This doesn't seem to offer much more over HN user/DuckDuckGo operator Gabriel's API[1]. I have a console app I've written that queries WolframAlpha and DDG. Between the two, > 90% of my 'fast questions' get answered (with the added bonus of a decent privacy policy).

https://duckduckgo.com/api

mark_l_watson · on Dec 20, 2015

I used Knowledge Graph when I was a contractor at Google. Graph queries are not cheap, entity lookups should be inexpensive.

I am very happy that Google opened up this API. I used to use Freebase, and I use DBPedia a lot. When I get home from traveling I am looking forward to kicking the tires of the new API.

kuschku · on Dec 20, 2015

And it would be even better if Google would be required to provide full data dumps for download, like Freebase did before Google annexed them.

chris_st · on Dec 20, 2015

It would be interesting to see if one could start with (say) the OpenCyc ontology, and expand it using this or the DDG search.

kwrobel · on Dec 20, 2015

Talking about Cyc ontology, we are working on automatic Wikipedia articles classification: http://cycloped.io/

jisaacso · on Dec 20, 2015

I'm curious how the knowledge graph API performs disambiguation without any context. E.g., if you search for `Apple` will it return the company or the fruit?

finin · on Dec 20, 2015

The current service returns a ranked (with scores) list of up to 200 entities. You can specify a type in your query or filter the results to select types of interest (e.g., Person, Place or Organization). The top result for 'apple' is the Corporation 'Apple, Inc.' and #2 is Thing 'apple' (a fruit). The score is probably based on a graph popularity metric (e.g., number of inlinks) possibly augmented by pagerank. Interestingly, the knowledge graph ID is the same as the Freebase MID and the results for the KG search for 'apple' appear to be a subset of a similar Freebase search and also in the same order.

axefrog · on Dec 20, 2015

I don't see how it could. If you search Google for apple, or even ask a person to give you information about the term "Apple", how can they give you what you need without further context?

iheartmemcache · on Dec 20, 2015

The Knowledge "Graph"[1]offers an optional disambiguation parameter you can query with[2]. DuckDuckGo (I swear I'm not a shill or associated with them!) offers a disambiguation API out-of-the-box and integrates some of the RDF material I mentioned below. Here's your "Apple" example[3].

Though, based on the amount of data Google has on the average user, and the fact that you have to sign-up to get an API key which is presumably associated with your search history, Gmail history (either any conversations sent from your Gmail account, or any mail you received dispatched from a Gmail account directed at you), they could easily determine if you meant Apple the fruit [you work for the USDA], Apple the company [you're an engineer in SF with a User-Agent history that's very heavily skewed towards Safari], or etymological basis of Apple, the word [you're a linguist], and disambiguate based on aggregate information. I'd imagine it'd be pretty trivial to do with their existing advertising profile + visit history of any site that either has Google Analytics or a Doubleclick ad.

[1] Again, I struggle to call it a graph, even if it's implemented as a GDB on Google's end, until the end-user traverses it, it's just a Knowledge API.

[2] https://developers.google.com/knowledge-graph/reference/rest... See: `types'.

[3] http://api.duckduckgo.com/?q=apple&format=json&pretty=1

jisaacso · on Dec 20, 2015

Really cool, thank! The `types` seem to be a good way to add context if you know, a priori, the type you're looking for.

Google definitely fuses user data into their knowledge graph. This is seen in Freebase's `g.` identifier [1]. I'm curious if they'd influence their publicly facing API algorithms using that data.

[1] https://groups.google.com/forum/#!topic/freebase-discuss/_8x...

jisaacso · on Dec 20, 2015

Completely agree! I'm curious if the `query` parameter in the API performs well on long queries (with context) or if it needs to be focused to a single entity's name

mrnismo92 · on Dec 20, 2015

Woah this is neat. Would love to use this to make small & subject specific tools for K-12 schools.