When it comes to search engines, index size is important.
But different companies measure index size differently. We'd like
to discuss how we calculate search index size, and some simple tests
anyone can run to gauge the comprehensiveness of Google or any other
search engine.
To begin with, what's a search index?
A search index is the repository of all of the documents that a search
engine has available to search. Search engines build their indexes
by using web crawlers software that moves from link to link on the
web, reading pages and adding their contents to the index.
So how big is Google's index?
Search engines' published metrics for index size measurement vary greatly
and are no longer easily comparable. Often, for instance, web crawlers
retrieve duplicate entries for one page or links to documents that
they haven't crawled, and whose content thus isn't in the index. At
Google we believe the essential quality of an index isn't the total
number of documents, but its comprehensiveness which unique
documents are in the index. So we don't count duplicate or uncrawled
pages. According to our internal testing, our newly expanded search
index is more than three times larger than that of any other search
engine.
But can we prove it?
Yes, but even better, you can prove it yourself. The basic test for
search engine comprehensiveness is whether you can find uncommon information.
Popular queries return millions of results, but even the most obsessive
searcher isn't about to surf a few million pages, or even a tiny fraction
of them; in most of these cases, you'll either quickly find what you're
looking for or refine your search to be more focused.
To see for yourself,
try searching for something very specific, or try a query that previously
returned very few results. For example, you could enter your name or
hometown, along with your favorite color or animal. Navigate to the
last page to see how many results the search engine really delivered.
(On the last page, you may have to click the "repeat the search with
the omitted results included" link to see all the results.) Do this
on different search engines for several queries and see what you come
up with. As you can imagine, we've run quite a few tests like this,
and we expect your results will be very similar to ours.
Why is this important?
Web search lies at the heart of how millions of people find things
online. At Google, we know that index comprehensiveness matters in
providing our users with the best and most useful results for every
search they do. If a document is not in our index, we can't show it
to users. We take comprehensiveness seriously, but even more than that, we care about the quality of the results we're able to offer.

