Librarian Central

In the debut issue of the Google Librarian newsletter, we published an article by quality engineer Matt Cutts explaining how Google collects and ranks search results. The most common question we heard in response was "How does Google determine which web sites are the most 'trusted'?" Here, his reply:

This question goes to the heart of what we do. You already know the short answer: Google uses more than 100 different factors, including the PageRank algorithm, to determine whether a site is trusted or reputable. If you think of the internet as a democracy, a web page that links to another page is "voting" for the value of the page. As we explain in our Technology Overview, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. But that's not the end of the story. If Page A itself has more votes from other pages, the vote carries more weight. Or to put it another way, if more people trust your site, your trust is more valuable.

In addition to using the PageRank algorithm, we automatically analyze the content of pages we crawl. This goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags. We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.

The long answer is more complicated. Since how we determine search results is the core of our business, there are some ingredients in our "special sauce" that we can't share. In addition, it goes without saying that we're on constant guard against people exploiting the information to achieve artificially high placement in our search results. At the same time, Google was born in a university research environment, and there is a large and growing body of academic work exploring and analyzing our technology. That includes the grand-daddy of them all, The PageRank Citation Ranking: Bringing Order to the Web, the original Stanford University paper by Larry Page, Sergey Brin, Rajeev Motwani and Terry Winograd. If you'd like to take a look, Google Scholar is a good place to start (especially if you click on the citations as well as the papers themselves).

Finally, you might also want to check out this link, which takes you to a collection of technology papers written by people now at Google. It contains oldies-but-goodies like the Stanford paper on PageRank, but also brand new research about everything from algorithms to artificial intelligence. Enjoy!

Other questions? Send us a note. Every newsletter we'll try to answer 1 or 2 of the most frequently asked questions.

Sign up to receive this newsletter.