Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Page images | Web History | Sign in

Patents

  

US 20040210826A1

(19) United States

(12) Patent Application Publication (io) Pub. No.: US 2004/0210826 Al

Najork (43) Pub. Date: Oct. 21,2004

(54) SYSTEM AND METHOD FOR MAINTAINING A DISTRIBUTED DATABASE OF HYPERLINKS

(75) Inventor: Marc A. Najork, Palo Alto, CA (US)

Correspondence Address:
WOODCOCK WASHBURN LLP
ONE LIBERTY PLACE, 46TH FLOOR
1650 MARKET STREET
PHILADELPHIA, PA 19103 (US)

(73) Assignee: Microsoft Corporation

(21) Appl. No.: 10/413,645

(22) Filed: Apr. 15, 2003

Publication Classification (51) Int. CI.7 G06F 17 00

(52) U.S. C I 715 501.1

(57) ABSTRACT

Nodes ol a web graph are distributed over a cluster ol computers. Tables distributed over the computers map source (destination) locations to lists of destination (source) locations. To accommodate traversing hyperlinks forward, a table maps the location of a web page "X" to locations of all the web pages "X" links to. To accommodate traversing hyperlinks backward, a table maps the location of a web page "Y" to locations of all web pages that link to Y. URLs identifying web pages are mapped to fixed-sized checksums, reducing the storage required for each node, while providing a way to map a URL to a node. Mapping is chosen to preserve information about the web server component of the URL. Nodes can then be partitioned across the machines in the cluster such that nodes corresponding to URLs on the same web server are assigned to the same machine in the cluster.

[blocks in formation]
[graphic]
[graphic]
[graphic]
« PreviousContinue »