Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A system and method are provided for searching for desired items from a network of information resources. In particular, the system and method have advantageous applicability to searching for World Wide Web pages having desired content. An initial set of pages are selected, preferably by running a conventional keyword-based query, and then further selecting pages pointing to, or pointed to from, the pages found by the keyword-based query. Alternatively, the invention may be applied to a single page, where the initial set includes pages pointed to by the single page and pages which point to the single page. Then, iteratively, authoritativeness values are computed for the pages of the initial set, based on the number of links to and from the pages. One or more communities, or "neighborhoods", of related pages are defined based on the authoritativeness values thus produced. Such communities of pages are likely to be of particular interest and value to the user who is interested in the...

InventorJon Michael Kleinberg
Original AssigneeInternational Business Machines Corporation
Primary Examiner: John Loomis
Current U.S. Classification1/1; 707/999.005; 707/999.009; 707/999.101; 707/E17.108
International Classification: G06F 1730

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US5257185May 21, 1990Oct 26, 1993Ann W. FarleyInteractive, cross-referenced knowledge system
US5446891Nov 2, 1994Aug 29, 1995International Business Machines CorporationSystem for adjusting hypertext links with weighed user goals and activities
US5778363Dec 30, 1996Jul 7, 1998Intel CorporationMethod for measuring thresholded relevance of a document to a specified topic
US5826031Jun 10, 1996Oct 20, 1998Sun Microsystems, Inc.Method and system for prioritized downloading of embedded web objects
US5835905Apr 9, 1997Nov 10, 1998Xerox CorporationSystem for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US6311177Feb 16, 1999Oct 30, 2001International Business Machines CorporationAccessing databases when viewing text on the web
US6336112Mar 16, 2001Jan 1, 2002International Business Machines CorporationMethod for interactively creating an information database including preferred information elements, such as, preferred-authority, world wide web pages
US6397210Aug 4, 1999May 28, 2002International Business Machines CorporationNetwork interactive tree search method and system
US6397218Aug 4, 1999May 28, 2002International Business Machines CorporationNetwork interactive search engine server and method
US6401118Aug 13, 1998Jun 4, 2002Online Monitoring ServicesMethod and computer program product for an online monitoring search engine
US6408316Dec 17, 1998Jun 18, 2002International Business Machines CorporationBookmark set creation according to user selection of selected pages satisfying a search condition
US6665837Aug 10, 1998Dec 16, 2003Overture Services, Inc.Method for identifying related pages in a hyperlinked database
US6671714Nov 23, 1999Dec 30, 2003Method, apparatus and business system for online communications with online and offline recipients
US6675170Jun 29, 2000Jan 6, 2004NEC Laboratories America, Inc.Method to efficiently partition large hyperlinked databases by hyperlink structure
US6684254May 31, 2000Jan 27, 2004International Business Machines CorporationHyperlink filter for "pirated" and "disputed" copyright material on the internet in a method, system and program
US6711569Jul 24, 2001Mar 23, 2004Bright Planet CorporationMethod for automatic selection of databases for searching
US6745178Apr 28, 2000Jun 1, 2004International Business Machines CorporationInternet based method for facilitating networking among persons with similar interests and for facilitating collaborative searching for information
US6799176Jul 6, 2001Sep 28, 2004The Board of Trustees of the Leland Stanford Junior UniversityMethod for scoring documents in a linked database
US6839702Dec 13, 2000Jan 4, 2005Google Inc.Systems and methods for highlighting search results
US6938206Jan 19, 2001Aug 30, 2005Transolutions, Inc.System and method for creating a clinical resume
US7000199May 9, 2001Feb 14, 2006FairIsaac and Company Inc.Methodology for viewing large strategies via a computer workstation
US7010527Aug 13, 2001Mar 7, 2006Oracle International Corp.Linguistically aware link analysis method and system
US7028029Aug 23, 2004Apr 11, 2006Google Inc.Adaptive computation of ranking
US7058628Jul 2, 2001Jun 6, 2006The Board of Trustees of the Leland Stanford Junior UniversityMethod for node ranking in a linked database
US7076483Aug 27, 2001Jul 11, 2006Xyleme SARanking nodes in a graph
US7096214Dec 13, 2000Aug 22, 2006Google Inc.System and method for supporting editorial opinion in the ranking of search results
US7111232Mar 6, 2002Sep 19, 2006Thomas Layne BascomMethod and system for making document objects available to users of a network
US7139974Mar 6, 2002Nov 21, 2006Thomas Layne BascomFramework for managing document objects stored on a network
US7158971Apr 10, 2002Jan 2, 2007Thomas Layne BascomMethod for searching document objects on a network
US7159023Dec 16, 2003Jan 2, 2007Alexa InternetUse of web usage trail data to identify relationships between browsable items
US7165069Jun 28, 1999Jan 16, 2007Alexa InternetAnalysis of search activities of users to identify related network sites
US7167871Sep 3, 2002Jan 23, 2007Xerox CorporationSystems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections
US7188117Sep 3, 2002Mar 6, 2007Xerox CorporationSystems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections
US7194466May 1, 2003Mar 20, 2007Microsoft CorporationObject clustering using inter-layer links
US7216123Aug 22, 2003May 8, 2007Board of Trustees of the LeLand Stanford Junior UniversityMethods for ranking nodes in large directed graphs
US7249126Dec 29, 2004Jul 24, 2007Shopping.comSystems and methods for dynamically updating relevance of a selected item
US7260568Apr 15, 2004Aug 21, 2007Microsoft CorporationVerifying relevance between keywords and web site contents
US7266553Jul 1, 2002Sep 4, 2007Microsoft CorporationContent data indexing
US7269587Dec 1, 2004Sep 11, 2007The Board of Trustees of the Leland Stanford Junior UniversityScoring documents in a linked database
US7281005Oct 20, 2003Oct 9, 2007Telenor ASABackward and forward non-normalized link weight analysis method, system, and computer program product
US7289985Apr 15, 2004Oct 30, 2007Microsoft CorporationEnhanced document retrieval
US7293007Apr 29, 2004Nov 6, 2007Microsoft CorporationMethod and system for identifying image relatedness using link and page layout analysis
US7299270Jul 10, 2001Nov 20, 2007Lycos, Inc.Inferring relations between internet objects that are not connected directly
US7305389Apr 15, 2004Dec 4, 2007Microsoft CorporationContent propagation for enhanced document retrieval
US7366705Aug 16, 2004Apr 29, 2008Microsoft CorporationClustering based text classification
US7372903Sep 22, 1999May 13, 2008MediaTek, Inc.Apparatus and method for object based rate control in a coding system
US7386543Jun 30, 2006Jun 10, 2008Google Inc.System and method for supporting editorial opinion in the ranking of search results
US7386792Jan 18, 2002Jun 10, 2008Thomas Layne BascomSystem and method for collecting, storing, managing and providing categorized information related to a document object
US7389241Apr 9, 2002Jun 17, 2008Thomas Layne BascomMethod for users of a network to provide other users with access to link relationships between documents
US7421432Dec 13, 2000Sep 2, 2008Google Inc.Hypertext browser assistant
US7493320Aug 16, 2004Feb 17, 2009Telenor ASAMethod, system, and computer program product for ranking of documents using link analysis, with remedies for sinks
US7499934May 23, 2006Mar 3, 2009International Business Machines CorporationMethod for linking documents
US7512587Jul 1, 2004Mar 31, 2009Microsoft CorporationEfficient computation of web page rankings
US7529756Dec 22, 2000May 5, 2009West Services, Inc.System and method for processing formatted text documents in a database
US7565630Jun 15, 2004Jul 21, 2009Google Inc.Customization of search results for search queries received from third party sites
US7584183Feb 1, 2006Sep 1, 2009Yahoo! Inc.Method for node classification and scoring by combining parallel iterative scoring calculation
US7593981Nov 3, 2006Sep 22, 2009Alexa InternetDetection of search behavior based associations between web sites
US7630973Nov 3, 2003Dec 8, 2009Yahoo! Inc.Method for identifying related pages in a hyperlinked database
US7668822Sep 18, 2006Feb 23, 2010Become, Inc.Method for assigning quality scores to documents in a linked database
US7676464Mar 17, 2006Mar 9, 2010International Business Machines CorporationPage-ranking via user expertise and content relevance
US7676555Dec 4, 2006Mar 9, 2010BrightPlanet CorporationSystem and method for efficient control and capture of dynamic database content
US7680812Sep 16, 2005Mar 16, 2010Telenor ASAMethod, system, and computer program product for searching for, navigating among, and ranking of documents in a personal web
US7689585Apr 15, 2004Mar 30, 2010Microsoft CorporationReinforced clustering of multi-type data objects for search term suggestion
US7689666Aug 28, 2007Mar 30, 2010System and method for restricting internet access of a computer
US7716223Dec 1, 2004May 11, 2010Google Inc.Variable personalization of search results in a search engine
US7752186Dec 20, 2004Jul 6, 2010AOL Inc.Grouping multimedia and streaming media search results
US7778954Mar 6, 2006Aug 17, 2010West Publishing CorporationSystems, methods, and software for presenting legal case histories
US7797344Dec 23, 2005Sep 14, 2010Become, Inc.Method for assigning relative quality scores to a collection of linked documents
US7809705Feb 13, 2007Oct 5, 2010Yahoo! Inc.System and method for determining web page quality using collective inference based on local and global information
US7831526Aug 27, 2007Nov 9, 2010Fair Isaac CorporationArticle and method for finding a compact representation to visualize complex decision trees
US7853586Oct 18, 2004Dec 14, 2010Google Inc.Highlighting occurrences of terms in documents or search results
US7873652Sep 2, 2005Jan 18, 2011Charles E. Hill & Associates, Inc.Electronic presentation generation system and method
US7882105Jul 24, 2006Feb 1, 2011France TelecomMethod of ranking a set of electronic documents of the type possibly containing hypertext links to other electronic documents
US7908260Dec 31, 2007Mar 15, 2011BrightPlanet Corporation II, Inc.Source editing, internationalization, advanced configuration wizard, and summary page selection for information automation systems
US7908277Feb 5, 2007Mar 15, 2011The Board of Trustees of the Leland Stanford Junior UniversityAnnotating links in a document based on the ranks of documents pointed to by the links
US7912831Oct 3, 2006Mar 22, 2011Yahoo! Inc.System and method for characterizing a web page using multiple anchor sets of web pages
US7925967Jun 8, 2001Apr 12, 2011AOL Inc.Metadata quality improvement
US7970768Aug 20, 2007Jun 28, 2011Microsoft CorporationContent data indexing with content associations
US7987115Dec 27, 2007Jul 26, 2011Institute for Information IndustryAdvertisement selection systems and methods for internet articles
US7987189Aug 20, 2007Jul 26, 2011Microsoft CorporationContent data indexing and result ranking
US7991755Dec 17, 2004Aug 2, 2011International Business Machines CorporationDynamically ranking nodes and labels in a hyperlinked database
US8055663Dec 20, 2006Nov 8, 2011Institute for Information IndustrySystems and methods for measuring behavior characteristics
US8065301Jul 11, 2007Nov 22, 2011Shopping.comSystems and methods for dynamically updating relevance of a selected item
US8095529Jan 4, 2005Jan 10, 2012AOL Inc.Full-text relevancy ranking
US8126884Jan 28, 2010Feb 28, 2012The Board of Trustees of the Leland Stanford Junior UniversityScoring documents in a linked database
US8131715Jan 19, 2010Mar 6, 2012The Board of Trustees of the Leland Stanford Junior UniversityScoring documents in a database
US8131717Jan 19, 2010Mar 6, 2012The Board of Trustees of the Leland Stanford Junior UniversityScoring documents in a database
US8135731Dec 2, 2004Mar 13, 2012International Business Machines CorporationAdministration of search results
US8161040Apr 30, 2008Apr 17, 2012Piffany, Inc.Criteria-specific authority ranking
US8161050Nov 20, 2008Apr 17, 2012Microsoft CorporationVisualizing hyperlinks in a search results list
US8176033Apr 21, 2009May 8, 2012NTT DoCoMo, Inc.Document processing device and document processing method
US8180776Mar 9, 2010May 15, 2012Google Inc.Variable personalization of search results in a search engine
US8195651Feb 2, 2010Jun 5, 2012The Board of Trustees of the Leland Stanford Junior UniversityScoring documents in a linked database
US8200609Aug 29, 2008Jun 12, 2012Fair Isaac CorporationConstruction of decision logic with graphs
US8209311Jul 1, 2010Jun 26, 2012AOL Inc.Methods and systems for grouping uniform resource locators based on masks
US8229782Dec 17, 2009Jul 24, 2012Amazon.com, Inc.Methods and systems for processing distributed feedback
US8237716Sep 8, 2008Aug 7, 2012Fair Isaac CorporationAlgorithm for drawing directed acyclic graphs

Claims

1. A computer program product, for use with a computer system, for directing the computer system to execute a search of information resources, the resources having content-based links between each other, to identify a desired subset of the information resources which satisfy a desired criterion, the computer program product comprising:

a computer-readable medium;
means, provided on the recording medium, for directing the computer system to identify an initial set of information resources;
means, provided on the recording medium, for directing the computer system to define initial authoritativeness information for the initial set;
means, provided on the recording medium, for directing the computer system to use the initial authoritativeness information as input authoritativeness information, to execute the steps of:
(i) producing first authoritativeness information about a set of information resources pointed to by links in resources of the input set, and
(ii) producing second authoritativeness information about a set of information resources having links that point to resources of the input set; and
means, provided on the recording medium, for directing the computer system to produce a final set of information resources based on the first and second authoritativeness information.

2. A computer program product as recited in claim 1, wherein the information resources include World Wide Web pages, and the content-based links include hyperlinks.

3. A computer program product as recited in claim 1, wherein the means for directing to identify an initial set of information resources includes means, provided on the recording medium, for directing the computer system to obtain, as an input, an information resource containing subject matter of interest.

4. A computer program product as recited in claim 3, wherein the means for directing to identify an initial set of information resources includes means, provided on the recording medium, for directing the computer system to identify a further set of information resources linked to the input information resource.

5. A computer program product as recited in claim 1, wherein:

the means for directing to execute the steps of producing first and second authoritativeness information is operative in a series of iterations;
the initial authoritativeness information is used as input authoritativeness information for a first iteration; and the produced first and second authoritativeness information is a result of the iteration, the first and second authoritativeness information produced in a given iteration to be used as the input authoritativeness information for the next iteration.

6. A computer program product as recited in claim 1 further comprising means, provided on the recording medium, for directing the computer system to execute the steps of producing first authoritativeness information and producing second authoritativeness information in a series of iterations until a predetermined condition is met.

7. A computer program product as recited in claim 6, wherein the predetermined condition includes the execution of a specified number of iterations.

8. A computer program product as recited in claim 6, wherein the predetermined condition includes a steady state in which further iterations result in substantially the same results.

9. A computer program product as recited in claim 6, wherein the means for directing to identify an initial set of information resources includes means, provided on the recording medium, for directing the computer system to execute a keyword-based query search, results of the search including information resources to be included in the initial set.

10. A computer program product as recited in claim 9, wherein the means for directing to identify an initial set of information resources further includes means, provided on the recording medium, for directing the computer system to identify information resources linked to or from the information resources which are the results of the search, the former information resources also to be included in the initial set.

11. A computer program product as recited in claim 10, wherein the means for directing to define initial authoritativeness information includes means, provided on the recording medium, for directing the computer system to select an initial numerical authoritativeness value for each of the information resources of the initial set.

12. A computer program product as recited in claim 11, wherein the means for directing to define initial authoritativeness information further includes means, provided on the recording medium, for directing the computer system to define an authority value and a hub value for each of the information resources of the initial set.

13. A computer program product as recited in claim 12, wherein the defined authority values and hub values are processed as vectors, each vector containing a respective term corresponding with each respective one of the information resources of the initial set, and having stored therein the value defined for that respective one of the information resources of the initial set.

14. A computer program product as recited in claim 12, wherein:

an initial hub value is defined as 1 if the information resource was found by the keyword-based query search, and 0 if the information resource is linked to or from the information resources which are the results of the search; and
an initial authority value is defined as 0 for all information resources.

15. A computer program product as recited in claim 12, wherein, for each iteration:

the hub value for an information resource is updated as the sum of the authority values for authority information resources which point to the hub information resource; and
the authority value for an information resource is updated as the sum of the hub values for hub information resources which are pointed to by the information resource.

16. A computer program product as recited in claim 15, wherein each iteration further includes normalizing the hub and authority values for the information resources.

17. A computer program product as recited in claim 1, wherein the means for directing to produce a final set of information resources includes means, provided on the recording medium, for directing the computer system to select information resources from the set based on their hub and authority values.

18. A computer program product as recited in claim 17, wherein the means for directing to select includes means, provided on the recording medium, for directing the computer system to select information resources whose hub values or authority values have greatest magnitudes.

19. A computer program product as recited in claim 17, wherein the means for directing to select includes means, provided on the recording medium, for directing the computer system to select a plurality of successive communities, selecting each successive community including selecting information resources whose hub values or authority values have greatest magnitudes of those information resources not already selected for a prior community.

20. A method for executing a search of information resources, the resources having content-based links between each other, to identify a desired subset of the information resources which satisfy a desired criterion, the method comprising the steps of:

identifying an initial set of information resources;
defining initial authoritativeness information for the initial set;
using the initial authoritativeness information as input authoritativeness information, executing the steps of:
(i) producing first authoritativeness information about a set of information resources pointed to by links in resources of the input set, and
(ii) producing second authoritativeness information about a set of information resources having links that point to resources of the input set; and
producing a final set of information resources based on the first and second authoritativeness information.

21. A method as recited in claim 20, wherein the information resources include World Wide Web pages, and the content-based links include hyperlinks.

22. A method as recited in claim 20, wherein the step of identifying an initial set of information resources includes obtaining, as an input, an information resource containing subject matter of interest.

23. A method as recited in claim 22, wherein the step of identifying an initial set of information resources includes identifying a further set of information resources linked to the input information resource.

24. A method as recited in claim 20, wherein:

the step of executing the steps of producing first and second authoritativeness information is executed in a series of iterations;
the initial authoritativeness information is used as input authoritativeness information for a first iteration; and
the produced first and second authoritativeness information is a result of the iteration, the first and second authoritativeness information produced in a given iteration to be used as the input authoritativeness information for the next iteration.

25. A method as recited in claim 20, wherein the steps of producing first authoritativeness information and producing second authoritativeness information are executed in a series of iterations until a predetermined condition is met.

26. A method as recited in claim 25, wherein the predetermined condition includes the execution of a specified number of iterations.

27. A method as recited in claim 25, wherein the predetermined condition includes a steady state in which further iterations result in substantially the same results.

28. A method as recited in claim 25, wherein the step of identifying an initial set of information resources includes executing a keyword-based query search, results of the search including information resources to be included in the initial set.

29. A method as recited in claim 28, wherein the step of identifying an initial set of information resources further includes identifying information resources linked to or from the information resources which are the results of the search, the former information resources also to be included in the initial set.

30. A method as recited in claim 29, wherein the step of defining initial authoritativeness information includes selecting an initial numerical authoritativeness value for each of the information resources of the initial set.

31. A method as recited in claim 30, wherein the step of defining initial authoritativeness information further includes defining an authority value and a hub value for each of the information resources of the initial set.

32. A method as recited in claim 31, wherein the defined authority values and hub values are processed as vectors, each vector containing a respective term corresponding with each respective one of the information resources of the initial set, and having stored therein the value defined for that respective one of the information resources of the initial set.

33. A method as recited in claim 31, wherein:

an initial hub value is defined as 1 if the information resource was found by the keyword-based query search, and 0 if the information resource is linked to or from the information resources which are the results of the search; and
an initial authority value is defined as 0 for all information resources.

34. A method as recited in claim 31, wherein, for each iteration:

the hub value for an information resource is updated as the sum of the authority values for authority information resources which point to the hub information resource; and
the authority value for an information resource is updated as the sum of the hub values for hub information resources which are pointed to by the information resource.

35. A method as recited in claim 34, wherein each iteration further includes normalizing the hub and authority values for the information resources.

36. A method as recited in claim 20, wherein:

each information resource is associated with an authority value and a hub value; and
the step of producing a final set of information resources includes selecting information resources from the set based on the hub and authority values.

37. A method as recited in claim 36, wherein the step of selecting includes selecting information resources whose hub values or authority values have greatest magnitudes.

38. A method as recited in claim 36, wherein the step of selecting includes selecting a plurality of successive communities, selecting each successive community including selecting information resources whose hub values or authority values have greatest magnitudes of those information resources not already selected for a prior community.

39. A system for executing a search of information resources, the resources having content-based links between each other, to identify a desired subset of the information resources which satisfy a desired criterion, the system comprising:

means for identifying an initial set of information resources;
means for defining initial authoritativeness information for the initial set;
means for using the initial authoritativeness information as input authoritativeness information, to execute the steps of:
(i) producing first authoritativeness information about a set of information resources pointed to by links in resources of the input set, and
(ii) producing second authoritativeness information about a set of information resources having links that point to resources of the input set; and
means for producing a final set of information resources based on the first and second authoritativeness information.

40. A system as recited in claim 39, wherein the information resources include World Wide Web pages, and the content-based links include hyperlinks.

41. A system as recited in claim 39, wherein the means for identifying an initial set of information resources includes means for obtaining, as an input, an information resource containing subject matter of interest.

42. A system as recited in claim 41, wherein the means for identifying an initial set of information resources includes means for identifying a further set of information resources linked to the input information resource.

43. A system as recited in claim 39, wherein:

the means for executing the steps of producing first and second authoritativeness information is operative in a series of iterations;
the initial authoritativeness information is used as input authoritativeness information for a first iteration; and
the produced first and second authoritativeness information is a result of the iteration, the first and second authoritativeness information produced in a given iteration to be used as the input authoritativeness information for the next iteration.

44. A system as recited in claim 39 further comprising means for executing the steps of producing first authoritativeness information and producing second authoritativeness information in a series of iterations until a predetermined condition is met.

45. A system as recited in claim 44, wherein the predetermined condition includes the execution of a specified number of iterations.

46. A system as recited in claim 44, wherein the predetermined condition includes a steady state in which further iterations result in substantially the same results.

47. A system as recited in claim 44, wherein the means for identifying an initial set of information resources includes means for executing a keyword-based query search, results of the search including information resources to be included in the initial set.

48. A system as recited in claim 47, wherein the means for identifying an initial set of information resources further includes means for identifying information resources linked to or from the information resources which are the results of the search, the former information resources also to be included in the initial set.

49. A system as recited in claim 48, wherein the means for defining initial authoritativeness information includes means for selecting an initial numerical authoritativeness value for each of the information resources of the initial set.

50. A system as recited in claim 49, wherein the means for defining initial authoritativeness information further includes means for defining an authority value and a hub value for each of the information resources of the initial set.

51. A system as recited in claim 50, wherein the defined authority values and hub values are processed as vectors, each vector containing a respective term corresponding with each respective one of the information resources of the initial set, and having stored therein the value defined for that respective one of the information resources of the initial set.

52. A system as recited in claim 50, wherein:

an initial hub value is defined as 1 if the information resource was found by the keyword-based query search, and 0 if the information resource is linked to or from the information resources which are the results of the search; and
an initial authority value is defined as 0 for all information resources.

53. A system as recited in claim 50, wherein, for each iteration:

the hub value for an information resource is updated as the sum of the authority values for authority information resources which point to the hub information resource; and
the authority value for an information resource is updated as the sum of the hub values for hub information resources which are pointed to by the information resource.

54. A system as recited in claim 53, wherein each iteration further includes normalizing the hub and authority values for the information resources.

55. A system as recited in claim 39, wherein the means for producing a final set of information resources includes means for selecting information resources from the set based on their hub and authority values.

56. A system as recited in claim 55, wherein the means for selecting includes means for selecting information resources whose hub values or authority values have greatest magnitudes.

57. A system as recited in claim 55, wherein the means for selecting includes means for selecting a plurality of successive communities, selecting each successive community including selecting information resources whose hub values or authority values have greatest magnitudes of those information resources not already selected for a prior community.