« PreviousContinue »
(i9) United States
(12) Patent Application Publication
Pedersen et al.
(io) Pub. No.: US 2007/0250498 Al (43) Pub. Date: Oct. 25, 2007
(54) DETERMINING RELATED TERMS BASED ON LINK ANNOTATIONS OF DOCUMENTS BELONGING TO SEARCH RESULT SETS
(76) Inventors: Jan Pedersen, Los Altos Hills, CA
(US); Hadar Shemtov, Palo Alto, CA
Hickman Palmero Truong & Becker LLP/
2055 Gateway Place
San Jose, CA 95110-1089 (US)
(21) Appl. No.: 11/408,550
(22) Filed: Apr. 21, 2006
(51) Int. CI.
G06F 17/30 (2006.01)
(52) U.S. CI 707/5
Techniques for automatically focusing searches conducted by a search engine are provided. According to one aspect, revised query terms are automatically generated based on text in links that are in incoming (and/or outgoing) link lists associated with documents that are referenced in initial search results generated based on initial query terms. For example, some of the phrases that appear in incoming (and/or outgoing) links associated with a result document may be selected. The selected phrases may be added to the initial query terms to generate revised query terms. These revised query terms may be submitted automatically to the search engine in order to produce a more focused list of revised search results. This process may be performed repeatedly, each iteration revising query terms generated by the previous iteration, until specified criteria are satisfied, at which point the final revised search results may be presented to a user.
Patent Application Publication Oct. 25, 2007 Sheet 1 of 2 US 2007/0250498 Al
RECEIVE INITIAL QUERY TERMS FROM USER ►
DETERMINE, BASED ON QUERY TERMS, A SET OF RESULT DOCUMENTS
RANK RESULT DOCUMENTS
SELECT LINKS THAT ARE IN TOP "N" RESULT DOCUMENTS AND THAT LINK TO
OTHER TOP "N" RESULT DOCUMENTS
DETERMINE WEIGHTS FOR SELECTED LINKS
RANK SELECTED LINKS BASED ON ASSOCIATED WEIGHTS
ADD, TO QUERY TERMS, TEXT ASSOCIATED WITH EACH LINK IN THE TOP "X" SELECTED LINKS
PRESENT, TO USER, SEARCH RESULTS THAT REFER TO RESULT DOCUMENTS
IN THE CURRENT SET OF RESULT DOCUMENTS
Patent Application Publication Oct. 25, 2007 Sheet 2 of 2 US 2007/0250498 Al
DETERMINING RELATED TERMS BASED ON LINK ANNOTATIONS OF DOCUMENTS BELONGING TO SEARCH RESULT SETS
FIELD OF THE INVENTION
 The present invention relates to search engines and, more specifically, to a technique for automatically focusing and narrowing search results.
 Search engines that enable computer users to obtain references to web pages that contain one or more specified words are now commonplace. Typically, a user can access a search engine by directing a web browser to a search engine "portal" web page. The portal page usually contains a text entry field and, sometimes, a button control. The user can initiate a search for web pages that contain specified query terms by typing those query terms into the text entry field. When the button control is activated, or when a script executing on the "portal" web page determines that a specified event has been occurred, the query terms are sent to the search engine, which typically returns, to the user's web browser, a dynamically generated web page that contains a list of references to other web pages that contain the query terms.
 All too often, such a list of references includes references to web pages that have little or nothing to do with the subject matter in which the user is interested. Even if the referenced web pages contain the query terms that the user has submitted to the search engine, this is no guarantee that those web pages will be focused on the topic to which the query terms pertain; the occurrence of the query terms in a web page may be merely tangential to the web page's primary discussion. As a result, the user is forced to hunt and pick through multitudes of irrelevant search results in order to find a select few web pages in which the user is actually interested.
 What is needed is an automated way of focusing a search so that the web pages referenced in the list of search results therein have a higher probability of relevance to the subject matter in which the user is interested.
 The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
BRIEF DESCRIPTION OF THE DRAWINGS
 The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
 FIG. 1 is a flow diagram that illustrates an example of a technique for automatically generating revised query terms based on link-related text, according to an embodiment of the invention; and
 FIG. 2 is a block diagram of a computer system on which embodiments of the invention may be implemented.
 In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
 Automated mechanisms are provided through which searches conducted by a search engine, such as the Internet search engine provided by Yahoo!, are focused and narrowed. The automated mechanisms accomplish this narrowing and focusing based on link-associated data, such as the text that typically appears underlined in a hyperlink on a web page. Such text is typically referred to as "anchor text." As a result of the narrowing, the search engine returns search results that are typically more relevant to the interests of the user who submitted initial query terms to the search engine.
 According to one embodiment of the invention, a separate list of incoming and outgoing links is maintained, in an index, for each document (e.g., web page) in a set of documents. The list of incoming links associated with a particular document indicates other documents that contain links that reference the particular document. The list of outgoing links associated with a particular document indicates other documents that are referenced by links in the particular document. For example, assuming that web pages "A,""B," and "C" each contain links that refer to web page "D," the list of incoming links associated with web page "D" comprises those of the links in web pages "A,""B," and "C" that refer to web page "D." Similarly, assuming that web page "D" contains links that refer to web pages "E,""F," and "G," the list of outgoing links associated with web page "D" comprises those of the links in web page "D" that refer to web pages "E,""F" and "G."
 A search engine generates initial search results based at least in part on initial query terms that a user submits to the search engine. The initial search results refer to result documents. In one embodiment of the invention, revised query terms are automatically generated. The revised query terms are generated based at least in part on anchor text within links that are in the incoming (and/or outgoing) lists associated with the result documents. The specific manner in which the anchor text is used to generate the revised query terms is discussed in greater detail below.
 For example, for each phrase that appears in an incoming (or, in some embodiments of the invention, outgoing) link associated with a result document, a separate weight may be determined. Some of the phrases may be selected based on their associated weights. The selected phrases may be added to the initial query terms in order to generate revised query terms. These revised query terms may be submitted automatically to the search engine in order to produce a narrowed and more focused list of revised search results. This process may be performed repeatedly, each iteration revising query terms generated by the previous iteration, until specified criteria are satisfied, at which