WO2008144457A3 - Efficient retrieval algorithm by query term discrimination - Google Patents

Efficient retrieval algorithm by query term discrimination Download PDF

Info

Publication number
WO2008144457A3
WO2008144457A3 PCT/US2008/063808 US2008063808W WO2008144457A3 WO 2008144457 A3 WO2008144457 A3 WO 2008144457A3 US 2008063808 W US2008063808 W US 2008063808W WO 2008144457 A3 WO2008144457 A3 WO 2008144457A3
Authority
WO
WIPO (PCT)
Prior art keywords
document
efficient retrieval
query term
rows
search
Prior art date
Application number
PCT/US2008/063808
Other languages
French (fr)
Other versions
WO2008144457A2 (en
Inventor
Chenxi Lin
Lei Ji
Huajun Zeng
Benyu Zhang
Zheng Chen
Jian Wang
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of WO2008144457A2 publication Critical patent/WO2008144457A2/en
Publication of WO2008144457A3 publication Critical patent/WO2008144457A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Abstract

Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
PCT/US2008/063808 2007-05-18 2008-05-15 Efficient retrieval algorithm by query term discrimination WO2008144457A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/804,627 2007-05-18
US11/804,627 US7822752B2 (en) 2007-05-18 2007-05-18 Efficient retrieval algorithm by query term discrimination

Publications (2)

Publication Number Publication Date
WO2008144457A2 WO2008144457A2 (en) 2008-11-27
WO2008144457A3 true WO2008144457A3 (en) 2009-02-12

Family

ID=40028576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/063808 WO2008144457A2 (en) 2007-05-18 2008-05-15 Efficient retrieval algorithm by query term discrimination

Country Status (2)

Country Link
US (1) US7822752B2 (en)
WO (1) WO2008144457A2 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7925644B2 (en) * 2007-03-01 2011-04-12 Microsoft Corporation Efficient retrieval algorithm by query term discrimination
JP5309570B2 (en) * 2008-01-11 2013-10-09 株式会社リコー Information retrieval apparatus, information retrieval method, and control program
US20090204889A1 (en) * 2008-02-13 2009-08-13 Mehta Rupesh R Adaptive sampling of web pages for extraction
US20100145923A1 (en) * 2008-12-04 2010-06-10 Microsoft Corporation Relaxed filter set
US20100169311A1 (en) * 2008-12-30 2010-07-01 Ashwin Tengli Approaches for the unsupervised creation of structural templates for electronic documents
US20100228738A1 (en) * 2009-03-04 2010-09-09 Mehta Rupesh R Adaptive document sampling for information extraction
CN102270201B (en) * 2010-06-01 2013-07-17 富士通株式会社 Multi-dimensional indexing method and device for network files
CN103186650B (en) * 2011-12-30 2016-05-25 中国移动通信集团四川有限公司 A kind of searching method and device
JP6520052B2 (en) 2014-11-06 2019-05-29 富士ゼロックス株式会社 INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING PROGRAM
US20170154107A1 (en) * 2014-12-11 2017-06-01 Hewlett Packard Enterprise Development Lp Determining term scores based on a modified inverse domain frequency
CN105653703A (en) * 2015-12-31 2016-06-08 武汉传神信息技术有限公司 Document retrieving and matching method
US11615149B2 (en) 2019-05-27 2023-03-28 Microsoft Technology Licensing, Llc Neural network for search retrieval and ranking
US11868413B2 (en) * 2020-12-22 2024-01-09 Direct Cursus Technology L.L.C Methods and servers for ranking digital documents in response to a query

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5765150A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Method for statistically projecting the ranking of information

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0675265B2 (en) 1989-09-20 1994-09-21 インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン Information retrieval method and system
US5915249A (en) * 1996-06-14 1999-06-22 Excite, Inc. System and method for accelerated query evaluation of very large full-text databases
US5920859A (en) 1997-02-05 1999-07-06 Idd Enterprises, L.P. Hypertext document retrieval system and method
KR100285265B1 (en) 1998-02-25 2001-04-02 윤덕용 Db management system and inverted index storage structure using sub-index and large-capacity object
US7062483B2 (en) 2000-05-18 2006-06-13 Endeca Technologies, Inc. Hierarchical data-driven search and navigation system and method for information retrieval
US7496559B2 (en) 2002-09-03 2009-02-24 X1 Technologies, Inc. Apparatus and methods for locating data
US7111000B2 (en) 2003-01-06 2006-09-19 Microsoft Corporation Retrieval of structured documents
US7647299B2 (en) 2003-06-30 2010-01-12 Google, Inc. Serving advertisements using a search of advertiser web information
US7849063B2 (en) 2003-10-17 2010-12-07 Yahoo! Inc. Systems and methods for indexing content for fast and scalable retrieval
US20050138067A1 (en) * 2003-12-19 2005-06-23 Fuji Xerox Co., Ltd. Indexing for contexual revisitation and digest generation
US20050267872A1 (en) * 2004-06-01 2005-12-01 Yaron Galai System and method for automated mapping of items to documents
US7836076B2 (en) * 2004-08-20 2010-11-16 Hewlett-Packard Development Company, L.P. Distributing content indices
US20060047656A1 (en) * 2004-09-01 2006-03-02 Dehlinger Peter J Code, system, and method for retrieving text material from a library of documents
US7765214B2 (en) 2005-05-10 2010-07-27 International Business Machines Corporation Enhancing query performance of search engines using lexical affinities
US7469251B2 (en) 2005-06-07 2008-12-23 Microsoft Corporation Extraction of information from documents
US7689559B2 (en) * 2006-02-08 2010-03-30 Telenor Asa Document similarity scoring and ranking method, device and computer program product
US7685091B2 (en) * 2006-02-14 2010-03-23 Accenture Global Services Gmbh System and method for online information analysis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5765150A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Method for statistically projecting the ranking of information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ROBERTSON S.: "Understanding inverse document frequency, on theoretical arguments for IDF", JOURNAL OF DOCUMENTATION, vol. 60, 2004, pages 503 - 520 *

Also Published As

Publication number Publication date
US7822752B2 (en) 2010-10-26
US20080288483A1 (en) 2008-11-20
WO2008144457A2 (en) 2008-11-27

Similar Documents

Publication Publication Date Title
WO2008144457A3 (en) Efficient retrieval algorithm by query term discrimination
MX2020009283A (en) Search engine scoring and ranking.
WO2007112439A3 (en) Identifying the items most relevant to a current query based on user activity with respect to the results of similar queries
WO2007130716A3 (en) Methods and apparatus for computerized searching
NZ578672A (en) Information-retrieval systems, methods, and software with concept-based searching and ranking
WO2013173826A3 (en) Populating and searching a drug informatics database
WO2009152370A3 (en) Searching using patterns of usage
GB2468804A (en) Mobile search service
WO2008108297A1 (en) Homologous search system
Liu et al. Improving ranking-based recommendation by social information and negative similarity
Hamed et al. Measuring climate change on Twitter using Google’s algorithm: Perception and events
Basile et al. Aggregation strategies for linked open data-enabled recommender systems
WO2009101625A3 (en) Method for searching for homing endonucleases, their genes and their targets
Ozsoy et al. Result diversification for tweet search
Zhao et al. A time-enhanced topic clustering approach for news web search
Bilgic et al. Active query selection for learning rankers
WO2014078181A3 (en) Ranking signals for sparse corpora
US20110004521A1 (en) Techniques For Use In Sorting Partially Sorted Lists
Ernsting et al. Language modeling approaches to blog post and feed finding
Fink et al. Statute-enhanced lexical retrieval of court cases for COLIEE 2022
Teodoro et al. Automatic IPC encoding and novelty tracking for effective patent mining
Liu et al. A topic detection and tracking system with TF-Density
CN108090175A (en) Film temperature appraisal procedure
US20130301874A1 (en) Method and system for realtime de-duplication of objects in an entity-relationship graph
Goto et al. Exploiting symmetry in relational similarity for ranking relational search results

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08755623

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08755623

Country of ref document: EP

Kind code of ref document: A2