WO2008144457A3 - Efficient retrieval algorithm by query term discrimination - Google Patents
Efficient retrieval algorithm by query term discrimination Download PDFInfo
- Publication number
- WO2008144457A3 WO2008144457A3 PCT/US2008/063808 US2008063808W WO2008144457A3 WO 2008144457 A3 WO2008144457 A3 WO 2008144457A3 US 2008063808 W US2008063808 W US 2008063808W WO 2008144457 A3 WO2008144457 A3 WO 2008144457A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- efficient retrieval
- query term
- rows
- search
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Abstract
Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/804,627 | 2007-05-18 | ||
US11/804,627 US7822752B2 (en) | 2007-05-18 | 2007-05-18 | Efficient retrieval algorithm by query term discrimination |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2008144457A2 WO2008144457A2 (en) | 2008-11-27 |
WO2008144457A3 true WO2008144457A3 (en) | 2009-02-12 |
Family
ID=40028576
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2008/063808 WO2008144457A2 (en) | 2007-05-18 | 2008-05-15 | Efficient retrieval algorithm by query term discrimination |
Country Status (2)
Country | Link |
---|---|
US (1) | US7822752B2 (en) |
WO (1) | WO2008144457A2 (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7925644B2 (en) * | 2007-03-01 | 2011-04-12 | Microsoft Corporation | Efficient retrieval algorithm by query term discrimination |
JP5309570B2 (en) * | 2008-01-11 | 2013-10-09 | 株式会社リコー | Information retrieval apparatus, information retrieval method, and control program |
US20090204889A1 (en) * | 2008-02-13 | 2009-08-13 | Mehta Rupesh R | Adaptive sampling of web pages for extraction |
US20100145923A1 (en) * | 2008-12-04 | 2010-06-10 | Microsoft Corporation | Relaxed filter set |
US20100169311A1 (en) * | 2008-12-30 | 2010-07-01 | Ashwin Tengli | Approaches for the unsupervised creation of structural templates for electronic documents |
US20100228738A1 (en) * | 2009-03-04 | 2010-09-09 | Mehta Rupesh R | Adaptive document sampling for information extraction |
CN102270201B (en) * | 2010-06-01 | 2013-07-17 | 富士通株式会社 | Multi-dimensional indexing method and device for network files |
CN103186650B (en) * | 2011-12-30 | 2016-05-25 | 中国移动通信集团四川有限公司 | A kind of searching method and device |
JP6520052B2 (en) | 2014-11-06 | 2019-05-29 | 富士ゼロックス株式会社 | INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING PROGRAM |
US20170154107A1 (en) * | 2014-12-11 | 2017-06-01 | Hewlett Packard Enterprise Development Lp | Determining term scores based on a modified inverse domain frequency |
CN105653703A (en) * | 2015-12-31 | 2016-06-08 | 武汉传神信息技术有限公司 | Document retrieving and matching method |
US11615149B2 (en) | 2019-05-27 | 2023-03-28 | Microsoft Technology Licensing, Llc | Neural network for search retrieval and ranking |
US11868413B2 (en) * | 2020-12-22 | 2024-01-09 | Direct Cursus Technology L.L.C | Methods and servers for ranking digital documents in response to a query |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5765150A (en) * | 1996-08-09 | 1998-06-09 | Digital Equipment Corporation | Method for statistically projecting the ranking of information |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0675265B2 (en) | 1989-09-20 | 1994-09-21 | インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン | Information retrieval method and system |
US5915249A (en) * | 1996-06-14 | 1999-06-22 | Excite, Inc. | System and method for accelerated query evaluation of very large full-text databases |
US5920859A (en) | 1997-02-05 | 1999-07-06 | Idd Enterprises, L.P. | Hypertext document retrieval system and method |
KR100285265B1 (en) | 1998-02-25 | 2001-04-02 | 윤덕용 | Db management system and inverted index storage structure using sub-index and large-capacity object |
US7062483B2 (en) | 2000-05-18 | 2006-06-13 | Endeca Technologies, Inc. | Hierarchical data-driven search and navigation system and method for information retrieval |
US7496559B2 (en) | 2002-09-03 | 2009-02-24 | X1 Technologies, Inc. | Apparatus and methods for locating data |
US7111000B2 (en) | 2003-01-06 | 2006-09-19 | Microsoft Corporation | Retrieval of structured documents |
US7647299B2 (en) | 2003-06-30 | 2010-01-12 | Google, Inc. | Serving advertisements using a search of advertiser web information |
US7849063B2 (en) | 2003-10-17 | 2010-12-07 | Yahoo! Inc. | Systems and methods for indexing content for fast and scalable retrieval |
US20050138067A1 (en) * | 2003-12-19 | 2005-06-23 | Fuji Xerox Co., Ltd. | Indexing for contexual revisitation and digest generation |
US20050267872A1 (en) * | 2004-06-01 | 2005-12-01 | Yaron Galai | System and method for automated mapping of items to documents |
US7836076B2 (en) * | 2004-08-20 | 2010-11-16 | Hewlett-Packard Development Company, L.P. | Distributing content indices |
US20060047656A1 (en) * | 2004-09-01 | 2006-03-02 | Dehlinger Peter J | Code, system, and method for retrieving text material from a library of documents |
US7765214B2 (en) | 2005-05-10 | 2010-07-27 | International Business Machines Corporation | Enhancing query performance of search engines using lexical affinities |
US7469251B2 (en) | 2005-06-07 | 2008-12-23 | Microsoft Corporation | Extraction of information from documents |
US7689559B2 (en) * | 2006-02-08 | 2010-03-30 | Telenor Asa | Document similarity scoring and ranking method, device and computer program product |
US7685091B2 (en) * | 2006-02-14 | 2010-03-23 | Accenture Global Services Gmbh | System and method for online information analysis |
-
2007
- 2007-05-18 US US11/804,627 patent/US7822752B2/en not_active Expired - Fee Related
-
2008
- 2008-05-15 WO PCT/US2008/063808 patent/WO2008144457A2/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5765150A (en) * | 1996-08-09 | 1998-06-09 | Digital Equipment Corporation | Method for statistically projecting the ranking of information |
Non-Patent Citations (1)
Title |
---|
ROBERTSON S.: "Understanding inverse document frequency, on theoretical arguments for IDF", JOURNAL OF DOCUMENTATION, vol. 60, 2004, pages 503 - 520 * |
Also Published As
Publication number | Publication date |
---|---|
US7822752B2 (en) | 2010-10-26 |
US20080288483A1 (en) | 2008-11-20 |
WO2008144457A2 (en) | 2008-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2008144457A3 (en) | Efficient retrieval algorithm by query term discrimination | |
MX2020009283A (en) | Search engine scoring and ranking. | |
WO2007112439A3 (en) | Identifying the items most relevant to a current query based on user activity with respect to the results of similar queries | |
WO2007130716A3 (en) | Methods and apparatus for computerized searching | |
NZ578672A (en) | Information-retrieval systems, methods, and software with concept-based searching and ranking | |
WO2013173826A3 (en) | Populating and searching a drug informatics database | |
WO2009152370A3 (en) | Searching using patterns of usage | |
GB2468804A (en) | Mobile search service | |
WO2008108297A1 (en) | Homologous search system | |
Liu et al. | Improving ranking-based recommendation by social information and negative similarity | |
Hamed et al. | Measuring climate change on Twitter using Google’s algorithm: Perception and events | |
Basile et al. | Aggregation strategies for linked open data-enabled recommender systems | |
WO2009101625A3 (en) | Method for searching for homing endonucleases, their genes and their targets | |
Ozsoy et al. | Result diversification for tweet search | |
Zhao et al. | A time-enhanced topic clustering approach for news web search | |
Bilgic et al. | Active query selection for learning rankers | |
WO2014078181A3 (en) | Ranking signals for sparse corpora | |
US20110004521A1 (en) | Techniques For Use In Sorting Partially Sorted Lists | |
Ernsting et al. | Language modeling approaches to blog post and feed finding | |
Fink et al. | Statute-enhanced lexical retrieval of court cases for COLIEE 2022 | |
Teodoro et al. | Automatic IPC encoding and novelty tracking for effective patent mining | |
Liu et al. | A topic detection and tracking system with TF-Density | |
CN108090175A (en) | Film temperature appraisal procedure | |
US20130301874A1 (en) | Method and system for realtime de-duplication of objects in an entity-relationship graph | |
Goto et al. | Exploiting symmetry in relational similarity for ranking relational search results |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08755623 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08755623 Country of ref document: EP Kind code of ref document: A2 |