WO2000028441A3 - A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases - Google Patents

A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases Download PDF

Info

Publication number
WO2000028441A3
WO2000028441A3 PCT/US1999/026366 US9926366W WO0028441A3 WO 2000028441 A3 WO2000028441 A3 WO 2000028441A3 US 9926366 W US9926366 W US 9926366W WO 0028441 A3 WO0028441 A3 WO 0028441A3
Authority
WO
WIPO (PCT)
Prior art keywords
database
record
cluster
data
model
Prior art date
Application number
PCT/US1999/026366
Other languages
French (fr)
Other versions
WO2000028441A2 (en
Inventor
Usama Fayyad
Kristin P Bennett
Dan Geiger
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of WO2000028441A2 publication Critical patent/WO2000028441A2/en
Publication of WO2000028441A3 publication Critical patent/WO2000028441A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Abstract

Method and apparatus for efficiently performing nearest neighbor queries on a database of records wherein each record has a large number of attributes by automatically extracting a multidimensional index from the data. The method is based on first obtaining a statistical model of the content of the data in the form of a probability density function. This density is then used to decide how data should be reorganized on disk for efficient nearest neighbor queries. At query time, the model decides the order in which data should be scanned. It also provides the means for evaluating the probability of correctness of the answer found so far in the partial scan of data determined by the model. In this invention a clustering process is performed on the database to produce multiple data clusters. Each cluster is characterized by a cluster model. The set of clusters represent a probability density function in the form of a mixture model. A new database of records is built having an augmented record format that contains the original record attributes and an additional record attribute containing a cluster number for each record based on the clustering step. The cluster model uses a probability density function for each cluster so that the process of augmenting the attributes of each record is accomplished by evaluating each record's probability with respect to each cluster. Once the augmented records are used to build a database the augmented attribute is used as an index into the database so that nearest neighbor query analysis can be very efficiently conducted using an indexed look up process. As the database is queried, the probability density function is used to determine the order clusters or database pages are scanned. The probability density function is also used to determine when scanning can stop because the nearest neighbor has been found with high probability.
PCT/US1999/026366 1998-11-11 1999-11-09 A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases WO2000028441A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/189,229 US6263334B1 (en) 1998-11-11 1998-11-11 Density-based indexing method for efficient execution of high dimensional nearest-neighbor queries on large databases
US09/189,229 1998-11-11

Publications (2)

Publication Number Publication Date
WO2000028441A2 WO2000028441A2 (en) 2000-05-18
WO2000028441A3 true WO2000028441A3 (en) 2000-10-05

Family

ID=22696490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/026366 WO2000028441A2 (en) 1998-11-11 1999-11-09 A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases

Country Status (2)

Country Link
US (1) US6263334B1 (en)
WO (1) WO2000028441A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8954459B1 (en) 2008-06-26 2015-02-10 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US9147042B1 (en) 2010-11-22 2015-09-29 Experian Information Solutions, Inc. Systems and methods for data verification
US9152727B1 (en) 2010-08-23 2015-10-06 Experian Marketing Solutions, Inc. Systems and methods for processing consumer information for targeted marketing applications
US9251541B2 (en) 2007-05-25 2016-02-02 Experian Information Solutions, Inc. System and method for automated detection of never-pay data sets
US9342783B1 (en) 2007-03-30 2016-05-17 Consumerinfo.Com, Inc. Systems and methods for data verification
US9483606B1 (en) 2011-07-08 2016-11-01 Consumerinfo.Com, Inc. Lifescore
US9529851B1 (en) 2013-12-02 2016-12-27 Experian Information Solutions, Inc. Server architecture for electronic data quality processing
US9563916B1 (en) 2006-10-05 2017-02-07 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US9576030B1 (en) 2014-05-07 2017-02-21 Consumerinfo.Com, Inc. Keeping up with the joneses
US11847693B1 (en) 2014-02-14 2023-12-19 Experian Information Solutions, Inc. Automatic generation of code for attributes
US11954089B2 (en) 2007-09-27 2024-04-09 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records

Families Citing this family (169)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7268700B1 (en) 1998-01-27 2007-09-11 Hoffberg Steven M Mobile communication device
KR100284778B1 (en) * 1998-10-28 2001-03-15 정선종 Insertion method of high dimensional index structure for content-based image retrieval
US6397166B1 (en) * 1998-11-06 2002-05-28 International Business Machines Corporation Method and system for model-based clustering and signal-bearing medium for storing program of same
US6542894B1 (en) * 1998-12-09 2003-04-01 Unica Technologies, Inc. Execution of multiple models using data segmentation
JP3235660B2 (en) * 1998-12-24 2001-12-04 日本電気株式会社 Information retrieval apparatus and method, and storage medium storing information retrieval program
US6721759B1 (en) * 1998-12-24 2004-04-13 Sony Corporation Techniques for spatial representation of data and browsing based on similarity
US7035855B1 (en) * 2000-07-06 2006-04-25 Experian Marketing Solutions, Inc. Process and system for integrating information from disparate databases for purposes of predicting consumer behavior
US6704721B1 (en) * 1999-04-02 2004-03-09 International Business Machines Corporation Systems and methods for automated navigation between dynamic data with dissimilar structures
US6430539B1 (en) * 1999-05-06 2002-08-06 Hnc Software Predictive modeling of consumer financial behavior
KR100518860B1 (en) * 1999-07-05 2005-09-30 엘지전자 주식회사 Image searching method using feature normalizing information
US6424969B1 (en) * 1999-07-20 2002-07-23 Inmentia, Inc. System and method for organizing data
US6985885B1 (en) * 1999-09-21 2006-01-10 Intertrust Technologies Corp. Systems and methods for pricing and selling digital goods
GB2354609B (en) * 1999-09-25 2003-07-16 Ibm Method and system for predicting transactions
US7392185B2 (en) * 1999-11-12 2008-06-24 Phoenix Solutions, Inc. Speech based learning/training system using semantic decoding
US6446068B1 (en) * 1999-11-15 2002-09-03 Chris Alan Kortge System and method of finding near neighbors in large metric space databases
FR2801991B1 (en) * 1999-12-03 2002-05-03 Canon Kk CONTENT-BASED IMAGE SEARCHING METHOD AND DEVICE TAKING INTO ACCOUNT THE CONTENT OF REGIONS OF INTEREST
US6782395B2 (en) * 1999-12-03 2004-08-24 Canon Kabushiki Kaisha Method and devices for indexing and seeking digital images taking into account the definition of regions of interest
US6771841B1 (en) * 1999-12-29 2004-08-03 Intel Corporation Determining a bounding shape for a collection of points
US7318053B1 (en) * 2000-02-25 2008-01-08 International Business Machines Corporation Indexing system and method for nearest neighbor searches in high dimensional data spaces
US6470314B1 (en) * 2000-04-06 2002-10-22 International Business Machines Corporation Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech
US6532467B1 (en) * 2000-04-10 2003-03-11 Sas Institute Inc. Method for selecting node variables in a binary decision tree structure
US6701309B1 (en) * 2000-04-21 2004-03-02 Lycos, Inc. Method and system for collecting related queries
US7617184B2 (en) 2000-05-18 2009-11-10 Endeca Technologies, Inc. Scalable hierarchical data-driven navigation system and method for information retrieval
US7325201B2 (en) * 2000-05-18 2008-01-29 Endeca Technologies, Inc. System and method for manipulating content in a hierarchical data-driven search and navigation system
US7062483B2 (en) 2000-05-18 2006-06-13 Endeca Technologies, Inc. Hierarchical data-driven search and navigation system and method for information retrieval
US7035864B1 (en) * 2000-05-18 2006-04-25 Endeca Technologies, Inc. Hierarchical data-driven navigation system and method for information retrieval
US7096220B1 (en) 2000-05-24 2006-08-22 Reachforce, Inc. Web-based customer prospects harvester system
US7082427B1 (en) 2000-05-24 2006-07-25 Reachforce, Inc. Text indexing system to index, query the archive database document by keyword data representing the content of the documents and by contact data associated with the participant who generated the document
US7003517B1 (en) * 2000-05-24 2006-02-21 Inetprofit, Inc. Web-based system and method for archiving and searching participant-based internet text sources for customer lead data
US7120629B1 (en) 2000-05-24 2006-10-10 Reachforce, Inc. Prospects harvester system for providing contact data about customers of product or service offered by business enterprise extracting text documents selected from newsgroups, discussion forums, mailing lists, querying such data to provide customers who confirm to business profile data
AU6263101A (en) 2000-05-26 2001-12-03 Tzunami Inc. Method and system for organizing objects according to information categories
US20010048767A1 (en) * 2000-05-31 2001-12-06 Hyun-Doo Shin Indexing method of feature vector data space
KR100667741B1 (en) * 2000-05-31 2007-01-12 삼성전자주식회사 Indexing method of feature vector data space
US6816848B1 (en) * 2000-06-12 2004-11-09 Ncr Corporation SQL-based analytic algorithm for cluster analysis
AU6831801A (en) * 2000-06-12 2001-12-24 Previsor Inc Computer-implemented system for human resources management
US6697998B1 (en) * 2000-06-12 2004-02-24 International Business Machines Corporation Automatic labeling of unlabeled text data
WO2002003256A1 (en) * 2000-07-05 2002-01-10 Camo, Inc. Method and system for the dynamic analysis of data
EP1182588A3 (en) * 2000-08-21 2003-05-28 Samsung Electronics Co., Ltd. Signal indexing
KR100400500B1 (en) 2000-08-21 2003-10-08 삼성전자주식회사 Indexing method of feature vector data space
US7330850B1 (en) 2000-10-04 2008-02-12 Reachforce, Inc. Text mining system for web-based business intelligence applied to web site server logs
US7043531B1 (en) 2000-10-04 2006-05-09 Inetprofit, Inc. Web-based customer lead generator system with pre-emptive profiling
GB2367917A (en) 2000-10-12 2002-04-17 Qas Systems Ltd Retrieving data representing a postal address from a database of postal addresses using a trie structure
US8515959B2 (en) 2000-11-06 2013-08-20 International Business Machines Corporation Method and apparatus for maintaining and navigating a non-hierarchical personal spatial file system
KR100419575B1 (en) * 2000-12-05 2004-02-19 한국전자통신연구원 Method for bulkloading of high-dementional index structure
US7363308B2 (en) * 2000-12-28 2008-04-22 Fair Isaac Corporation System and method for obtaining keyword descriptions of records from a large database
US7283987B2 (en) 2001-03-05 2007-10-16 Sap Ag Compression scheme for improving cache behavior in database systems
DE60136491D1 (en) * 2001-03-30 2008-12-18 Nokia Corp METHOD FOR CONFIGURING A NETWORK BY DEFINING CLUSTERS
US6920453B2 (en) * 2001-12-31 2005-07-19 Nokia Corporation Method and system for finding a query-subset of events within a master-set of events
US6944619B2 (en) 2001-04-12 2005-09-13 Primentia, Inc. System and method for organizing data
US7149649B2 (en) 2001-06-08 2006-12-12 Panoratio Database Images Gmbh Statistical models for improving the performance of database operations
US6820073B1 (en) * 2001-06-20 2004-11-16 Microstrategy Inc. System and method for multiple pass cooperative processing
US7003512B1 (en) * 2001-06-20 2006-02-21 Microstrategy, Inc. System and method for multiple pass cooperative processing
US7080065B1 (en) * 2001-06-22 2006-07-18 Oracle International Corporation Query pruning using interior rectangles in an R-tree index
US7219108B2 (en) * 2001-06-22 2007-05-15 Oracle International Corporation Query prunning using exterior tiles in an R-tree index
US20030018623A1 (en) * 2001-07-18 2003-01-23 International Business Machines Corporation System and method of query processing of time variant objects
US7478103B2 (en) * 2001-08-24 2009-01-13 Rightnow Technologies, Inc. Method for clustering automation and classification techniques
US20030083924A1 (en) * 2001-10-31 2003-05-01 Yuchun Lee Multi-level audience processing
US7165058B2 (en) * 2001-12-27 2007-01-16 The Boeing Company Database analysis tool
WO2003063030A1 (en) * 2002-01-22 2003-07-31 Syngenta Participations Ag System and method for clustering data
US7031969B2 (en) * 2002-02-20 2006-04-18 Lawrence Technologies, Llc System and method for identifying relationships between database records
US8214391B2 (en) 2002-05-08 2012-07-03 International Business Machines Corporation Knowledge-based data mining system
US7010526B2 (en) 2002-05-08 2006-03-07 International Business Machines Corporation Knowledge-based data mining system
US6993534B2 (en) * 2002-05-08 2006-01-31 International Business Machines Corporation Data store for knowledge-based data mining system
US7174343B2 (en) * 2002-05-10 2007-02-06 Oracle International Corporation In-database clustering
US7747624B2 (en) * 2002-05-10 2010-06-29 Oracle International Corporation Data summarization
US7174344B2 (en) * 2002-05-10 2007-02-06 Oracle International Corporation Orthogonal partitioning clustering
US7080063B2 (en) * 2002-05-10 2006-07-18 Oracle International Corporation Probabilistic model generation
US7174336B2 (en) * 2002-05-10 2007-02-06 Oracle International Corporation Rule generation model building
US6996575B2 (en) * 2002-05-31 2006-02-07 Sas Institute Inc. Computer-implemented system and method for text-based document processing
US7133811B2 (en) * 2002-10-15 2006-11-07 Microsoft Corporation Staged mixture modeling
US20040103013A1 (en) * 2002-11-25 2004-05-27 Joel Jameson Optimal scenario forecasting, risk sharing, and risk trading
US7435306B2 (en) * 2003-01-22 2008-10-14 The Boeing Company Method for preparing rivets from cryomilled aluminum alloys and rivets produced thereby
US20040158561A1 (en) * 2003-02-04 2004-08-12 Gruenwald Bjorn J. System and method for translating languages using an intermediate content space
US9818136B1 (en) 2003-02-05 2017-11-14 Steven M. Hoffberg System and method for determining contingent relevance
US7181467B2 (en) * 2003-03-27 2007-02-20 Oracle International Corporation Delayed distance computations for nearest-neighbor queries in an R-tree index
US7870148B2 (en) * 2003-04-18 2011-01-11 Unica Corporation Scalable computation of data
DE10320419A1 (en) * 2003-05-07 2004-12-09 Siemens Ag Database query system and method for computer-aided query of a database
US7239989B2 (en) * 2003-07-18 2007-07-03 Oracle International Corporation Within-distance query pruning in an R-tree index
US7555441B2 (en) * 2003-10-10 2009-06-30 Kronos Talent Management Inc. Conceptualization of job candidate information
US7668845B1 (en) 2004-02-18 2010-02-23 Microsoft Corporation C-tree for multi-attribute indexing
EP1571571A1 (en) * 2004-03-02 2005-09-07 Henner Lüttich Multivariate logic for automated prioritisation and selection
US7406477B2 (en) * 2004-03-12 2008-07-29 Sybase, Inc. Database system with methodology for automated determination and selection of optimal indexes
US7428528B1 (en) 2004-03-31 2008-09-23 Endeca Technologies, Inc. Integrated application for manipulating content in a hierarchical data-driven search and navigation system
US7647356B2 (en) * 2004-05-07 2010-01-12 Oracle International Corporation Methods and apparatus for facilitating analysis of large data sets
WO2006018041A1 (en) * 2004-08-13 2006-02-23 Swiss Reinsurance Company Speech and textual analysis device and corresponding method
US7590589B2 (en) 2004-09-10 2009-09-15 Hoffberg Steven M Game theoretic prioritization scheme for mobile ad hoc networks permitting hierarchal deference
US7610272B2 (en) * 2004-11-29 2009-10-27 Sap Ag Materialized samples for a business warehouse query
WO2006066556A2 (en) * 2004-12-24 2006-06-29 Panoratio Database Images Gmbh Relational compressed data bank images (for accelerated interrogation of data banks)
US7877405B2 (en) * 2005-01-07 2011-01-25 Oracle International Corporation Pruning of spatial queries using index root MBRS on partitioned indexes
US8175889B1 (en) 2005-04-06 2012-05-08 Experian Information Solutions, Inc. Systems and methods for tracking changes of address based on service disconnect/connect data
US7908242B1 (en) 2005-04-11 2011-03-15 Experian Information Solutions, Inc. Systems and methods for optimizing database queries
US7571192B2 (en) * 2005-06-15 2009-08-04 Oracle International Corporation Methods and apparatus for maintaining consistency during analysis of large data sets
US8874477B2 (en) 2005-10-04 2014-10-28 Steven Mark Hoffberg Multifactorial optimization system and method
US7890510B2 (en) * 2005-10-05 2011-02-15 International Business Machines Corporation Method and apparatus for analyzing community evolution in graph data streams
US8019752B2 (en) 2005-11-10 2011-09-13 Endeca Technologies, Inc. System and method for information retrieval from object collections with complex interrelationships
US7461073B2 (en) * 2006-02-14 2008-12-02 Microsoft Corporation Co-clustering objects of heterogeneous types
US20070250476A1 (en) * 2006-04-21 2007-10-25 Lockheed Martin Corporation Approximate nearest neighbor search in metric space
US20080033809A1 (en) * 2006-07-24 2008-02-07 Black Andre B Techniques for promotion management
US8521786B2 (en) 2006-07-24 2013-08-27 International Business Machines Corporation Techniques for assigning promotions to contact entities
US8473344B2 (en) * 2006-07-24 2013-06-25 International Business Machines Corporation Contact history for promotion management
US8315904B2 (en) * 2006-07-24 2012-11-20 International Business Machines Corporation Organization for promotion management
US8473343B2 (en) * 2006-07-24 2013-06-25 International Business Machines Corporation Tracking responses to promotions
US20080028062A1 (en) * 2006-07-25 2008-01-31 Microsoft Corporation Determining measures of traffic accessing network locations
US8005759B2 (en) 2006-08-17 2011-08-23 Experian Information Solutions, Inc. System and method for providing a score for a used vehicle
US7912865B2 (en) 2006-09-26 2011-03-22 Experian Marketing Solutions, Inc. System and method for linking multiple entities in a business database
US8676802B2 (en) * 2006-11-30 2014-03-18 Oracle Otc Subsidiary Llc Method and system for information retrieval with clustering
US7743058B2 (en) * 2007-01-10 2010-06-22 Microsoft Corporation Co-clustering objects of heterogeneous types
US8606626B1 (en) * 2007-01-31 2013-12-10 Experian Information Solutions, Inc. Systems and methods for providing a direct marketing campaign planning environment
US8606666B1 (en) 2007-01-31 2013-12-10 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US7742982B2 (en) 2007-04-12 2010-06-22 Experian Marketing Solutions, Inc. Systems and methods for determining thin-file records and determining thin-file risk levels
US7870136B1 (en) * 2007-05-24 2011-01-11 Hewlett-Packard Development Company, L.P. Clustering data with constraints
US8301574B2 (en) 2007-09-17 2012-10-30 Experian Marketing Solutions, Inc. Multimedia engagement study
US20090112533A1 (en) * 2007-10-31 2009-04-30 Caterpillar Inc. Method for simplifying a mathematical model by clustering data
US7958141B2 (en) * 2007-11-01 2011-06-07 Ebay Inc. Query utilization
US7856434B2 (en) 2007-11-12 2010-12-21 Endeca Technologies, Inc. System and method for filtering rules for manipulating search results in a hierarchical search and navigation system
US7991777B2 (en) * 2007-12-03 2011-08-02 Microsoft International Holdings B.V. Method for improving search efficiency in enterprise search system
US20090240699A1 (en) * 2008-03-18 2009-09-24 Morgan Christopher B Integration for intelligence data systems
JP5224868B2 (en) * 2008-03-28 2013-07-03 株式会社東芝 Information recommendation device and information recommendation method
US7991689B1 (en) 2008-07-23 2011-08-02 Experian Information Solutions, Inc. Systems and methods for detecting bust out fraud using credit data
US8825646B1 (en) * 2008-08-08 2014-09-02 Google Inc. Scalable system for determining short paths within web link network
US20100185672A1 (en) * 2009-01-21 2010-07-22 Rising Iii Hawley K Techniques for spatial representation of data and browsing based on similarity
JP2010277329A (en) * 2009-05-28 2010-12-09 Toshiba Corp Neighborhood retrieval device
US20100332292A1 (en) 2009-06-30 2010-12-30 Experian Information Solutions, Inc. System and method for evaluating vehicle purchase loyalty
US8364518B1 (en) 2009-07-08 2013-01-29 Experian Ltd. Systems and methods for forecasting household economics
TWI385544B (en) * 2009-09-01 2013-02-11 Univ Nat Pingtung Sci & Tech Density-based data clustering method
KR101092820B1 (en) * 2009-09-22 2011-12-12 현대자동차주식회사 Lipreading and Voice recognition combination multimodal interface system
TWI391837B (en) * 2009-09-23 2013-04-01 Univ Nat Pingtung Sci & Tech Data clustering method based on density
DE102010006450B4 (en) * 2010-02-01 2019-03-28 Bruker Daltonik Gmbh Stepped search for microbial spectra in reference libraries
US9652802B1 (en) 2010-03-24 2017-05-16 Consumerinfo.Com, Inc. Indirect monitoring and reporting of a user's credit data
US8725613B1 (en) 2010-04-27 2014-05-13 Experian Information Solutions, Inc. Systems and methods for early account score and notification
EP2390810B1 (en) * 2010-05-26 2019-10-16 Tata Consultancy Services Limited Taxonomic classification of metagenomic sequences
US8738440B2 (en) 2010-06-14 2014-05-27 International Business Machines Corporation Response attribution valuation
US8489606B2 (en) * 2010-08-31 2013-07-16 Electronics And Telecommunications Research Institute Music search apparatus and method using emotion model
US8639616B1 (en) 2010-10-01 2014-01-28 Experian Information Solutions, Inc. Business to contact linkage system
US8862498B2 (en) 2010-11-08 2014-10-14 International Business Machines Corporation Response attribution valuation
US8606771B2 (en) 2010-12-21 2013-12-10 Microsoft Corporation Efficient indexing of error tolerant set containment
US8484212B2 (en) 2011-01-21 2013-07-09 Cisco Technology, Inc. Providing reconstructed data based on stored aggregate data in response to queries for unavailable data
EP2518656B1 (en) * 2011-04-30 2019-09-18 Tata Consultancy Services Limited Taxonomic classification system
US9529915B2 (en) * 2011-06-16 2016-12-27 Microsoft Technology Licensing, Llc Search results based on user and result profiles
EP2541437A1 (en) 2011-06-30 2013-01-02 dimensio informatics GmbH Data base indexing
US8775299B2 (en) 2011-07-12 2014-07-08 Experian Information Solutions, Inc. Systems and methods for large-scale credit data processing
US9251191B2 (en) * 2012-03-09 2016-02-02 Raytheon Company System and method for indexing of geospatial data using three-dimensional Cartesian space
US9853959B1 (en) 2012-05-07 2017-12-26 Consumerinfo.Com, Inc. Storage and maintenance of personal data
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
GB2508223A (en) 2012-11-26 2014-05-28 Ibm Estimating the size of a joined table in a database
GB2508603A (en) 2012-12-04 2014-06-11 Ibm Optimizing the order of execution of multiple join operations
US9697263B1 (en) 2013-03-04 2017-07-04 Experian Information Solutions, Inc. Consumer data request fulfillment system
US9213748B1 (en) * 2013-03-14 2015-12-15 Google Inc. Generating related questions for search queries
US10223401B2 (en) 2013-08-15 2019-03-05 International Business Machines Corporation Incrementally retrieving data for objects to provide a desired level of detail
US9767222B2 (en) 2013-09-27 2017-09-19 International Business Machines Corporation Information sets for data management
US10102536B1 (en) 2013-11-15 2018-10-16 Experian Information Solutions, Inc. Micro-geographic aggregation system
US10242019B1 (en) 2014-12-19 2019-03-26 Experian Information Solutions, Inc. User behavior segmentation using latent topic detection
CN107209773B (en) 2015-02-20 2021-01-01 惠普发展公司,有限责任合伙企业 Automatic invocation of unified visual interface
WO2016133543A1 (en) 2015-02-20 2016-08-25 Hewlett-Packard Development Company, L.P. Iterative visualization of a cohort for weighted high-dimensional categorical data
US10783268B2 (en) 2015-11-10 2020-09-22 Hewlett Packard Enterprise Development Lp Data allocation based on secure information retrieval
EP3469505A4 (en) 2016-06-13 2019-12-18 Affinio Inc. Method and apparatus for interacting with information distribution system
WO2018039377A1 (en) 2016-08-24 2018-03-01 Experian Information Solutions, Inc. Disambiguation and authentication of device users
US11080301B2 (en) 2016-09-28 2021-08-03 Hewlett Packard Enterprise Development Lp Storage allocation based on secure data comparisons via multiple intermediaries
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US10163227B1 (en) 2016-12-28 2018-12-25 Shutterstock, Inc. Image file compression using dummy data for non-salient portions of images
CN116205724A (en) 2017-01-31 2023-06-02 益百利信息解决方案公司 Large scale heterogeneous data ingestion and user resolution
US20190034548A1 (en) * 2017-07-26 2019-01-31 International Business Machines Corporation Selecting a browser to launch a uniform resource locator (url)
US20200265450A1 (en) * 2017-09-13 2020-08-20 Affinio Inc. Composite Radial-Angular Clustering Of A Large-Scale Social Graph
US10963434B1 (en) 2018-09-07 2021-03-30 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
CN109684518B (en) * 2018-11-02 2021-09-17 宁波大学 Variable-length Hash coding high-dimensional data nearest neighbor query method
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data
CN111354427B (en) * 2020-02-25 2022-04-29 南通大学 Nearest neighbor multi-granularity profit method for large-scale electronic health record knowledge collaborative reduction
CN111626321B (en) * 2020-04-03 2023-06-06 河南师范大学 Image data clustering method and device
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0797161A2 (en) * 1996-03-22 1997-09-24 Pilot Software Inc Computer system and computerimplemented process for applying database segment definitions to a database
US5787422A (en) * 1996-01-11 1998-07-28 Xerox Corporation Method and apparatus for information accesss employing overlapping clusters

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5325298A (en) * 1990-11-07 1994-06-28 Hnc, Inc. Methods for generating or revising context vectors for a plurality of word stems
US5590242A (en) * 1994-03-24 1996-12-31 Lucent Technologies Inc. Signal bias removal for robust telephone speech recognition
US5832182A (en) * 1996-04-24 1998-11-03 Wisconsin Alumni Research Foundation Method and system for data clustering for very large databases
US5790426A (en) * 1996-04-30 1998-08-04 Athenium L.L.C. Automated collaborative filtering system
CA2187704C (en) * 1996-10-11 1999-05-04 Darcy Kim Rossmo Expert system method of performing crime site analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787422A (en) * 1996-01-11 1998-07-28 Xerox Corporation Method and apparatus for information accesss employing overlapping clusters
EP0797161A2 (en) * 1996-03-22 1997-09-24 Pilot Software Inc Computer system and computerimplemented process for applying database segment definitions to a database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
INDYK P ET AL: "Approximate nearest neighbors: towards removing the curse of dimensionality", PROCEEDINGS OF THE THIRTIETH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, PROCEEDINGS OF STOC98: 13TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, DALLAS, TX, USA, 23-26 MAY 1998, 1998, New York, NY, USA, ACM, USA, pages 604 - 613, XP002138344, ISBN: 0-89791-962-9 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9563916B1 (en) 2006-10-05 2017-02-07 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US11954731B2 (en) 2006-10-05 2024-04-09 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US9342783B1 (en) 2007-03-30 2016-05-17 Consumerinfo.Com, Inc. Systems and methods for data verification
US9251541B2 (en) 2007-05-25 2016-02-02 Experian Information Solutions, Inc. System and method for automated detection of never-pay data sets
US11954089B2 (en) 2007-09-27 2024-04-09 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US8954459B1 (en) 2008-06-26 2015-02-10 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US9152727B1 (en) 2010-08-23 2015-10-06 Experian Marketing Solutions, Inc. Systems and methods for processing consumer information for targeted marketing applications
US9147042B1 (en) 2010-11-22 2015-09-29 Experian Information Solutions, Inc. Systems and methods for data verification
US9483606B1 (en) 2011-07-08 2016-11-01 Consumerinfo.Com, Inc. Lifescore
US9529851B1 (en) 2013-12-02 2016-12-27 Experian Information Solutions, Inc. Server architecture for electronic data quality processing
US11847693B1 (en) 2014-02-14 2023-12-19 Experian Information Solutions, Inc. Automatic generation of code for attributes
US9576030B1 (en) 2014-05-07 2017-02-21 Consumerinfo.Com, Inc. Keeping up with the joneses

Also Published As

Publication number Publication date
WO2000028441A2 (en) 2000-05-18
US6263334B1 (en) 2001-07-17

Similar Documents

Publication Publication Date Title
WO2000028441A3 (en) A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases
US7143098B2 (en) Systems, methods, and computer program products to reduce computer processing in grid cell size determination for indexing of multidimensional databases
US6178417B1 (en) Method and means of matching documents based on text genre
US20040249808A1 (en) Query expansion using query logs
US20060155681A1 (en) Method and apparatus for automatic recommendation and selection of clustering indexes
CN106599052A (en) Data query system based on ApacheKylin, and method thereof
JPH11161670A (en) Method, device, and system for information filtering
CN110597876B (en) Approximate query method for predicting future query based on offline learning historical query
JP2009500760A (en) Method and apparatus for searching in multiple data sources for a selected user community
US6826563B1 (en) Supporting bitmap indexes on primary B+tree like structures
CN106611016A (en) Image retrieval method based on decomposable word pack model
Lomet A review of recent work on multi-attribute access methods
KR102345410B1 (en) Big data intelligent collecting method and device
US20050132269A1 (en) Method for retrieving image documents using hierarchy and context techniques
DeClaris et al. Information filtering and retrieval: Overview, issues and directions
Thomas et al. Creating a customized access method for blobworld
US20060039607A1 (en) Method and apparatus for extracting feature information, and computer product
Markellos et al. Knowledge discovery in patent databases
Wan et al. Image retrieval with an octree-based color indexing scheme
JP3632477B2 (en) Internet information retrieval method and storage medium storing internet information retrieval program
Hartzman et al. A relational approach to querying data streams
JPH081642B2 (en) Keyword search method
Pareti et al. On defining signatures for the retrieval and the classification of graphical drop caps
CN111625553B (en) Statistical information collection optimization method and system
Rishe et al. Knowledge Management for Database Interoperability

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

122 Ep: pct application non-entry in european phase