CA2172874A1 - Method and System for Minimizing Attribute Naming Errors in Set Oriented Duplicate Detection - Google Patents
Method and System for Minimizing Attribute Naming Errors in Set Oriented Duplicate DetectionInfo
- Publication number
- CA2172874A1 CA2172874A1 CA2172874A CA2172874A CA2172874A1 CA 2172874 A1 CA2172874 A1 CA 2172874A1 CA 2172874 A CA2172874 A CA 2172874A CA 2172874 A CA2172874 A CA 2172874A CA 2172874 A1 CA2172874 A1 CA 2172874A1
- Authority
- CA
- Canada
- Prior art keywords
- list
- record
- records
- duplicate
- fields
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q99/00—Subject matter not provided for in other groups of this subclass
Abstract
The invention is a method for detecting duplicate records on a list or in a file and comprises a number of steps. The steps include entering a list, comprised of one or more records, to a data processing system; then, applying a nickname lookup table to the records to determine a common first name. Once a common name has been determined, the method matches a first record from the list with a second record from the list by comparing the fields of the first record with the fields of at least one other record; the comparison is based on a set of pre-determined criteria. The matching sequence determines a duplicate set, wherein the duplicate set is comprised of at least two records with fields that match. The method then lists matching records sequentially so that the system can create a new record by filling each empty field with a next available corresponding field from a subsequent record within the duplicate set. The newly created record is then retained on the original list; and the duplicate records are placed on a second list.
Pre-sorting of the list can occur just prior to the matching sequence as well as just prior to outputting the final list. Additionally, the system operator can be given a number of options to provide flexibility. These options can include:
manually correcting a record on the duplicate records list; deleting an address record from the list of duplicates; or, outputting the record.
Pre-sorting of the list can occur just prior to the matching sequence as well as just prior to outputting the final list. Additionally, the system operator can be given a number of options to provide flexibility. These options can include:
manually correcting a record on the duplicate records list; deleting an address record from the list of duplicates; or, outputting the record.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/413,579 | 1995-03-30 | ||
US08/413,579 US5799302A (en) | 1995-03-30 | 1995-03-30 | Method and system for minimizing attribute naming errors in set oriented duplicate detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2172874A1 true CA2172874A1 (en) | 1996-10-01 |
CA2172874C CA2172874C (en) | 1999-11-23 |
Family
ID=23637788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002172874A Expired - Fee Related CA2172874C (en) | 1995-03-30 | 1996-03-28 | Method and system for minimizing attribute naming errors in set oriented duplicate detection |
Country Status (3)
Country | Link |
---|---|
US (1) | US5799302A (en) |
CA (1) | CA2172874C (en) |
FR (1) | FR2732489B1 (en) |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5822720A (en) | 1994-02-16 | 1998-10-13 | Sentius Corporation | System amd method for linking streams of multimedia data for reference material for display |
US6026398A (en) * | 1997-10-16 | 2000-02-15 | Imarket, Incorporated | System and methods for searching and matching databases |
US7089331B1 (en) | 1998-05-29 | 2006-08-08 | Oracle International Corporation | Method and mechanism for reducing client-side memory footprint of transmitted data |
CN100492388C (en) | 1998-08-14 | 2009-05-27 | 3M创新有限公司 | Radio frequency identification system applications |
US7496854B2 (en) * | 1998-11-10 | 2009-02-24 | Arendi Holding Limited | Method, system and computer readable medium for addressing handling from a computer program |
US7272604B1 (en) | 1999-09-03 | 2007-09-18 | Atle Hedloy | Method, system and computer readable medium for addressing handling from an operating system |
NO984066L (en) * | 1998-09-03 | 2000-03-06 | Arendi As | Computer function button |
US6662079B2 (en) * | 1998-11-30 | 2003-12-09 | Pitney Bowes Inc. | Method and system for preparation of mailpieces having a capability for processing intermixed qualified and non-qualified mailpieces |
US6529896B1 (en) | 2000-02-17 | 2003-03-04 | International Business Machines Corporation | Method of optimizing a query having an existi subquery and a not-exists subquery |
US7389284B1 (en) | 2000-02-29 | 2008-06-17 | Oracle International Corporation | Method and mechanism for efficient processing of remote-mapped queries |
US9070105B2 (en) * | 2000-04-21 | 2015-06-30 | United States Postal Service | Systems and methods for providing change of address services over a network |
US6738768B1 (en) * | 2000-06-27 | 2004-05-18 | Johnson William J | System and method for efficient information capture |
US7013455B1 (en) | 2000-10-19 | 2006-03-14 | International Business Machines Corporation | System for automatically altering environment variable to run proper executable file by removing references to all except one duplicate file in the path sequence |
CA2449702A1 (en) * | 2001-06-05 | 2002-12-12 | 3M Innovative Properties Company | Methods of managing the transfer and use of data |
US20020180588A1 (en) * | 2001-06-05 | 2002-12-05 | Erickson David P. | Radio frequency identification in document management |
EP1399871A2 (en) * | 2001-06-15 | 2004-03-24 | 3M Innovative Properties Company | Methods of managing the transfer, use, and importation of data |
US20020190862A1 (en) * | 2001-06-15 | 2002-12-19 | 3M Innovative Properties Company | Methods of managing the transfer, use, and importation of data |
US7130861B2 (en) | 2001-08-16 | 2006-10-31 | Sentius International Corporation | Automated creation and delivery of database content |
US7103590B1 (en) | 2001-08-24 | 2006-09-05 | Oracle International Corporation | Method and system for pipelined database table functions |
US7092956B2 (en) * | 2001-11-02 | 2006-08-15 | General Electric Capital Corporation | Deduplication system |
US6999975B1 (en) * | 2001-11-14 | 2006-02-14 | Cas, Inc. | System and method for identifying records with valid address, but invalid name information |
US7370044B2 (en) * | 2001-11-19 | 2008-05-06 | Equifax, Inc. | System and method for managing and updating information relating to economic entities |
US7321858B2 (en) * | 2001-11-30 | 2008-01-22 | United Negro College Fund, Inc. | Selection of individuals from a pool of candidates in a competition system |
US7610351B1 (en) | 2002-05-10 | 2009-10-27 | Oracle International Corporation | Method and mechanism for pipelined prefetching |
US6973457B1 (en) | 2002-05-10 | 2005-12-06 | Oracle International Corporation | Method and system for scrollable cursors |
US6968339B1 (en) * | 2002-08-20 | 2005-11-22 | Bellsouth Intellectual Property Corporation | System and method for selecting data to be corrected |
US7103603B2 (en) * | 2003-03-28 | 2006-09-05 | International Business Machines Corporation | Method, apparatus, and system for improved duplicate record processing in a sort utility |
US8423563B2 (en) * | 2003-10-16 | 2013-04-16 | Sybase, Inc. | System and methodology for name searches |
US7295120B2 (en) * | 2004-12-10 | 2007-11-13 | 3M Innovative Properties Company | Device for verifying a location of a radio-frequency identification (RFID) tag on an item |
US8204866B2 (en) * | 2007-05-18 | 2012-06-19 | Microsoft Corporation | Leveraging constraints for deduplication |
US20080301184A1 (en) * | 2007-05-30 | 2008-12-04 | Pitney Bowes Inc. | System and Method for Updating Mailing Lists |
US20090167502A1 (en) * | 2007-12-31 | 2009-07-02 | 3M Innovative Properties Company | Device for verifying a location and functionality of a radio-frequency identification (RFID) tag on an item |
US8429632B1 (en) | 2009-06-30 | 2013-04-23 | Google Inc. | Method and system for debugging merged functions within a program |
US8458681B1 (en) * | 2009-06-30 | 2013-06-04 | Google Inc. | Method and system for optimizing the object code of a program |
US8682898B2 (en) | 2010-04-30 | 2014-03-25 | International Business Machines Corporation | Systems and methods for discovering synonymous elements using context over multiple similar addresses |
US8683455B1 (en) | 2011-01-12 | 2014-03-25 | Google Inc. | Method and system for optimizing an executable program by selectively merging identical program entities |
US8689200B1 (en) | 2011-01-12 | 2014-04-01 | Google Inc. | Method and system for optimizing an executable program by generating special operations for identical program entities |
US8996548B2 (en) | 2011-01-19 | 2015-03-31 | Inmar Analytics, Inc. | Identifying consuming entity behavior across domains |
US20130007106A1 (en) * | 2011-07-01 | 2013-01-03 | Salesforce. Com Inc. | Asynchronous interaction in the report generator |
US9563677B2 (en) | 2012-12-11 | 2017-02-07 | Melissa Data Corp. | Systems and methods for clustered matching of records using geographic proximity |
US11055327B2 (en) | 2018-07-01 | 2021-07-06 | Quadient Technologies France | Unstructured data parsing for structured information |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5326181A (en) * | 1986-10-14 | 1994-07-05 | Bryce Office Systems Inc. | Envelope addressing system adapted to simultaneously print addresses and bar codes |
US4858907A (en) * | 1986-10-14 | 1989-08-22 | Bryce Office Systems, Inc. | System for feeding envelopes for simultaneous printing of addresses and bar codes |
US4853882A (en) * | 1987-11-02 | 1989-08-01 | A. C. Nielsen Company | System and method for protecting against redundant mailings |
US5079714A (en) * | 1989-10-03 | 1992-01-07 | Pitney Bowes Inc. | Mail deliverability by mail and database processing |
US5111395A (en) * | 1989-11-03 | 1992-05-05 | Smith Rodney A | Automated fund collection system including means to eliminate duplicate entries from a mailing list |
US5227970A (en) * | 1990-07-06 | 1993-07-13 | Bernard C. Harris Publishing | Methods and systems for updating group mailing lists |
AU9086891A (en) * | 1990-11-05 | 1992-05-26 | Johnson & Quin, Inc. | Document control and audit apparatus and method |
US5245533A (en) * | 1990-12-18 | 1993-09-14 | A. C. Nielsen Company | Marketing research method and system for management of manufacturer's discount coupon offers |
US5428777A (en) * | 1991-11-18 | 1995-06-27 | Taylor Publishing Company | Automatic index for yearbooks with spell checking capabilities |
US5377120A (en) * | 1992-06-11 | 1994-12-27 | Humes; Carl L. | Apparatus for commingling & addressing mail pieces |
US5680611A (en) * | 1995-09-29 | 1997-10-21 | Electronic Data Systems Corporation | Duplicate record detection |
-
1995
- 1995-03-30 US US08/413,579 patent/US5799302A/en not_active Expired - Lifetime
-
1996
- 1996-03-28 CA CA002172874A patent/CA2172874C/en not_active Expired - Fee Related
- 1996-03-29 FR FR9603952A patent/FR2732489B1/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
FR2732489B1 (en) | 1999-03-19 |
US5799302A (en) | 1998-08-25 |
FR2732489A1 (en) | 1996-10-04 |
CA2172874C (en) | 1999-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2172874A1 (en) | Method and System for Minimizing Attribute Naming Errors in Set Oriented Duplicate Detection | |
US5276868A (en) | Method and apparatus for pointer compression in structured databases | |
Chernoff | Representations, automorphisms, and derivations of some operator algebras | |
US7814231B2 (en) | Method of synchronizing between three or more devices | |
KR950012220A (en) | Address Translation Mechanism of Virtual Memory Computer Systems Supporting Multiple Page Sizes | |
US8688659B2 (en) | Method for indexed-field based difference detection and correction | |
CA2049133A1 (en) | Methods and apparatus for implementing data bases to provide object-oriented invocation of applications | |
CA2223933A1 (en) | Method for managing globally distributed software components | |
WO1999023582B1 (en) | Methods and apparatus for a universal tracking system | |
CN104009921B (en) | The data message forwarding method matched based on arbitrary fields | |
GB1514829A (en) | Transversal multi-tap equalizers | |
CN104463460B (en) | Processing method and processing device for the waiting information that network data is launched | |
CN107451177A (en) | For the querying method and system of the block chain of the single corrigenda of increase block | |
US6760737B2 (en) | Spatial median filter | |
EP0400820A3 (en) | Content addressable memory | |
EP0394172A3 (en) | Method of performing file services given partial file names | |
JPH1040255A (en) | Hash table control device | |
CN105608139B (en) | Data matching system and method | |
JPH0315982A (en) | Logical simulation system | |
DE502004006212D1 (en) | METHOD FOR PROCESSING CDR INFORMATION | |
EP3683698A1 (en) | Data storage method and system for datasets | |
JP2560292B2 (en) | Vehicle work alignment device | |
CN106570024A (en) | Data increment processing method and apparatus | |
JPH06290100A (en) | Data base system | |
JPH0221026B2 (en) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed | ||
MKLA | Lapsed |
Effective date: 20120328 |