WO2003012685A3 - A data quality system - Google Patents

A data quality system Download PDF

Info

Publication number
WO2003012685A3
WO2003012685A3 PCT/IE2002/000117 IE0200117W WO03012685A3 WO 2003012685 A3 WO2003012685 A3 WO 2003012685A3 IE 0200117 W IE0200117 W IE 0200117W WO 03012685 A3 WO03012685 A3 WO 03012685A3
Authority
WO
WIPO (PCT)
Prior art keywords
vector
data quality
quality system
pairs
generate
Prior art date
Application number
PCT/IE2002/000117
Other languages
French (fr)
Other versions
WO2003012685A2 (en
Inventor
Gary Ramsay
Sarah-Jane Delany
Brian Caulfield
Garry Moroney
Padraig Cunningham
Ronan Pearce
Original Assignee
Tristlam Ltd
Gary Ramsay
Sarah-Jane Delany
Brian Caulfield
Garry Moroney
Padraig Cunningham
Ronan Pearce
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tristlam Ltd, Gary Ramsay, Sarah-Jane Delany, Brian Caulfield, Garry Moroney, Padraig Cunningham, Ronan Pearce filed Critical Tristlam Ltd
Publication of WO2003012685A2 publication Critical patent/WO2003012685A2/en
Priority to US10/768,979 priority Critical patent/US7281001B2/en
Publication of WO2003012685A3 publication Critical patent/WO2003012685A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Abstract

A system (1) generates an output indicating scores for the extent of matching of pairs of data records. Thresholds may be set for the scores for decision-making or human review. A vector extraction module (12) measures similarity of pairs of fields in a pair of records to generate a vector. The vector is then processed to generate a score for the record pair.
PCT/IE2002/000117 2001-08-03 2002-08-02 A data quality system WO2003012685A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/768,979 US7281001B2 (en) 2001-08-03 2004-02-02 Data quality system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IE2001/0744 2001-08-03
IE20010744 2001-08-03

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US10/768,979 Continuation US7281001B2 (en) 2001-08-03 2004-02-02 Data quality system

Publications (2)

Publication Number Publication Date
WO2003012685A2 WO2003012685A2 (en) 2003-02-13
WO2003012685A3 true WO2003012685A3 (en) 2004-02-12

Family

ID=11042824

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IE2002/000117 WO2003012685A2 (en) 2001-08-03 2002-08-02 A data quality system

Country Status (3)

Country Link
US (1) US7281001B2 (en)
IE (1) IES20020647A2 (en)
WO (1) WO2003012685A2 (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7774715B1 (en) 2000-06-23 2010-08-10 Ecomsystems, Inc. System and method for computer-created advertisements
US8285590B2 (en) 2000-06-23 2012-10-09 Ecomsystems, Inc. Systems and methods for computer-created advertisements
US7392240B2 (en) * 2002-11-08 2008-06-24 Dun & Bradstreet, Inc. System and method for searching and matching databases
US7287019B2 (en) * 2003-06-04 2007-10-23 Microsoft Corporation Duplicate data elimination system
US20050034042A1 (en) * 2003-08-07 2005-02-10 Process Direction, Llc System and method for processing and identifying errors in data
US20050246330A1 (en) * 2004-03-05 2005-11-03 Giang Phan H System and method for blocking key selection
GB2419974A (en) * 2004-11-09 2006-05-10 Finsoft Ltd Calculating the quality of a data record
FR2880492B1 (en) * 2005-01-04 2015-10-30 Gred METHOD AND SYSTEM FOR MANAGING PATIENT IDENTITIES IN A COMPUTER NETWORK FOR PRODUCING AND STORING MEDICAL INFORMATION
US8285739B2 (en) 2005-07-28 2012-10-09 International Business Machines Corporation System and method for identifying qualifying data records from underlying databases
US20070067278A1 (en) * 2005-09-22 2007-03-22 Gtess Corporation Data file correlation system and method
DK1952285T3 (en) * 2005-11-23 2011-01-10 Dun & Bradstreet Inc System and method for crawling and comparing data that has word-like content
US7672942B2 (en) * 2006-05-01 2010-03-02 Sap, Ag Method and apparatus for matching non-normalized data values
US8296183B2 (en) 2009-11-23 2012-10-23 Ecomsystems, Inc. System and method for dynamic layout intelligence
US8407189B2 (en) * 2009-11-25 2013-03-26 International Business Machines Corporation Finding and fixing stability problems in personal computer systems
US20120150825A1 (en) 2010-12-13 2012-06-14 International Business Machines Corporation Cleansing a Database System to Improve Data Quality
US8635197B2 (en) * 2011-02-28 2014-01-21 International Business Machines Corporation Systems and methods for efficient development of a rule-based system using crowd-sourcing
US20120314249A1 (en) * 2011-06-13 2012-12-13 Xerox Corporation Methods and systems for reminding about print history
CN103823812A (en) * 2012-11-19 2014-05-28 苏州工业园区新宏博通讯科技有限公司 System data management method
US9110941B2 (en) * 2013-03-15 2015-08-18 International Business Machines Corporation Master data governance process driven by source data accuracy metric
US10515060B2 (en) * 2013-10-29 2019-12-24 Medidata Solutions, Inc. Method and system for generating a master clinical database and uses thereof
US20160147799A1 (en) * 2014-11-26 2016-05-26 Hewlett-Packard Development Company, L.P. Resolution of data inconsistencies
US10515101B2 (en) * 2016-04-19 2019-12-24 Strava, Inc. Determining clusters of similar activities
JP6664007B2 (en) 2016-04-20 2020-03-13 エーエスエムエル ネザーランズ ビー.ブイ. Method for aligning records, scheduling maintenance, and apparatus
US10558627B2 (en) * 2016-04-21 2020-02-11 Leantaas, Inc. Method and system for cleansing and de-duplicating data
US10147040B2 (en) 2017-01-20 2018-12-04 Alchemy IoT Device data quality evaluator
US11055327B2 (en) 2018-07-01 2021-07-06 Quadient Technologies France Unstructured data parsing for structured information
US11550766B2 (en) 2019-08-14 2023-01-10 Oracle International Corporation Data quality using artificial intelligence
CN113591485A (en) * 2021-06-17 2021-11-02 国网浙江省电力有限公司 Intelligent data quality auditing system and method based on data science
US11922357B2 (en) 2021-10-07 2024-03-05 Charter Communications Operating, Llc System and method for identifying and handling data quality anomalies
US20230244697A1 (en) * 2022-01-31 2023-08-03 Verizon Patent And Licensing Inc. Systems and methods for hybrid record unification using a combination of deterministic, probabilistic, and possibilistic operations
CN116894057B (en) * 2023-07-17 2023-12-22 云达信息技术有限公司 Python-based cloud service data collection processing method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724597A (en) * 1994-07-29 1998-03-03 U S West Technologies, Inc. Method and system for matching names and addresses
US6065003A (en) * 1997-08-19 2000-05-16 Microsoft Corporation System and method for finding the closest match of a data entry

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5255310A (en) * 1989-08-11 1993-10-19 Korea Telecommunication Authority Method of approximately matching an input character string with a key word and vocally outputting data
AU631276B2 (en) * 1989-12-22 1992-11-19 Bull Hn Information Systems Inc. Name resolution in a directory database
US5619709A (en) * 1993-09-20 1997-04-08 Hnc, Inc. System and method of context vector generation and retrieval
US5930784A (en) * 1997-08-21 1999-07-27 Sandia Corporation Method of locating related items in a geometric space for data mining
US6289342B1 (en) * 1998-01-05 2001-09-11 Nec Research Institute, Inc. Autonomous citation indexing and literature browsing using citation context
US6263333B1 (en) * 1998-10-22 2001-07-17 International Business Machines Corporation Method for searching non-tokenized text and tokenized text for matches against a keyword data structure
US6665868B1 (en) * 2000-03-21 2003-12-16 International Business Machines Corporation Optimizing host application presentation space recognition events through matching prioritization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724597A (en) * 1994-07-29 1998-03-03 U S West Technologies, Inc. Method and system for matching names and addresses
US6065003A (en) * 1997-08-19 2000-05-16 Microsoft Corporation System and method for finding the closest match of a data entry

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HERNANDEZ M A ET AL: "The merge/purge problem for large databases", 1995 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SAN JOSE, CA, USA, 22-25 MAY 1995, vol. 24, no. 2, SIGMOD Record, June 1995, USA, pages 127 - 138, XP002251528, ISSN: 0163-5808 *
LUJAN-MORA S., PALOMAR M: "Reducing Inconsistency in Integrating Data from Different Sources", IEEE. INTERNATIONAL SYMPOSIUM ON DATABASE ENGINEERING & APPLICATIONS, 16 July 2001 (2001-07-16) - 18 July 2001 (2001-07-18), pages 209 - 218, XP002251529, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/iel5/7469/20303/00938087.pdf?isNumber=20303&prod=CNF&arnumber=938087&arSt=209&ared=218&arAuthor=Lujan-Mora%2C+S.%3B+Palomar%2C+M.%3B> [retrieved on 20030818] *

Also Published As

Publication number Publication date
IE20020648A1 (en) 2003-03-19
IES20020647A2 (en) 2003-03-19
US20040158562A1 (en) 2004-08-12
WO2003012685A2 (en) 2003-02-13
US7281001B2 (en) 2007-10-09

Similar Documents

Publication Publication Date Title
WO2003012685A3 (en) A data quality system
WO2004059573A3 (en) Face recognition system and method
DE60045673D1 (en) Signal processing method and apparatus and recording medium
WO2001022285A3 (en) A probabilistic record linkage model derived from training data
EP1536638A4 (en) Metadata preparing device, preparing method therefor and retrieving device
WO2003058878A8 (en) Gaming device with biometric system
CA2259362A1 (en) Executing computations expressed as graphs
CA2110866A1 (en) Audience measurement system and method
WO2003013673A3 (en) Alternative player tracking techniques
WO2003049035A3 (en) Method and apparatus for automatic face blurring
WO2005052860A3 (en) System and method for detecting and matching anatomical structures using appearance and shape
EP1207475A3 (en) System and method for providing environmental impact information, recording medium recording the information, and computer data signal
EP1632172A3 (en) Plethysmograph pulse recognition processor
WO2005046431A3 (en) Removing chest compression artifacts from physiological signals
CA2195942A1 (en) Method and apparatus for inserting source identification data into a video signal
WO2007038612A3 (en) Apparatus and method for processing user-specified search image points
CA2091503A1 (en) Apparatus for studying large sets of data
AU1154895A (en) Handwriting input apparatus using more than one sensing technique
WO2007038642A3 (en) Apparatus and method for trajectory-based identification of digital data content
WO2005124630A3 (en) Transaction accounting processing system and approach
WO2003042880A1 (en) Information processing apparatus and method, and information processing system and method
ATE210323T1 (en) DETECTION SYSTEM FOR RECOGNIZING PERSONS
JPS58130393A (en) Voice recognition equipment
WO2005006278A3 (en) Systems and methods for training component-based object identification systems
HK1093805A1 (en) Personal identification method, identification system and apparatus for personal biometrical identification

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VN YU ZA ZM

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 10768979

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FORM 1205A OF 260504)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP