WO1998020432A1 - Automatic transmission of legacy system data - Google Patents

Automatic transmission of legacy system data Download PDF

Info

Publication number
WO1998020432A1
WO1998020432A1 PCT/US1997/018878 US9718878W WO9820432A1 WO 1998020432 A1 WO1998020432 A1 WO 1998020432A1 US 9718878 W US9718878 W US 9718878W WO 9820432 A1 WO9820432 A1 WO 9820432A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
map
legacy
fields
record
Prior art date
Application number
PCT/US1997/018878
Other languages
French (fr)
Inventor
Timothy Patrick Kelliher
Jeanette Marie Bruno
Original Assignee
General Electric Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Electric Company filed Critical General Electric Company
Priority to CA002241514A priority Critical patent/CA2241514C/en
Priority to JP10521427A priority patent/JP2000505222A/en
Priority to AT97946264T priority patent/ATE519166T1/en
Priority to EP97946264A priority patent/EP0883848B1/en
Publication of WO1998020432A1 publication Critical patent/WO1998020432A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99932Access augmentation or optimizing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • Y10S707/99945Object-oriented database structure processing

Definitions

  • the present invention relates to electronic data transmission, and more specifically a system which automatically finds and transmits information to a service provider in one of a number of standard formats.
  • POMS Physician Office Management Systems
  • a mapping tool typically supports implementing an insertion or extraction routine from the legacy system. It does not help design the routines.
  • the user of the mapping tool (typically a software engineer) supplies the mapping tool with a layout of the source data, a layout of the target data items and a mapping between the source and target data items. The mapping tool then implements this map.
  • a screen scraper is a tool which monitors (and stores) inputs to the computer to determine order and type of screen inputs which are required by the legacy system.
  • the screen scraper is programmed to monitor a specific legacy system.
  • the user determines the sequence of screens (in the legacy system) needed to retrieve/insert the data from/to the legacy system.
  • the user then runs the legacy application in conjunction with the screen scraper which monitors and identifies the screens that are to be scraped and tags fields in the screens that will be retrieved/filled by the screen scraper.
  • a user then writes a script (in the screen scraper's programming language) to start up the legacy application, maneuver from screen to screen and retrieve/insert data from/into the tagged fields in the legacy application screens.
  • a script in the screen scraper's programming language
  • the screen scraper can automatically run the legacy application entering values from an electronic source (such as a data file) into the appropriate user interface fields. It can also generate data files by automatically extracting data from the legacy system's screens.
  • an electronic source such as a data file
  • the present invention works on existing legacy systems, such as a Physician's Office Management System (POMS) and automatically determines the data format of a storage device of existing legacy system, extracts data required by a service company, such as an insurance company, and transmits the data in one of several predetermined industry standard format.
  • POMS Physician's Office Management System
  • a set of sample "seed data" is fed to the legacy system, either manually, or by an automated data feeder.
  • a Data Locator searches said legacy system storage device for locations (table, record, field position) of occurrences of the seed values. It makes a "raw map” indicating the seed values, their location, and the format of the legacy field where each was found.
  • a Map Refining device receives the raw map from the Data Locator, and culls out false hits from the raw map to produce a clean map.
  • a Control Flow Analyzer identifies key fields in the clean map used to acquire other data which must be acquired first, and creates control flow information indicating the order of data extraction.
  • This control flow information and the clean map are passed to an Output Generator which scripts data extraction steps to extract the data in the required order from the legacy system storage device.
  • An Extraction device executes the data extraction steps and extracts the data from the legacy storage device and passes it to a Transmission device which reformats this data according to a predetermined format, contacts the service company by conventional communications systems, and sends the data in the predefined format to the service company.
  • An object of the present invention is to provide a system that determines the data format of an existing system, extracts the data, and transmits the data in a required format.
  • Figure 1 is a simplified block diagram of an embodiment of the present invention.
  • the invention described in this document works with an existing system, a "legacy system”, and automatically discovers the disk data format, "legacy disk format”, determines a map between the legacy data and a target description, which may be a standard format, or user-supplied, and implements this map by either: generating the source code to implement access to the data directly; or passing the information to a mapping tool, which transmits the legacy data to the service provider.
  • the present invention not only determines where the legacy values are stored in the legacy disk, but determines access order to key data fields that must be accessed first before other data is accessed.
  • Fig. 1 a simplified block diagram of the present invention is shown. Many of the blocks shown are functional blocks and represent a specific function to be accomplished. They may be purely hardware, or software running on hardware, as long as they perform the intended function.
  • An existing legacy system 10 is shown which may be any type of computer management system, for example a Physician Office Management System (POMS).
  • POMS Physician Office Management System
  • a user 1 initially provides a data feeder 15 with a suggested list of the data fields to extract/insert, the list defining a record.
  • the list of fields is a "domain description" for a record.
  • the domain descriptor should be representative of the legacy system being used. For example, a physician's Office Management System would use fields having information as to a patient's name, social security number, office visit date, diagnosis, charges, etc.
  • seed data representing a number of records.
  • the seed data is selected to be as unique as possible so as not to be confused with existing information on the legacy system. For example, when putting in zip code information, if the system is on the west coast, the majority of existing zip code information will be west coast zip codes, therefore the seed data chosen will be east coast zip codes.
  • the domain description does not contain any information about how the fields are stored in the legacy application.
  • Data feeder 15 inserts this information into a keyboard buffer 7 of legacy system 10.
  • legacy system 10 In another embodiment, user 1 simply types the information directly into legacy system 10 and a monitor device 6 running a monitor program, keeps track of keyboard input. Monitor device functions may be performed by a legacy system CPU 5, running in a multiprocessing mode.
  • Legacy system 5 employs CPU 5 running legacy application stored in a legacy application memory 8, which may, or may not be part of a contiguous memory device also having a screen buffer 9 and keyboard buffer 7.
  • Seed data from monitor device 6 or data feeder 15 is 0 passed to a Data Locator 17 coupled to a storage device 13 of legacy system 10.
  • Data Locator 17 performs an exhaustive search of legacy system storage device 13 looking for occurrences of the seed 5 values.
  • Data Locator 17 will use type information of the domain fields to recognize the data values that may be stored in a variety of formats. For example, if the domain field is a date type, then Data Locator 17 will look for matches in the form yyyyymmdd, yymmdd, mmddyyyy, mmddyy, etc. Every time a match is found 0 to one of the seed values the tool keeps a "hit list", or "raw map", of the seed values going from domain fields to legacy fields (table, record, record position), where the seed value was found. The raw map also contains the format of the legacy field where the data was stored.
  • SQL Structured Query Language
  • Data Locator 17 may directly ask legacy system CPU 5 where each seed value us stored to produce the raw map.
  • Data Locator 17 then creates an file, such as an ASCII text file, containing the raw map information.
  • user 1 may interact with a manual modification unit 37 to read in and modify the ASCII file.
  • the legacy system employs a flat database structure, or does not support SQL, an exhaustive search of the storage device 13 is performed.
  • Data Locator 17 may also have to characterize fields or delimiters that are not contained in the domain description. For instance, if legacy system 10 uses variable length records with comma delimited fields, the extraction routines will have to know to look for the commas. Delimiters can be recognized by finding one character that is repeatedly adjacent to the seed values across all the seed sets. Another example is when the legacy system implements an array of values with a variable length and uses either an array delimiter or stores the array length in some other field. When this occurs, the array delimiter or the length field, stored at some other location on storage device 13 must first be retrieved in order to locate the variable length data array.
  • Data Locator 17 produces false hits when seed field values that are not unique across the legacy data fields are chosen. For example, seed data that represent a person's sex and a person's marital status will both have seed values of "M" one to indicate that the person is a male, the other to indicate that the person is married. Assuming the legacy data includes fields for both sex and marital status, Data Locator 17 will find a match to the value "M" in a number of the application fields.
  • a Map Refining device 20 is coupled to Data
  • Locator 17 and receives the raw map.
  • Map Refining device 20 then culls out false hits from the raw map to produce a clean map of domain fields to legacy fields.
  • Map Refining device 20 may contain several functional units, some of which are required, and others which are optional, but add to the performance of the system.
  • a Set Consistency device 19 is coupled to the Data Locator 17 and receives the raw map. It analyses the raw map checking consistency from one seed set to the next. There will be multiple seed sets for a given domain description. Set Consistency device 19 verifies that if domain field A in setl is mapped to legacy fieldB, then domain field A for all the other sets must map to legacy f eldB. Maps that are found to be inconsistent are discarded.
  • a Single Record Cull device 21 culls out single record maps where only one domain field maps to a given record in the legacy application. For example, if seed valueA maps to filel, recordc and no other seed value in that set maps to filel, recordc, then this mapping is discarded.
  • the domain fields may be iterative fields.
  • an Iteration Consistency device 23 received the map after it has been processed by devices 19 and 21 and checks if all iterations of a given domain field map consistently to the same legacy field. It checks to determine if the legacy system is using an array structure to implement an iterative domain field, or has implemented different iterations of the domain field employing a fixed set of fields. For example, if domain fieldA has 10 iterations in the first seed set, and there is a map entry associating iteration 1 to legacy application fieldB , then there should be 9 other maps associating all iterations 2 through 10 to either legacy application fieldB or associating all iterations 2 through 10 to 9 different legacy application fields.
  • Map Refining device 20 may employ a Ranking device 25 which receives the map which may have been processed by devices 19, 21, and 23 (if present), that ranks the maps by the number of fields which map to a common record of legacy storage device 13. The set of maps that map the largest set of domain fields to a given legacy record are given the highest ranking. For example:
  • the map associating sex to record 1 is given a higher ranking than the maps associating sex to records 2 and 3.
  • Map Refining device 20 may optionally also include a Multiple Record Search device 27 receives the map which may have been processed by any of devices 19, 21, 23, 25, if present, and identifies if differing subsets of maps map to different records in the same file. When this occurs, Multiple Record Search device 27 infers that legacy system 10 is using different record layouts within the same file. In the case of variable length records, and variable length arrays with array delimiters, Map Refining device 20 can recognize that it cannot determine a consistent positioning of the domain fields and it will expand its analysis to look for non- domain fields that regularly delimit the seed values or consistently represent the number of iterations for a given domain field in each seed set. Fields that define array lengths will be fields in the legacy application that do not map to any domain field and consistently contain the number of iterations in the corresponding seed set.
  • Map Refining device 20 After each of the above devices within Map Refining device 20 have performed their function, an ASCII, or spreadsheet file of the processed maps that were not culled out could be created, and modified by user 1 with manual modification unit 37.
  • the final output of Map Refining device 20 is a "clean map". The user may optionally update the clean map with entries for the missing domain fields.
  • a Control Flow Analyzer 31 is coupled to the Map Refining device 20 and receives its output clean map. It employs the clean map file to identify legacy fields that can be used as keys into the legacy application files. Key fields are fields which are required to get other data, and must be retrieved first.
  • a relational database file is comprised of three tables, entitled “Personal", “Gov't Nos.”, and "Insurance". Knowing only the patients name, the Personal table will provide one with a Patient number which the Physician uses to identify this patient. Providing the Patient number to the second table, Gov't. Nos. one receives the Social Security number. Providing the Social Security Number to the third table, Insurance, a list of Insurance Companies, and the past charges to each are provided.
  • Control Flow Analyzer 31 schedules the order for reading data from the legacy fields. Control flow analyzer 31 produces control flow information describing the derived control flow as output. Control Flow Analyzer 31 may also produced as an ASCII file as output for possible user modification.
  • control flow information may also be determined by this device and passed to Control Flow Analyzer 31.
  • Locator 17 may also determine control flow from the schema acquired from legacy system 10. ⁇ Output Generator
  • An Output Generator 33 receives the clean map from Map Refining device 20 indicating where domain fields are located within the legacy Data Storage device 13. It also receives the control flow information from Control Flow Analyzer 31 indicating the sequence of extracting data. It then scripts instruction steps to extract the data in the required order.
  • An Extraction device 35 is coupled to Output Generator
  • Extraction device 35 executes the scripts having the information as to which data to extract, where it is located and the order of data extraction. 6 ⁇ Transmission Device The extracted data is passed to a Transmission device 39 which reformats this data according to a predetermined user -selected format, or an industry standard, such as Electronic Data Interchange (EDI) for example set forth in the publication "Medicare Part A Specifications for the ANSI ASC X12 835" dated July 1, 1993.
  • Manual modification unit 37 may be used to select a company to send the extracted data to. This company has a predetermined format which is prestored in Transmission device 35. Transmission device 35 then contacts that company (via o conventional communications systems) and sends the data in their predefined format.
  • EDI Electronic Data Interchange

Abstract

The present invention analyses an existing legacy system, such as a Physician's Office Management System, and automatically extracts, reformats and sends required data to a service company, which may be for example, an insurance company. It begins by feeding the legacy system 'seed data' and monitors the legacy system storage device to determine a 'raw map' of where and how the seed data is stored. It then culls out multiple records, single records, inconsistent records in the raw map. Control flow information is also extracted. This indicates which fields are 'key' fields and are used to extract other information. These key fields must be acquired before their related data. After the data is located and the order is determined, a script is automatically created to extract the data. The data is then extracted and reformatted in a predetermined format determined by the service company, and the required data is automatically sent, by conventional means, to the service company.

Description

AUTOMATIC TRANSMISSION OF LEGACY SYSTEM DATA
This US Patent Application is being filed on an invention from research which was partially funded under a cooperative agreement with the US Federal Government National Institute of Standards and Technology (NIST) Advanced Technology
Program, "Health Care Information Infrastructure Technology (HUT)", contract number: 70NANB5H1011.
This application claims priority of a Provisional Application of the same name by Bruno, Kelliher, Ser. No. 60/029,558 filed Nov. 7, 1996.
BACKGROUND OF THE INVENTION
1. Scope of the Invention
The present invention relates to electronic data transmission, and more specifically a system which automatically finds and transmits information to a service provider in one of a number of standard formats.
2. Related Prior Art
It is sometimes necessary to transmit information from an existing system, a legacy system, to other systems each having a required transmission format. This is the case in many areas but especially for Physician Office Management Systems (POMS) communicating information to insurance providers, such as Aetna, Blue Cross, Prudential etc. POMS are small computer systems used for record keeping in Physicians Offices.
Typically, small interface programs are manually written for each of the different provider standards for the specific POMS being used.
Today there are three ways to integrate with a legacy application:
• write the interface code by hand, • use a mapping tool, or
• use a screen scraping tool to help implement software for the integration.
Mapping tools
A mapping tool typically supports implementing an insertion or extraction routine from the legacy system. It does not help design the routines. The user of the mapping tool (typically a software engineer) supplies the mapping tool with a layout of the source data, a layout of the target data items and a mapping between the source and target data items. The mapping tool then implements this map.
Screen scraper
A screen scraper is a tool which monitors (and stores) inputs to the computer to determine order and type of screen inputs which are required by the legacy system.
The screen scraper is programmed to monitor a specific legacy system. To program a screen scraper, the user determines the sequence of screens (in the legacy system) needed to retrieve/insert the data from/to the legacy system. The user then runs the legacy application in conjunction with the screen scraper which monitors and identifies the screens that are to be scraped and tags fields in the screens that will be retrieved/filled by the screen scraper.
A user then writes a script (in the screen scraper's programming language) to start up the legacy application, maneuver from screen to screen and retrieve/insert data from/into the tagged fields in the legacy application screens.
After it has been fully programmed, the screen scraper can automatically run the legacy application entering values from an electronic source (such as a data file) into the appropriate user interface fields. It can also generate data files by automatically extracting data from the legacy system's screens.
Currently, there is a need for a system which automatically determines the disk format of the legacy data, and transmits the data required by a service provider to it in its required format.
SUMMARY OF THE INVENTION
The present invention works on existing legacy systems, such as a Physician's Office Management System (POMS) and automatically determines the data format of a storage device of existing legacy system, extracts data required by a service company, such as an insurance company, and transmits the data in one of several predetermined industry standard format.
A set of sample "seed data" is fed to the legacy system, either manually, or by an automated data feeder.
A Data Locator searches said legacy system storage device for locations (table, record, field position) of occurrences of the seed values. It makes a "raw map" indicating the seed values, their location, and the format of the legacy field where each was found.
A Map Refining device receives the raw map from the Data Locator, and culls out false hits from the raw map to produce a clean map.
A Control Flow Analyzer identifies key fields in the clean map used to acquire other data which must be acquired first, and creates control flow information indicating the order of data extraction.
This control flow information and the clean map are passed to an Output Generator which scripts data extraction steps to extract the data in the required order from the legacy system storage device. An Extraction device executes the data extraction steps and extracts the data from the legacy storage device and passes it to a Transmission device which reformats this data according to a predetermined format, contacts the service company by conventional communications systems, and sends the data in the predefined format to the service company.
OBJECTS OF THE INVENTION
An object of the present invention is to provide a system that determines the data format of an existing system, extracts the data, and transmits the data in a required format.
It is another object of the present invention to provide an automatic interface to transmit data from an existing computer systems to a remote computer system in one of several standard formats.
BRIEF DESCRIPTION OF THE DRAWINGS
The features of the invention believed to be novel are set forth with particularity in the appended claims. The invention itself, however, both as to organization and method of operation, together with further objects and advantages thereof, may be best understood by reference to the following description taken in conjunction with the accompanying drawing in which:
Figure 1 is a simplified block diagram of an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The invention described in this document works with an existing system, a "legacy system", and automatically discovers the disk data format, "legacy disk format", determines a map between the legacy data and a target description, which may be a standard format, or user-supplied, and implements this map by either: generating the source code to implement access to the data directly; or passing the information to a mapping tool, which transmits the legacy data to the service provider.
The present invention not only determines where the legacy values are stored in the legacy disk, but determines access order to key data fields that must be accessed first before other data is accessed.
In Fig. 1 a simplified block diagram of the present invention is shown. Many of the blocks shown are functional blocks and represent a specific function to be accomplished. They may be purely hardware, or software running on hardware, as long as they perform the intended function.
An existing legacy system 10 is shown which may be any type of computer management system, for example a Physician Office Management System (POMS).
Slightly differing subsets of legacy system information is intended to be regularly sent to existing service companies, "Serv. Co. 1", "Serv. Co. 2", "Serv. Co. 3", such as insurance companies, each which has their own unique format, shown here as "fi ", "f2", "f3", respectively, or in an industry standard format, "fstnd"-
In one embodiment, a user 1 initially provides a data feeder 15 with a suggested list of the data fields to extract/insert, the list defining a record. The list of fields is a "domain description" for a record. The domain descriptor should be representative of the legacy system being used. For example, a physician's Office Management System would use fields having information as to a patient's name, social security number, office visit date, diagnosis, charges, etc.
User 1 also provides data feeder 15 with several sets of sample values for each these fields, termed "seed data" representing a number of records. The seed data is selected to be as unique as possible so as not to be confused with existing information on the legacy system. For example, when putting in zip code information, if the system is on the west coast, the majority of existing zip code information will be west coast zip codes, therefore the seed data chosen will be east coast zip codes. The domain description does not contain any information about how the fields are stored in the legacy application.
Data feeder 15 inserts this information into a keyboard buffer 7 of legacy system 10.
o In another embodiment, user 1 simply types the information directly into legacy system 10 and a monitor device 6 running a monitor program, keeps track of keyboard input. Monitor device functions may be performed by a legacy system CPU 5, running in a multiprocessing mode. Legacy system 5 employs CPU 5 running legacy application stored in a legacy application memory 8, which may, or may not be part of a contiguous memory device also having a screen buffer 9 and keyboard buffer 7.
Seed data from monitor device 6 or data feeder 15 is 0 passed to a Data Locator 17 coupled to a storage device 13 of legacy system 10.
1 ) Data Locator
Data Locator 17 performs an exhaustive search of legacy system storage device 13 looking for occurrences of the seed 5 values. Data Locator 17 will use type information of the domain fields to recognize the data values that may be stored in a variety of formats. For example, if the domain field is a date type, then Data Locator 17 will look for matches in the form yyyyymmdd, yymmdd, mmddyyyy, mmddyy, etc. Every time a match is found 0 to one of the seed values the tool keeps a "hit list", or "raw map", of the seed values going from domain fields to legacy fields (table, record, record position), where the seed value was found. The raw map also contains the format of the legacy field where the data was stored.
In legacy systems using relational database designs, a query in Structured Query Language (SQL) is first attempted to try to determine the schema, defining the field names, the field types, the order of the fields. SQL is beginning to become a standard and is widely used. If the legacy system supports SQL, Data Locator 17 may directly ask legacy system CPU 5 where each seed value us stored to produce the raw map. Data Locator 17 then creates an file, such as an ASCII text file, containing the raw map information. In an optional embodiment , user 1 may interact with a manual modification unit 37 to read in and modify the ASCII file.
If the legacy system employs a flat database structure, or does not support SQL, an exhaustive search of the storage device 13 is performed.
Data Locator 17 may also have to characterize fields or delimiters that are not contained in the domain description. For instance, if legacy system 10 uses variable length records with comma delimited fields, the extraction routines will have to know to look for the commas. Delimiters can be recognized by finding one character that is repeatedly adjacent to the seed values across all the seed sets. Another example is when the legacy system implements an array of values with a variable length and uses either an array delimiter or stores the array length in some other field. When this occurs, the array delimiter or the length field, stored at some other location on storage device 13 must first be retrieved in order to locate the variable length data array.
2^ Map Refining Device
Data Locator 17 produces false hits when seed field values that are not unique across the legacy data fields are chosen. For example, seed data that represent a person's sex and a person's marital status will both have seed values of "M" one to indicate that the person is a male, the other to indicate that the person is married. Assuming the legacy data includes fields for both sex and marital status, Data Locator 17 will find a match to the value "M" in a number of the application fields.
Therefore, a Map Refining device 20 is coupled to Data
Locator 17 and receives the raw map. Map Refining device 20 then culls out false hits from the raw map to produce a clean map of domain fields to legacy fields.
Map Refining device 20 may contain several functional units, some of which are required, and others which are optional, but add to the performance of the system.
a) Set Consistency device
A Set Consistency device 19 is coupled to the Data Locator 17 and receives the raw map. It analyses the raw map checking consistency from one seed set to the next. There will be multiple seed sets for a given domain description. Set Consistency device 19 verifies that if domain field A in setl is mapped to legacy fieldB, then domain field A for all the other sets must map to legacy f eldB. Maps that are found to be inconsistent are discarded.
b) Single Record Cull Device
A Single Record Cull device 21 culls out single record maps where only one domain field maps to a given record in the legacy application. For example, if seed valueA maps to filel, recordc and no other seed value in that set maps to filel, recordc, then this mapping is discarded.
c) Iteration Consistency device (optional)
The domain fields may be iterative fields. In an optional embodiment, an Iteration Consistency device 23 received the map after it has been processed by devices 19 and 21 and checks if all iterations of a given domain field map consistently to the same legacy field. It checks to determine if the legacy system is using an array structure to implement an iterative domain field, or has implemented different iterations of the domain field employing a fixed set of fields. For example, if domain fieldA has 10 iterations in the first seed set, and there is a map entry associating iteration 1 to legacy application fieldB , then there should be 9 other maps associating all iterations 2 through 10 to either legacy application fieldB or associating all iterations 2 through 10 to 9 different legacy application fields.
d) Ranking device (optional)
The domain fields are initially grouped by user 1. In another optional embodiment, Map Refining device 20 may employ a Ranking device 25 which receives the map which may have been processed by devices 19, 21, and 23 (if present), that ranks the maps by the number of fields which map to a common record of legacy storage device 13. The set of maps that map the largest set of domain fields to a given legacy record are given the highest ranking. For example:
Given domain fields name, date of birth, sex.
If the sex maps to records 1, 2 and 3 (possibly in different files) and the name and date of birth both map to record 1, but not 2 and 3, then the map associating sex to record 1 is given a higher ranking than the maps associating sex to records 2 and 3.
e) Multiple Record Search device
Map Refining device 20 may optionally also include a Multiple Record Search device 27 receives the map which may have been processed by any of devices 19, 21, 23, 25, if present, and identifies if differing subsets of maps map to different records in the same file. When this occurs, Multiple Record Search device 27 infers that legacy system 10 is using different record layouts within the same file. In the case of variable length records, and variable length arrays with array delimiters, Map Refining device 20 can recognize that it cannot determine a consistent positioning of the domain fields and it will expand its analysis to look for non- domain fields that regularly delimit the seed values or consistently represent the number of iterations for a given domain field in each seed set. Fields that define array lengths will be fields in the legacy application that do not map to any domain field and consistently contain the number of iterations in the corresponding seed set.
After each of the above devices within Map Refining device 20 have performed their function, an ASCII, or spreadsheet file of the processed maps that were not culled out could be created, and modified by user 1 with manual modification unit 37. The final output of Map Refining device 20 is a "clean map". The user may optionally update the clean map with entries for the missing domain fields.
3^ Control Flow Analyzer
A Control Flow Analyzer 31 is coupled to the Map Refining device 20 and receives its output clean map. It employs the clean map file to identify legacy fields that can be used as keys into the legacy application files. Key fields are fields which are required to get other data, and must be retrieved first. For example, a relational database file is comprised of three tables, entitled "Personal", "Gov't Nos.", and "Insurance". Knowing only the patients name, the Personal table will provide one with a Patient number which the Physician uses to identify this patient. Providing the Patient number to the second table, Gov't. Nos. one receives the Social Security number. Providing the Social Security Number to the third table, Insurance, a list of Insurance Companies, and the past charges to each are provided. Therefore, information from the first two tables are needed before the charges may be obtained. Based on the key field designations, Control Flow Analyzer 31 schedules the order for reading data from the legacy fields. Control flow analyzer 31 produces control flow information describing the derived control flow as output. Control Flow Analyzer 31 may also produced as an ASCII file as output for possible user modification.
If a Monitor device 6 is employed, control flow information may also be determined by this device and passed to Control Flow Analyzer 31.
If the SQL is operational on legacy system 10, Data
Locator 17 may also determine control flow from the schema acquired from legacy system 10. ^ Output Generator
An Output Generator 33 receives the clean map from Map Refining device 20 indicating where domain fields are located within the legacy Data Storage device 13. It also receives the control flow information from Control Flow Analyzer 31 indicating the sequence of extracting data. It then scripts instruction steps to extract the data in the required order.
1. It may script C++ routines to extract the data directly from the legacy system storage device 13. 2. Or it may create Mapping tool scripts to direct the mapping tool to extract the data.
5^ Extraction Device
An Extraction device 35 is coupled to Output Generator
33, and to legacy disk storage device 13. It is capable of executing scripts created by the Output device 31. Extraction device 35 executes the scripts having the information as to which data to extract, where it is located and the order of data extraction. 6^ Transmission Device The extracted data is passed to a Transmission device 39 which reformats this data according to a predetermined user -selected format, or an industry standard, such as Electronic Data Interchange (EDI) for example set forth in the publication "Medicare Part A Specifications for the ANSI ASC X12 835" dated July 1, 1993. Manual modification unit 37 may be used to select a company to send the extracted data to. This company has a predetermined format which is prestored in Transmission device 35. Transmission device 35 then contacts that company (via o conventional communications systems) and sends the data in their predefined format.
While this is described in terms of transmitting data from company to company, it may also be used to transmit data between departments with the same company.
5 While several presently preferred embodiments of the novel invention has been described in detail herein, many modifications and variations will now become apparent to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and 0 variations as fall within the true spirit of the invention.

Claims

WHAT IS CLAIMED IS:
1. A system for automatically determining the data format of a storage device of existing legacy system, extracting the data, and transmitting the data in one of several predetermined formats required by a service company comprising: a) a data input means for feeding seed data to said legacy system; b) a Data Locator coupled to said legacy storage device, for searching said legacy system storage device and for storing location (table, record, record position) of occurrences of the seed values in a raw map along with the seed values and the format of the legacy field; c) a Map Refining device coupled to, and for receiving the raw map from the Data Locator, for culling out false hits from the raw map to produce a clean map of domain fields to legacy fields; d) a Control Flow Analyzer coupled to, and receiving the output clean map from the Map Refining device for identifying key fields in the clean map used to acquire other data which must be acquired first, and for creating control flow information indicating the order of data extraction; e) an Output Generator coupled to the Map Refining device and the control flow analyser, for receiving the clean map the control flow information and for creating data extraction steps to extract the data in the required order; f) an Extraction device is coupled to the Output Generator, and to legacy storage device, for executing the data extraction steps to acquire extracted data; g) a Transmission device coupled to the Extraction device for receiving the extracted data, for reformating this data according to a predetermined format, for contacting a service company by conventional communications systems, and for sending the data in the predefined format.
. The system of claim 1 wherein the Map Refining device comprises: a Set Consistency device coupled to the Data Locator which receives the raw map and analyses the raw map checking for consistency from one seed set to the next, discarding inconsistent maps.
3. The system of claim 1 wherein the Map Refining device comprises: a Single Record Cull device which receives the raw map and culls out single record maps where only one domain field maps to a given record in the legacy application.
4. The system of claim 1 wherein the Map Refining device comprises: an iteration consistency device which receives the raw map and checks if all iterations of a given domain field map consistently to the same legacy field, culling out those which do not.
5. The system of claim 1 wherein the Map Refining device comprises: a ranking device which receives the raw map and ranks the maps by the number of fields which map to a common record of legacy storage device, the set of maps that map the largest set of domain fields to a given legacy record are given the highest ranking.
6. The system of claim 1 wherein the Map Refining device comprises: a Multiple Record Search device which receives the raw map and identifies if differing subsets of maps map to different records in the same file indicating that said legacy system is using different record layouts within the same file.
7. The system of claim 1 wherein the Map Refining device is adapted to search for repeated characters as delimiters, and indicate if the records are variable length to the Output Generator.
PCT/US1997/018878 1996-11-07 1997-10-28 Automatic transmission of legacy system data WO1998020432A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CA002241514A CA2241514C (en) 1996-11-07 1997-10-28 Automatic transmission of legacy system data
JP10521427A JP2000505222A (en) 1996-11-07 1997-10-28 Automatic transmission of legacy system data
AT97946264T ATE519166T1 (en) 1996-11-07 1997-10-28 AUTOMATIC TRANSFER OF DATA FROM AN ADOPTED SYSTEM
EP97946264A EP0883848B1 (en) 1996-11-07 1997-10-28 Automatic transmission of legacy system data

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US2955896P 1996-11-07 1996-11-07
US60/029,558 1996-11-07
US08/772,634 1996-12-23
US08/772,634 US5857194A (en) 1996-11-07 1996-12-23 Automatic transmission of legacy system data

Publications (1)

Publication Number Publication Date
WO1998020432A1 true WO1998020432A1 (en) 1998-05-14

Family

ID=26705073

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/018878 WO1998020432A1 (en) 1996-11-07 1997-10-28 Automatic transmission of legacy system data

Country Status (6)

Country Link
US (1) US5857194A (en)
EP (1) EP0883848B1 (en)
JP (1) JP2000505222A (en)
AT (1) ATE519166T1 (en)
CA (1) CA2241514C (en)
WO (1) WO1998020432A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077815A2 (en) * 2001-03-23 2002-10-03 Orchid Systems, Inc. System for and method of automatically migrating data among multiple legacy applications
EP1936516A1 (en) * 2006-12-22 2008-06-25 PRB S.r.l. Method to directly and automatically load data from documents and/or extract data to documents

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6502236B1 (en) * 1999-03-16 2002-12-31 Fujitsu Network Communications, Inc. Method and apparatus for automatic generation of programs for processing data units of multiple formats
US6836780B1 (en) 1999-09-01 2004-12-28 Jacada, Ltd. Method and system for accessing data in legacy applications
US6308178B1 (en) 1999-10-21 2001-10-23 Darc Corporation System for integrating data among heterogeneous systems
US10002167B2 (en) * 2000-02-25 2018-06-19 Vilox Technologies, Llc Search-on-the-fly/sort-on-the-fly by a search engine directed to a plurality of disparate data sources
US6687873B1 (en) 2000-03-09 2004-02-03 Electronic Data Systems Corporation Method and system for reporting XML data from a legacy computer system
US6993745B1 (en) * 2000-03-09 2006-01-31 Electronic Data Systems Corporation Method and system for modeling a legacy computer system
US7111233B1 (en) 2000-03-09 2006-09-19 Electronic Data Systems Corporation Method and system for applying XML schema
US7114147B2 (en) * 2000-03-09 2006-09-26 Electronic Data Systems Corporation Method and system for reporting XML data based on precomputed context and a document object model
US20020169715A1 (en) * 2000-08-10 2002-11-14 Ruth Robin C. System and method for administering a financial program involving the collection of payments
EP1328889A4 (en) * 2000-10-11 2005-06-01 Healthtrio Inc System for communication of health care data
US20020091818A1 (en) * 2001-01-05 2002-07-11 International Business Machines Corporation Technique and tools for high-level rule-based customizable data extraction
US6584407B2 (en) * 2001-01-10 2003-06-24 Halliburton Energy Services, Inc. Formation resistivity measurement method that eliminates effects of lateral tool motion
WO2002063470A1 (en) * 2001-02-07 2002-08-15 Orchid Systems, Inc. System for and method of learning and automatically correcting business logic errors
EP1368769A2 (en) * 2001-03-14 2003-12-10 AAA Edv Vertriebs AG Data processing device for the preparation of a goods catalogue in the form of a graphics file
US6782400B2 (en) * 2001-06-21 2004-08-24 International Business Machines Corporation Method and system for transferring data between server systems
US7089245B1 (en) * 2001-08-31 2006-08-08 Bmc Software, Inc. Service desk data transfer interface
US20030069758A1 (en) * 2001-10-10 2003-04-10 Anderson Laura M. System and method for use in providing a healthcare information database
US20030187849A1 (en) * 2002-03-19 2003-10-02 Ocwen Technology Xchange, Inc. Management and reporting system and process for use with multiple disparate data bases
KR100512758B1 (en) * 2002-12-06 2005-09-07 한국전자통신연구원 Method for architecture-based reengineering using design patterns
US7500003B2 (en) * 2002-12-26 2009-03-03 Ricoh Company, Ltd. Method and system for using vectors of data structures for extracting information from web pages of remotely monitored devices
US7300418B2 (en) * 2003-03-10 2007-11-27 Siemens Medical Solutions Health Services Corporation Healthcare system supporting multiple network connected fluid administration pumps
US7415484B1 (en) * 2003-05-09 2008-08-19 Vignette Corporation Method and system for modeling of system content for businesses
US7676486B1 (en) 2003-05-23 2010-03-09 Vignette Software Llc Method and system for migration of legacy data into a content management system
US9690577B1 (en) * 2004-02-09 2017-06-27 Akana, Inc. Legacy applications as web services
US20050192844A1 (en) * 2004-02-27 2005-09-01 Cardiac Pacemakers, Inc. Systems and methods for automatically collecting, formatting, and storing medical device data in a database
US20050192843A1 (en) * 2004-02-27 2005-09-01 Cardiac Pacemakers, Inc. Systems and methods for validating patient and medical devices information
US20050192649A1 (en) * 2004-02-27 2005-09-01 Cardiac Pacemakers, Inc. Systems and methods for providing variable medical information
US7574516B2 (en) 2005-02-01 2009-08-11 Microsoft Corporation Mechanisms for transferring raw data from one data structure to another representing the same item
US20080074276A1 (en) * 2006-09-25 2008-03-27 Usa As Represented By The Administator Of The National Aeronautics And Space Ad Data Acquisition System
US20080097952A1 (en) * 2006-10-05 2008-04-24 Integrated Informatics Inc. Extending emr - making patient data emrcentric
US8484626B2 (en) * 2007-09-28 2013-07-09 Verizon Patent And Licensing Inc. Generic XML screen scraping
US9501619B2 (en) 2008-11-13 2016-11-22 Cerner Innovation, Inc. Integrated medication and infusion monitoring system
US8869028B2 (en) * 2009-05-18 2014-10-21 Xerox Corporation Interface structures and associated method for automated mining of legacy systems using visual configuration tools
US20110071844A1 (en) 2009-09-22 2011-03-24 Cerner Innovation, Inc. Pharmacy infusion management
CN103477363B (en) 2011-04-12 2017-09-08 应用科学公司 For managing the system and method donated blood
CN102297687B (en) * 2011-05-13 2012-07-04 北京理工大学 Calibrating method for electronic compass
US20210327009A1 (en) * 2014-04-10 2021-10-21 School Innovations & Achievement, Inc. System and method for student attendance management
US11426498B2 (en) 2014-05-30 2022-08-30 Applied Science, Inc. Systems and methods for managing blood donations
US9507823B2 (en) * 2014-06-18 2016-11-29 Sap Se Automated metadata lookup for legacy systems
US20160004685A1 (en) * 2014-07-02 2016-01-07 IGATE Global Solutions Ltd. Insurance Data Archiving and Retrieval System
US9922037B2 (en) 2015-01-30 2018-03-20 Splunk Inc. Index time, delimiter based extractions and previewing for use in indexing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0449494A2 (en) * 1990-03-27 1991-10-02 International Business Machines Corporation Method and apparatus for controlling the transfer of data between heterogeneous data base systems
EP0625756A1 (en) * 1993-05-20 1994-11-23 Hughes Aircraft Company Federated information management architecture and system
EP0634718A2 (en) * 1993-07-13 1995-01-18 International Computers Limited Computer systems integration

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301105A (en) * 1991-04-08 1994-04-05 Desmond D. Cummings All care health management system
US5678044A (en) * 1995-06-02 1997-10-14 Electronic Data Systems Corporation System and method for improved rehosting of software systems
US5664109A (en) * 1995-06-07 1997-09-02 E-Systems, Inc. Method for extracting pre-defined data items from medical service records generated by health care providers
US5634053A (en) * 1995-08-29 1997-05-27 Hughes Aircraft Company Federated information management (FIM) system and method for providing data site filtering and translation for heterogeneous databases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0449494A2 (en) * 1990-03-27 1991-10-02 International Business Machines Corporation Method and apparatus for controlling the transfer of data between heterogeneous data base systems
EP0625756A1 (en) * 1993-05-20 1994-11-23 Hughes Aircraft Company Federated information management architecture and system
EP0634718A2 (en) * 1993-07-13 1995-01-18 International Computers Limited Computer systems integration

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077815A2 (en) * 2001-03-23 2002-10-03 Orchid Systems, Inc. System for and method of automatically migrating data among multiple legacy applications
WO2002077815A3 (en) * 2001-03-23 2004-01-08 Orchid Systems Inc System for and method of automatically migrating data among multiple legacy applications
EP1936516A1 (en) * 2006-12-22 2008-06-25 PRB S.r.l. Method to directly and automatically load data from documents and/or extract data to documents

Also Published As

Publication number Publication date
EP0883848A1 (en) 1998-12-16
EP0883848B1 (en) 2011-08-03
US5857194A (en) 1999-01-05
ATE519166T1 (en) 2011-08-15
JP2000505222A (en) 2000-04-25
CA2241514A1 (en) 1998-05-14
CA2241514C (en) 2004-10-12

Similar Documents

Publication Publication Date Title
US5857194A (en) Automatic transmission of legacy system data
US8122048B2 (en) Context sensitive term expansion with dynamic term expansion
US8086623B2 (en) Context-sensitive term expansion with multiple levels of expansion
US6931408B2 (en) Method of storing, maintaining and distributing computer intelligible electronic data
US6233578B1 (en) Method and system for information retrieval
US7788253B2 (en) Global anchor text processing
US5926808A (en) Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US6389412B1 (en) Method and system for constructing integrated metadata
US7013298B1 (en) Method and system for automated data storage and retrieval
US7051020B2 (en) Intelligent query re-execution
WO2005083597A1 (en) Intelligent search and retrieval system and method
JPH11120203A (en) Method for combining data base and device for retrieving document from data base
AU2004201344A1 (en) Computer searching with associations
US9477729B2 (en) Domain based keyword search
EP0364180A2 (en) Method and apparatus for indexing files on a computer system
US7133866B2 (en) Method and apparatus for matching customer symptoms with a database of content solutions
CN113407785B (en) Data processing method and system based on distributed storage system
CN111400323A (en) Data retrieval method, system, device and storage medium
US7912864B2 (en) Retrieving collected data mapped to a base dictionary
CN111190965A (en) Text data-based ad hoc relationship analysis system and method
JP2004192212A (en) Automatic storage system, program, and method for file
AU2007203368B2 (en) Universal data relationship inference engine
CN111552768B (en) Information search method, device and equipment based on natural language understanding and readable storage medium
JP2023057658A (en) Information processing device, method executed by computer to provide information, and program
JP2000137733A (en) Document file retrieval system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

ENP Entry into the national phase

Ref document number: 2241514

Country of ref document: CA

Ref country code: CA

Ref document number: 2241514

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997946264

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1997946264

Country of ref document: EP