A Data Processing Method and System
Description
Field of the invention.
The present invention relates to the field of data processing, and more particularly to data models and query processing.
Background and prior art
Data models such as entity-relationship-models are commonly used for database design. A data model is a conceptual description of data objects, i.e. entity types, their attributes, and the relationships between them. There are different types of data models, depending on the data structures to be defined such as relational data model.
ACM Transactions on Database Systems, Volume 1, No. 1, March 1976, Peter Pin-Shan Chen "The Entity-Relationship Model-Toward a Unified View of Data", pages 9-36 shows a data model. This model incorporates some of the important semantic information about the real world serving as a tool for database design.
From U.S. Pat. No. 4,479,196 to Ferrer et al. a diagrammatic technique to represent an entity relationship model is known for usage in a database system. This technique is directed to represent databases in a form which is readily processed and efficiently utilized by digital computers.
Large applications can be based on very complex data models. Searches are usually performed by using structured query language (SQL) expressions. For more information on the use of structured query language see "A Guide to SQL", Philip J. Pratt, Boyd & Fraser Pub Co, February 1995, ISBN: 0877095205. Search expressions may be quite complex and their results may be relied upon to produce appropriate reports within the database application.
The data of a data model can be stored on a single data processing system or it can be stored on various distributed internal and / or external data sources. A disadvantage of distributed data processing relying on various kinds of internal and / or external data sources is that each individual data source may require a dedicated interface. This makes prior art distributed data processing systems difficult and costly to implement. A further disadvantage is a lack of flexibility and high maintenance costs.
Summary of the invention
The present invention provides for a data processing method which relies on a data model having a set of entity types and a set of attributes for each entity type. The data model can be adapted to a customers needs by customization. The customization is stored by means of customizing data which indicates the internal and / or external data sources for the attributes and the data structures provided by these internal and / or external data sources.
When a query is processed it is first determined whether the query can be performed by using a single data source. If such a data source is not available a set of data sources is determined. The combination of the data sources
contains the information requested by the query. For processing the query it is split up into a number of sub-queries and the results of the sub-queries are combined to provide the query result.
It is a particular advantage of the present invention that this approach is generic and can be used by various applications. Further, it provides flexibility as far as the customization and the data sources are concerned.
Brief description of the drawings
In the following preferred embodiments of the invention will be described in greater detail by making reference to the drawings in which:
Figure 1 is a block diagram of a first embodiment of a data processing system of the invention,
Figure 2 is a block diagram of a more detailed embodiment of a data processing system of the present invention,
Figure 3 is illustrative on an embodiment of a method of the present invention.
Detailed description
Figure 1 shows data processing system 100 having a customized data model 102. Customized data model 102 is based on generic data model 104 which encompasses a number of entity types i.e. entity type 1, entity type 2, ... entity type i, ... Each one of the entity types has a set of attributes. For example entity i has attribute 1, attribute 2, ... attribute j, ...
Customized data model 102 has customizing data 106. Customizing data 106 has table 108 for identification of data sources for attributes. In the example considered here table 108 has one row for each entity type. Each data field of
the role indicates a data source for one of the attributes of the particular entity type. For example row i of table 108 contains entries for entity i , i.e. the data sources for the attributes of entity type i . As indicated in Figure 1 data source k is the data source for attribute j of entity type i.
Further customizing data 106 has data source descriptor 110 which describes the data structures being provided by the data sources of table 108.
Data processing system 100 has a number of application programs 112, 114, 116... which are used for various data processing purposes of the data contained in the customized data model 102. Program 118 serves as an interface between the application programs 112, 114, 116, ... and customized data model 102. Program 118 has query processing module 120 and result processing module 122.
Data processing system 100 has a number of internal data sources, i.e. data source 124 and is coupled to external data sources 126, 128, ... via computer network 130, such as the Internet.
In operation one of the application programs, such as application program 116, sends query 132 to program 118. Query 132 specifies at least one entity type and one attribute of the specified entity type of data model 104. For example query 132 is issued by application 116 in order to obtain attribute data for attribute j for entity type i .
Query 132 is processed by query processing module 120. Query processing module 120 determines the data sources and their data structures which are available for providing the requested data from customizing data 106. If there is a single database which can provide the requested data program 118 forwards query 132 to that database.
If such a database is not available query processing module 120 determines a set of data sources which in combination contain the information requested by query 132. In this case query processing module 120 generates a sub-query for each one of the data sources which in combination contain the requested data.
For example internal data source 124 and external data source 126 have been identified to contain in combination the information as requested by query 132. In this instance sub-query 134 and sub-query 136 are generated. Sub-query 134 is entered into internal data source 124 which in response provides tabular data 138. Likewise sub-query 136 is sent over computer network 130 to data source 126, such as by means of a HTTP request. In response data source 126 provides tabular data 140 which is transmitted over computer network 130 to data processing system 100 i.e. to program 118.
Result processing module 122 combines the information contained in tabular data 138 and 140 in order to provide the tabular data 142 containing the information requested by query 132.
It is a particular advantage of data processing system 100, that program 118 can be used as an interface for various application programs 112, 114, 116, ... These application programs do not need to have knowledge as to the location of data sources for attribute data and the data structures provided by these data sources. This information is encapsulated in customized data model 102 and is relied upon by program 118 for query processing. This generic approach enables an efficient administration of data processing system 100 as well as an efficient change management. For example data sources can be added or replaced by making corresponding entries into table 108 and into data source descriptor 100.
As an alternative to fixed assignments of data sources to attributes a rule base can be utilised. In the rule base a set of rules specifies the assignments of attributes of entity types to data sources.
Figure 2 shows a more detailed embodiment. Elements of the embodiment of Figure 2 which correspond to elements of the embodiment of Figure 1 are designated by like reference numerals having added 100.
In the embodiment of Figure 2 generic data model 204 has an entity type 'company' containing company related data. These data are attributive data 'address', Dun and Bradstreet Number 'DUNS', and 'tax number '.
Customizing data 206 has a table 208 for identifying of the data sources for the attributive data of entity type 'company' and other entity types which are contained in data model 204 but not shown in Figure 1 for convenience of explanation.
Data source k = 1 i.e. internal data source 224, is entered in table 208 as the data source for attribute 1 , i.e. attribute 'address' of entity type 'company'. Data source k = 2, i.e. external data source 226, is entered in table 208 as the data source for attribute 2, i.e. DUNS, for entity type 'company'.
Data source descriptor 210 has an entry for each one of the data sources and describes the data structure of each data source. In the example considered here data source k = 1 has a column for DUNS and a column for the address of the company with the DUNS. Data source k = 2 has one column for the company names and another column of the DUNSs being related to the company names. The corresponding database tables 244 of data source 224 (k = 1 ) and database table 246 of data source 226 (k = 2) are shown in Figure 2.
Application program 216 issues query 232. Query 232 is a request for address data of all companies. When query 232 is received by program 218 query processing module 220 checks customizing data 206 for the availability of a data source containing the addresses for all companies. As such a data structure is not available in accordance with data source descriptor 210 query processing module 220 needs to determine a set of data sources which in combination contain the information requested in query 232. In the example considered here this set of data sources consists of data sources k = 1 and k = 2 .
Next query processing module 220 generates sub-queries 234 and 236 for each one of the data sources of the identified set of data sources.
Sub-query 236 is directed towards obtaining the DUNS numbers for all companies. The corresponding tabular data 240 is received from data source 226 via computer network 230. The DUNS numbers contained in tabular data 240 are used by query processing module 220 to generate query 234 requesting the addresses of the companies having the DUNS numbers received my means of tabular data 240. In response data source 224 provides tabular data 238 containing the addresses of the companies having the DUNS numbers of query 234.
Alternatively query 234 is directed towards obtaining all addresses being associated to DUNS numbers in data source 224.
Result processing module 222 combines tabular data 240 and 238 in order to provide tabular data 242 relating company names to company addresses. This combination is performed by using the DUNS numbers which unequivocally identify the companies as a link between tabular data 240 and tabular data 238.
Figure 3 is a corresponding flow chart. In step 300 a query is entered. The query specifies at least one attribute for at least one entity type. In other words
the query is directed towards obtaining certain attribute data for a certain entity type as defined in the customized data model.
In step 302 it is determined whether there is a single data source which is able to provide the entity type / attribute information. If this is the case the query is forwarded to that data source and the query is performed in step 304.
If the contrary is the case a sub-set of the available set of data sources is determined in step 306 which in combination contain the information requested in the query. In step 308 sub-queries are generated for each data source of the sub-set. In step 310 the sub-queries are performed and the sub-query results are provided to the common interface program where the sub-query results are combined in step 312. This yields the query result which is returned to the requesting application program.
It is important to note that the same principles as described above for a query can also be used for writing of data. Further, the term 'attribute' as used herein can also encompass the entity type, i.e. the identifier of an entity type.
List of Reference Numerals
100 data processing system
102 custom led data model
104 generic data model
106 customizing data
108 table
110 data source descriptor
112 application program
114 application program
116 application program
118 program
120 query processing module
122 result processing module
124 data source
126 data source
128 data source
130 computer network
132 query
134 sub-query
136 sub-query
138 tabular data
140 tabular data
142 tabular data
200 data processing system
202 custom led data model
204 generic data model
206 customizing data
208 table
210 data source descriptor
212 application program
214 application program
216 application program
218 program
220 query processing module
222 result processing module
224 data source
226 data source
228 data source
230 computer network
232 query
234 sub-query
236 sub-query
238 tabular data
240 tabular data
242 tabular data
244 database table
246 database table