Sign in

Apparatus and method for performing data transformations in data warehousing

 M S Kiumarse Zamanian et al
A transformation description language (TDL) for specifying how data is to be manipulated in a data warehousing application. The TDL is comprised of a source for storing raw data, one or more transformation objects for processing the raw data according to predefined instructions, and a target for...
Inventors: M S Kiumarse Zamanian, Diaz Nesamoney
Assignee: Informatica Corporation
Primary Examiner: Sanjiv Shah

U.S. Classification
707/101; 707/100

International Classification
G06F 1730

View patent at USPTO

Citations

Patent NumberTitleIssue date
5692181System and method for generating reports from a computer databaseNov 25, 1997
5706495Encoded-vector indices for decision support and warehousingJan 6, 1998
5708828System for converting data from input data environment using first format to output data environment using second format by executing the associations between their fields Jan 13, 1998
5721903System and method for generating reports from a computer databaseFeb 24, 1998
5781911Integrated system and method of data warehousing and deliveryJul 14, 1998
5794228Database system with buffer manager providing per page native data compression and decompression Aug 11, 1998
5794229Database system with methodology for storing a database table by vertically partitioning all columns of the table Aug 11, 1998
5794246Method for incremental aggregation of dynamically increasing database data sets Aug 11, 1998
5826258Method and apparatus for structuring the querying and interpretation of semistructured information Oct 20, 1998
5832496System and method for performing intelligent analysis of a computer database Nov 3, 1998
5842213Method for modeling, storing, and transferring data in neutral formNov 24, 1998
5870746System and method for segmenting a database based upon data attributesFeb 9, 1999
5870747Generalized key indexesFeb 9, 1999
5873102Pluggable tablespaces on a transportable mediumFeb 16, 1999

Claims

What is claimed is:

1. A computer implemented method for transforming data in a data warehousing application, comprising the steps of:

specifying at least one source table containing data, wherein the source table includes at least a first source table output port;
storing metadata corresponding to a plurality of transformation objects, wherein the transformation objects have at least one transformation object input port for accepting data and at least one transformation object output port for outputting transformed data and particular transformation objects transform data according to the metadata corresponding to that particular transformation object;
specifying a target table for storing manipulated data, wherein the target table includes at least a first target table input port;
selecting at least one of the transformation objects;
mapping data from the first source table output port to a first transformation object input port of a first selected transformation object, wherein the mapping is defined by a human user;
transforming the data according to the metadata corresponding to the first selected transformation object;
mapping the transformed data from a first transformation object output port to the first target table input port.

2. The computer implemented method of claim 1, further comprising the steps of:

selecting a second transformation object having at least a second transformation object input port for accepting data and at least a second transformation object output port for outputting data, wherein the second selected transformation object manipulates input data according to metadata corresponding to the second selected transformation object;
mapping data from the first source table output port to the second transformation object input port;
mapping transformed data from the second transformation object output port to the first target table input port.

3. The computer implemented method of claim 1, further comprising the steps of:

specifying a second source table containing data, wherein the second source table includes at least a second source table output port;
mapping data from the second source table output port to the first transformation object input port.

4. The computer implemented method of claim 1, further comprising the steps of:

specifying a second target table for storing manipulated data, wherein the second target table includes at least a second target table input port;
mapping the manipulated data from the first transformation object output port to the second target table input port.

5. The computer implemented method of claim 1, wherein the first source table output port, the first transformation object input port, the first transformation object output port, and the first target table input port are comprised of a name, a data type, and a flow type.

6. The computer implemented method of claim 1, wherein the first output port, the first input port, the second output port, the second input port, the third output port, and the third input port are further comprised of a precision value and a scale value.

7. The computer implemented method of claim 1, wherein the first source table output port, the first transformation object input port, the first transformation object output port, and the first target table input port include attributes and specific behavior defining how the first set of data and the second set of data are to be manipulated.

8. The computer implemented method of claim 1, wherein the first transformation object is comprised of an aggregator which performs an aggregation function.

9. The computer implemented method of claim 1, wherein the first transformation object is comprised of an expression which associates an expression statement with either the first input port or the second output port.

10. The computer implemented method of claim 1, wherein the first transformation object is comprised of a filter which applies an expression to either the first input port or the second output port according to an evaluation.

11. The computer implemented method of claim 1, wherein the first transformation object is comprised of a lookup, wherein either the first input port or the second output port is translated into another port based on predefined fields that are specified in a relational table in the target database.

12. The computer implemented method of claim 1, wherein the first transformation object is comprised of a query for tailoring input data for mapping from the source table using SQL expressions.

13. The computer implemented method of claim 1, wherein the first transformation object is comprised of a sequence for creating unique keys for records.

14. The computer implemented method of claim 1, wherein the first transformation object is comprised of a stored procedure for executing a parametized function.

15. The computer implemented method of claim 1, wherein the first transformation object is comprised of an update for specifying how rows are used to update the target table.

16. The computer implemented method of claim 1, wherein the behavior corresponding to the first transformation object is different from the behavior corresponding to the second transformation object.

17. The computer implemented method of claim 1 further comprising the step of defining an application specific behavior for the first transformation object and for the second transformation object.

18. A computer implemented method for transforming data in a data mart application, comprising the steps of:

identifying a set of source tables containing data, wherein each source table has at least one source output port;
storing a set of transformation objects, wherein each transformation ofject has at least one transformation object input port and at least one transformation object output port and each transformation object has corresponding metadata which defines how data is to be transformed for each of the transformation objects;
identifying a set of target tables for storing transformed data, wherein the target tables include at least one target table input port;
selecting one or more source tables, transformation objects, and target tables for constructing a mapping;
mapping data from one or more selected source table ouput ports to one or more of the transformation object input ports;
mapping transformed data from one or more selected transformation object output ports to one or more of a different transformation object input port;
mapping transformed data from one or more selected transformation object output ports to one or more target table input ports.

19. The computer implemented method of claim 18, wherein the ports are comprised of a name, a data file, and a flow type.

20. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of an aggregator which performs an aggregation function.

21. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of an expression which associates an expression statement with either the first input port or the second output port.

22. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of a filter which applies an expression to either the first input port or the second output port according to an evaluation.

23. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of a lookup, wherein either the first input port or the second output port is translated into another port based on redefined fields that are specified in a relational table in the target.

24. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of a query for tailoring input data for mapping from the source table using SQL expressions.

25. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of a sequence for creating unique keys for records.

26. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of a stored procedure for executing a parametized function.

27. The computer implemented method of claim 18, wherein one of the transformation objects is comprised of an update for specifying how rows are used to update the target table.

28. The computer implemented method of claim 18, wherein the behavior corresponding to a first transformation object is different from the behavior corresponding to a second transformation object.

29. The computer implemented method of claim 18 further comprising the step of defining an application specific behavior for a first transformation object in a first application and a different application specific behavior for the first transformation object in a second application.

30. A computer-readable medium having stored thereon instructions for causing a computer to modify data comprising:

a source for storing untransformed data, wherein the source includes an output port for outputting the untransformed data;
a plurality of transformation objects for processing the untransformed data according to predefined instructions, wherein the plurality of transformation objects include at least one input port for accepting data and at least one output port for outputting processed data and wherein each of the transformation objects have a predefined set of rules for modifying data;
a target for storing modified data, wherein the target includes at least one input port for accepting modified data;
a mapping for directing a flow of data between ports corresponding to the source, one or more transformation objects which were selected from the plurality of transformation objects, and the target.

31. The computer-readable medium of claim 30, wherein a source output port is capable of being mapped to an input port of a first transformation object and an output port of the first transformation object is capable of being mapped to an input port of a first target.

32. The computer-readable medium of claim 31, wherein the source output port is capable of being mapped to an input port of a second transformation object and an output port of the second transformation object is capable of being mapped to the input port of the first target.

33. The computer-readable medium of claim 31, wherein a second source output port of a second source is capable of being mapped to the input port of the first transformation object.

34. The computer-readable medium of claim 31, wherein the output port of the first transformation object is capable of being mapped to an input port of a second target.

35. The computer-readable medium of claim 31, wherein the first transformation object is capable of being mapped to accept inputs from a plurality of source tables or a plurality of previous transformation objects.

36. The computer-readable medium of claim 30, wherein the ports are comprised of a name, a data type, and a flow type.

37. The computer computer-readable medium of claim 36, wherein the ports are further comprised of a precision value and a scale value.

38. The computer-readable medium of claim 30, wherein the transformation objects include attributes and specific behavior defining how the data is to be manipulated.

39. The computer-readable medium of claim 30, wherein the plurality of transformation objects are comprised of one or more of the following: an aggregator transformation, an expression transformation, a filter transformation, a lookup transformation, a query transformation, a sequence transformation, a stored procedure transformation, and an update strategy transformation.

40. The computer-readable medium of claim 30, wherein the aggregator transformation performs aggregation functions over one or more ports associated with the aggregator transformation and contains an attribute specifying an associated cache directory location.

41. The computer-readable medium of claim 30, wherein the filter transformation applies an expression to all associated ports if a result of an evaluation is true.

42. The computer-readable medium of claim 30, wherein the lookup transformation translates or a combination of associated ports into other ports based on predefined field.

43. The computer-readable medium of claim 30, wherein the query transformation modifies input data for a mapping from one or more sources using SQL expressions.

44. The computer-readable medium of claim 30, wherein the sequence transformation creates unique keys for records as the records are processed in a mapping.

45. The computer-readable medium of claim 44, wherein each instance of the sequence transformation is created with an initial value which is used at the start of an execution, an increment value to compute values of subsequent indexes, and an end value.

46. The computer-readable medium of claim 45, wherein the sequence transformation further includes two redefined output ports that contain the current value and a next value of a sequence index.

47. The computer-readable medium of claim 30, wherein the stored procedure transformation executes a parameterized function either in a pipeline mode or a stand-alone mode within the expression, lookup, or user-defined transformations.

48. The computer-readable medium of claim 30, wherein the update strategy transformation specifies how each individual row, after the row is processed by a mapping, is used to update the target.

49. The computer-readable medium of claim 48, wherein the update includes insert, delete, and override functions.

50. The computer-readable medium of claim 30, wherein the mapping is comprised of a combination of source, transformation, and target objects that are linked corresponding to a directed, acyclic graph.

51. The computer-readable medium of claim 50, wherein the mapping is further comprised of connectivity data for ports of connected objects and an order that the objects are connected.

Drawings