Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20080155500 A1
Publication typeApplication
Application numberUS 11/615,097
Publication dateJun 26, 2008
Filing dateDec 22, 2006
Priority dateDec 22, 2006
Also published asUS8286146, US20080229279
Publication number11615097, 615097, US 2008/0155500 A1, US 2008/155500 A1, US 20080155500 A1, US 20080155500A1, US 2008155500 A1, US 2008155500A1, US-A1-20080155500, US-A1-2008155500, US2008/0155500A1, US2008/155500A1, US20080155500 A1, US20080155500A1, US2008155500 A1, US2008155500A1
InventorsMichael Richmond
Original AssigneeInternational Business Machines Corporation
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Method for automatic generation of schema mapping application code
US 20080155500 A1
Abstract
A method for automatic generation of schema mapping application code. The method includes loading a code generation tool with source and target schemas. The method further includes defining the mapping specification between the source and the target. The method proceeds by triggering the code generation tool. Afterwards, the method proceeds by compiling the generated code. Furthermore, the method includes executing the generated code to transform input data files.
Images(2)
Previous page
Next page
Claims(2)
1-12. (canceled)
13. A method for automatic generation of schema mapping application code, including:
loading a code generation tool with source and target schemas;
defining the mapping specification between the source and the target;
triggering the code generation tool;
compiling the generated code; and
executing the generated code to transform input data files;
wherein the code generation tool is triggered by selecting a code generation menu item;
wherein when the code generation menu item is selected a code generation options dialog is displayed for allowing the user to control aspects of the code generation process;
wherein the user is allowed to specify the file system directory where generated files will be saved;
wherein the user is allowed to turn off generation of particular application implementation files;
wherein the user is allowed to customize the generated application by at least one of, (a) hand, and by (b) incorporating the generated code into the existing application of the user;
wherein the code generation tool is configured to generate at least one of, (i) a stand-alone application, (ii) a web service implementation, and (iii) a software component, for performing the specified map transformation from input data documents to produce the output data documents;
wherein the application includes a plurality of static library classes and a plurality of dynamic classes, the implementation of the static library classes being fixed and not dependent upon either schema nor map specification for any particular generated application, the implementation and quantity of the dynamic classes being dependent upon the schemas and the particular map specification that the code generation tool will generate at least one of, (i) a custom application to implement, (ii) a web service implementation, and (iii) a software component;
wherein the dynamic classes are separated into two groups, (a) the first group termed the singleton dynamic classes, the first group being dynamic classes for which the code generation tool is invoked once to produce a single version of these classes and, (b) the second group termed the map-specific dynamic classes, the second group being the dynamic classes for which the code generation tool is invoked multiple times to produce various versions of these classes;
wherein the code generation tool is configured to read in the source and target schemas along with the mapping specification and construct data structures in memory to represent the schemas and the mapping specification;
wherein the code generation process is coordinated by a generation coordinator object (GCO), the GCO handles the generation of the application code in two phases, (i) extraction of data to produce generation arguments, and (ii) invocation of individual code generators;
wherein one code generator exists for each type of class that is generated by the code generation tool.
Description
    TRADEMARKS
  • [0001]
    IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
  • BACKGROUND OF THE INVENTION
  • [0002]
    1. Field of Invention
  • [0003]
    This invention relates in general to information systems, and more particularly to data exchange and data storage among information systems.
  • [0004]
    2. Description of Background
  • [0005]
    Modern information systems rely heavily on both data exchange and data storage. Data exchange enables interaction between different components in an information system. Additionally, data exchange makes it possible for an information system to interact with other information systems. Data exchange between information systems is used to achieve interoperability and integration of disparate information systems, which may exist in disjoint administrative and organizational domains and is a key feature of current enterprise systems.
  • [0006]
    Data storage is used extensively to handle the various data used by information systems. Information systems are increasingly attempting to share common data storage pools across organizations. In some cases data stores are being shared between organizations to support joint enterprise systems. Data storage is commonly used to integrate data from disparate systems to present a unified view of data that may originate from varying sources.
  • [0007]
    In order for data exchange and data storage to function all parties involved must agree on a common format and structure before direct data exchange or sharing via a data store can be accomplished. This format and structure information is known as the data schema. With both data exchange technology and data storage technology all data to be exchanged or stored must conform to a well-defined data schema in order for the information system to interpret the data.
  • [0008]
    In practice data schemas are defined by the target data store, the integrated data view or as a requirement on the data exchange process. The key requirement in all cases is that the data to be stored, integrated or exchanged and must conform to a shared data schema. That is, interaction between information systems relies upon both data producers and data consumers to agree upon the data schema to be used.
  • [0009]
    When these data interactions cross-organizational and administrative boundaries problems arise. These problems are based on the difficulty of managing a common definition and ensuring data compliance with the agreed upon data schema across the organizational and administrative boundaries. It is common for each party involved in a data interaction to have their own internal data schema. This internal schema is often influenced by factors that are completely unrelated to, and likely to take precedence over any data interaction requirements. Some factors that commonly influence internal schema designs include: the organization's existing internal data stores, internal application structures and behavior, business processes and needs, political and administrative structure of the organization, and software development constraints.
  • [0010]
    It is often possible to align an organization's internal data schemas with the schemas necessary to allow data interaction with other organizations. Organizations that need to perform data interactions with other parties generally invest significant development and maintenance effort to ensure that information systems conform to the agreed upon common data schemas. When these schemas evolve further effort to update, test and deploy schema-dependent portions of the information systems is necessary. As organizations increase the types of data interactions they are party to the required effort to maintain translation from the internal data schemas to the common data schema increases in direct proportion to the breath of the interactions.
  • [0011]
    To address these issues the concept known as schema mapping has been investigated within the following disclosure. For example, given two schemas, A and B, it is possible to define a mapping specification, which captures the correspondences between elements in schema A and elements in schema B. With this mapping information and an input document which conforms to schema A, it is possible to automatically produce an output document that corresponds to the input document data and conforms to schema B. Throughout this application, this process is referred to as executing the mapping. One skilled in the art should know that a mapping may involve a single source and a single schema, or alternatively a mapping may involve multiple sources and multiple schemas.
  • [0012]
    The disclosure pertains to a software tool, which automatically generates the source code for a custom application that executes a given mapping between schemas. That is, given a set of source and a set of target schemas together with a mapping specification that maps from the source schemas to the target schemas, the disclosed tool will generate the source code. This mapping application is able to read in input data documents that conform to the source schemas and produce output data documents that comprise the input document data in a form that corresponds to the target schemas based on the mapping specification. The disclosed invention may also be utilized to generate software artifacts other than applications, for example and not meant to be limiting, the disclosed invention may be utilized to generate software artifacts for a web service, or a software component, etc.
  • [0013]
    XML to XML mappings can be expressed as transforms over XML documents using query/script based techniques. For example, the mapping can be expressed as an XQuery or XSLT script that performs the specified mapping. Earlier work with the disclosed mapping tool automatically produced XQuery and XSLT transformation scripts based on an XML-to-XML map specification. Passing the transformation script along with an input data document into a script execution engine performs execution of these scripts over an XML data document. That is, passing the XQuery script into an XQuery execution engine along with the data document; or passing the XSLT scripts into an XSLT execution engine along with the data document.
  • [0014]
    A generic mapping engine could be used to address the problem described above. The generic mapping engine takes as input the source and target schemas, the map specification, and the data document to be transformed. Effectively a generic engine interprets the schemas and map specification at runtime to transform the input data document. Although practical, this kind of generic approach has two disadvantages when compared to the disclosed invention:
      • 1. Increased complexity of the engine implementation, and
      • 2. Longer execution times as a result of the indirection required to interpret the map specification at runtime.
  • [0017]
    The preliminary testing of the code generation approach versus a generic mapping engine show that the generated mapping application runs 45%-65% faster than a generic mapping engine over the same map specification and input document.
  • [0018]
    The generated applications are implemented in a person-friendly coding style making it easy for developers to understand, review and extend the generated code.
  • SUMMARY OF THE INVENTION
  • [0019]
    The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for automatic generation of schema mapping application code. The method includes loading a code generation tool with source and target schemas. The method proceeds by defining the mapping specification between the source and the target schemas. Then, the method progresses by triggering the code generation tool. The method further includes compiling the generated code, and executing the generated code to transform input data files.
  • [0020]
    Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawing.
  • TECHNICAL EFFECTS
  • [0021]
    As a result of the summarized invention, technically we have achieved a solution for a method for automatic generation of schema mapping application code.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • [0022]
    The subject regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawing in which:
  • [0023]
    FIG. 1 illustrates one example of a method for automatic generation of schema mapping application code.
  • [0024]
    The detailed description explains an exemplary embodiment of the invention, together with advantages and features, by way of example with reference to the drawing.
  • DETAILED DESCRIPTION OF THE INVENTION
  • [0025]
    This application discloses a software tool that automatically generates the program code for a schema mapping application. The generated code includes a complete stand-alone application that can read in data files conforming to a specific source data schema and produce as output corresponding data files in which the input data has been transformed to conform to specific target data schemas. The user of this generation tool defines both the source and target schemas, and specifies the transformation from source to target to be performed. This transformation is known as the mapping specification. Based on this input, the tool invokes a series of code generators that produce source code that implements the desired schema mapping. It should be known by one skilled in the art that the disclosed invention might be utilized to produce source code in any programming language desired by the user. At generation time the user can select from a number of options that control various aspects of the code generation process. It should also be well known by one skilled in the art that a mapping may involve a single source and a single schema, or alternatively a mapping may involve multiple sources and multiple schemas.
  • [0026]
    In effect, the code generation tool performs a compilation of the mapping specification into a code implementation that performs the mapping defined by the mapping specification. In comparison, a generic mapping engine interprets the mapping specification at runtime to transform input data. Although this application illustrates the disclosed invention being utilized to generate software artifacts for an application, the disclosed invention may also be utilized to generate software artifacts for a web service, or a software component, etc.
  • [0027]
    Referring to FIG. 1, a method for automatic generation of schema mapping application code is shown. At step 100, a code generation tool is loaded with source and target schemas. Then, at step 110, the mapping specification is defined between the source and the target schemas.
  • [0028]
    Subsequently, at step 120, the code generation tool is triggered. Then, at step 130, the generated code is compiled. Afterwards, at step 140, the generated code is executed to transform input data files. The processing of FIG. 1 is described in further detail below.
  • [0029]
    The generated application code is designed to be easy for non-expert programmers to read and understand. As such, this approach makes it relatively easy for users to extend the generated application to perform data validation, specialized transformation functions and/or integrate the generated code into existing user applications.
  • [0030]
    The code generation tool generates a stand-alone application that performs the specified map transformation from input data documents to produce the output data documents. The structure of this application consists of a number of static library classes and a number of dynamic classes. The implementation of the static library classes is fixed and does not depend on the schemas or map specification for any particular generated application. As such, these static classes can be compiled and shipped as part of the tooling distribution although the code generation tool also supports the generation of these classes. The implementation and number of the dynamic classes is dependent on the schemas and particular map specification that the code generation tool will generate a custom application to implement. The static library classes are primarily interfaces and abstract classes that will be used as super-classes by the dynamic class in the custom application.
  • [0031]
    The dynamic classes can be divided into two groups. The first group consists of the dynamic classes for which the code generation tool is invoked once to produce a single version of these classes in the resulting generated application. Classes in this group are referred to as the singleton dynamic classes. The second group consists of the dynamic classes for which the code generation tool is invoked multiple times to produce several versions of these classes in the resulting generated application. Classes in this group are referred to as the map-specific dynamic classes.
  • [0032]
    The code generation tool reads in the source and target schemas along with the mapping specification and constructs data structures in memory to represent the schemas and the mapping specification. These in memory structures are referred to as the map specification data structures.
  • [0033]
    The code generation process is coordinated by a generation coordinator object (GCO). The GCO object handles the generation of the application code in two phases:
      • 1. Extraction of data to produce generation arguments, and
      • 2. Invocation of individual code generators.
  • [0036]
    First, the GCO traverses the map specification data structures and extracts information from the map specification structures to build up a set of generation arguments (GA). These generation arguments are tied to the particular code generators involved in producing the desired product. The resulting set of generation arguments holds only the data values that are required during code generation. These values are stored in a form that is convenient for authors of the code generators. By extracting only the information that is relevant to the code generation the authors of the code generators can focus on the task of producing the relevant output code rather than on where the information that controls the output code is located. At the end of this stage there is no further need for the GCO to access the map specification data structures and generation can complete based solely on the data represented in the GA objects. The generation of the GA objects is a structuring mechanism to reduce the complexity of the code generators. It is possible to eliminate this extraction phase if the code generation tool author is willing to deal with the resulting increase in complexity of the code generator implementation.
  • [0037]
    A code generator exists for each type of class that is generated by the application generation tool. That is, one code generator exists for each:
      • singleton dynamic class,
      • map-specific dynamic class forms (i.e. One generator for the target.java map-specific dynamic class. This generator is invoked multiple times with different arguments to produce the set of class used to represent the target schema).
      • static library class, and
      • the ant build script
        The GCO invokes each of the generators in turn based on the extracted GA objects and user input. The user input is collected by an on-screen dialog before code generation commences. This dialog allows the user to specify the destination directory for the generated files and provides checkboxes to enable or disable the invocation of various categories of code generators. For example, one checkbox allows the user to specify that the static library classes should be generated, another checkbox controls whether the map-specific dynamic classes used to represent the target schema should be generated.
        Invocation of a generator involves performing the following steps:
      • 1. Instantiation of the appropriate code generator class.
      • 2. Invocation of the appropriate generate ( . . . ) method passing the appropriate GA objects as arguments.
      • 3. Parsing the string result from the generate ( . . . ) call to extract the package name and class name from the string containing the source code for the generated class.
      • 4. Writing the string result from the generate ( . . . ) call to a file following source file naming conventions based on the extracted package and class names.
  • [0046]
    The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
  • [0047]
    As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
  • [0048]
    Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
  • [0049]
    The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
  • [0050]
    While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.
Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US6985912 *Jul 22, 2003Jan 10, 2006Thought, Inc.Dynamic object-driven database manipulation and mapping system having a simple global interface and an optional multiple user need only caching system with disable and notify features
US6999956 *Nov 15, 2001Feb 14, 2006Ward MullinsDynamic object-driven database manipulation and mapping system
US7698684 *Sep 28, 2005Apr 13, 2010Sap AgMethod and system for generating schema to Java mapping descriptors and direct mapping of XML schema and Java interfaces
US20040006549 *Mar 24, 2003Jan 8, 2004Ward MullinsMicro edition dynamic object-driven database manipulation and mapping system
US20050149484 *Jan 4, 2005Jul 7, 2005Joshua FoxRun-time architecture for enterprise integration with transformation generation
US20060048107 *Sep 2, 2004Mar 2, 2006Microsoft CorporationEnhanced compiled representation of transformation formats
US20060184568 *Feb 15, 2005Aug 17, 2006International Business Machines CorporationHaving a single set of object relational mappings across different instances of the same schemas
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7941449 *Nov 5, 2007May 10, 2011Verizon Patent And Licensing Inc.Data structure versioning for data management systems and methods
US7958133Dec 20, 2005Jun 7, 2011At&T Intellectual Property Ii, L.P.Application conversion of source data
US8250535 *Sep 9, 2008Aug 21, 2012Lockheed Martin CorporationMaintaining vitality of data in safety-critical systems
US8316058May 2, 2011Nov 20, 2012Verizon Business Network Services Inc.Data structure versioning for data management systems and methods
US8626825Mar 4, 2010Jan 7, 2014Oracle International CorporationFacilitating a service application to operate with different service frameworks in application servers
US8626951 *Apr 23, 2007Jan 7, 20144Dk Technologies, Inc.Interoperability of network applications in a communications environment
US9396175 *Jul 10, 2012Jul 19, 2016International Business Machines CorporationSupporting generation of transformation rule
US9400771 *Jul 18, 2012Jul 26, 2016International Business Machines CorporationSupporting generation of transformation rule
US20060248521 *Dec 20, 2005Nov 2, 2006David KornApplication conversion of source data
US20070124302 *Jan 29, 2007May 31, 2007David KornMapping a Source File From a Source System To a Target System
US20080263187 *Apr 23, 2007Oct 23, 20084Dk Technologies, Inc.Interoperability of Network Applications in a Communications Environment
US20090119348 *Nov 5, 2007May 7, 2009Verizon Business Network Services Inc.Data structure versioning for data management systems and methods
US20100064294 *Mar 11, 2010Lockheed Martin CorporationMaintaining Vitality of Data In Safety-Critical Systems
US20110060790 *Mar 10, 2011Oracle International CorporationFacilitating a service application to operate with different service frameworks in application servers
US20110208781 *Aug 25, 2011Verizon Business Network Services Inc.Data structure versioning for data management systems and methods
US20130179772 *Jul 10, 2012Jul 11, 2013International Business Machines CorporationSupporting generation of transformation rule
US20130185627 *Jul 18, 2012Jul 18, 2013International Business Machines CorporationSupporting generation of transformation rule
Classifications
U.S. Classification717/109, 717/106
International ClassificationG06F9/44
Cooperative ClassificationG06F8/24, G06F9/44505, G06F8/00
European ClassificationG06F9/445C, G06F8/00, G06F8/24
Legal Events
DateCodeEventDescription
Mar 12, 2007ASAssignment
Owner name: INTERNATIONAL BUSINESS MACHINES COPORATION, NEW YO
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RICHMOND, MICHAEL;REEL/FRAME:018993/0151
Effective date: 20070108
Oct 3, 2008ASAssignment
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y
Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE THE NAME OF THE RECEIVING PARTY FROM INTERNATIONAL BUSINESS MACHINES COPORATION TO INTERNATIONAL BUSINESS MACHINES CORPORATION PREVIOUSLY RECORDED ON REEL 018993 FRAME 0151;ASSIGNOR:RICHMOND, MICHAEL;REEL/FRAME:021628/0360
Effective date: 20070108