US 20050278139 A1 Abstract Methods and apparatus, including computer program products, for identifying matches between disparate schemas calculates a degree of similarity between elements of two schemas using each of multiple matching processes. The calculated degrees of similarity are combined using a first weighting vector to produce first combined degrees of similarity. The first weighting vector includes multiple weighting coefficients and each weighting coefficient corresponds to one of the matching processes. The weighting coefficients are tuned using information relating to a predicted degree of matching accuracy associated with the first weighting vector.
Claims(25) 1. A computer program product, tangibly embodied in an information carrier, for identifying matches between disparate schemas, the computer program product being operable to cause data processing apparatus to:
calculate a degree of similarity between elements of two schemas using each of a plurality of matching processes; combine the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and tune the weighting coefficients using information relating to a predicted degree of matching accuracy associated with the first weighting vector. 2. The computer program product of the calculated degrees of similarity are combined using each of a plurality of weighting vectors, with each weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and the weighting coefficients are tuned by determining, using the combined degrees of similarity for each of the plurality of weighting vectors, a predicted degree of matching accuracy associated with each of the plurality of weighting vectors and selecting a second weighting vector to determine possible matches between the elements of the two schemas, with the second weighting vector selected based on a comparison of information relating to the respective predicted degrees of matching accuracy associated with the first weighting vector and the second weighting vector. 3. The computer program product of 4. The computer program product of identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity; receiving user feedback relating to a subset of the possible matches and using the user feedback to produce the information relating to a predicted degree of matching accuracy associated with the first weighting vector; and modifying the first weighting vector based on the information relating to the predicted degree of matching accuracy to produce a second weighting vector. 5. The computer program product of combine the calculated degrees of similarity using the second weighting vector to produce second combined degrees of similarity; and identify a modified set of possible matches between the elements of the two schemas based on the second combined degrees of similarity. 6. The computer program product of 7. The computer program product of 8. A method for identifying matches between disparate schemas, the method comprising:
calculating a degree of similarity between elements of two schemas using each of a plurality of matching processes; combining the calculated degrees of similarity using each of a plurality of weighting vectors, with each weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; determining, using the combined degrees of similarity, a level of ambiguity for each weighting vector; and selecting a particular weighting vector to determine possible matches between the elements of the two schemas, wherein the particular weighting vector is selected based on the level of ambiguity for each weighting vector. 9. The method of 10. The method of for each weighting vector, calculating a factor using at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches; and wherein selecting the particular weighting vector is based on a value of the factor for the particular weighting vector relative to values of the factors for others of the plurality of weighting vectors. 11. The method of 12. The method of 13. The method of 14. The method of selecting a candidate weighting vector; and tuning the candidate weighting vector by modifying the weighting coefficients for the candidate weighting vector to produce the particular weighting vector, wherein the factor for the particular weighting vector indicates a favorable weighting relative to the factor for the candidate weighting vector. 15. The method of identifying, as representing an unambiguous match for a particular element, a maximum combined degree of similarity for the particular element; or identifying, as representing an unambiguous match for a particular element, a combined degree of similarity for the particular element that exceeds a predetermined threshold and that exceeds all other combined degrees of similarity for the particular element by at least a predetermined amount. 16. The method of identifying, as representing an ambiguous match for a particular element, a combined degree of similarity for the particular element that exceeds a first threshold and is less than a second threshold; or identifying, as representing an ambiguous match for a particular element, a combined degree of similarity for the particular element that exceeds a predetermined threshold and that is within a predetermined range of other combined degrees of similarity for the particular element. 17. The method of 18. The method of 19. The method of determining a set of possible matches between the elements of the two schemas using the combined degrees of similarity for the particular weighting vector; receiving user feedback relating to a subset of the possible matches; tuning the particular weighting vector based on the user feedback; combining the calculated degrees of similarity using the tuned weighting vector; and determining a new set of possible matches between the elements of the two schemas using the combined degrees of similarity for the tuned weighting vector. 20. A method for identifying matches between disparate schemas, the method comprising:
calculating a degree of similarity between elements of two schemas using each of a plurality of matching processes; combining the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity; receiving user feedback relating to a subset of the possible matches; modifying the first weighting vector based on the user feedback to produce a second weighting vector; combining the calculated degrees of similarity using the second weighting vector to produce second combined degrees of similarity; and identifying a modified set of possible matches between the elements of the two schemas based on the second combined degrees of similarity. 21. The method of 22. A system for identifying matches between disparate schemas, the system comprising:
means for calculating a degree of similarity between elements of two schemas using each of a plurality of matching processes; means for combining the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and means for tuning the weighting coefficients using information relating to a predicted degree of matching accuracy associated with the first weighting vector. 23. The system of means for determining, using the combined degrees of similarity for each of the plurality of weighting vectors, at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches; and means for selecting a second weighting vector to determine possible matches between the elements of the two schemas, wherein the second weighting vector is selected based on a comparison of information relating to a predicted degree of accuracy associated with each of the first weighting vector and the second weighting vector, with the information relating to the predicted degree of accuracy determined using at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches. 24. The system of means for identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity; means for receiving user feedback relating to a subset of the possible matches and using the user feedback to produce the information relating to a predicted degree of matching accuracy associated with the first weighting vector; and means for modifying the first weighting vector based on the information relating to the predicted degree of matching accuracy to produce a second weighting vector, the system further comprising: means for combining the calculated degrees of similarity using the second weighting vector to produce second combined degrees of similarity; and means for identifying a modified set of possible matches between the elements of the two schemas based on the second combined degrees of similarity. 25. The system of Description The present invention relates to data processing by digital computer, and more particularly to mapping elements between disparate schemas. Integration of applications in an enterprise can lead to more efficient operations. Enterprise application integration can require significant effort when migrating from disparate legacy applications to a more integrated framework. Enterprise application integration can be performed using a message exchange procedure, in which messages are exchanged between different data sets. Application data is typically organized according to the type of application or applications with which the data is designed to operate. As a result, the organization or structure of the data can be highly specialized. The messages used for enterprise application integration are generally structured sets of data in a well-defined syntax. The structure of the data can be referred to as its schema. Countless different schemas and/or schema domains (e.g., SQL DDL, XML-based dialects (such as xCBL), OWL, RDF, ODMG, SAP-IDoc, EDI, UBL, etc.) exist. Many different integration scenarios (e.g., business process integration, enterprise application integration, and master data management) require schema matching, in which a mapping between the elements of two schemas is produced. Schema matching can also be important in data translation applications (e.g., where data from a first database is migrated into a second database for use with a different application). Existing techniques for schema matching primarily rely upon manual mapping of elements from one schema to another. Some approaches exist, however, for partially automating the schema matching process using simple algorithms for field name or database structure matching or using machine learning technologies. Some approaches combine the criteria of different matching algorithms to produce a more complex matching technique (i.e., hybrid and composite matchers). Simple, hybrid, and composite matchers, however, are inflexible and tend to produce good results for some types of schemas while producing poor results for other types of schemas. Techniques have also been proposed for building ontologies for different schema domains. By building an ontology, schemas can be classified by type, and different weights can be applied to different individual matchers based on the class or classes of the schemas to be matched. For example, schemas in a first classification may use a composite matcher that heavily weights the contribution of a field name matcher that is a component of the composite matcher, while schemas in a second classification may use a composite matcher that heavily weights the contribution of a structural matcher that is a component of the composite matcher. Such an approach may provide improved performance relative to conventional simple, hybrid, or composite matchers but only works for schema domains that have previously been associated with a particular class of schema domains. The present invention provides methods and apparatus, including computer program products, that implement techniques for mapping schemas by tuning the relative contributions of different component matchers. The relative contributions (i.e., the weights) of different matchers can be tuned by optimizing a measure of ambiguity, which may be an algorithm that is based on a number of ambiguous matches, a number of unambiguous matches, and/or a number of impossible matches. In addition or as an alternative, the relative contributions of different matchers can be tuned by monitoring user interaction (e.g., user approvals and rejections of proposed matches) and using the user feedback to fine-tune the weights of the different matchers. In one general aspect, the techniques feature calculating a degree of similarity between elements of two schemas using each of multiple matching processes and combining the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity. The first weighting vector includes multiple weighting coefficients and each weighting coefficient corresponds to one of the matching processes. The weighting coefficients are tuned using information relating to a predicted degree of matching accuracy associated with the first weighting vector. The invention can be implemented to include one or more of the following advantageous features. The calculated degrees of similarity are combined using each of multiple weighting vectors. Each weighting vector includes multiple weighting coefficients, and each weighting coefficient corresponds to one of the matching processes. The weighting coefficients are tuned by determining, using the combined degrees of similarity for each of the weighting vectors, a predicted degree of matching accuracy associated with each of the weighting vectors. A second weighting vector is selected to determine possible matches between the elements of the two schemas. The second weighting vector is selected based on a comparison of information relating to the respective predicted degrees of matching accuracy associated with the first weighting vector and the second weighting vector. Each predicted degree of matching accuracy is determined using a number of ambiguous matches, a number of unambiguous matches, and/or a number of impossible matches. The weighting coefficients are tuned by identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity and receiving user feedback relating to a subset of the possible matches and using the user feedback to produce the information relating to a predicted degree of matching accuracy associated with the first weighting vector. The first weighting vector is then modified based on the information relating to the predicted degree of matching accuracy to produce a second weighting vector. The calculated degrees of similarity are combined using the second weighting vector to produce second combined degrees of similarity, and a modified set of possible matches between the elements of the two schemas is identified based on the second combined degrees of similarity. The calculated degrees of similarity are combined by multiplying each calculated degree of similarity for each matching process by the corresponding weighting coefficient to obtain weighted degrees of similarity and summing the weighted degrees of similarity. A degree of similarity is calculated between multiple pairs of elements. Each pair of elements includes one element selected from a source schema and one element selected from a target schema. Multiple different weighting vectors can be used. A level of ambiguity is determined for each weighting vector, and a particular weighting vector to determine possible matches between the elements of the two schemas is selected based on the level of ambiguity for each weighting vector. A level of ambiguity can be determined by determining a number of ambiguous matches, a number of unambiguous matches, and/or a number of impossible matches. For each weighting vector, a factor is calculated, and the particular weighting vector selected is based on a value of the factor for the particular weighting vector relative to values of the factors for other weighting vectors. The particular weighting vector selected can be a weighting vector having a factor that tends to indicate a relatively high number of ambiguous matches or a relatively high number of unambiguous matches. Alternatively, the particular weighting vector selected can be a weighting vector having a factor that tends to indicate a relatively low number of ambiguous matches and a relatively low number of impossible matches. Unambiguous matches can be determined by identifying a maximum combined degree of similarity for the particular element, or identifying a combined degree of similarity for the particular element that exceeds a predetermined threshold and that exceeds all other combined degrees of similarity for the particular element by at least a predetermined amount. Ambiguous matches can be determined by identifying a combined degree of similarity for the particular element that exceeds a first threshold and is less than a second threshold or identifying a combined degree of similarity for the particular element that exceeds a predetermined threshold and that is within a predetermined range of other combined degrees of similarity for the particular element. Impossible matches can be identified by determining, for a particular element, that no combined degree of similarity for the particular element exceeds a predetermined minimum threshold. The matching processes can include schema-based criteria, content-based criteria, per-element criteria, structural criteria, linguistic criteria, and/or constraint-based criteria. User feedback relating to possible matches can be used to modify a first weighting vector to produce a second weighting vector. The calculated degrees of similarity can then be combined using the second weighting vector to produce second combined degrees of similarity, and a modified set of possible matches between the elements of the two schemas can be identified based on the second combined degrees of similarity. The first weighting vector can be selected based on a context associated with the two schemas and/or a similarity of one or more of the schema to schema for which the first weighting vector was previously used. The invention can be implemented to realize one or more of the following advantages. The invention can be used to provide enhanced matching performance, to improve the quality of matching, and/or, depending on the particular algorithms that are used, regulate the number and types of possible matches that are identified for manual review and approval. In addition to providing improved matching results for schemas that previously have been classified, the invention can also be used to provide enhanced matching results for unclassified schemas. In addition, the invention can be used to assist users with manual finishing touches because the system can provide some different mapping examples as suggestions to the user. In other words, the elements of disparate schemas may be mapped without detailed knowledge of the characteristics of the schemas. In this regard, the techniques provide generic data model matching (i.e., the techniques can perform matching independent of the data model). Furthermore, mapping can be performed automatically or at least semi-automatically. One implementation of the invention provides all of the above advantages. Details of one or more implementations of the invention are set forth in the accompanying drawings and in the description below. Further features, aspects, and advantages of the invention will become apparent from the description, the drawings, and the claims. Like reference numbers and designations in the various drawings indicate like elements. The matching techniques can use matchers that implement particular matching processes. Any number of different types of matching processes can be used. For example, the matching processes may be implemented in individual matchers that are schema-based, content-based, type-based, or semantic-based matchers. Schema-based matchers consider schema information, while content-based matchers consider instance data within a particular schema. Schema-based matchers can include per-element matchers, which can be linguistic (e.g., using element names or descriptions) or constraint-based (e.g., using types or keys). Schema-based matchers can also include structural matchers, which match combinations of elements or nodes and may be constraint based (e.g., graph matchers). Content-based matchers can include per-element matchers, which can be linguistic (e.g., using word frequencies or key terms) or constraint-based (e.g., using value patterns and ranges). Type-based matchers can include per-element matchers, which can perform matching based on the type of node (e.g., characteristics, facets, regular expressions), and semantic matchers can analyze the semantical context of the definition and name of each node. Matching processes may also be implemented in combined matchers, which may be hybrid (e.g., using multiple match criteria) or composite (e.g., using manually or automatically determined combinations of results from different match algorithms). One or more of these various different matching techniques can be used in step Each matching technique produces results that indicate a degree of similarity between an element in a first schema and an element in a second schema. For example, for every pair of elements between the two schemas, a matching technique may assign a value between zero and one, which indicates a probability estimate that the two elements match, with a value of zero indicating an absolute impossibility and a value of one indicating an absolute certainty of a match. The calculated degrees of similarity are then combined using one or more weighting vectors to provide composite match results (step It is possible to define the weighting vector for each matching procedure. The initial weighting vector or vectors that are used may be selected based on characteristics of the schema to be matched. When schema are to be matched, parameters relating to the schema and/or the matching process can be manually input into, or automatically generated by (e.g., by performing an automated analysis of the schema's structure, type, etc.), a system that performs the matching. These parameters can be used to influence which weighting vector or vectors are initially selected. The parameters may related to, e.g., the schema domain, a context of the schema and/or the matching process, etc. For example, a schema that is similar to a previously mapped schema (e.g., a schema that is a different version of a previously mapped dialect) is assigned a weighting vector that is the same as or otherwise corresponds to (e.g., a modified or tuned weighting vector, as described below) the weighting vector for the previously mapped schema. Parameters that relate to the context of the schema can also affect the weighting vectors. For example, if a specific schema comes from a specific industry (e.g. automotive), the weighting vectors can be adjusted according the requirements of the specific industry. Different industries may have different specific requirements for the matching process and thus the weighting vectors may be adjusted in accordance with these requirements. Context drivers can include, for example: a business process type, a business document type, an industry category, a product category, a geopolitical area, and/or a system type. Which weighting vectors are used for particular contexts can be manually preprogrammed or can be selected based on an automated or partially automated tuning process, through which weighting vectors used in a particular context are adjusted through a “learning” process and the adjusted weighting vectors are subsequently used for matching other schema with-the same context. To improve the accuracy of the composite match results, the weighting coefficients are tuned using information relating to a predicted degree of matching accuracy associated with the one or more weighting vectors (step In some implementations, the predicted degree of matching accuracy is a calculation of a level of ambiguity associated with a particular weighting vector. The combined degree of similarity for a particular pair of elements (i.e., an element from a source schema and a potentially matching element from a target schema) can be used to categorize the potential match as ambiguous, unambiguous, or impossible. Thereafter, the level of ambiguity can be calculated based on a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches. An ambiguous match generally means that a statistical possibility exists that the pair of elements actually match. In some implementations, multiple ambiguous matches can be associated with a particular element. For example, a particular source element might have several statistically possible matches in a target schema. Each of the statistically possible matches can be an ambiguous match. For purposes of this description, an unambiguous match generally means that it is at least statistically probable that the pair of elements actually matches, and an impossible match generally means that it is statistically improbable or impossible that the pair of elements actually match. For example, an unambiguous match can be defined by combined degrees of similarity for which the maximum probability of a match, among all possible matches, exceeds 70%, while an impossible match can be defined by combined degrees of similarity for which the maximum probability of a match, among all possible matches, is less than 50%. Classifying a match as unambiguous does not necessarily mean that two identified elements actually do match, just that the particular matching process (or combination of processes) used to predict matches generates matching results that suggest a statistical probability of a match. Similarly, classifying a match as impossible does not necessarily mean that a match does not exist, just that the particular matching process (or combination of processes) used to predict matches is unable to predict a match with a sufficient degree of confidence. Matches between two schemas can be categorized based on combined degrees of similarity in both directions or in only one direction (i.e., from a source to a target schema). For example, if matching is performed in both directions, a particular pair of elements may be identified as unambiguous only if the pair of elements meet the criteria for an unambiguous match in both directions (e.g., target element t and source element s represent an unambiguous match only if the corresponding probability of a match: (a) exceeds 70%, (b) is the maximum probability associated with target element t for all possible source elements, and (c) is the maximum probability associated with source element s for all possible target elements). If matching is performed in a single direction, on the other hand, the particular pair of elements may be identified as unambiguous if the pair of elements meet the criteria for an unambiguous match in only one direction (e.g., target element t and source element s represent an unambiguous match if the corresponding probability of a match exceeds 70% and is the maximum probability associated with target element t for all possible source elements, but is not necessarily the maximum probability associated with source element s for all possible target elements). Once a categorization is made among the different levels of ambiguity, a calculation of the overall level of ambiguity for a particular weighting vector can be made. For example, an overall level of ambiguity a can be calculated by:
The value of a for the particular weighting vector can then be compared to the value of a for other predefined weighting vectors to find the lowest overall level of ambiguity a. Alternatively, the weighting coefficients can be adjusted using an adjustment algorithm to optimize or improve (e.g., reduce) the overall level of ambiguity a. Thus, the calculated overall level of ambiguity can serve as a measure of a predicted degree of matching accuracy for weighting vectors. Other algorithms for calculating the overall level of ambiguity for weighting vectors can also be used. In the above example, the goal may be to reduce the overall level of ambiguity a as much as possible, thereby favoring weighting vectors that minimize the number of ambiguous matches. In other implementations, it may be desirable to reduce (or increase) the number of impossible assignments, to reduce (or increase) the number of unambiguous matches, or to perform some combination of these alternatives (e.g. to reduce the number of unambiguous matches while increasing (or maximizing) the number of ambiguous matches. Which type of weighting vector tends to be favored and how the level of ambiguity is calculated generally depends on the desired results. Typically, implementations of a matching process, such as process Furthermore, the tool may be used for different purposes at different stages of a mapping procedure. For example, the tool may be initially used to minimize the number of ambiguous matches. Subsequently, after the user has approved some of the proposed matches, settings for the tool can be changed to favor minimizing the number of unambiguous matches. In addition to favoring different levels of ambiguity using different weighting vectors, the results of the composite matcher can also be influenced by adjusting threshold levels or other criteria for determining whether pairs of elements represent unambiguous, ambiguous, or impossible matches. The categorization among ambiguous, unambiguous, and impossible matches is arbitrary in that the categories can be defined differently for different implementations (e.g., what constitutes an unambiguous match can differ between different implementations or even in the same implementation depending on other characteristics of the element). The criteria used to categorize a particular combined degree of similarity as ambiguous, unambiguous, or impossible can be selected by a developer (e.g., programmer) of software that implements the process In other implementations, instead of defining the predicted degree of matching accuracy as a calculation of a level of ambiguity associated with a particular weighting vector, the predicted degree of matching accuracy can be based on feedback from a user. For example, the combined degrees of similarity generally provide composite match results that indicate which pairs of elements between the source and target schemas are likely and/or unlikely to represent actual matches. A user can review a subset (e.g., ten possible matches or 5% of the possible matches) of the total set of possible matches and provide feedback regarding whether the possible matches in the subset represent actual matches. This feedback can be used to modify the weighting vector. For instance, the correct matches identified by the user can be compared with results of the various matching processes to determine correlations (i.e., which matching processes were most likely to predict the correct match). The weighting vector can then be adjusted to more heavily weight the matching processes that showed the greatest correlations. The adjusted weighting vector can then be used to generate new combined degrees of similarity. Thus, the user feedback on a subset of the possible matches provides a measure of a predicted degree of matching accuracy for weighting vectors. The use of user feedback to adjust the weighting vector can be applied iteratively, such that the matching process continuously “learns” how to better predict matches between the particular schemas being mapped. The settings of the weighting vectors is changed according to feedback from the user. The user can influence the different weighting vectors of each matching type. For example, if the user defines that the matching result of name or definition is primarily wrong, then the weighting vector of a semantic or name matcher will be changed. User feedback can also be used to fine-tune a weighting vector that is selected from one or more candidate weighting vectors using a calculated level of ambiguity. For example, by identifying a particular weighting vector having a lowest calculated level of ambiguity among a set of predefined weighting vectors, the particular weighting vector can be selected as a “best” candidate for producing matching proposals. The particular weighting vector can then be fine-tuned by adjusting the weighting coefficients based on feedback from a user. In general, the performance of a particular matching process can be assessed based on certain metrics. The precision of the matching process is a measure of the reliability of the proposed matches and can be calculated as the number of correct matches divided by the total number of proposed matches. The recall of the matching process indicates the percentage of correct matches found and can be calculated as the number of correct matches divided by the number of actual matches. Neither precision nor recall alone, however, provides a good assessment of performance. Generally, high precision can be obtained at the expense of recall, and vice versa. Performance can more accurately be assessed by an overall measurement, which is calculated as:
In some implementations, however, it may be unnecessary to calculate a degree of similarity for every source schema element-target schema element pair because some pairs (or entire branches of a schema) may be easily rejected without having to calculate a degree of similarity. For example, a branch of the source schema As shown in Which element pairs are identified as likely or possible matches depends on a type of selection algorithm used. A “threshold” selection algorithm identifies all element pairs with a combined degree of similarity over a certain threshold. A “MaxN” type of selection algorithm identifies the n largest combined degrees of similarity, where n is an integer greater than or equal to one, and a “Max Delta” type of selection algorithm identifies: (a) the element pair with the largest combined degree of similarity, and (b) all element pairs having a combined degree of similarity within some delta value of the largest value. These selection algorithms can be combined and/or other selection algorithms can be used. Depending on the particular implementation, a set of combined degrees of similarity for a specific weighting vector can be used as an initial estimation for predicting matches or can simply be compared to combined degrees of similarity for other weighting vectors to narrow the selection of weighting vectors. In either case, the weighting coefficients are tuned to obtain an improved mapping of the schemas and/or to improve the identification of likely or probable matches. When multiple weighting vectors are applied to the similarity cube Each level of the weighting vector similarity cube In some implementations, tuning (or fine-tuning) is performed by generating new weighting coefficients (e.g., identifying one or more additional candidate weighting vectors) after making an initial selection of a weighting vector. For example, when only one weighting vector is initially used to calculate combined degrees of similarity, the weighting coefficients for the weighting vector can be modified or tuned after obtaining the initial results. As shown in In some implementations, optional user feedback (as indicated at In the illustrated example of The number of ambiguous, impossible, and/or unambiguous matches can be used to calculate a measure of ambiguity. The measure of ambiguity can, in turn, be used to compare the weighting vector used to generate the matching results with other weighting vectors or to otherwise tune the weighting vector (e.g., by comparing the measure of ambiguity with corresponding measures for similar weighting vectors in which the weighting coefficients have been adjusted). The invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. The invention can be implemented as one or more computer program products, i.e., one or more computer programs tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. The processes and logic flows described herein, including the method steps of the invention, can be performed by one or more programmable processors executing one or more computer programs to perform functions of the invention by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry. To provide for interaction with a user, the invention can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. The invention can be implemented in a computing system that includes a back-end component (e.g., a data server), a middleware component (e.g., an application server), or a front-end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention), or any combination of such back-end, middleware, and front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet. The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The invention has been described in terms of particular embodiments, but other embodiments can be implemented and are within the scope of the following claims. For example, the operations of the invention can be performed in a different order and still achieve desirable results. Other embodiments are within the scope of the following claims Referenced by
Classifications
Legal Events
Rotate |