US 20050065955 A1 Abstract A method for providing a polyhierarchical classification includes identifying properties of objects useful for distinguishing objects under classification. A plurality of criteria are identified for specializing the identified properties. A form is chosen for attributive expressions that describe classification categories. The attributive expressions are customizable and encode compositions of object properties in terms of attributes from the plurality of criteria. A domain of applicability is identified for each criterion that is representable by attributive expressions, and a dependence relationship between criteria is defined by the inclusion of attributes in the attributive expressions, where a selected criterion depends on another criterion if its domain of applicability includes at least one attribute by the other criterion. A generating polyhierarchy of criteria is automatically established by the dependence relationships between the criteria. The generating polyhierarchy of criteria implicitly defines an induced polyhierarchy of classification categories.
Claims(31) 1. A method for providing a polyhierarchical classification, the method comprising:
identifying properties of objects considered useful for distinguishing objects under classification; identifying a plurality of criteria for specializing the identified properties of the objects, wherein each criterion of the plurality of criteria is defined by a set of mutually exclusive attributes so that a single classified object can be assigned no more than one attribute by the same criterion; choosing a form of attributive expressions for describing classification categories, wherein the attributive expressions are information structures encoding logical formulas that define compositions of object properties in terms of attributes from the plurality of criteria, while the form of the attributive expressions is customizable; and identifying a domain of applicability for each criterion, wherein the domains of applicability are representable by attributive expressions composed of attributes from other criteria or the empty attributive expression, and a dependence relationship between criteria is defined by the inclusion of attributes in the attributive expressions, wherein a selected criterion depends on another criterion if the attributive expression defining its domain of applicability includes at least one attribute by the other criterion, and a generating polyhierarchy of criteria is automatically established by the dependence relationships between the criteria, wherein,
the attributive expressions identifying domains of applicability of criteria, define corresponding root categories in the polyhierarchical classification, wherein each criterion originates from its respective root category, and
when established, the generating polyhierarchy of criteria implicitly defines an induced polyhierarchy of classification categories without requiring an explicit enumeration of the categories and an ordering between them.
2. The method of 3. The method of 4. The method of 5. The method of 6. The method of 7. The method of 8. The method of 9. The method of 10. The method of during run-time, automatically generating additional attributive expressions that describe corresponding categories for classifying objects, wherein the additional categories are part of the induced polyhierarchy of classification categories but are not necessary for defining the structure of the generating polyhierarchy of criteria. 11. The method of 12. The method of attributive expressions encoding a conjunction of elementary specializations, wherein each elementary specialization is encoded by a single attribute, and no one attributive expression includes more than one attribute from the same criterion; attributive expressions encoding a conjunction of logical terms, wherein each logical term is a disjunction of elementary specializations encoded by attributes from the same criterion, or a negation of an elementary specialization encoded by a complement of an attribute, and no one attributive expression includes codes of more than one disjunctive logical term with the same criterion; attributive expressions encoding a disjunction of logical terms, wherein each logical term is a conjunction of elementary specializations encoded by attributes from different criteria, and no one code of a logical term contained in an attributive expression includes more than one attribute from the same criterion; and attributive expressions encoding a disjunction of logical terms, wherein each logical term is a conjunction of simpler logical terms, wherein each simpler logical term is a disjunction of elementary specializations encoded by attributes from the same criterion, or a negation of an elementary specialization encoded by a complement of an attribute, and no one code of a conjunctive logical term includes codes of more than one simpler disjunctive logical term with the same criterion. 13. The method of 14. The method of 15. The method of adding a second polyhierarchy classification to the existing polyhierarchy classification; identifying a root category from which the second polyhierarchy classification is to originate from in the existing polyhierarchy classification, wherein the identified root category is defined in terms of attributes from existing criteria, and the dependence relationships between the existing criteria and topmost criteria of the second polyheiarchy automatically incorporates the second polyhierarchy classification into the existing polyhierarchy classification. 16. The method of storing the generating polyhierarchy of criteria as a reusable template classification that is capable of being associated with a set of objects. 17. The method of using the template classification as a prototype classification for constructing a more comprehensive generating classification of criteria. 18. The method of using the template classification or at least one of its components as a component of another polyhierarchy classification. 19. The method of when classifying objects, automatically identifying persistent categories from the induced polyhierarchy of classification categories that serve as containers for the classified objects in the induced polyhierarchy of classification categories; and storing attributive expressions defining the identified persistent categories, wherein all other categories used with the application are capable of being dynamically restored in run-time using the generating polyhierarchy of criteria. 20. The method of browsing the polyhierarchy classification; and extracting user-specified sub-hierarchies in the induced polyhierarchy of classification categories, wherein the user-specified sub-hierarchies are automatically restored during run-time using algorithms for retrieving direct child categories and direct parent categories of selected classification categories, wherein the direct child and parent categories of the selected categories are defined by the structure of the generating polyhierarchy of criteria and the form of the attributive expressions. 21. The method of automatically performing tests for inclusion between classification categories to determine whether a general-specific relationship exists between the categories, wherein the algorithm used to test for inclusion depends on the chosen form of the attributive expressions representing the classification categories. 22. The method of intersection of categories; difference of categories; unification of categories; and complement of a category. 23. The method of classifying a set of available objects by associating the objects with attributive expressions defining categories in the induced polyhierarchy of classification categories, wherein software code supporting the polyhierarchy classification provides for an automatic extension of persistent categories serving as containers for the classified objects. 24. The method of 25. The method of facilitating an interactive classification of new objects, wherein the interactive classification includes specifying traits of a new object using the criteria of the generating polyhierarchy of criteria, wherein a set of specified traits determines a current specialization level in the polyhierarchical classification, and the polyhierarchical classification: provides automatic recognition of all criteria that are applicable to the new object at the current specialization level, provides random access to all the criteria applicable at the current specialization level, and automatically constructs attributive expressions for the for respective persistent categories serving as containers for the classified objects. 26. The method of 27. The method of facilitating an interactive search and retrieval of information on specific objects from the set of classified objects by specifying a set of traits of the specific objects using criteria of the generating polyhierarchy, wherein the set of specified traits determines a current specialization level in the polyhierarchical classification, and the polyhierarchical classification automatically recognizes all criteria applicable to the search at the current specialization level and provides random access to all the applicable criteria. 28. The method of 29. The method of 30. The method of facilitating an automatic search for and retrieval of information on objects pertaining to a particular category of the induced polyhierarchy of classification categories, wherein the category is defined by a dynamically constructed attributive expression, and the polyhierarchical classification provides return information requested using an external application-specific programming environment. 31. The method of Description This application claims the benefit of U.S. provisional patent applications M 1. Field of the Invention This invention relates generally to construction and/or description of polyhierarchical classifications, and, in particular, to construction and/or description of computer-stored polyhierarchical multi-criteria classifications with intrinsic recognition of domains of classification criteria applicability and simultaneous (random) access to applicable classification criteria. 2. Description of the Related Art Classification of sets of arbitrary entities such as objects, relations, processes, concepts, subjects, etc, is a basic paradigm used by both the human mind and present-day information technologies for storage, retrieval, analysis and systematization of knowledge. The kernel principle of classification is decomposition of a classified set into a number of classes (categories) in accordance with a system of rules (criteria). If categories are ordered by a directed relationship, such as “abstract-concrete”, “general-specific”, or “parent-child” they form a polyhierarchical structure. The term “polyhierarchical structure” is intended to include both single and multiple inheritance relationships between categories. In other words, a category in a polyhierarchical structure may have one or more than one parent. Polyhierarchical classifications provide a dramatic increase of functionality as compared with classifications constructed without ordering categories by their abstraction level. In fact, the latter can be used only to store, search for, and retrieve information. In contrast, the former creates a well-developed formalism for manipulating systems of interrelated abstract entities, thus providing the ability to process information across different abstraction levels, create new languages, formalisms, concepts, and theories. Persistent polyhierarchical classifications include structures that are relatively stable. Persistence of a classification denotes that a set of categories and system, for example, of the “general-specific” relationships between them must be pre-designed and stored in a permanent descriptive repository. Further extensions and refinements of a persistent classification may include the introduction of new criteria, categories, and relationships. Previously developed parts of a persistent classification ordinarily remain unchanged when extending a classified set, adding new selection options to existing criteria, and introducing new criteria. Moreover, a run-time modification of a persistent classification is generally not permitted. This means, in particular, that the accessible search options including keywords and ranges of parameters are permanently stored in the descriptive repository. Persistent classifications are a foundation for collaborative development of general, reusable, and standardized systems. For example, hierarchies of classes, subjects, and aspects in object-oriented (‘OO’), subject-oriented (‘SO’), and aspect-oriented (‘AO’) programming, respectively, are persistent classifications. The classifications used in natural sciences, such as taxonomies of species, classifications of minerals, chemicals, astronomical objects, natural languages, fundamental particles, mathematical abstractions, and countless others are persistent as well. Classification schemes are used in the vast majority of modem computer-aided information systems such as electronic data repositories, computer modeling environments, expert systems, and many others. In particular, electronic data repositories are increasingly being used to store, search for, and retrieve data. These repositories are capable of storing and providing access to large volumes of information. The Internet is one factor that has contributed to the demand for electronic data repositories and to their proliferation. A large number of websites on the Internet, for example, allow users to search though data repositories and retrieve information free of charge. Several well-known examples include websites advertising vehicles available for purchase. These websites typically allow the user to search though the repository by entering search criteria, such as the make of the vehicle, the model, price range, color, and the like. Internet search engines are another example of an application that searches for, and retrieves information from an electronic repository. Other applications include catalogues and directories, online documentation, and components of operating systems, as well as countless others. In short, the ability to electronically search for and retrieve information has become essential in a number of different software and commercial environments. Data repositories are often very large in size. Managing, organizing, and classifying the data is essential in maximizing the usefulness of the repository. The usual approach is to organize and manage the repository using a multi-criteria classification scheme, which can be hierarchical and/or persistent depending on the desired functionality. A number of advanced applications work with sets of abstract entities rather than plain data. These applications may include OO, SO, and AO programming environments, as well as, component based software engineering (CBSE) systems, intelligent databases, content management and expert systems. Such applications explicitly use persistent hierarchies of classes, aspects, etc. as formal schemes for defining entities of different abstraction levels, describing relations between them, and manipulating abstract entities rather than specific objects. The use of hierarchical classifications provides a mechanism for logical operations, such as generalization, specialization, and composition of information. For example, the OO programming paradigm is based on class hierarchies formed by inheritance relationships. Under this approach, a child class includes the data (instance variables) and functions (class methods) of its parents, along with some additional ones. In other words, the child class is similar to its parents except for some additional features. This creates a so-called abstraction mechanism (i.e., a way of accessing a class object by reference to its abstract parent class with automatic data mapping and class method dispatch). Object-oriented hierarchies can be treated as multi-criteria classifications whose criteria are represented by sets of inheritance relationships sharing common parent classes. Modern approaches to multi-criteria classification schemes generally use representations in terms of trees, directed acyclic graphs (‘DAGs’), compositions of trees, or set based formulas. These approaches, however, do not provide efficient support for development, maintenance, and use of general persistent polyhierarchical classifications. Several disadvantages of present-day multi-criteria classification schemes are discussed below for the case of a simplified classification of automobiles. In The criteria in this example include manufacturer name, model year, engine type, internal combustion (IC) engine family, electric power source, fuel type, gasoline grade, and battery type. Some criteria are applicable to only specific kinds of cars, but not to other types of cars. For example, the “gasoline grade” criterion is applicable only for cars with IC engines requiring gasoline fuel. Likewise, the “battery type” criterion, in this illustrative example, is applicable only for electric cars with battery power sources. Such criteria can be called conditional criteria because their applicability depends on specific selections made under more general criteria. Information on available cars in a hypothetical electronic data repository may be organized and searched based on the criteria shown. For example, data entries related to Toyota cars manufactured in 2003 with internal combustion piston engines fueled with regular gasoline should be classified under node Unfortunately, the tree-structured hierarchical classification scheme Another disadvantage of tree-type hierarchies is the mutual exclusivity of subcategories corresponding to different selection options of a criterion. When a category of objects is specialized by a criterion, only one of the available options is selectable (i.e., different options are considered to be mutually exclusive). This may be confusing, for example, if a feature defined by a lower-ranking criterion is equally applicable for several options of higher-ranking criteria. For example, cars with internal combustion engines in the classification These disadvantages arise, at least in part, due to the conjunctive logical structure of tree hierarchies. Elementary specializations performed by selecting options by different criteria describe a set of traits connected by the logical operator ‘AND’. For example, node Another disadvantage of tree-structured classifications relates to fast multiplication of sub-trees with increases in simultaneously applicable criteria. Continuing with the example of Furthermore, a more comprehensive specialization of technical characteristics of piston engines (ICP) may require introduction of at least three more criteria: “ICP family”, “number of cylinders” and “cylinders volume range” with approximately 6 to 8 options each. In this case, the sub-tree starting from the criterion “fuel type” would be repeated 20,000,000 to 50,000,000 times. Finally, a full-scale commercial version of the car classification would implement about 70 criteria in total, and the respective tree structure would contain an astronomical number of nodes. A vast majority of these corresponding categories are intermediate abstract categories and empty leaf categories because there are only a limited number of different car models in the world. However, to support the appropriate sequences of transitions between categories and retrievals of respective criteria, in most cases, a large percentage of the intermediate nodes must be enumerated and stored. Therefore, such a structure would become unmanageable due to the amount of data stored in a repository or incorporated in a computer program to support the tree hierarchy. Directed acyclic graphs (‘DAGs’) that can be viewed as generalization of trees are one approach used to reduce the aforementioned predefined path problem. Similar to trees, DAGs represent hierarchical classifications as category sets strictly ordered by directed relationships, such as “abstract-concrete”, “general-specific”, “parent-child”, etc. However, in contrast to trees, DAGs allow each category to have more than one parent (i.e., DAGs utilize the so-called multiple inheritance concept). A search may be started with any of thee criteria, “manufacturer name”, “model year”, or “engine type” applicable to all cars. After a selection, the search progresses with the remaining originally applicable criteria (if any), as well as with other criteria that may become applicable due to the selection just made, and so on. For example, if “internal combustion” of the criterion “engine type” is selected, the next selection available includes one of the remaining criteria “model year”, “manufacturer name”, or the new criterion “IC engine family” applicable to all the cars with IC engines. In contrast to trees, DAGs provide simultaneous (random) access to all currently applicable criteria, and a sequence of selections corresponds to a particular path on the graph. For example, the vertex Directed acyclic graph structured polyhierarchical classifications resolve the predefined path problem at the expense of an even more dramatic increase in the amount of descriptive data. To provide a full variety of possible selection sequences, all meaningful combination of options from different criteria, and all possible transitions between them must be represented by graph vertices and edges. To illustrate by example, a topmost sub-graph reflecting only five globally applicable criteria of the car classification: “manufacturer name”, “model year”, “brand”, “exterior category”, and “price range”, would contain 167,706 vertices and 768,355 edges. Due to the large amount of mandatory stored data, DAG representations are not relevant for a vast majority of practical applications. As described above for tree-type hierarchies, DAGs also include the disadvantage of the mutual exclusivity of different selection options of a criterion, discussed above. Thus, logical disjunctions of traits are not allowed when developing and using DAGs structured polyhierarchical classifications. Directed acyclic graphs introduce an additional limitation in relation to testing for the “parent-child” relationships between mutually distant categories. In A DAG is usually stored in a computer as an array of vertices, where each vertex is supplied with lists of its immediate parents and children. Continuing with the example shown in To reduce the described problems with trees and DAGs, modern “synthetic” classification methods use compositions of multiple trees, changing the most preferable criteria for each tree. In particular, this approach may be implemented via the concept of “faceted classification”. The classification aspects represented by different facets are mutually exclusive and collectively form a description of object properties identifying classification categories. Mutual exclusivity of aspects means that a characteristic represented by a facet does not appear in another one. In this example, the sample classification When performing a search, a selection may be made from the facets in arbitrary order. For example, a selection may specify internal combustion engine (node Unfortunately, faceted classifications include a number of limitations. For example, faceted classification methods require splitting a classification into a set of independent hierarchies, which hides domains of criteria applicability. In the illustrative example of These techniques are used to describe multi-level systems of relationships between finite sets of units characterized by their relations to other units but not by their internal properties, and, in particular, to establish domains of facet applicability. Advanced FKR methods are capable of representing sophisticated systems of relationships, but when implemented for constructing complex polyhierarchical classifications based solely on “general-specific” relations, they become inconvenient for practical implementations due to the large number of auxiliary data structures. Such an approach becomes exasperating for the developer because it requires manipulating highly abstract concepts, but does not offer a clear logical approach to building classification. In addition, faceted classifications do not automatically provide a persistent polyhierarchical structure of a classification. In fact, faceted classifications implement persistent inheritance relationships only within separate facets. The final classification categories are formed dynamically in run-time and are described by combinations of independently specified properties. If some facets are not globally applicable, a global polyhierarchical structure is not defined unless supplementary rules for defining compatibility and priority of headings from different facets are introduced. For example, it is not possible to check directly whether the category “Toyota cars fueled with gasoline”, defined by a composition of the headings Moreover, in practical cases, it can be difficult to appropriately separate classification aspects for representation by a set of independent hierarchies. One approach is to build a relatively small number of large multi-criteria facets. If, for example, the facets “fuel type” and “battery type” shown in Smaller facets generally improve flexibility of the classification. If, for example, the criteria “IC engine family” and “electric power sources” are extracted and represented as independent facets, they may then be suitable for use in wider contexts. This classification design, however, would result in further encumbering supplementary data structures or program codes defining applicability and consistency of facets in terms of roles or purposes of facets, meta-facets, etc. Therefore, a classification developer has to find an optimal design that reduces the complexity of both individual facets and rules of their interactions (i.e., satisfy two contradictory requirements). In practice, the solution to this problem may be difficult or nonexistent. As a result, many faceted classification tools do not include mechanisms for the control of applicability and consistency of facets, thus creating an opportunity for errors when developing and using the classification tool. Other techniques of tree or DAG compositions are unified by the concepts of “separation of concerns” (‘SOC’) and “multi-dimensional separation of concerns” (‘MDSOC’). These approaches are currently used for building software engineering environments for subject and aspect oriented programming (‘SOP’ and ‘AOP’, respectively) and subject oriented databases (‘SOD’). SOC, for example, has been developed as a supplementary tool for existing OO programming languages, such as C++, Java, and Smalltalk. In an attempt to solve the predefined path problem, these approaches introduce one or more additional tree-structured hierarchies, similar to the unified modeling language (‘UML’) class diagrams that provide crosscutting access to categories of the dominant class hierarchy. In other words, different trees representing areas of concern are built and associated with the dominant tree of classes. In one example, SOC allows a developer to build any number of overlapped trees associated with the same set of classes. A set of user-defined composition rules describes application-specific scenarios of the class method dispatch and data mapping. MDSOC supports composing concerns into tree-structured hyperslices considered hyperplanes in the hyperspace of concerns, thus allowing so-called “multiple classifications” based on compositions of concerns. SOC and MDSOC are specialized approaches intended solely for efficient non-invasive extension of object-oriented computer codes while keeping the advantages of the object-oriented inheritance mechanism. They cannot realistically be considered as general principles for constructing complicated polyhierarchical classifications with dynamically retrieving particular sub-hierarchies in run time. For instance, both concerns and hyperslices are typically tree-structured hierarchies. Generation of a new hyperslice is a static procedure since it requires additional programming, recompiling, and re-debugging the code. In addition, the composition rules used for defining hyperslices depend on specific features of the basic object-oriented environment and descriptions of particular software system units. Structure of the dominant object-oriented class hierarchy imposes restrictions on construction of auxiliary hierarchies since the latter must refer to the same classes, instance variables, and class methods. This problem is commonly referred to as “tyranny of dominant concern”. If a classification scheme uses some heuristic criteria that cannot be formally derived from the existing source code, module configurations, and the like, then a comprehensive description of additional composition rules has to be manually developed. In general cases, it is expected to be an arduous job that should require a great deal of professional expertise. Moreover, due to their narrow specialization, SOC and MDSOC use comprehensive descriptive structures, such as sets of sub-trees describing concerns and hyperslices, rules of class method dispatch, and the like, which are unnecessary for the classification purpose itself. Even after removing the object-oriented specific components and leaving only descriptions of inheritance relationships, dependencies would not allow SOC or MDSOC to be implemented for real-world polyhierarchical classifications due to the amount of programming work and computer resources required for development, storage, and maintenance. Another classification approach is based on using set-theoretic operations and logical formulae for building a classification in run-time. These approaches generally use the concept of “set based classification”. They are typically implemented in the so-called dynamic classification tools, as well as in the rough sets theory and granular computing methods intended for machine learning and data mining. A set based classification typically uses an information table containing attributive descriptions of properties of classified objects. Table cells contain the attributes defining respective car characteristics, where each relevant attribute corresponds to one of the available selection options. The set of attributes from a table row exhaustively specifies a composition of characteristics definable by the eight-criteria classification. The attributes can be represented not only by enumerated identifiers but also by loose keywords or numerical parameters taking values from a continuous range. A search may be conducted that includes the selection of discrete attributes and ranges of attributive numerical parameters in arbitrary order. At each stage of selection, the repository management system retrieves a set of all objects having the specified subset of attributes. For example, using the table in Moreover, set based classifications permit retrieval of specific subsets defined by arbitrary compositions of set-theoretic operations, such as intersection, unification, and difference of subsets. When performing a search, compositions may be represented in terms of logical combinations of constraints imposed on the attributes. For example, the following illustrative formula may be used when searching the table Unfortunately, set based classifications are a specialized approach not generally applicable for development of real-world polyhierarchical classifications. The approach does not imply the existence of a global persistent polyhierarchy. For example, when performing a search with a dynamic classification tool, each category is described by a user-specified logical formula without any relation to other categories. Rough sets and granular computing based systems automatically build hierarchies of the so-called decision rules expressed in terms of logical formulae. However, these hierarchies are intended solely for making particular conclusions based on statistical correlations between properties of available objects, rather than for building pre-designed multi-criteria categorizations. They are not persistent because their structure depends on available sets of objects listed in the information table. Moreover, because of tree structuring, the decision rule hierarchies restore both predefined path and category multiplication problems. Information tables do not use domains of criteria applicability. In a typical case, many criteria will only be applicable to a few of the objects, thus resulting in numerous empty or “N/A” cells. The more conditional (i.e. locally applicable) criteria that are used the greater the percentage of empty cells. As a result, when storing information on qualitatively diverse objects, information tables become very inefficient. Moreover, the lack of automatic control of criteria applicability creates an opportunity for errors during data input into the information table. In fact, when describing a new object with conventional classifications, a data entry person manually selects all the criteria applicable to the object and enter attributes for those criteria. In a real-world application, a classification can use dozens or even hundreds of criteria, while only a few of the criteria may be applicable to a particular object. Without the advantage of automatic recognition of criteria applicability, correct data input becomes unmanageable. For example, if a classification does not provide automatic recognition of criteria applicability, some applicable criteria may be missed, or attributes by non-applicable criteria may be mistakenly entered. Recently developed advanced search systems, such as Universal Knowledge Processor (‘UKP’) uses the ‘dynamic taxonomies’ technique (described in Italian Patent No.: 01303603), combine faceted and set based classification approaches. When interactively searching for information, the dynamic taxonomies provide a graphic user interface that allows for specializations to occur using different facets while concurrently performing set-theoretic operations between them. However, this approach inherits disadvantages of both set-based classifications, such as lack of a pre-designed global polyhierarchy and dependence on the amount of available data, and faceted classifications, such as predefined path and sub-tree multiplication problems. Its range of applicability is therefore limited. It cannot be used, for example, for non-interactive retrieval of information, manipulating abstract categories without reference to available objects, and describing diverse sets of objects. What is needed, therefore, is a more general approach to the construction of hierarchical classifications that may provide, for example, the following set of features: -
- 1. Global polyhierarchical system of classification categories supporting intrinsic recognition of domains of criteria applicability and simultaneous (random) access to all the applicable criteria;
- 2. Persistence of the polyhierarchy and, in particular, invariance of its previously developed part with respect to extension of the classified set, addition of new selection options to existing criteria, and introduction of new classification criteria;
- 3. Compactness of descriptive data structures that provide the ability to avoid cumulative multiplication of explicitly enumerated and mandatory stored classification categories, as well as interrelations between them, or other descriptions;
- 4. Support for set-theoretic operations, including intersections, unifications, complements and differences of sub-categories;
- 5. Efficient realization of the algorithm of testing categories for distant inheritance relationships; and/or
- 6. Conceptual simplicity of the design process, as well as further unplanned extensions and refinements.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
In one aspect of the present invention, a method for providing a polyhierarchical classification is provided. The method includes identifying properties of objects considered useful for distinguishing the objects under classification. A plurality of criteria are identified for specializing the identified properties of the objects. Each criterion of the plurality of criteria is defined by a set of mutually exclusive attributes so that a single classified object can be assigned no more than one attribute by the same criterion. A form is chosen for attributive expressions that describe classification categories. The attributive expressions are information structures encoding logical formulas that define compositions of object properties in terms of attributes from the plurality of criteria, and the form of the attributive expressions is customizable. A domain of applicability is identified for each criterion. The domains of applicability are representable by attributive expressions composed of attributes from other criteria or the empty attributive expression, and a dependence relationship between criteria is defined by the inclusion of attributes in the attributive expressions, where a selected criterion depends on another criterion if the attributive expression defining its domain of applicability includes at least one attribute by the other criterion. A generating polyhierarchy of criteria is automatically established by the dependence relationships between the criteria. In the generating polyhierarchy of criteria, the attributive expressions identifying domains of applicability of criteria define corresponding root categories, and each criterion originates from its respective root category. When established, the generating polyhierarchy of criteria implicitly defines an induced polyhierarchy of classification categories without requiring an explicit enumeration of the categories and an order between them. These and other objects of the present invention will become apparent to those of skill in the art upon review of the present specification, including the drawings and the claims. The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which the leftmost significant digit(s) in the reference numerals denote(s) the first figure in which the respective reference numerals appear, and in which: While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims. Illustrative embodiments of the invention are shown in Various illustrative embodiments of the present invention offer general, straightforward, mathematically rigorous approaches to construction of polyhierarchical classifications with intrinsic recognition of domains of criteria applicability and simultaneous (random) access to applicable classification criteria. Classifications in accordance with the present invention alleviate ambiguities and limitations that arise when constructing conventional classifications in terms of trees, directed acyclic graphs (DAGs), compositions of trees (facets), and information tables. A new approach in accordance with various illustrative embodiments of the present invention is based on the introduction of a kernel system of classification criteria that complies generally with the following guidelines: -
- Each criterion uniquely defines a particular disjoint decomposition of a classification category into a denumerable set of more specific subcategories;
- A domain of definition, (i.e., area of applicability) of each criterion is explicitly defined by composition of classifications by some more general criteria; and
- Subsets of criteria sharing a common domain of definition do not have to be mandatorily ordered by rank or any other property.
Decomposition by a particular criterion is associated with a denumerable set of the criterion's branches identified by respective distinct symbols, such as numbers, verbose names, database records, and the like. Any meaningful ordered pair (criterion, branch) denoting an elementary specialization is called an (elementary) attribute assigned by the corresponding criterion. Hence, each criterion is responsible for the specialization of a particular property of an object by specifying a value of the respective discrete attribute. Since any criterion has its own range of definition (i.e., domain of applicability), in accordance with the second rule above, that range is specified by attributes appointed by more general criteria, and so forth. In one embodiment, a criterion cannot be applied until a specialization by more general criteria defming its domain of applicability is made, so one can say that a selected criterion depends on those more general criteria. Therefore, a recurrent sequence of the criteria forms a polyhierarchical structure established by the directed non-reflective relation of criteria dependency and is called the generating polyhierarchy of criteria. Classification categories are implicitly identified as attributive expressions encoding compositions of elementary specializations represented in terms of attributes from different criteria. Depending on the required functionality of the target classification, the categories can be identified by whether they are (1) simple collections of attributes implying logical conjunction of elementary specializations encoded by attributes from different criteria, (2) collections with branch unions allowing, in addition, logical disjunction of elementary specializations encoded by attributes from the same criterion, (3) unions of simple collections encoding arbitrary logical statements on object properties representable in terms of elementary specializations by criteria with using conjunctions, disjunctions, differences, and negations, or (4) other application-specific attributive structures encoding logical statements on object properties in terms of elementary specializations. These categories form an induced polyhierarchy of categories that is established by the directed relation of implication of logical statements on object properties represented by the respective attributive expressions. If criteria of the generating polyhierarchy are semantically related, some classification categories can appear to be identically empty. However, this does not restrict possibilities of application of various illustrative embodiments of the methods according to the present invention. The generating polyhierarchy implicitly and unambiguously defines the induced polyhierarchy, thus making redundant an explicit description of the equivalent DAG. The generating polyhierarchy is an independent re-usable information structure serving as a template classification for structuring information. In general, the generating polyhierarchy may be further applied to a number of classified sets, included in more general classifications as a component, or used as a prototype for more comprehensive classifications. The generating polyhierarchy provides a compact representation of the target classification, while requiring neither enumeration nor storage of a vast majority of the classification categories. For practical applications it is usually sufficient to store only: -
- the root categories defming domains of applicability of the criteria;
- non-empty leaf (most derived) categories serving as containers for the classified objects; and
- non-empty abstract categories emerging if some objects have an incomplete description (i.e., they cannot be assigned attributes from some applicable criteria due to, for example, incomplete knowledge of their properties).
The basic operations such as selection by a superposition of criteria, retrieval of parent and child categories, tests for the pertinence of a given category to another one, and set-theoretic operations of intersection, unification, and complement (difference of subsets), can be performed directly in terms of the attributive expressions. Due to the reduction of the stored descriptive data structures, and the specifically non-local nature of that description, the managing algorithms appear to be quite simple and straightforward. To perform basic operations, such as database access, operations on attributive expressions, and user interface, reusable non-application specific software code may be developed to support using and managing the polyhierarchy classification. The functionality of the supporting software depends on the form of the attributive expressions (e.g., simple collections, collections with branch unions, unions of simple collections, or a custom form of attributive expressions) and the configuration of the data repository used to store the generating polyhierarchy of criteria and the persistent categories of the induced polyhierarchy of classification categories. However, unlike with conventional classification methods, the software code does not depend on application-specific features of the polyhierarchical classification and the complexity of the classification. The various illustrative embodiments of the present inventive methods offer a general tool for constructing polyhierarchical classifications that: -
- Describe general persistent polyhierarchical structures of dependencies that cannot be efficiently represented in terms of trees, general DAGs, or their compositions;
- Are automatically produced by generating polyhierarchies of criteria that can be developed and managed as primary reusable information structures separated from the target polyhierarchy of categories;
- Are highly flexible with respect to extending classified sets, introducing new kinds of classified items and classification criteria;
- Substantially reduce or eliminate programming work usually required for developing and managing classifications;
- Do not depend on specific features of a processing environment such as hardware configuration, operating system or database structure;
- Reduce the amount of hardware resources required for development, maintenance and use of client data repositories due to the dramatic simplification of descriptive structures and managing algorithms;
- Allow mathematically rigorous and clearly understandable (“look-and-feel”) ways of design that do not require special knowledge;
- Provide a natural approach to development of intelligent and flexible graphic user interfaces;
- Could be efficiently implemented with existing database management systems, and
- Create a new basic formalism for describing existing and building next-generation taxonomical systems as well as for developing software/middleware engineering environments.
Various illustrative embodiments of the present inventive methods have potential applications and intended uses such as the design, development, maintenance, and use of any hierarchically structured data repositories including (but not limited to): -
- Taxonomical, expert, content management, machine learming, and artificial intelligence systems;
- Data and knowledge bases;
- Intelligent control systems and robots;
- Software and middleware engineering environments;
- Application-specific lists, catalogues, and directories;
- Components of operating systems (file and folder catalogues, registry, and the like);
- Internet search engines;
- Descriptive structures of object-, subject-, and aspect-oriented computer programs and compilers (specifically, when intensively using multiple inheritance); and
- On-line documentation and help subsystems.
One preferable illustrative embodiment of a method according to the present invention features the integration of additional descriptive data structures, such as connected lists of criteria, attributes, branches, root and non-empty categories, and the like into existing databases. This allows, for example, the use of standard and/or built-in database management systems for developing, maintaining, and using the resulting classifications. Let ‘A’ be a finite or an infinite set of unspecified objects. A classification of objects ‘a∈A’ may be built as a hierarchical decomposition of A into a system of subsets (categories of classification) using a system of loose specialization rules (criteria of classification). A simple case is a classification by a single criterion. The set A may be partitioned into mutually disjoint categories A(i) using some loose rule (criterion):
The partitioning above is equivalent to introducing a function attr(a) on the set A that takes integral values from 1 to N depending on the subset A(i) that the element a∈A belongs to:
This partitioning may be considered a classification by criterion C, criterion C being defined by the function attr, and categories A(i) are generated by the criterion C. Distinct values attr(a)=i are called branches of the criterion C, and ordered pairs (C, i) are called attributes in the sense that these represents properties of elements a∈A distinguished under classification by the criterion C. The number of branches of a criterion is called its cardinality. Due to the unambiguousness of the function attr, attributes (C, i) are mutually exclusive for any given C, (i.e., no single element a∈A may be assigned more than one attribute by any particular criterion). In addition, the numeric identification of branches (i=1, . . . ,N) is used here only for notational convenience. In practical implementations of various illustrative embodiments of the methods claimed herein, the branches of criteria may be represented by any unordered but denumerable collections of distinct symbols, such as verbose names, references to database records, binary strings, programming entities, and the like. Note that in practical implementations, it may sometimes be convenient to introduce criteria of cardinality N=1 that generates the only category, identical to the subset under classification. The use of such criteria does not impair the logic of further considerations nor limit possibilities of application of various illustrative embodiments of the methods according to the present invention. Practical cases typically require concurrent use of several classification criteria C Now, classifications generated by superpositions of criteria may be considered. For example, the inclusion a∈A This partitioning represents a “two-parameter” classification of the set A, as illustrated by Classifications generated by superposition of more than two criteria may be built in a similar way. The inclusion a∈A Each of these partitionings, unambiguously defined by the collection of criteria numbers {p(s)}, represents an “L-parameter” classification of the set A. In the above-described formal classification scheme, criteria of classification are not ordered by any rank or other feature. This means that the resulting system of categories, as well as any algorithms using the resulting system of categories, are invariant with respect to the transposition (renumbering) of the criteria. Note that if criteria C Note that the above-described scheme is directly applicable to cases of infinite denumerable sets of criteria {C In practical applications, many useful classification criteria are applicable not to the whole set A, but only to some of its subsets. In this case, those subsets (criteria domains of applicability) are explicitly described by attributes from other criteria; (i.e., the domains of applicability are themselves categories of classification). Conditional criteria may be introduced that are applicable to those, and only those, elements a∈A that have attributes {C Note that if a classification uses some semantically related criteria whose root categories overlap, then some combinations of attributes may correspond to contradictory descriptions of object properties. This means that such categories would be identically empty sets by design. An example of such a case is the classification of substances by two criteria “phase state” and “magnetic properties” considered in the previous section “Illustrative Embodiment of a Classification by a System of Criteria”. However, that does not hinder further consideration of such categories nor limit methods of application of such categories. The subsets of attributes from different criteria defining categories of a classification are called simple collections. Since a root category of a conditional criterion is defined through other criteria, the construction of a criteria system is essentially recurrent. First, criteria may be introduced on the whole set A (the most general category). Then the categories formed by attributes from those criteria can be used for introducing additional conditional criteria. As a result of assigning attributes by those additional criteria, new categories are formed that can be used as roots for introducing yet other criteria, and so forth. A directed binary relation of dependence between conditional criteria may be introduced. We will say that criterion C _{u}, and is transitive, that is, from C_{u}⊂C_{v }and C_{v}⊂C_{w}, it follows that C_{u}⊂C_{w}. Combination of these properties guarantees the absence of loops (cyclic paths) in the system of all possible relations of dependence defined on a set of criteria.
For the purpose of illustration, a subset of independent criteria may be considered whose shared root category is the whole set A. In this case, an additional imaginary criterion C It may be observed that the introduction of the imaginary criterion C It is easy to show that categories generated by a polyhierarchy of criteria form a polyhierarchy themselves, with a directed binary relation of inclusion, starting from one topmost category A. The inclusion relation A Categories related to a given category by relations of inclusion and differing from it by only one attribute may be considered either immediate parents (immediate base) categories or immediate children (immediate derived) categories, depending on the direction of the inclusion relation. To prove the existence of a global polyhierarchical structure of a plurality of categories and give a guideline for practical implementations, three tasks are considered below: 1) find all immediate parent (base) categories of a given category; 2) find all immediate child (derived) categories for a given category; and 3) determine whether one of two given categories is a more general category than the other, (i.e., check if they are related by inclusion). Consider, for example, any given category A _{p(m)}, s=1,2, . . . ,L. This criterion, C_{p(m)}, is called a leaf criterion of category A_{{p(s)}}{i_{s}}. For example, in _{{1,2,4}}{2,2,2} defied by simple collection {(C_{1}, 2), (C_{2}, 2), (C_{4}, 2)} has only one leaf criterion C_{4}, since C_{1} C_{4 }and C_{2} C_{4 }while C_{4} C_{1 }and C_{4} C_{2}. The category A_{{1,2}}{2,2} defied by simple collection {(C_{1}, 2), (C_{2}, 2)} has two leaf criteria C_{1 }and C_{2}, since C_{1} C_{2 }and C_{2} C_{1}.
If C _{p(s)}, i_{s})}⊃{(C_{q(t)}, k_{t})}. Because the immediate base category A_{{q(t)}}{k_{t}} has fewer attributes, the immediate base category A_{{q(t)}}{k_{t}} corresponds to a more abstract classification level.
Thus, for each category A A free criteria of a given category A _{f}) and f∉{p(s)}). For example, in _{{2}}{2} and A_{{1,2,4}}{2,2,2} each have one free criterion, C_{1 }and C_{5 }respectively, since A_{{2}}{2}⊂A=root(C_{1}) and A_{{1,2,4}}{2,2,2}=root(C_{5}). Similarly, the top most category A has two free criteria C_{1 }and C_{2}, since root(C_{1})=root(C_{2})=A. The sets of leaf criteria and free criteria of a given category do not intersect, since the former may participate in the attributes forming a respective simple collection, while the latter do not. By adding one of the free attributes (C_{f}, i_{f})(1≦i_{f}≦N_{f}) to the simple collection of the category A_{{p(s)}}{i_{s}} an immediate derived category A_{{r(t)}}{n_{t}} is produced with one more attribute {(C_{r(t)}, n_{t}), 1≦t≦L+1}={(C_{p(s)}, i_{s}), 1≦s≦L}U(C_{f}, i_{f}). The immediate derived category A_{{r(t)}}{n_{t}} is related to the original one by inclusion: A_{{p(s)}}{i_{s}}⊃A_{{r(t)}}{n_{t}}{(C_{p(s)}, i_{s})}⊂{(C_{r(t)}, n_{t})}. Since the immediate derived category A_{{r(t)}}{n_{t}} has more attributes than the given category A_{{p(s)}}{i_{s}}, it corresponds to a more concrete classification level. Thus, for each category with a non-empty set of free criteria there is a set of immediate derived categories, and their number exactly equals the sum of cardinalities of free criteria of the given category.
In addition, the problem of matching two given different categories A _{p(s)}, i_{s})}⊂{(C_{q(t)}, j_{t})}, (i.e., L_{1}<L_{2}, p(s)=q(s) and i_{s}=j_{s }for s=1,2, . . . ,L,). Therefore, the solution of this problem amounts to a mere comparison of two attribute sets forming respective simple collections.
When the classification polyhierarchy is described by a conventional directed acyclic graph (DAG), for example, the solution of that problem amounts to finding a path, or sequence of edges, between two given vertices (see the section above titled “Description of the Prior Art”). If that graph is stored “as is” (i.e., without cumbersome auxiliary descriptions) finding a path requires a combinatorial search of intermediate vertices, and the cost of it dramatically increases with the complexity of the polyhierarchy. To optimize the path search, a redundant description including auxiliary data may be employed. However, in a general case, such optimization would lead to a no less dramatic increase in data storage requirements. Therefore, an effective solution of this problem is not possible for descriptions in terms of conventional DAGs. Implicit Description of Induced Polyhierarchies of Categories It can be observed that construction of a polyhierarchy of categories is induced (i.e. uniquely defined) by a generating polyhierarchy of criteria. Therefore, a generating polyhierarchy may be considered as primary with respect to a polyhierarchy of categories, not only when designing the classification itself, but also when developing data structures and user interfaces in real applications. When designing a classification system, one task is to choose classification criteria and establish dependencies between them. Because only those branches that define dependency relationships between criteria are required for a generating polyhierarchy, there is no need to detail all branches that will be necessary for the whole polyhierarchy at this initial stage. This allows a design of the classification in more abstract terms, without the use of additional classification principles (other than criteria dependencies) and without exhaustively enumerating all possible selection options. The specification of branches that participate in dependencies between criteria produces simple collections corresponding to root categories. At further stages, other branches of criteria are added, thereby automatically inducing, (i.e., making meaningful), correspondent categories of classification. This process allows an automatic and dynamic extension of the induced polyhierarchy. Since extension of the classified set typically requires the addition of new branches, cardinalities of criteria should generally not be fixed in advance. To summarize, the conditions of applicability of various illustrative embodiments of the methods include: -
- Branches of each criterion are mutually exclusive, which means that a single classified objects can be assigned no more than one attribute by the same criterion;
- Domains of applicability of criteria are defined by sets of attributes of more general criteria, (i.e., they coincide with some categories (roots) of the same polyhierarchical classification); and
- Criteria that share their root category are not ordered by rank or any other property. This means that only dependency relations between criteria should be used when designing a generating polyhierarchy.
The generating polyhierarchy together with the sets of criteria branches implicitly describe the structure of an induced polyhierarchical classification of categories. Therefore, the enumeration and storage of the overwhelming majority of categories become redundant, since categories can be dynamically retrieved anytime using the generating polyhierarchy of criteria. In this particular embodiment, the proposed classification method is fully synthetic. The subset of persistent categories that are permanently stored in the form of simple collections (or more general forms of attributive expressions introduced below) is defined by considerations of practical implementation. In one embodiment, the permanent storage of only the following categories is sufficient for effectively working with the induced polyhierarchy: -
- Root categories that define the structure of a generating polyhierarchy;
- Nonempty leaf categories used as “containers” for classified objects; and
- Possibly, also intermediate abstract categories if they are non-empty, if some objects are not fully classified (i.e., they cannot be assigned attributes from some applicable criteria due to, for example, incomplete knowledge of their properties).
For convenient interfacing with external applications, the storage of some additional categories can be useful, in particular: -
- Identically empty categories, arising from the use of semantically related criteria with overlapping domains of definition (if any). As noted above, if a classification uses some semantically related criteria whose root categories overlap, then some combinations of attributes may correspond to contradictory descriptions of object properties. This means that such categories would be identically empty sets by design. Explicit presentation of these categories by simple collections may facilitate the logic of detecting contradictory queries to a client database; and
- Categories that define domains of applicability of additional, (i.e., external to this classification), search tools, such as keyword search engines, applications for sorting by dynamic criteria, and the like.
Illustrative embodiments of the proposed methods can be efficiently implemented by including additional constructs into existing databases. Below, a simplified illustrative example of a realization using the Microsoft Access 2000 environment is considered. The list of objects subject to classification (client objects) is stored in the table “Objects”. For each object, table fields “ID”, “ObjectName”, “Category_Ref” and “Data” contain, respectively, the object's unique identifier, verbose name, reference to object category and object-specific data unrelated to the purpose of classification. Of course, in practical applications, this table may contain other fields for object-specific data, comments, references to other tables, and the like; in particular, these additional data can be used by search tools external to the classification. The other four tables, “Attributes”, “Branches”, “Criteria” and “Categories”, store the description of the polyhierarchical classification. Each of these tables has the “ID” field with unique identifiers (such as auto-numbers) of respective description elements. The “Categories” table stores the list of persistent categories that are sufficient for comfortable work with the polyhierarchy (the categories that are sufficient, for example, were considered and described above at the end of the section titled “Implicit Description of Induced Polyhierarchies of Categories”). Since this table serves only for the identification of particular persistent simple collections, it has only one required field, “ID”. Attributes of each category, in this scheme, are stored in the “Attributes” table, discussed more fully below. The “Criteria” and “Branches” tables that describe, respectively, criteria and branches, include fields “CriterionName” and “BranchName” which are used for verbose human-readable definitions, but are not essential for the polyhierarchy structure. In particular, these names can be changed at any time and do not have to be unique. The field “RootCategory_Ref” of the “Criteria” table contains references to root categories of corresponding criteria, and the field “Criterion_Ref” of the “Branches” table contains references that define to which criterion every branch belongs. So, in this illustrative example, the “Branches” table contains all possible attributes that can form simple collections defming categories. Note that to provide the basic functionality, neither branch indices (within a particular criterion) nor the cardinalities of criteria are required, hence their absence from the illustrated database scheme. The “Attributes” table describes composition of simple collections that define categories, as a “many-to-many” relation between tables “Branches” and “Categories”. Each instance of an attribute is represented by a reference “Branch_Ref” to the corresponding row in the table “Branches”. Instances of attributes are associated with categories by references “Category_Ref” to “IDs” of corresponding categories. The exemplary database configuration is intended for automatically performing low-level operations such as retrieving lists of branches of a selected criterion, finding a root category of a criterion, retrieving a simple collection of attributes defming a selected category, finding objects pertaining to a given category, and the like. These processes may be performed using standard management systems of relational databases. Implementations of these methods in environments other than relational databases may require development of supplementary platform-specific routines to support such low-level operations. In addition, supplementary software codes may be used for supporting higher-level operations, such as database access, user interfaces, and operations on classification categories mentioned, for example, in the section titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”. However, unlike with conventional classification methods, the supplementary software does not depend on application-specific features of the polyhierarchical classification and the complexity of the classification. The left view includes drop-down lists of free criteria (left column) and drop-down lists of criteria branches (right column). Before the selection process begins, there is only one drop-down list—a list of criteria that are applicable to the whole classified set. The selection is performed by a step-by-step specialization with an alternate selection of criteria and branches. At each step, when a criterion is selected, a drop-down list of its branches appears next to it in the right column. When a branch is selected, a new attribute is added to the currently selected category (i.e., superposition of attributes), and a new list of free criteria applicable to the currently selected category, if any such free criteria exist, appears below the last criterion choice. The rollback in selection can be performed simply by choosing another item or “deselect” from one of those lists where a selection has already been done. Doing so makes anything selected below the changed level disappear, because in this particular selection method, the choice available at each level depends on all previous levels. An improvement to this interface would include only removing those subsequently selected attributes that are inconsistent with the rollback change, rather than all of them. The central view in the application window visualizes the polyhierarchical classification in a form similar to the conventional one that is typically used to represent tree hierarchies. But, unlike the typical representation, the central view uses two kinds of expansion nodes: those corresponding to free criteria (a pair of vertical blue, or darker, arrows in the icon) and their branches (a horizontal green, or lighter, arrow in the icon). The user can expand and minimize the lists of free criteria and their branches by clicking on conventional tree expander icons “+” and “−.” Clicking on a particular branch performs a specialization by the respective criterion. If an available free criterion is not used for a specialization, it will stay available at the next specialization level, thereby appearing again in the list of free criteria. The central part (view) of the application window allows a step-by-step specialization by successive selection of criteria and branches, thereby duplicating the functionality of the drop-down lists in the left view. The two views are connected to each other: any selection or rollback in either of them triggers an automatic selection of the corresponding item or rollback in the other one. These selection tools can be used concurrently, so that each specialization step can be performed in either of the two views. The sequence of specializations performed in the left and central window divisions (views) define at each step a particular category of classification. The right division (view) shows a list of all items of the classified set that pertain to that category. As the selection range is refined at each successive specialization step, the list of items is shortened. An item can be selected by clicking on it, whereupon its short description appears in the area below. This illustrative embodiment of user interface can be easily adjusted for facilitating interactive data input when developing databases. It is sufficient to add just two ancillary controls: an input field for object name and a ‘Record’ or ‘Confirm’ button for recording a new object name and associating it with a set of properties specified with using the windows described above. In this section, several generalizations are presented of the formalism that describes categories by attributive expressions in the method of building polyhierarchical classifications described above. These generalizations are based on the introduction of disjunctive operations on categories: one generalization, for example, allows construction of new categories by uniting branches within a particular criterion, and another generalization, for example, goes further toward uniting arbitrary categories. Each version makes it possible to generalize the polyhierarchical system of relations (e.g., “general-specific”) between categories, the second one of these generalizations, for example, turning the set of all possible categories into a ring, (i.e., a system of subsets closed with respect to the operations of unification, intersection, subtraction, and symmetric difference). A detailed discussion of the respective semantic extensions of the notion of attribute collection, as well as algorithms required for efficient work with classification in terms of attributive expressions is provided herein. A ring (in the set-theoretic sense) is a non-empty system S of subsets, satisfying the following conditions: -
- 1. S is closed with respect to operation of intersection of subsets: For all pairs of subsets A, B∈S, A1 B∈S, and
- 2. S is closed with respect to operation of symmetric difference of subsets: For all pairs of subsets A, B∈S, AΔB∈S.
From the definition above it follows that any ring S of subsets satisfies also the following conditions: -
- 3. S includes the empty subset: Ø∈S, and
- 4. S is closed with respect to operation of union of subsets: For all pairs of subsets A, B∈S, AUB∈S, and
- 5. S is closed with respect to operation of complement of subsets: For all pairs of subsets A, B∈S, A\B∈S.
One of the ideas behind the aforementioned method of classification is the use of the generating polyhierarchy of classification criteria for an automatic construction of the induced polyhierarchy of categories. Each category may be defined, for example, by a simple collection of attributes, where each attribute is assigned by a particular criterion, with no more than one attribute from each criterion. That simple collection uniquely defines a superposition (intersection) of partitionings of the classified set by separate features, (i.e., the induced polyhierarchy is constructed by using logical conjunction of elementary specializations defined by attributes). If the identically empty category is formally added to the set of categories of the induced polyhierarchy, the latter becomes a semiring of subsets. A semiring (in the set-theoretic sense) is a system S of subsets, satisfying the following conditions: -
- 1. S includes the empty subset: Ø∈S, and
- 2. S is closed with respect to operation of intersection of subsets: For all pairs of subsets A, B∈S, A∩B∈S, and
- 3. Existence of finite decomposition: For all pairs of subsets A, B∈S, such that A⊂B exists decomposition B=A
_{1}U A_{2}U . . . UA_{N}, where subsets A_{k}∈S(k=1,2, . . . ,N) are mutually disjoint, and A_{1}=A.
In some cases, however, definition of categories solely by means of a conjunction of features may not be sufficient. For example, some routines of the Matlab package take for input objects uncommon types such as “number or vector,” “vector or matrix,” and the like. A fragment of one of the possible classifications based on a conjunction of features that include such categories is shown in Categories shown in
By applying formal comparison rules to these collections it cannot be derived that “Number”⊂“Matrix OR Number,” since {ref, (C _{1}, 4), (C_{2}, 3)}, “Vector”=“Number OR Vector”∩“Vector OR Matrix,” since {ref, (C_{1}, 2)}≠{ref, (C_{1}, 4), (C_{2}, 1)}∩{ref, (C_{1}, 4), (C_{2}, 2)}={ref. (C_{1}, 4)}, and so forth. Therefore, this particular variant of the classification does not reflect some relations of “general-specific” between categories that are significant in the context of Matlab's interfaces.
A more complex version of the conjunctive classification can be created, that uses three independent but semantically related criteria: C
Dashes in this table correspond to free criteria. Although this variant is able to test category inclusions via formal comparisons of the respective simple collections of attributes, it has two significant drawbacks. The first problem is that criteria are semantically related, which causes numerous identically empty categories. The second problem lies in the non-uniqueness of object categorization. For example, an object <<Number>> can be put into these five categories: {ref, (C These examples illustrate that classifications based exclusively on conjunctions of elementary specializations do not always allow for a neat implementation. This may be resolved through the use of disjunctive operations on categories in terms of attributive expressions. Formalisms based on generalized forms of attributive expressions may be introduced to combine operations of both logical conjunction and disjunction of elementary specializations when constructing generating and induced polyhierarchies. These illustrative examples are an extension of the automatic reproduction of the induced polyhierarchy of classification categories by the generating polyhierarchy of criteria discussed above. When introducing disjunctions of elementary specializations, it should be appreciated that “assigning attributes to a classified object” in the definition of classification criteria given, for example, in the beginning of the section “Illustrative Embodiments of a Classification by System of Criteria” above, is not the same as associating an object with a classification category that is defined by a disjunctive attributive expression, such as collections with branch unions and unions of simple collections (described in, for example, the sections below titled “Unions of Criteria Branches” and “Uniting Arbitrary Categories”). In the definition of classification criteria, “assignment of attributes to an object” means a set elementary specialization of object properties, which is essentially a conjunctive procedure, (i.e., elementary specializations encoded by the attributes are implied to be linked with logical AND). Therefore, assigning more than one attribute by the same criterion to an object results in a contradictive specialization of its properties. However, an object can be associated with a classification category that is defined by a disjunctive attributive expression containing several attributes by the same criterion. This may imply, for example, that properties of the object cannot be definitely specialized due to the lack of available information on that object. Associating an object with a category defined by a disjunctive attributive expression denotes a number of possible options for an unknown set of object properties. Those possible options are linked with logical OR, such a category may reflect, for example, an incomplete specialization of the set of object properties. Unions of Criterion Branches As described above, classification of a subset A by a criterion C When constructing a classification by superposition of criteria, each category A The semantics of simple collections can be generalized by including unions of criterion branches. For the purpose of illustration, it is convenient to adopt a convention that assigning several attributes by the same criterion is always performed in the sense of a disjunction of respective elementary specializations. Unlike the formalism of simple collections, this extended convention allows repetitions of criteria in attributive expressions, but all elementary specializations defined by branches of one criterion are united (disjuncted) rather than being intersected (conjuncted). As an example, consider extending a given category A In order to facilitate the illustration, it is helpful to introduce several definitions. Collections of attributes encoding only conjunctions of elementary specializations, as described above, (and therefore not including multiple attributes by any single criteria) {(C Union of branches, or branch union, may be defined as a fragment of a collection composed of attributes by distinct branches of the same criterion:
The above notation allows the representation of a collection of attributes as a set of branch unions (1):
Description of categories in terms of collections with branch unions (3) is equivalent to a valid superposition of intersections and unifications of subsets generated by separate criteria, taking into account criteria dependencies. In particular, there is no restriction on the use of composite categories as criteria roots, so the branch unions can be used in construction of the generating polyhierarchy. Therefore, the directed relation “general-specific,” (i.e., the relation of inclusion), that is the foundation of a polyhierarchical classification, retains its meaning in the new semantics. This extension increases the number of valid (meaningful) categories of the induced polyhierarchy by using disjunctions in definitions of specializations and generalizations. Operations on Collections with Branch Unions In general, practical applications require a formalism that would allow an efficient execution of typical operations with categories represented by attribute collections with branch unions. Discussed below are three important tasks including: 1) comparison of two given categories by the relation “general-specific,” (i.e., the test for inclusion), 2) calculation of the intersection of two categories, and determining 3) the direct parent (base) and 4) the direct child (derived) categories of a given category. As before, for a given category A, its direct parent and direct child are those categories B⊃A and D⊂A whose definitions differ from A by only one attribute. For convenience, it may be assumed that unions of branches in attribute collections (3) are numbered in the order of definition of dependency relations between criteria C _{p(t) }when 1≦s≦t≦M. Due to the hierarchical (acyclic) structure of relations between criteria, such an ordering should always exist. In some applications, it may be useful to define categories by collections that include complements of branch unions (2).
Test for Inclusion Relations Consider two arbitrary categories
_{1}≧M_{2 }and U_{p(s)}{i_{n,s}}⊂U_{q(s)}{j_{m,s}} if 1≦s≦M_{2})(M_{1}>M_{2 }and p(s)=q(s), K_{1,s}≦K_{2,s}, i_{n,s}=j_{n,s }if 1≦n≦K_{1,s}, 1≦s≦M_{2}) (5)
For the inclusion to be strict, it is necessary and sufficient that at least one of the inequalities M _{1}≧M_{2 }and K_{1,s}≦K_{2,s }(1≦s≦M_{2}) be strict. Note that when the compared categories are simple, K_{1,s}=1, U_{p(s)}{i_{n,s}}=(C_{p(s)}, i_{s})(1≦s≦M_{1}) and K_{2,t}=1, U_{q(t)}{j_{m,t}}=(C_{q(t)}, j_{t}) (1≦t≦M_{2}). In that case (5) takes the form:
A _{{p(s)}}{i_{n,s}}⊂A_{{q(t)}}{j_{m,t}} M_{1}≧M_{2 }and p(s)=q(s), i_{s}=j_{s }if 1≦s≦M_{2},
which coincides with the condition of inclusion for categories of a purely conjunctive classification (see, for example, the section above titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”). Computing Intersection It is possible to combine sets of criteria indices of the two given categories (4)
Using the algorithm of testing for inclusion (5) it can be verified that the category A Retrieving Direct Derived Categories Now consider a category A Note that removing an attribute from a union of cardinality 1 results in the identically empty category that is not considered as derived. So, when computing direct children by this procedure, it should be used to reduce branch unions of cardinalities K Use of this formalism allows the addition of attributes by free criteria (see, for example, the section above titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”) to be represented in the more general terms of removing attributes from branch unions, as discussed in this section. If the initial category A Using the notation with total unions, the procedure of removing an attribute from a union discussed above (see (9)) can be directly applied to total unions U It should be appreciated that the method of determining direct child categories through assigning attributes by free criteria, as described, for example, in the section “Illustrative Embodiments of the Induced Polyhierarchies of Categories” does not have independent sense of an elementary specialization. This is because the granularity of elementary specialization attainable is dependent upon the chosen form of the attributive expressions. The attributive expression obtained by assigning a new attribute by a free criterion in the semantics of simple collections is equivalent, in the semantics of collections with branch unions considered here, to a sequence of elementary specializations: one-by-one removal of attributes from the total union of branches of that free criterion. It follows that: a) the assignment of an individual attribute by a free criterion is equivalent to a superposition of elementary disjunctive specializations (a sequence of removals of attributes from the respective branch union), and b) the category resulting from assignment of an attribute by a free criterion cannot be considered a direct child of the initial category in a general case (if the cardinality of the free criterion exceeds two). Retrieving Direct Base Categories The disjunctive method of construction of direct base categories should be founded, by its meaning, on the addition of attributes to branch unions. However, in a general case, generalizing a category by extending one of its branch unions can result in violating domains of definitions of the criteria participating in a given attributive expression. Consideration may be given, therefore, to which attributes can be added to the collection without affecting the domains of definition, thereby preserving the dependencies between criteria participating in the collection. Consider a given category A The hull is the most broad (most abstract) category among all classification categories on which all the criteria C _{n,s}=i_{n,s }for 1≦n≦K_{1,s }and 1≦s≦M). The logical equivalence of this representation was proved in the section above titled “Retrieving Direct Derived Categories”.
If for a certain t the strict inequality K It can be observed that the method of retrieval of parents of simple categories by removing attributes corresponding to leaf criteria (see, for example, the section above titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”) does not have independent sense of an elementary specialization in this formalism. This is because the granularity of elementary specialization attainable is dependent upon the chosen form of the attributive expression. Leaf criteria, by their definition, do not participate in the definition of root categories root(C Therefore, retrieval of direct parent categories by removing leaf criterion attributes, as described above, loses its role as an independent method once branch unions are adopted. In fact, the collection resulting from the removal of a single attribute of a leaf criterion in the semantics of simple collections is obtained, in the semantics of collections with branch unions, by a sequence of elementary generalizations: one-by-one additions of attributes to the corresponding branch union. This means that a) removal of a single attribute by a leaf criterion can be represented by a superposition of elementary disjunctive generalizations (a sequence of additions of attributes to the respective branch union), and b) the resulting category can not, in a general case, be considered a direct parent of the initial category (for leaf criteria with cardinalities exceeding two). Uniting Arbitrary Categories In principle, it may be possible that a proposed formalism, even with the branch union generalization, turns out not to be convenient enough for the construction of a classification. For instance, consider building an extensive classification of material objects. Objects that have optical subsystems may require the introduction of criteria reflecting their optical properties (e.g., focal length, resolution, photosensitivity, and the like), but categories of such objects can be very specialized and significantly different. For example, both electronic devices and living animals may have optical subsystems. This creates the desirability to define criteria on a union of unrelated, or generally speaking, arbitrary categories. To resolve this problem, an even more general formalism may be needed, that: -
- Allows the description of arbitrary unions of categories represented in terms of attributive expressions, and in certain special cases reduces to branch unions;
- Preserves the meaning of dependency relations between criteria and that of the relation “general-specific” between categories; and
- Allows an efficient practical implementation in common programming environments and database management systems.
A convenient notation is useful for the description. Assignment of an attribute (C Domains of definitions of predicates P Generalization of this formalism for the case of unions of arbitrary categories can be performed by defining categories by using logical polynomial functions of the form:
Taking into account mutual distributivity of operations {circumflex over ( )} and v it is possible to transform any of the functions (10) corresponding to a collection with branch unions (3) to the polynomial form (11). But as to an opposite conversion, a complete factorization of the polynomial (11) is necessary for its transformation to the form (10), which may not be possible in a general case. Therefore, polynomials (11) make a broader class of compositions of predicates P compared to conjunctive functions (10). Each polynomial (11) defines a category A _{{p(s,k)}}{i_{s,k}}=true). For any two such polynomials d_{1 }and d_{2 }the following statements are true:
A(d _{1} vd _{2})=A(d _{1})UA(d _{2}), (12)
A(d _{1} {circumflex over ( )}d _{2})=A(d _{1})∩A(d _{2}), (13)
A(d _{1})⊂A(d _{2})(d _{1} →d _{2}). (14)
The formula (14) means that the category A(d _{2}) includes A(d_{1}) if and only if the implication relationship between respective logical functions d_{1 }and d_{2 }is valid (i.e., from the statement d_{1}=true it follows that d_{2}=true, such that the inclusion of categories in terms of logical functions (11) is represented by the relation of implication between them). The meaning of relations (12)-(14) in the context of various illustrative embodiments is explained below.
First, since the induced polyhierarchy is automatically and uniquely determined by the generating polyhierarchy, any a priori information about the composition of the classified set need not be used when building a classification. So, the categories are considered as subsets of all imaginary objects that can theoretically exist due to the compatibility of various properties determined by attributes from participating dependent criteria. Second, in order to enable a gradual extension of the classification, it should be certain that an induced polyhierarchy remains valid when new branches are added to some criteria. In other words, the formalism in use is additive with respect to criteria cardinalities, all relations between categories are invariant with respect to increasing cardinalities. As an example, consider a classification of a set A by two mutually independent criteria C Third, the semantics of the formalism considered does not allow the description of relations between categories that results from the semantical relation of criteria, because there are no criteria reflecting such relations. For example, in the conjunctive classification of the Matlab objects with three mutually independent but semantically related criteria C In summary, categories of classification are treated as subsets of all imaginary (potentially existing) objects with combinations of properties permitted by the construction of the generating polyhierarchy. When performing set theory operations on categories and establishing relations between them, the requirement of invariance with respect to increasing criteria cardinalities should be considered. Any category relationships stipulated only by the <<external>> semantics of criteria and not reflected in the structure of the generating polyhierarchy are excluded from consideration. In one implementation of this methodology, it is convenient to represent the logical polynomial functions (11) in the form of assemblies:
_{{p(s,l)}}{i_{s,l}} if k≠l). Assemblies (15) are yet another form of attributive expressions called unions of simple collections. This representation, by definition, includes the conjunction of elementary specializations of properties within each simple collection (16) and the disjunction of specializations represented by separate simple collections.
To compute the complements of categories considered below, an expression for the negation of a logical polynomial will be needed. Simple transformations result in the formula
Since the semantics of unions of simple collections is based on set theory operations and rules, it preserves the meaning of the relation “general-specific,” which is equivalent to the relation of inclusion. Since it also preserves the meaning of dependency relations between criteria and imposes no restrictions on the use of composite categories as roots, unions of simple collections can be used in the construction of the generating polyhierarchy. This generalization turns the system of categories of the induced polyhierarchy into a ring (i.e., a system of subsets closed with respect to operations of unification, intersection, subtraction and symmetric difference). Note that the method considered here of describing categories by logical functions and collections of attributes reminds one of the formal language of “granular computing” used for an automatic construction of classifications by known properties of objects (as described, for example, in the article by Y. Y. Yao and J. T. Yao, titled “Granular Computing as a Basis for Consistent Classification Problems,” in -
- In various illustrative embodiments, the formalism introduces the basic concept of a generating polyhierarchy that enables development of classifications on a more abstract concept base, and efficient operations on sets of classifications, such as composition and splitting of polyhierarchies;
- The induced polyhierarchy is uniquely defined by the conditions of compatibility of the object properties encoded by the structure of the generating polyhierarchy, so the construction of the classification itself does not require reference to an available set of classified objects; and
- Definitions of set theory operations are invariant with respect to changes in the composition of the classification universe, the implied meaning of criteria, and the increase of criteria cardinalities.
- Also, procedures for elementary generalizations and specializations are introduced, thus allowing definitions of immediate child and parent categories for a selected category, which in its terms automatically provides for the existence of a global polyhiearchal structure of classification.
Operations on Unions of Simple Collections
A number of basic tasks may be useful for working with a classification. These basics tasks may include: 1) the test for inclusion, 2) computing the union, 3) computing the intersection, 4) computing the complement, and 5) retrieving direct base (parent) and direct derived (child) categories. The algorithms to perform these tasks in terms of unions of simple collections form a basic set of operations on categories are called taxonomy algebra. For simplicity, a number of technical details of operations with simple collections are omitted. Moreover, union components are defined as simple categories that correspond to individual simple collections from the unions. Test for Inclusion According to the formula (14), the relation of inclusion between categories is considered equivalent to the relation of implication between their logical polynomials. Due to the independence of predicates P _{j }if i≠j, and a simple category B⊂A_{1}UA_{2}U . . . UA_{K}, there exists a number r (1≦r≦K) such that B⊂A_{r}.
Two arbitrary categories may be represented by unions of simple collections:
The algorithm is based on formula (12) of the disjunction of logical polynomials. The union of two given categories (18) is determined by concatenation of the lists of simple collections included in the unions {S This algorithm is based on the formula (13) of the conjunction of logical polynomials. The intersection of two given categories (18) is equivalent to the union of all non-empty pair-wise intersections of the union components:
_{{p(s,k)}}{i_{s,k}} and S_{{p(s,k)}}{i_{s,k}}S_{{q(t,m)}}{j_{t,m}}.
The resulting union of simple collections {T Computing the Complement (Difference) This algorithm is based on the formula (17) for the negation of logical polynomials. If the categories (18) are defined by the polynomials d Combination of the expressions (22) and (23) provides the ability to compute the complement as a superposition of unions and intersections of the categories B Since the composition of unions {T Retrieving Direct Derived and Base Categories It is natural to call direct parents (or direct base) categories and direct children (or direct derived) categories of a given category A those categories B⊃A and D⊂A, that result from A after performing a single elementary extension (generalization) and, respectively, restriction (specialization). In other words, those extensions and restrictions that cannot be represented as a composition of simpler operations. More exactly, there are no intermediate categories B* and D* such that B*≠A, B*≠B, B⊃B*⊃A and D*≠A, D*≠D, D⊂D*⊂A, respectively. In the semantics of unions of simple collections, any extension of a category A is performed by uniting it with any non-empty category not included in A, and any restriction is performed by subtracting from A one of its non-empty subcategories. Elementary extensions and restrictions correspond to addition and subtraction of various leaf categories of the induced polyhierarchy. Leaf categories are simple categories without free criteria. Thus, direct derived and direct base categories of a category A are all possible non-empty categories A\E and A U F, respectively, where E⊂A and F A are leaf categories of the polyhierarchical classification. Clearly, the previously considered procedures of restriction and extension in terms of simple collections (see, for example, the section above titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”) and collections with unions of branches (see, for example, the sections above titled “Retrieving Direct Derived Categories” and “Retrieving Direct Base Categories”) can be performed as sequences of corresponding elementary operations in terms of unions of simple collections. The generalized forms of attributive expressions, described above, can be implemented using common database management systems (DBMS) as effectively as simpler versions of the method described, for example, in the section above titled “Illustrative Embodiments of Database Configuration Facilitating Simple Collections”. In one illustrative embodiment, the generalized form of the attributive expression may be implemented in the Microsoft Access 2000 environment. Compared with the initial construction of a sample database as described above in the description accompanying The exemplary database configurations are intended for automatically performing low-level operations such as retrieving a list of branches of a selected criterion, finding a root category of a criterion, retrieving a list of attributes of the attributive expression defining a selected category, finding objects pertaining to a given category, and the like. These processes may be performed using standard management systems of a relational database. Implementations of these methods in environments other than relational databases may require the development of supplementary platform-specific routines to support such low-level operations. In addition, supplementary software code may be used for supporting higher level operations, such as database access, user interfaces, and operations on classification categories mentioned, for example, in the sections titled “Operations on Collections with Branch Unions” and “Operations on Unions of Simple Collections”. However, unlike with conventional classification methods, the supplementary software does not depend on application-specific features of the polyhierarchical classification and the complexity of the classification. Other Aspects of Practical Implementations In the development of particular applications, additional technical challenges may arise that may be resolved with the knowledge of application functionality and specific features of the particular polyhierarchical classification. Some of the predictable issues include: -
- 1. As already noted in the sections above, the generalized forms of attributive expressions greatly increase the granularity of classification (i.e., total number of available classification categories, in particular the numbers of direct parents of non-topmost categories and direct children of non-leaf categories). In some cases, this may lead to such a complexity of sub-trees of the induced polyhierarchy that they no longer allow for an observable graphical representation. In this case, the use of the “three-window interface” (see, for example, the description accompanying
FIG. 12 in the section titled “Illustrative Embodiments of a Graphical User Interface”) becomes difficult, as well as the use of any other graphical interface based on the sub-trees visualization. Thus, the design of the user interface may become an important factor when developing applications with interactive access to the classification, such as with interactive search systems. - 2. The description of the classification categories in terms of unions of simple collections is much more general than the description based on collections with branch unions. On the other hand, classifications built with intensive use of branch unions may not be able to be implemented efficiently in terms of unions of simple collections. This follows from the observation that after the transformation of logical polynomials (10) to the form (11), the size of the data structures required for their representation in terms of attributive expressions could increase considerably.
- 1. As already noted in the sections above, the generalized forms of attributive expressions greatly increase the granularity of classification (i.e., total number of available classification categories, in particular the numbers of direct parents of non-topmost categories and direct children of non-leaf categories). In some cases, this may lead to such a complexity of sub-trees of the induced polyhierarchy that they no longer allow for an observable graphical representation. In this case, the use of the “three-window interface” (see, for example, the description accompanying
Therefore, when constructing complicated classifications, an optimization of the formalism of unions of simple collections may be required. To combine advantages of the two method versions, it is possible to use a mixed form of the logical functions defining classification categories (see, for example, the section above titled “Uniting Arbitrary Categories”):
Particular terms g The simplified database configurations considered in the subsections above titled “Illustrative Embodiments of Database Configuration Facilitating Simple Collections” and “Illustrative Embodiments of Database Configurations Facilitating Collections with Branch Unions and Unions of Simple Collections” provide efficient support for low-level operations in relational database environment, thus allowing a reduction in the size of program codes that perform high-level operations, such as access to a database, user interfaces, operations on classification categories, etc. However, when optimized for particular applications, those configurations may require modification and/or supplementation by additional elements. For the purpose of illustration, several exemplary modifications are listed below that might be helpful for reducing the use of computer resources, extending the functionality, and enhancing the efficiency of the interfaces: -
- 1. When building complex classifications that contain a large number of criteria and persistent categories, the largest space in the permanently stored descriptive data (without taking into account classified objects) may be occupied by the auxiliary table “Attributes” representing a “many-to-many” relation between other tables, see
FIGS. 11, 14 , and**15**. Therefore, considerable reduction in storage requirements can be achieved by storing persistent attributive expressions in a compact form. For instance, the following techniques can be useful:- a. Instead of representing persistent attributive expressions by “many-to-many” relations between dedicated tables, they can be compressed into a form of binary or text strings and stored in a special field of the “Categories” table. In some cases the compact format of the attributive expressions can be chosen so that typical operations, such as inclusion checking may be executed directly with the compressed strings without decoding them. In addition to a savings in storage space, this solution provides for faster retrieval of the attributes of a specified category.
- b. In the “Categories” table, a set of intermediate “reference-point” categories can be specified, such that all other classification categories can be derived from them. Reference-point categories are described by their full attributive expressions, while all other persistent categories are described by references to the nearest base reference-point categories supplemented with additional attributive sub-expressions. Those additional sub-expressions can be stored in a compressed form as discussed above. One natural choice for reference-point categories is the set of root categories of criteria. This method of representation of attributive expressions by splitting them into two or more sub-expressions is useful, in particular, for scalable network-distributed classifications with sub-hierarchies stored on, for example, different network nodes.
- 2. To ensure data consistency when designing and maintaining distributed classifications, it is expedient to remove the direct link of the description of the generating polyhierarchy of criteria with the list of persistent categories. This can be attained by using a dedicated group of independent tables to store criteria and sets of branches that define the structure of their dependencies, (i.e., generating polyhierarchy).
- 3. In certain contexts, (e.g., in graphical interfaces), it may be useful to quickly restore some sub-hierarchies of categories in the forms of trees or DAGs. This requires an efficient implementation of the retrieval of all direct parents and children of a given category (see, for example, the sections above titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”, “Retrieving Direct Derived Categories”, “Retrieving Direct Base Categories”, and “Retrieving Direct Derived and Base Categories”). In particular, it is useful to include in the descriptions of persistent categories additional information that would simplify the detection of their leaf and free criteria. This information can be encoded in a number of different ways. As an example, the information may be encoded as follows:
- a. In the descriptions of attributive expressions, the attributes (or branch unions) corresponding to leaf criteria may be assigned a special flag to distinguish them from other attributes (or branch unions).
- b. To explicitly list free criteria, the attributive expressions may be supplemented with new elements that can be called unspecified attributes. Each unspecified attribute has a reference to respective free criteria and has a flag that distinguishes it from other attributes (or branch unions), such as a reference to an “undefined branch” not associated with any criterion.
- 4. When using semantically related classification criteria, it is helpful to enable the recognition of attributive expressions that correspond to identically empty categories defined by inconsistent sets of object properties. To automate this task it may be beneficial to include the most abstract of identically empty categories in the “Categories” table with a flag to distinguish them from the rest. These categories can be also associated with diagnostic messages stored in a separate database table.
- 5. In some embodiments, the polyhierarchical classification can be combined with other tools for the search and the retrieval of data, such as interactive applications for search by keywords, parameter ranges, and the like. If these tools are applicable not to the whole classified set but only to some of its subsets, for automatically enabling and disabling them it may be useful to describe their domains of applicability in terms of classification categories. In a general case, classification categories can be defined by “natural” criteria, (i.e., criteria dictated by the nature of the classified objects), as well as additional “control” criteria introduced exclusively for the support of particular external tools. For the automation of interfaces, root categories of the “control” criteria may also be included into the “Categories” table with a flag to distinguish them from other categories.
- 6. To simplify development of interface program codes, it can be convenient to use attributive expressions not only for describing classification categories, but also as a formal language for representing intermediate logical formulas arising at different stages of data processing when working with a classification. Consider, for example, a particular case when a classification is built on the basis of one of three forms of attributive expressions: simple collections, collections with branch unions, or unions of simple collections. If an application requires access the classification using complex queries in the form of general logical formulas composed of elementary predicates (like those used in set-based and dynamic classifications), the interface program can encode such queries in the form of a union of simple collection for further processing. Since any of three mentioned forms of attributive expressions used in construction of the classification is a particular case of unions of simple collection, descriptions of classification categories can be dynamically converted to the same form. Therefore, when processing client queries, both descriptions of classification categories and input queries can be represented in a unified form, thus allowing use of a standard software library that supports a full set of logical operations (or set theory operations which is the same) using, for example, algorithms described in the section titled “Operations on Unions of Simple Collections”. This manner of optimization may be useful for both interactive and automatic modes of accessing a classification. The examples considered here show that a particular classification implementation may use several different forms of attributive expressions for different operations within the classification. For example, a first form of attributive expression may be used for developing and managing the classification, while a second form of attributive expression may be used for operating the classification to facilitate logical operations.
- 7. Applied polyhierarchical classifications may involve complex systems of criteria. As an example, a relatively small fragment of a generating polyhierarchy used as a foundation of a polyhierarchy of mathematical objects currently under development by QNT Software Development Inc. is presented in
FIG. 16 . If a classification includes too many simultaneously applicable criteria, then an appropriate ordering of those criteria may be required to provide convenient user interfaces.
- 1. When building complex classifications that contain a large number of criteria and persistent categories, the largest space in the permanently stored descriptive data (without taking into account classified objects) may be occupied by the auxiliary table “Attributes” representing a “many-to-many” relation between other tables, see
A specific feature of various illustrative embodiments of the methods claimed herein is that their practical realization includes definitions of domains of applicability of the classification criteria. Therefore, restructuring many existing classification systems in order to represent them in the form of induced polyhierarchies may require auxiliary criteria introduced exclusively for defining domains of applicability of other criteria. If the original classification is not based on a well-reasoned system of criteria, the classification may require adjustments of user interfaces in order to fill the gap between the new structure of classification and the user's conservative perception. To illustrate these points, The considerations above result in the conclusion that criteria presented to the user, as well as the order of their presentation, may not only be determined by the structure of the generating polyhierarchy, but also by considerations of the user's convenience, use of conventional terminology, and the like. Therefore, practical applications may require some auxiliary data structures specifying the interface protocol. Particularly, for the support of non-branching fragments of a generating polyhierarchy in the form of composite criteria, it may be sufficient to add a field for the “shown/hidden” flag to the “Criteria” table. However, it should be appreciated that such ordering of simultaneously applicable criteria and/or hiding criteria for the purpose of improving user interfaces may be accomplished without changing the underlying structure of the generating polyhierarchy. The description of polyhierarchical classifications based on the use of generating polyhierarchies of criteria has several advantages over widely used conventional methods of description by trees, facets, directed acyclic graphs (DAGs), and their compositions (facets). Illustrative examples of some of these advantages include: -
- 1. Compactness of descriptive data. Data structures required for describing a classification which are usually stored in a data repository or represented by an application-specific program code, ordinarily have by an order of magnitude smaller size compared to equivalent descriptions in terms of trees, DAGs or facets. For basic operations, such as specifying object properties and searching for objects by superposition of dependent criteria, retrieving particular sub-hierarchies with dynamically generating attributive expressions that define intermediate categories, matching distant categories by the relation “general-specific”, and performing set theory operations on persistent and dynamic categories, it is only necessary to permanently store descriptions of the generating polyhierarchy and non-empty categories of the induced polyhierarchy (see, for example, the section above tided “Implicit Description of Induced Polyhierarchies of Categories”). Because the generating polyhierarchy contains information about relations between criteria (but not categories), its structure is vastly more compact than that of the induced polyhierarchy. Definition of categories in terms of attributive expressions makes it unnecessary to store information about inheritance relations between categories (which are usually represented by graph edges or relations in faceted thesauri) regardless of the complexity of system of those relations.
- 2. Flexibility of the classification. Instead of listing consistent compositions of object properties that define classification categories, various illustrative embodiments of the classification claimed herein encode the full set of meaningful categories and inheritance relations between them by means of a generating polyhierarchy and sets of branches of its criteria. This simplifies modification of a polyhierarchical classification, for example, during its design, subsequent detailing, and when extending the classified set. So, for example, it proves useful not to list categories that are expected to be non-empty in advance, but form them automatically as new objects are included into the classification. Extension of the category polyhierarchy, required in order to introduce new options for defuring object properties, is done simply by extending the set of branches of corresponding criteria. Somewhat more complex modifications necessary for a) increasing the level of detail in descriptions of object properties or b) composing several polyhierarchical classifications into one, can be performed by extending the generating polyhierarchy by adding new criteria or sub-hierarchies of criteria with the automatic expansion (or merger) of the previously formed persistent attributive expressions.
- 3. Simplification of managing algorithms. Describing categories of classification in terms of attributive expressions directly identifies compositions of object properties without involvement of any redundant information, such as the sequence of specializing those properties. Unlike the conventional methods of description in terms of trees, DAGs, or facets the description presented herein is essentially non-local, because each attributive expression defines an absolute location of a category in the induced polyhierarchy, and, therefore, encodes a full set of possible paths connecting categories in the equivalent DAG. This leads to a considerable reduction in computational costs for solving “non-local” problems, such as a) the check of the distant inheritance (i.e., inclusion) relation between two given categories (see, for example, the sections above titled “Illustrative Embodiments of the Induced Polyhierarchies of Categories”, “Test for Inclusion Relation”, and “Test for Inclusion”), b) determining the nearest common base category for a given set of categories, and c) determining the nearest common derived category for a given set of categories. Unlike algorithms attempting to solve these problems using local inheritance relations, the use of various illustrative embodiments of the methods claimed herein requires neither combinatorial search for a path nor storage of redundant descriptions.
- 4. Automatic unambiguousness and consistency of description. A generating polyhierarchy and the sets of criteria branches define the structure of an induced polyhierarchy of categories. Hence, the use of various illustrative embodiments of the methods claimed herein does not require the use of heuristics in determining what persistent categories are necessary and how they relate to each other. Also, consistency of sets of object properties encoded by attributive expressions automatically results from dependence relations between criteria, without the involvement of any auxiliary constructions, such as composition rules, roles and purposes, meta-facets and the like. As opposed to other methods of classification, auxiliary descriptions and/or computer programs are not required to ensure unambiguousness and consistency of input data when developing, maintaining, and using databases or other information repositories.
- 5. High abstraction level. As already noticed in the section above titled “Implicit Description of Induced Polyhierarchies of Categories”, the main stage of constructing a classification is the design of a generating polyhierarchy, which is performed by systemizing classification criteria that provide specialization of significant traits (i.e., properties of imaginary objects, distinguishable under the classification). Unlike the process of designing classifications described by trees, facets and DAGs, it is not required at this stage to a) prescribe an order between mutually independent criteria, b) list necessary vertices (categories), and c) introduce redundant edges or other auxiliary descriptive structures like meta-facets. So, various illustrative embodiments of the methods claimed herein allow the design of polyhierarchical classifications on the basis of broader concepts, without considering secondary, implementation-specific details. In addition, the generating polyhierarchy and/or its sub-hierarchies become primary information structures that can be developed independently and reused when creating various classifications.
- 6. Increased efficiency of interfaces. The absence of a prescribed order between criteria sharing a common root category and intrinsic recognition of criteria domains of applicability makes interactive data input when developing and using information repositories much more efficient. Thus, it is expedient to build an interface that provides a full set of criteria applicable at a current level of specialization, and allows any of these criteria to be applied for further specializing of the description. The use of such an interface for interactive specialization of object properties is similar to browsing partial sub-trees of the polyhierarchical classification, which are defined by the sequence of specializations. However, unlike algorithms implemented in conventional widespread interactive database management and search systems, various illustrative embodiments of the methods claimed herein allow these trees to be formed dynamically rather than to be predefined. This enables a considerable increase in the variety of criteria (i.e. level of detail in specialization of object properties) without paying for the additional costs entailed by an increase in the complexity of the interface or by the catastrophic expansion of descriptive data and/or managing programs.
General Guidelines for Implementation
In various illustrative embodiments of the present invention, as shown in A domain of applicability of each criterion is representable as a classification category defined by an attributive expression that is composed of attributes from other criteria, or by the empty attributive expression. Since some auxiliary criteria may be required for defining domains of applicability of the previously selected criteria, identifying classification criteria may be performed simultaneously with identifying their domains of applicability. The method -
- simple collections of attributes implying a logical conjunction of elementary specializations encoded by attributes from different criteria (see, for example, the section above tidled “Illustrative Embodiments of Polyhierarchies of Criteria”);
- collections with branch unions allowing, in addition, a logical disjunction of elementary specializations encoded by attributes from the same criterion (see, for example, the section above titled “Unions of Criterion Branches”);
- unions of simple collections encoding arbitrary logical statements on object properties representable in terms of elementary specializations of criteria using conjunctions, disjunctions, differences, and negations (see, for example, the section above titled “Uniting Arbitrary Categories”);
- unions of collections with branch unions, which are an optimized version of unions of simple collections (see, for example, the section above titled “Other Aspects of Practical Implementation”);
- other application-specific attributive structures encoding logical statements of object properties in terms of elementary specializations.
Since domains of criteria applicability should be representable as classification categories defined by attributive expressions, an optimal way of describing those domains may depend on the chosen form of the attributive expressions. On the other hand, describing domains of applicability for application-specific criteria may require support for some pre-defined set of logical operations that relate to application-specific forms of the attributive expressions. As a result, it is often the case that the steps of identifing a plurality of criteria (box The method may further proceed by partially ordering the plurality of classification criteria into the generating polyhierarchy of criteria by identifying domains of criteria applicability in terms of their root categories described by respective attributive expressions, as set forth in box The resulting generating polyhierarchy of criteria implicitly provides an unambiguous and exhaustive description of a structure of the target polyhierarchical classification (see, for example, the section above titled “Illustrative Embodiments of the Induced Polyhierarchy of Categories”). The generating polyhierarchy of criteria may be permanently stored in a data repository, or represented in an alternative form intended, for example, for distribution in a text format. The alternative form of representation of the generating polyhierarchy of criteria should be equivalent to the representation in terms of attributive expressions of root categories in the sense that the former can be automatically converted to the latter without using any extra information. On completing the step set forth in box -
- superimposing the generating polyhierarchy to a stored set of object descriptions and interactively specializing new objects with an automatic recognition of domains of criteria applicability and random access to all the currently applicable criteria, as set forth in box
**1950**, and/or - supporting interactive search and retrieval of information on the classified objects with an automatic recognition of domains of criteria applicability and random access to all the currently applicable criteria, as set forth in box
**1960**, and/or - supporting automatic specialization of new objects using an auxiliary programming environment, and automatic search and retrieval of information on the classified objects specified by dynamically constructed attributive expressions, as set forth in box
**1970**.
- superimposing the generating polyhierarchy to a stored set of object descriptions and interactively specializing new objects with an automatic recognition of domains of criteria applicability and random access to all the currently applicable criteria, as set forth in box
At oval -
- superimposed with several sets of classified objects having similar properties;
- added to more general template classifications as a component; or
- used as a prototype for constructing more comprehensive template classifications.
Accordingly, depending upon the implementation of the present invention, the steps set forth in boxes**1910**,**1920**and**1930**may be undertaken as separate steps from the steps described in boxes**1950**,**1960**,**1970**. For example, steps**1950**,**1960**, and**1970**may be repeated when superimposing different sets of classified objects with the template classification.
Further extensions and refinements of the target classification may include, for example: -
- extending the set of objects superimposed with the generating polyhierarchy;
- introducing new branches to existing criteria;
- introducing new criteria to an existing generating polyhierarchy of criteria;
- extending an existing generating polyhierarchy of criteria by incorporating other generating polyhierarchies or their sub-hierarchies in the existing generating polyhierarchy of criteria.
When introducing a new criterion or incorporating a second polyhierarchy into an existing generating polyhierarchy, domains of applicability of the new components should be identified and represented by root categories in the existing generating polyhierarchy. To automatically establish a proper structure of dependency relationships between criteria of the original generating polyhierarchy and criteria of the new components, root categories of the new components should be defined in terms of attributive expressions composed of attributes from criteria of the original generating polyhierarchy,
A generating polyhierarchy generally encodes a target classification in a compact, clearly understandable form. For example, To facilitate performing low-level operations with descriptive structures representing criteria, branches, attributes, attributive expressions and their components, the configuration of the data repository used for classification storage, may be optimized. The optimal configuration of the data repository usually depends on a chosen form of the attributive expressions, as it was schematically shown, for example, in the sections above titled “Illustrative Embodiments of Database Configuration Facilitating Simple Collections” and “Illustrative Embodiments of Database Configurations Facilitating Collections with Branch Unions and Unions of Simple Collections”. To support higher-level operations, such as access to a data repository, logical operations on attributive expressions, interactive input and output of object descriptions, programing interfaces for automatic specialization as set forth in box At different stages of the method implementation, including construction, management, and use of the polyhierarchical classification, the software environment may generally support a number of operating modes. These operating modes may include, for example,: -
- describing root categories of new criteria and/or sub-hierarchies of criteria for an existing (in particular, empty) generating polyhierarchy of criteria and incorporating new components to the existing generating polyhierarchy;
- removing selected criteria and/or sub-hierarchies of criteria from an existing generating polyhierarchy;
- adding branches to existing criteria and removing selected branches of criteria;
- automatically constructing attributive expressions of classification categories defined by sequences of specializations by applicable criteria;
- automatically performing tests for inclusion between categories represented by their attributive expressions;
- automatically recognizing applicable criteria at a current specialization level;
- browsing an induced polyhierarchy of categories where the attributive expressions describing the categories are automatically constructed in run-time;
- dynamically extracting user-specified sub-hierarchies of an induced polyhierarchy of categories using algorithms for retrieving direct parent and direct child categories;
- automatically performing set-theory operations on categories represented by their attributive expressions, where a set of supported operations depends on the chosen form of the attributive expressions;
- interactively associating classified objects with classification categories via specifying object properties by superposition of applicable criteria with an automatic generation of the respective attributive expression;
- automatically associating classified objects with classification categories using programming interfaces specifically designed for an automatic identification of object properties in terms of attributive expressions;
- moving classified objects from one category to another and removing selected objects from a classification;
- interactively searching for particular objects via specifying object properties by superposition of applicable criteria, or using user-specified queries encoding logical statement of a composition of object properties;
- automatically searching for particular objects using programming interfaces designed for an automatic generation of queries encoding logical statement of a composition of object properties;
- automatically recognizing persistent categories required to be associated with new classified objects, and recording respective attributive expressions in a data repository;
- removing attributive expressions of selected persistent categories from a data repository.
As indicated above, aspects of this invention pertain to specific “method functions” implementable through various information processing systems including, but not limited to, electronic, photonic, quantum, biological and mechanical systems. In an alternate embodiment, the invention may be implemented as a computer program product for use with a computer system, control device, interface subsystem, or their components such as integrated circuits. Those skilled in the art should readily appreciate that programs defining the functions of the present invention can be delivered to a computer in many forms, which include, but are not limited to: (a) information permanently stored on non-writeable storage media (e.g., read only memory devices within a computer such as ROMs or CD-ROM disks readable only by a computer I/O attachment); (b) information alterably stored on writeable storage media (e.g., floppy disks and hard drives); (c) information conveyed to a computer through communication media, such as a local area network, a telephone network, or a public network like the Internet; or (d) information encoded in a pre-designed structure of hardware component, such as a microchip. It should be understood, therefore, that such media, when carrying computer readable instructions that direct the method functions of the present invention, represent alternate embodiments of the present invention. The particular embodiments disclosed above, and described with particularity, are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below. Referenced by
Classifications
Legal Events
Rotate |