WO2001059610A2 - Generation and effectiveness evaluation of a multi-feature classification system using genetic algorithms - Google Patents

Generation and effectiveness evaluation of a multi-feature classification system using genetic algorithms Download PDF

Info

Publication number
WO2001059610A2
WO2001059610A2 PCT/EP2001/000311 EP0100311W WO0159610A2 WO 2001059610 A2 WO2001059610 A2 WO 2001059610A2 EP 0100311 W EP0100311 W EP 0100311W WO 0159610 A2 WO0159610 A2 WO 0159610A2
Authority
WO
WIPO (PCT)
Prior art keywords
features
feature
classification
content material
combination
Prior art date
Application number
PCT/EP2001/000311
Other languages
French (fr)
Other versions
WO2001059610A3 (en
Inventor
James D. Schaffer
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to KR1020017012812A priority Critical patent/KR20010113779A/en
Priority to DE60128405T priority patent/DE60128405D1/en
Priority to EP01951179A priority patent/EP1397759B1/en
Priority to JP2001558869A priority patent/JP2003534583A/en
Publication of WO2001059610A2 publication Critical patent/WO2001059610A2/en
Publication of WO2001059610A3 publication Critical patent/WO2001059610A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms

Definitions

  • This invention relates to the field of classification systems, and in particular to the selection of the features and combinations of features that are used to determine a given sample's classification.
  • a number of methods are available for characterizing the content of a particular piece of material.
  • television guides containing a synopsis of each program are available, and automated systems have been proposed for categorizing programs, and segments of programs, based on an analysis of the images contained in each image frame.
  • web crawlers are used to extract key words and phrases from each web page to facilitate the search for material based on such key words or phrases, or synopses of select web pages are manually created to form an index to facilitate these searches.
  • speech recognition techniques may be employed to create an index of key words used in a television or radio program, or in the lyrics of a song, and so on. Other characterization methods are also employed based on other factors as well.
  • the time of day, day of the week, and season of the year may be included in the characterization of broadcast entertainment material, distinguishing, for example, between "prime time” programs and "before dawn” programs, as a potential indicator of program quality or popularity.
  • the producer, director, actors, broadcast network, type of provider, and so on may also be used to characterize a program.
  • similar parameters may also be used, such as the number of "hits" a particular web page experiences per day, the number of other web pages that reference this web page, the author of the web page, and so on.
  • the term "content material” is used hereinafter to refer to material that is related to the contents of information items, entertainment items, and other items that are potentially available for classification or characterization.
  • the content material may include the contents of the information or entertainment item itself, an abstract or synopsis of the item, information related to the creation or presentation of the item, and so on.
  • feature is used hereinafter to refer to a characteristic that is potentially available to facilitate the classification or characterization.
  • each word in a synopsis of a television program is a feature that can be used to facilitate the characterization of the content material of that television program; the director's name is also a feature, as is the time of day that the program is broadcast.
  • each key word of a web page is a feature, as is the provider of the web page, the family of pages to which this page belongs, and so on.
  • the effectiveness and efficiency of a classification system is highly dependent upon the choice of features used to classify the content material. This effectiveness and efficiency is particularly dependent upon the choice of features that comprise a combination of features.
  • the choice of features that comprise a combination of features is often a subjective choice, and is often a manually intensive process. For example, it is straightforward to use the words of a synopsis as the set of features that will be used to classify a television program. Each synopsis is processed to identify each word and to remove noise words. The resultant list of words used in the synopsis, potentially ordered by their frequency of occurrence, are stored in a database for subsequent processing to determine the subject matter classification for that content material, or to determine whether these words are correlated with words that are related to a user's preference, and so on.
  • chromosome woul 1 contains combinations of features, in the above example, the chromosome woul 1 contain a subset of all the words used in the synopses of many programs. Different chromosomes would contain different subsets.
  • each chromosome that contains these words in its subset of words will generally exhibit a better classification performance than a similar chromosome with fewer of these particular words, whereas the presence or absence of words that are common to a variety of classifications will not significantly affect their chromosomes' classification performance.
  • the performance of the evolved chromosomes can be expected to increase.
  • a single chromosome, or subset of words is selected as the best performing set of words for distinguishing among program classifications.
  • the need for a selection of a set of features that provides an effective and efficient means of characterizing or classifying content material is particularly important as the resources available for such characterizing or classifying become limited. For example, as technologies become available, viewers will expect their newly acquired home entertainment systems to provide program selection assistance, based on a "preferences" profile. These systems, however, will typically contain limited processing and storage capabilities, and may not, for example, be able to store every word and phrase of every synopsis available for such selection assistance. The inclusion of a non-discriminating word in the limited storage will be wasteful, and, more significantly, may also decrease the classification accuracy by introducing false distinctions. Thus, a classification system must be effective in the dual task of selecting effective discriminating features and excluding counter-productive non- discriminating features, and, in general, the effects of including or excluding a feature are non-additive.
  • Evolutionary algorithms hold the promise of providing an identification of the most effective words, or features, to include in a classification system having limited processing and storage capabilities, and this invention addresses a method and apparatus that further enhance the use of evolutionary algorithms for identifying effective feature subsets.
  • An initial set of features is defined that includes a large number of potential features, including the generated features that are combinations of other features. These features include, for example, all of the words used in a collection of content material that has been previously classified, as well as combination features based on these features, such as all the noun and verb phrases used.
  • This pool of original features and combination features is provided to an evolutionary algorithm for a subsequent evaluation, generation, and determination of the best subset of features to use for classification. In this evaluation and generation process, each combination feature is processed as an independent feature, independent of the features that were used, or not used, to form the combination feature.
  • a particular phrase that is generated as a combination of original feature words may be determined to be a better distinguishing feature than any of the original feature words and a more efficient distinguishing feature than an unrelated selection of the individual feature words, as might be provided by a conventional evolutionary algorithm.
  • the resultant best performing subset is subsequently used to characterize new content material for automated classification. If the automated classification includes a learning system, the evolutionary algorithm and the generated combination features are also used to train the learning system.
  • FIG. 1 illustrates an example block diagram of a feature set selection system with a feature combination generator in accordance with this invention.
  • FIG. 2 illustrates an example block diagram of a classification system for classifying content material based on a preferred feature set as determined by the feature set selection system in accordance with this invention.
  • FIG. 3 illustrates an example block diagram of a classification system for classifying content material via a learning system in accordance with this invention.
  • the same reference numerals indicate similar or corresponding features or functions.
  • a conventional feature selection system based on individual features may include both the features of "red” and “cross”, but because they are virtually independent features, the distinguishing capabilities of these features is likely to be poorer than a single feature "red cross”; and, because the feature "red” may be strongly correlated to another classification, the resultant classification using the independent feature “red” may be in error.
  • the occurrence of "the red cross” would likely be more indicative of a classification than an occurrence of "a red cross”, whereas a conventional classification process would not use the word "the” as a distinguishing feature.
  • combination features are generated from the individual features that are conventionally used to classify content material.
  • these generated combination features are treated as being substantially independent of the features that form the combination. For example, if “red” is strongly correlated with an "art” classification, and “red cross” is strongly correlated to a "humanitarian” classification, both features, the original “red” feature and the generated “red cross” feature, may be included in the feature set that is used to classify content material.
  • word phrases that may have effective distinguishing capabilities
  • other feature combinations may be formed.
  • particular director-producer, director-actor, actor-actress combinations may provide for a better characterization of content material than the individual director, actor, actress, and producer features.
  • the combination of the provider of a web site and particular key words or phrases may facilitate a more effective characterization of a web page.
  • the combination of "Philips" as the provider of information on a web page and "entertainment systems" as a key phrase may characterize a page differently than the presence of "Philips” and "entertainment systems” on a web page that is provided by a different provider. Because the number of potential basic features and combination features is virtually limitless, requiring therefore the use of a subset of features for content material classification, an evolutionary algorithm is used in a preferred embodiment for selecting which of the features to use.
  • FIG. 1 illustrates an example block diagram of a feature set selection system 100 in accordance with this invention.
  • the example selection system 100 includes an evolutionary algorithm 160 that is used to generate sets of features that are evaluated with regard to their effectiveness in classifying content material.
  • the evolutionary algorithm 160 uses a pool of features 1 10 as candidate features for inclusion in the selected set of features that are used in the classification process.
  • a feature combination generator 140 is used to generate combination features 141 that include combinations of individual features. Techniques are conventionally available for identifying noun-phrases and verb-phrases, by identifying, for example, adjectives followed by nouns, adverbs followed by verbs, adverb-adjective-noun combinations, and so on. A simpler technique, such as the selection of every sequential pair of words and every sequential triplet of words, or every pairing of non-word features, such as actor-director, producer-director, actor-producer, and so on, may also be used in the feature combination generator 140.
  • the number of different features, including both the basic features and the combination features, that can be used to facilitate the classification of content material is very large, particularly when features can be formed as combinations of other features.
  • the number of possible subsets that may be drawn from a very large pool of features can be astronomically large.
  • the evolutionary algorithm 160 is used to determine a set of features that is likely to be more effective than other sets of features for a given classification task.
  • Evolutionary algc rithms operate via an iterative offspring production process, and include genetic algorithms, .nutation algorithms, and the like.
  • the offspring production process of an evolutionary algorithm is used to determine which particular sets of genes are most effective for performing a given task, using a directed trial and error search.
  • a set of genes, or attributes, is termed a chromosome.
  • a reproduction-recombination cycle is used to propagate generations of offspring.
  • members of a population having different chromosomes mate and generate offspring.
  • These offspring have attributes passed down from the parent members, typically as some random combination of genes from each parent.
  • a classic genetic algorithm the individuals that are more effective than others in performing the given task are provided a higher opportunity to mate and generate offspring.
  • the individuals having preferred chromosomes are given a higher opportunity to generate offspring, in the hope that the offspring will inherit whichever genes allowed the parents to perform the given task effectively.
  • the recombination phase of the reproduction-recombination cycle effects the formation of the next generation of parents based on a preference for those exhibiting effectiveness for performing the given task. In this manner, the number of offspring having attributes that are effective for performing the given task will tend to increase with each generation.
  • Paradigms of other methods of generating offspring such as asexual reproduction, mutation, and the like, are also used to produce generations of offspring having an increasing likelihood of improved abilities to perform the given task.
  • the population consists of members having features that may be effective in classifying content material.
  • some features represent combinations of other features, independent of the individual features. That is, for example, the phrase “flying saucer” may be a feature, whose effectiveness in characterizing and classifying content material is processed substantially independent of the "flying" feature and the “saucer” feature. That is, the feature “flying saucer” will be passed on to future generations, or not passed on to future generations, without regard to whether the features "flying” or “saucer” are passed on.
  • noun phrases and verb phrases are treated as features that are independent of the word features that form such phrase features; director-actor features are independent of the particular director or actor feature, and so on. It has been found that this independent consideration of combination features is particularly well suited for the selection of classification features for use in a limited-resource embodiment, such as the aforementioned embodiment for a home entertainment system. That is, if the number of features that can be utilized for a particular embodiment is limited, the independent consideration of combination features will often lead to an elimination of redundant feature items. Assume, for example, that that aforementioned "flying saucer" feature is one that is highly effective in determining whether a given program is classified as "science fiction".
  • the "flying" and “saucer” feature genes are likely to die out, because the marginal effectiveness gained or lost by including or not including the “flying” or the “saucer” feature gene is likely to be minimal in chromosomes that contain the "flying saucer” gene.
  • the "flying” and “saucer” feature genes die out in a limited-feature embodiment, they are replaced by other features, such as a "murder" feature gene that is effective in determining whether a given program is classified as "mystery”.
  • features are defined that may potentially facilitate the classification of content material, and, in a preferred embodiment, these features include combinations of other features.
  • Candidate sets of select features are encoded as chromosomes that reflect different sets of abilities for distinguishing the content material to facilitate classification of the content material. Some sets of features are more effective for classifying the content material than other sets. By generating offspring from the members having chromosomes that are more effective for classification than others, the effectiveness of the offspring for properly classifying content material is likely to increase.
  • a pool of features 110 is provided that includes those features that may potentially facilitate a classification of content material. As discussed above, these features may include the words used in content material, the words used in the synopses of content material, the creator of the content material, the performers in the content material, and so on.
  • the feature combination generator 140 augments this pool of features 110 with combination features 141, as discussed above.
  • multiple techniques are employed in the combination generator 140 to generate combination features: a phrase identifier creates a feature for each phrase, and a combinatorial generator generates a variety of combinations of the features that are not word based.
  • the combination generator 140 also allows for the predefinition of likely combinations, such as director-actor, as well as the generation of substantially random combinations.
  • a set selector 120 creates a set of features from this pool of features 1 10.
  • the set selector 120 provides an initial population of feature sets 130 to a classification evaluator 150 to evaluate the effectiveness of each set of features 131 for classifying a collection 190 of preclassified content material.
  • the collection 190 contains content material items 191 and the proper classification 192 of each of the content material items 191. That is, for example, the collection 190 may be a collection of information regarding television programs, and the proper classification 192 is the category within which an existing program guide placed each television program 191, such as comedy, drama, sci-fi, mystery, news, and so on.
  • the proper classification 192 may be provided by a potential viewer who classifies each program 191 as "strongly likes", “likes", “no opinion”, "dislikes", and
  • the collection 190 may contain the information regarding all television programs provided during the past month, and the proper classification 192 is whether a particular viewer "watched” or “didn't watch” each program 191. Using an on-line monitor of the programs selected for viewing, this simpler embodiment allows for a classification of each television program 191 into the two classes of watched and not- watched without requiring a direct user input.
  • the collection of content material 190 may be a collection of electronic documents, a collection of abstracts, a collection of web pages, and so on, and the proper classification 192 may be "fiction", “history”, “gossip", and so on. Or, the classification 192 may merely be “viewed” and “not- viewed”.
  • the classification evaluator 150 determines the effectiveness 151 of each candidate set of features 131 for providing a classification that corresponds to the proper classification 192 of each content material item 191.
  • the evolutionary algorithm 160 thereafter provides parameters to the selector 120 for creating the next generation of feature sets 130, based on the effectiveness 151 of the prior generation of feature sets 130.
  • the evolutionary algorithm 160 provides parameters 161 that favor the generation of sets having features common with the more effective sets of the prior generation.
  • the evolutionary algorithm 160 continues to produce, via the set selector 120, generation after generation of candidate sets of features until a preferred set 131' is identified, typically the best performing set of features 131 found during this offspring generation process.
  • a number of techniques are available for terminating the search for the preferred set 131'.
  • a fixed time limit may be placed on the offspring generation process, the number of generations may be limited, convergence characteristics may be used to terminate the process when the incremental gain of each generation is below a cutoff limit, and so on.
  • a feature set 131 ' is found that has been shown to be effective in classifying the collection of prior classified material 190, it is used by a classifier 240 to classify new content material 291.
  • the classifier 240 uses the same classification process that is used in the classification evaluator 150. If, for example, the preferred feature set 131 ' was effective for classifying programs 191 as comedy, mystery, drama, etc., and the same classification process is used in the classifier 240, it is reasonable to assume that this same set 131' will be effective for classifying unknown programs 291 into comedy, mystery, drama, etc.
  • the classifier 240 could use the set 131' for classifying synopses of upcoming programs, and present the results of the classification process as a list of "suggested programs to view", based on the viewer's prior classifications of programs 190.
  • a number of techniques can be applied to evaluate a set's effectiveness in classifying a collection 190 of content material.
  • a learning system is used to learn how to best apply each set >f features 1 1 to the given classification task.
  • FIG. 3 illustrates an example block diagram of a classification system 300 for classifying content material 291 via a learning system 320 in accordance with this invention.
  • the classification system 300 provides the functions illustrated as the classification evaluator 150 of FIG. 1 and the classifier 240 of FIG. 2.
  • a portion of the collection of prior classified content material (item 190 in FIG. 1) is used to provide training content material 191 A, the remainder providing evaluation content material 191B.
  • the training content material 191 A is provided, via switch SI, to an input processor 310.
  • the input processor processes the content material 191 A to provide feature values 31 1 corresponding to the set of features 131 being used to train the learning system 320. For example, if the set of features includes word or phrase features, the input processor determines whether or not the content material 191 A contains each of the word or phrase features, and, depending upon the learning system 320, perhaps the number of occurrences of each word or phrase feature.
  • the learning system 320 While the training content material 191 A is being provided to the learning system 320, the learning system 320 is placed in a training mode, illustrated by the switch 329. Using techniques common in the art, such as the adjustment of weights of nodes in neural network, or the adjustment of correlation factors in a Bayesian classifier, the learning system 320 is trained to increase the likelihood that the given feature set 131 will provide a classification corresponding to the proper classification 192A corresponding to the training content material 191 A. Subsequent training content material items 191 A are similarly applied to the learning system 320 to increase the overall likelihood that the feature set 131 would properly classify the training content material 191 A.
  • the learning system 320 After the learning system 320 is trained to optimize the performance of the feature set 131 relative to the training content material 191 A, the previously classified evaluation content material 19 IB is provided to the input processor 310, via switch SI , and the corresponding feature values 31 1 are applied to learning system 320.
  • the learning system 320 is operated in an execute mode, illustrated by switch 329, when the evaluation content material 191B is applied, so that the learning system 320 provides a classification 241 of the evaluation content material 191B based on the feature set 131 that was used to train the learning system 320.
  • the determined classification 241 is provided to an evaluator 350, via the switch S2.
  • the evaluator 350 compares the determined classification 241 with the proper classification 192B corresponding to the content material 191B.
  • the evaluator 350 After processing each of the evaluation content material items 191B using the given feature set 131, the evaluator 350 provides a measure of effectiveness 151 to the evaluation algorithm 160, corresponding to the classification effectiveness of the given feature set 131. As discussed above, the evolutionary algorithm 160 provides selection parameters 161 to the set selector 120, based on the effectiveness of previously evaluated feature sets 131.
  • the evolutionary algorithm 160 and set selector 120 After a sufficient number of feature sets 131 are processed and evaluated against the evaluation content material 191B, the evolutionary algorithm 160 and set selector 120 provides the preferred feature set 131' as a final input to the input processor 310. Depending upon whether the parameters corresponding to the training of the learning system are saved for each evaluated feature set 131, the learning system 320 is either reloaded with these parameters, or retrained using these parameters.
  • the entire collection 190 of previously classified content material 191 may be applied as the training content material 191 A, to potentially improve the likelihood of the preferred set 131' being able to classify new content material 291, by exposing the preferred set 131' to a larger variety of content material 191.
  • the switch SI is switched to receive the new content material 291
  • the switch 329 is switched to place the learning system 320 into the execute mode
  • the switch S2 is switched to the production mode.
  • the system 300 provides a determined classification 241 of that new content material 291, based on the preferred feature set 131'. Note that after the preferred feature set 131 ' is selected and the learning system
  • the classification system 300 can be embodied on a relatively large computing system to effect the training and evaluation required to determine the preferred set of features for a given classification task, and then the results of this determination, including the parameters that optimize the performance of a classifier 240 for the determined set of features, can be downloaded to a limited-capacity classifier 240.
  • a set-top box is used to interface with a classification system 300 that is located at a site on the Internet, and the results of the determination of the preferred feature set and related parameters are subsequently downloaded to the set-top box.
  • the combination features 141 are presented above as inclusive combinations, for ease of understanding. That is, for example, the feature “red cross” includes both the “red” and “cross” feature occurring sequentially. Alternatively, a combination feature may be defined as the occurrence of one feature in the absence of another feature, such as “red” without “cross” immediately following, or “cross” without “red” immediately preceding. Such variations, and others, will be evident to one of ordinary skill in the art, and included within the spirit and scope of the following claims.

Abstract

The features that are presented to an evolutionary algorithm (160) are preprocessed to generate combination features (141) that may be more efficient in distinguishing among classifications than the individual features (110) that comprise the combination feature. An initial set of features is defined that includes a large number of potential features, including the generated features that are combinations of other features. These features include, words used in a collection of content material that has been previously classified, as well as combination features based on these features. This pool of original features (110) and combination features (141) are provided to an evolutionary algorithm for a subsequent evaluation, generation, and determination of the best subset of features (131') to use for classification. In this evaluation and generation process, each combination feature is processed as an independent feature, independent of the features that were used, or not used, to form the combination feature. The resultant best performing subset (131') is subsequently used to characterize new content material for automated classification.

Description

Multi-Feature Combination Gent ration and Classification Effectiveness Evaluation Using Genetic Algorithms
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the field of classification systems, and in particular to the selection of the features and combinations of features that are used to determine a given sample's classification.
2. Description of Related Art
Consumers are being provided an ever-increasing supply of information and entertainment options. Hundreds of television channels are available to consumers, via broadcast, cable, and satellite communications systems, and the Internet provides a virtually unlimited supply of material spanning most fields of potential interest. Because of the increasing supply of information, entertainment, and other material, it is becoming increasingly difficult for a consumer to locate material of specific interest. A number of techniques have been proposed for easing the selection task, most of which are based on a classification of the available material's content, and a corresponding classification of a user's interest.
A number of methods are available for characterizing the content of a particular piece of material. In the entertainment field, television guides containing a synopsis of each program are available, and automated systems have been proposed for categorizing programs, and segments of programs, based on an analysis of the images contained in each image frame. In the information field, web crawlers are used to extract key words and phrases from each web page to facilitate the search for material based on such key words or phrases, or synopses of select web pages are manually created to form an index to facilitate these searches. In like manner, speech recognition techniques may be employed to create an index of key words used in a television or radio program, or in the lyrics of a song, and so on. Other characterization methods are also employed based on other factors as well. For example, the time of day, day of the week, and season of the year may be included in the characterization of broadcast entertainment material, distinguishing, for example, between "prime time" programs and "before dawn" programs, as a potential indicator of program quality or popularity. The producer, director, actors, broadcast network, type of provider, and so on, may also be used to characterize a program. In the information field, similar parameters may also be used, such as the number of "hits" a particular web page experiences per day, the number of other web pages that reference this web page, the author of the web page, and so on. For ease of reference, the term "content material" is used hereinafter to refer to material that is related to the contents of information items, entertainment items, and other items that are potentially available for classification or characterization. The content material may include the contents of the information or entertainment item itself, an abstract or synopsis of the item, information related to the creation or presentation of the item, and so on. The term "feature" is used hereinafter to refer to a characteristic that is potentially available to facilitate the classification or characterization. For example, each word in a synopsis of a television program is a feature that can be used to facilitate the characterization of the content material of that television program; the director's name is also a feature, as is the time of day that the program is broadcast. In like manner, each key word of a web page is a feature, as is the provider of the web page, the family of pages to which this page belongs, and so on.
The effectiveness and efficiency of a classification system is highly dependent upon the choice of features used to classify the content material. This effectiveness and efficiency is particularly dependent upon the choice of features that comprise a combination of features. The choice of features that comprise a combination of features is often a subjective choice, and is often a manually intensive process. For example, it is straightforward to use the words of a synopsis as the set of features that will be used to classify a television program. Each synopsis is processed to identify each word and to remove noise words. The resultant list of words used in the synopsis, potentially ordered by their frequency of occurrence, are stored in a database for subsequent processing to determine the subject matter classification for that content material, or to determine whether these words are correlated with words that are related to a user's preference, and so on. Not every word, however, is equally effective in distinguishing among programs of different classifications. Some words, for example, may have a high frequency of occurrence in programs, regardless of the program's classification. Other words may have a low frequency of occurrence, but when they appear, are highly effective for distinguishing between program classifications. Evolutionary algorithms, discussed below, have been demonstrated to be particularly effective for determining the combination of features that provide a high degree of distinction among programs of differing classifications. In a traditional evolutionary algorithm, a chromosome is forr led that contains combinations of features, in the above example, the chromosome woul 1 contain a subset of all the words used in the synopses of many programs. Different chromosomes would contain different subsets. If a particular set of words is effective in distinguishing programs, each chromosome that contains these words in its subset of words will generally exhibit a better classification performance than a similar chromosome with fewer of these particular words, whereas the presence or absence of words that are common to a variety of classifications will not significantly affect their chromosomes' classification performance. By continually evolving alternative chromosomes based on the performance of prior chromosomes, with a preference for the evolution of chromosomes having traits (subsets of words) similar to those of the better performing prior chromosomes, the performance of the evolved chromosomes can be expected to increase. At the end of the evolutionary process, a single chromosome, or subset of words, is selected as the best performing set of words for distinguishing among program classifications.
The need for a selection of a set of features that provides an effective and efficient means of characterizing or classifying content material is particularly important as the resources available for such characterizing or classifying become limited. For example, as technologies become available, viewers will expect their newly acquired home entertainment systems to provide program selection assistance, based on a "preferences" profile. These systems, however, will typically contain limited processing and storage capabilities, and may not, for example, be able to store every word and phrase of every synopsis available for such selection assistance. The inclusion of a non-discriminating word in the limited storage will be wasteful, and, more significantly, may also decrease the classification accuracy by introducing false distinctions. Thus, a classification system must be effective in the dual task of selecting effective discriminating features and excluding counter-productive non- discriminating features, and, in general, the effects of including or excluding a feature are non-additive.
BRIEF SUMMARY OF THE INVENTION
Evolutionary algorithms hold the promise of providing an identification of the most effective words, or features, to include in a classification system having limited processing and storage capabilities, and this invention addresses a method and apparatus that further enhance the use of evolutionary algorithms for identifying effective feature subsets.
It is an object of this invention to facilitate the identification and choice of features that are used to characterize content material using an evolutionary algorithm. It is a further object of this invention to facilitate the formation of combination features that are used to characterize content material using an evolutionary algorithm.
These objects and others are achieved by preprocessing the features that are presented to an evolutionary algorithm to generate combination features that may be more efficient in distinguishing among classifications than the individual features that comprise the combination feature. An initial set of features is defined that includes a large number of potential features, including the generated features that are combinations of other features. These features include, for example, all of the words used in a collection of content material that has been previously classified, as well as combination features based on these features, such as all the noun and verb phrases used. This pool of original features and combination features is provided to an evolutionary algorithm for a subsequent evaluation, generation, and determination of the best subset of features to use for classification. In this evaluation and generation process, each combination feature is processed as an independent feature, independent of the features that were used, or not used, to form the combination feature. In this manner, for example, a particular phrase that is generated as a combination of original feature words may be determined to be a better distinguishing feature than any of the original feature words and a more efficient distinguishing feature than an unrelated selection of the individual feature words, as might be provided by a conventional evolutionary algorithm. The resultant best performing subset is subsequently used to characterize new content material for automated classification. If the automated classification includes a learning system, the evolutionary algorithm and the generated combination features are also used to train the learning system.
BRIEF DESCRIPTION OF THE DRAWINGS The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
FIG. 1 illustrates an example block diagram of a feature set selection system with a feature combination generator in accordance with this invention.
FIG. 2 illustrates an example block diagram of a classification system for classifying content material based on a preferred feature set as determined by the feature set selection system in accordance with this invention.
FIG. 3 illustrates an example block diagram of a classification system for classifying content material via a learning system in accordance with this invention. Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.
DETAILED DESCRIPTION OF THE INVENTION This invention is based on the observation that certain combinations of features, such as words, contain significantly more classification-sensitive information than the individual words that form the combination. Also, in many cases, the individual features may have a detrimental effect on the overall ability to distinguish among classifications. Consider, for example, highly descriptive phrases, such as "red cross", "flying saucer", "green beret", and so on. It would be rare that a program containing one of these phrases in its synopsis would be categorized in the same category as another program whose synopsis contains one of the other phrases. That is, each of these phrases is well suited to distinguish among program categories. The individual words, "red", "green", "cross", "saucer", "beret", and "flying", taken out of context from their distinguishing phrases, are likely not to be as effective for distinguishing classifications. Some of these words, such as "red" and "green" may be more suggestive of another classification, such as "art", and thereby serve to decrease the classification effectiveness of a system that uses these words out of context. A conventional feature selection system based on individual features, such as a conventional evolutionary algorithm that uses the words contained in a program synopsis, may include both the features of "red" and "cross", but because they are virtually independent features, the distinguishing capabilities of these features is likely to be poorer than a single feature "red cross"; and, because the feature "red" may be strongly correlated to another classification, the resultant classification using the independent feature "red" may be in error. In like manner, the occurrence of "the red cross" would likely be more indicative of a classification than an occurrence of "a red cross", whereas a conventional classification process would not use the word "the" as a distinguishing feature.
In accordance with one aspect of this invention, combination features are generated from the individual features that are conventionally used to classify content material. In accordance with another aspect of this invention, these generated combination features are treated as being substantially independent of the features that form the combination. For example, if "red" is strongly correlated with an "art" classification, and "red cross" is strongly correlated to a "humanitarian" classification, both features, the original "red" feature and the generated "red cross" feature, may be included in the feature set that is used to classify content material. In addition to the generation of word phrases that may have effective distinguishing capabilities, other feature combinations may be formed. For example, particular director-producer, director-actor, actor-actress combinations may provide for a better characterization of content material than the individual director, actor, actress, and producer features. Similarly, the combination of the provider of a web site and particular key words or phrases may facilitate a more effective characterization of a web page. For example, the combination of "Philips" as the provider of information on a web page and "entertainment systems" as a key phrase may characterize a page differently than the presence of "Philips" and "entertainment systems" on a web page that is provided by a different provider. Because the number of potential basic features and combination features is virtually limitless, requiring therefore the use of a subset of features for content material classification, an evolutionary algorithm is used in a preferred embodiment for selecting which of the features to use.
FIG. 1 illustrates an example block diagram of a feature set selection system 100 in accordance with this invention. The example selection system 100 includes an evolutionary algorithm 160 that is used to generate sets of features that are evaluated with regard to their effectiveness in classifying content material. The evolutionary algorithm 160 uses a pool of features 1 10 as candidate features for inclusion in the selected set of features that are used in the classification process. In accordance with one aspect of this invention, a feature combination generator 140 is used to generate combination features 141 that include combinations of individual features. Techniques are conventionally available for identifying noun-phrases and verb-phrases, by identifying, for example, adjectives followed by nouns, adverbs followed by verbs, adverb-adjective-noun combinations, and so on. A simpler technique, such as the selection of every sequential pair of words and every sequential triplet of words, or every pairing of non-word features, such as actor-director, producer-director, actor-producer, and so on, may also be used in the feature combination generator 140.
As discussed above, the number of different features, including both the basic features and the combination features, that can be used to facilitate the classification of content material is very large, particularly when features can be formed as combinations of other features. The number of possible subsets that may be drawn from a very large pool of features can be astronomically large. In accordance with this invention, the evolutionary algorithm 160 is used to determine a set of features that is likely to be more effective than other sets of features for a given classification task. Evolutionary algc rithms operate via an iterative offspring production process, and include genetic algorithms, .nutation algorithms, and the like. In a typical evolutionary algorithm, certain attributes, or £;enes, are assumed to be related to an ability to perform a given task, different sets of genes resulting in different levels of effectiveness for performing that task. The evolutionary algorithm is particularly effective for problems wherein the relation between the set of attributes and the effectiveness for performing the task does not have a closed form solution. Copending U.S. patent application "Code Compaction By Evolutionary Algorithm", U.S. serial number 09/217408, filed on 21st Dec. 1998 for Philips Electronics North America Corp., Attorney Docket PHA023579 (Disclosure 700241), incorporated by reference herein, discloses the use of evolutionary algorithms for compacting software code, data files, and the like. Copending U.S. patent application " Method For Improving Neural Network Architectures Using Evolutionary Algorithms", U.S. serial number 09/387488, filed on 1st Sept. 1999 for Philips Electronics North America Corp., Attorney Docket PHA023760 (Disclosure 700778), incorporated by reference herein, discloses the use of evolutionary algorithms for determining a preferred architecture for use in a neural network for solving a given task.
The offspring production process of an evolutionary algorithm is used to determine which particular sets of genes are most effective for performing a given task, using a directed trial and error search. A set of genes, or attributes, is termed a chromosome. In the genetic algorithm class of evolutionary algorithms, a reproduction-recombination cycle is used to propagate generations of offspring. In the reproduction phase of the reproduction- recombination cycle, members of a population having different chromosomes mate and generate offspring. These offspring have attributes passed down from the parent members, typically as some random combination of genes from each parent. In a classic genetic algorithm, the individuals that are more effective than others in performing the given task are provided a higher opportunity to mate and generate offspring. That is, the individuals having preferred chromosomes are given a higher opportunity to generate offspring, in the hope that the offspring will inherit whichever genes allowed the parents to perform the given task effectively. The recombination phase of the reproduction-recombination cycle effects the formation of the next generation of parents based on a preference for those exhibiting effectiveness for performing the given task. In this manner, the number of offspring having attributes that are effective for performing the given task will tend to increase with each generation. Paradigms of other methods of generating offspring, such as asexual reproduction, mutation, and the like, are also used to produce generations of offspring having an increasing likelihood of improved abilities to perform the given task.
In the context of this disclosure, the population consists of members having features that may be effective in classifying content material. In accordance with this invention, some features represent combinations of other features, independent of the individual features. That is, for example, the phrase "flying saucer" may be a feature, whose effectiveness in characterizing and classifying content material is processed substantially independent of the "flying" feature and the "saucer" feature. That is, the feature "flying saucer" will be passed on to future generations, or not passed on to future generations, without regard to whether the features "flying" or "saucer" are passed on. Generally, for example, noun phrases and verb phrases are treated as features that are independent of the word features that form such phrase features; director-actor features are independent of the particular director or actor feature, and so on. It has been found that this independent consideration of combination features is particularly well suited for the selection of classification features for use in a limited-resource embodiment, such as the aforementioned embodiment for a home entertainment system. That is, if the number of features that can be utilized for a particular embodiment is limited, the independent consideration of combination features will often lead to an elimination of redundant feature items. Assume, for example, that that aforementioned "flying saucer" feature is one that is highly effective in determining whether a given program is classified as "science fiction". Once the "flying saucer" feature becomes a dominant gene in each generation of offspring, the "flying" and "saucer" feature genes are likely to die out, because the marginal effectiveness gained or lost by including or not including the "flying" or the "saucer" feature gene is likely to be minimal in chromosomes that contain the "flying saucer" gene. As the "flying" and "saucer" feature genes die out in a limited-feature embodiment, they are replaced by other features, such as a "murder" feature gene that is effective in determining whether a given program is classified as "mystery".
In summary, features are defined that may potentially facilitate the classification of content material, and, in a preferred embodiment, these features include combinations of other features. Candidate sets of select features are encoded as chromosomes that reflect different sets of abilities for distinguishing the content material to facilitate classification of the content material. Some sets of features are more effective for classifying the content material than other sets. By generating offspring from the members having chromosomes that are more effective for classification than others, the effectiveness of the offspring for properly classifying content material is likely to increase.
As illustrated in FIG. 1, a pool of features 110 is provided that includes those features that may potentially facilitate a classification of content material. As discussed above, these features may include the words used in content material, the words used in the synopses of content material, the creator of the content material, the performers in the content material, and so on. The feature combination generator 140 augments this pool of features 110 with combination features 141, as discussed above. In a preferred embodiment, multiple techniques are employed in the combination generator 140 to generate combination features: a phrase identifier creates a feature for each phrase, and a combinatorial generator generates a variety of combinations of the features that are not word based. The combination generator 140 also allows for the predefinition of likely combinations, such as director-actor, as well as the generation of substantially random combinations. These and other methods of generating combination features will be evident to one of ordinary skill in the art in view of this invention.
A set selector 120 creates a set of features from this pool of features 1 10. The set selector 120 provides an initial population of feature sets 130 to a classification evaluator 150 to evaluate the effectiveness of each set of features 131 for classifying a collection 190 of preclassified content material. The collection 190 contains content material items 191 and the proper classification 192 of each of the content material items 191. That is, for example, the collection 190 may be a collection of information regarding television programs, and the proper classification 192 is the category within which an existing program guide placed each television program 191, such as comedy, drama, sci-fi, mystery, news, and so on. Alternatively, the proper classification 192 may be provided by a potential viewer who classifies each program 191 as "strongly likes", "likes", "no opinion", "dislikes", and
"strongly dislikes". In a simpler embodiment, the collection 190 may contain the information regarding all television programs provided during the past month, and the proper classification 192 is whether a particular viewer "watched" or "didn't watch" each program 191. Using an on-line monitor of the programs selected for viewing, this simpler embodiment allows for a classification of each television program 191 into the two classes of watched and not- watched without requiring a direct user input. In like manner, the collection of content material 190 may be a collection of electronic documents, a collection of abstracts, a collection of web pages, and so on, and the proper classification 192 may be "fiction", "history", "gossip", and so on. Or, the classification 192 may merely be "viewed" and "not- viewed".
Using techniques described hereinafter, the classification evaluator 150 determines the effectiveness 151 of each candidate set of features 131 for providing a classification that corresponds to the proper classification 192 of each content material item 191. The evolutionary algorithm 160 thereafter provides parameters to the selector 120 for creating the next generation of feature sets 130, based on the effectiveness 151 of the prior generation of feature sets 130. As is common in the art of evolutionary algorithms, discussed above, the evolutionary algorithm 160 provides parameters 161 that favor the generation of sets having features common with the more effective sets of the prior generation. The evolutionary algorithm 160 continues to produce, via the set selector 120, generation after generation of candidate sets of features until a preferred set 131' is identified, typically the best performing set of features 131 found during this offspring generation process. A number of techniques are available for terminating the search for the preferred set 131'. A fixed time limit may be placed on the offspring generation process, the number of generations may be limited, convergence characteristics may be used to terminate the process when the incremental gain of each generation is below a cutoff limit, and so on.
As illustrated in FIG. 2, when a feature set 131 ' is found that has been shown to be effective in classifying the collection of prior classified material 190, it is used by a classifier 240 to classify new content material 291. The classifier 240 uses the same classification process that is used in the classification evaluator 150. If, for example, the preferred feature set 131 ' was effective for classifying programs 191 as comedy, mystery, drama, etc., and the same classification process is used in the classifier 240, it is reasonable to assume that this same set 131' will be effective for classifying unknown programs 291 into comedy, mystery, drama, etc. In like manner, if the preferred feature set 131 ' was effective for classifying programs 191 as "watched" and "not- watched" by a particular viewer or a group of viewers, it is reasonable to assume that this same set 131' will be effective for classifying unknown programs 291 as programs that the viewer or group of views will be likely to watch or not watch. Or, in another application, the classifier 240 could use the set 131' for classifying synopses of upcoming programs, and present the results of the classification process as a list of "suggested programs to view", based on the viewer's prior classifications of programs 190.
A number of techniques can be applied to evaluate a set's effectiveness in classifying a collection 190 of content material. In a preferred embodiment of the invention, because the features of each eval .ration set 1 1 may be different, a learning system is used to learn how to best apply each set >f features 1 1 to the given classification task.
FIG. 3 illustrates an example block diagram of a classification system 300 for classifying content material 291 via a learning system 320 in accordance with this invention. As will be evident from the following description, the classification system 300 provides the functions illustrated as the classification evaluator 150 of FIG. 1 and the classifier 240 of FIG. 2.
To train the learning system 320, a portion of the collection of prior classified content material (item 190 in FIG. 1) is used to provide training content material 191 A, the remainder providing evaluation content material 191B. For each feature set 131 provided by the set selector 120, the training content material 191 A is provided, via switch SI, to an input processor 310. The input processor processes the content material 191 A to provide feature values 31 1 corresponding to the set of features 131 being used to train the learning system 320. For example, if the set of features includes word or phrase features, the input processor determines whether or not the content material 191 A contains each of the word or phrase features, and, depending upon the learning system 320, perhaps the number of occurrences of each word or phrase feature. While the training content material 191 A is being provided to the learning system 320, the learning system 320 is placed in a training mode, illustrated by the switch 329. Using techniques common in the art, such as the adjustment of weights of nodes in neural network, or the adjustment of correlation factors in a Bayesian classifier, the learning system 320 is trained to increase the likelihood that the given feature set 131 will provide a classification corresponding to the proper classification 192A corresponding to the training content material 191 A. Subsequent training content material items 191 A are similarly applied to the learning system 320 to increase the overall likelihood that the feature set 131 would properly classify the training content material 191 A.
After the learning system 320 is trained to optimize the performance of the feature set 131 relative to the training content material 191 A, the previously classified evaluation content material 19 IB is provided to the input processor 310, via switch SI , and the corresponding feature values 31 1 are applied to learning system 320. The learning system 320 is operated in an execute mode, illustrated by switch 329, when the evaluation content material 191B is applied, so that the learning system 320 provides a classification 241 of the evaluation content material 191B based on the feature set 131 that was used to train the learning system 320. The determined classification 241 is provided to an evaluator 350, via the switch S2. The evaluator 350 compares the determined classification 241 with the proper classification 192B corresponding to the content material 191B. After processing each of the evaluation content material items 191B using the given feature set 131, the evaluator 350 provides a measure of effectiveness 151 to the evaluation algorithm 160, corresponding to the classification effectiveness of the given feature set 131. As discussed above, the evolutionary algorithm 160 provides selection parameters 161 to the set selector 120, based on the effectiveness of previously evaluated feature sets 131.
After a sufficient number of feature sets 131 are processed and evaluated against the evaluation content material 191B, the evolutionary algorithm 160 and set selector 120 provides the preferred feature set 131' as a final input to the input processor 310. Depending upon whether the parameters corresponding to the training of the learning system are saved for each evaluated feature set 131, the learning system 320 is either reloaded with these parameters, or retrained using these parameters. In this final step, because the preferred feature set 131' has been selected, and need not be reevaluated, the entire collection 190 of previously classified content material 191 may be applied as the training content material 191 A, to potentially improve the likelihood of the preferred set 131' being able to classify new content material 291, by exposing the preferred set 131' to a larger variety of content material 191.
After training the learning system 320 to optimize the classification effectiveness of the preferred feature set 131', the switch SI is switched to receive the new content material 291, the switch 329 is switched to place the learning system 320 into the execute mode, and the switch S2 is switched to the production mode. Thereafter, when each new content material item 291 is applied to the system 300, the system 300 provides a determined classification 241 of that new content material 291, based on the preferred feature set 131'. Note that after the preferred feature set 131 ' is selected and the learning system
320 is trained, the evolutionary algorithm 350 and its related parts and other feature sets are no longer required to effect the classification of new content material 291. Thus, the components of the system 300 required to classify new content material 291 can be minimized to those illustrated in FIG. 2. In this manner, the classification system 300 can be embodied on a relatively large computing system to effect the training and evaluation required to determine the preferred set of features for a given classification task, and then the results of this determination, including the parameters that optimize the performance of a classifier 240 for the determined set of features, can be downloaded to a limited-capacity classifier 240. In one preferred embodiment, for example, a set-top box is used to interface with a classification system 300 that is located at a site on the Internet, and the results of the determination of the preferred feature set and related parameters are subsequently downloaded to the set-top box.
The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. For example, the combination features 141 are presented above as inclusive combinations, for ease of understanding. That is, for example, the feature "red cross" includes both the "red" and "cross" feature occurring sequentially. Alternatively, a combination feature may be defined as the occurrence of one feature in the absence of another feature, such as "red" without "cross" immediately following, or "cross" without "red" immediately preceding. Such variations, and others, will be evident to one of ordinary skill in the art, and included within the spirit and scope of the following claims.

Claims

CLAIMS:
1. A method for determining a preferred set of features (131') for classifying content material (291), comprising: augmenting (140) a pool of features (1 10) with combination features (141) that are based on two or more features in the pool of features (110), selecting (120) a first plurality of feature sets from the pool of features (1 10), evaluating (150) each feature set (131) of the first plurality of feature sets to provide a measure of effectiveness (151) of each feature set (131) relative to each evaluated feature set's ability to properly classify previously classified content material (191), and, selecting (120) at least one subsequent plurality of feature sets from the pool of features (1 10) based on the measure of effectiveness of each evaluated feature set (131), and evaluating (150) each feature set (131) of the at least one subsequent plurality of feature sets to provide the measure of effectiveness (151) of each evaluated feature set (131) relative to each feature set's ability to properly classify previously classified content material (191). and, selecting the preferred set of features (131') based on the measure of effectiveness (151) of each evaluated feature set (131).
2. The method of claim 1 , wherein selecting (120) the at least one subsequent plurality of feature sets includes an evolutionary generation (160) of the plurality of feature sets based on the measure of effectiveness (151) of each evaluated feature set (131).
3. The method of claim 1 or 2, wherein evaluating (150) each feature set includes: classifying the previously classified content material (191) via a classifier (240, 320) to provide an evaluation classification (241), and comparing (350) the evaluation classification (241) to a proper classification (192).
4. The method of cla_ m 3, further including training (320) the classifier (240, 320) using each feature set (131) of the first and at least one subsequent plurality of feature sets.
5. The method of claim 1, 2, 3 or 4, wherein the content material (291) includes at least one of: a video program, an audio program, an electronic document, and a web page.
6. A feature set selector (100, 200, 300) that selects a preferred set of features
(131') that facilitates a classification of content material (291), comprising: a feature combination generator (140) that is configured to augment a pool of features (1 10) by forming combination features (141) that are based on two or more features of the pool of features (1 10), a subset selector (120) that is configured to select a plurality of subsets of features (130) from the pool of features (110), a classifier (240, 320) that is configured to classify prior classified content material (191) using each subset of features (131) of the plurality of subsets of features to provide an evaluation classification (241) associated with each subset of features (131), an evaluator (350) that is configured to evaluate each subset of features (131) to provide a measure of effectiveness (151) associated with each subset of features (131) by comparing the evaluation classification (241) associated with each subset of features (131) and a proper classification (192) associated with the prior classified content material (191), and an evolutionary algorithm (160) that is configured to provide selection parameters (161) based on the measure of effectiveness (151) associated with each subset of features (131), and wherein the subset selector (120) is configured to select subsequent pluralities of subsets of features based on the selection parameters (161), and the evolutionary algorithm (160) provides the preferred set of features (131') that facilitates the classification of content material (291) in dependence upon the measure of effectiveness (151) of each subset of features (131).
7. The feature set selector (100, 200, 300) of claim 6, wherein the classifier (240, 320) is configured as a learning system (320), and the evolutionary algorithm (160) and subset selector (130) are further configured to provide the plurality of subsets of features to the classifier (240, 320) to facilitate a training of the learning system (320).
8. A classification system (100, 200, 300) comprising: a feature combination generator (140) that augments a pool of features (110) with combination features (141) that are a combination of other features in the pool of features (110). an evolutionary algorithm (160) that is configured to generate sets of features (131) based on measures of effectiveness associated with other sets of features, and to determine thereby a preferred set of features (131'), and a classifier (240, 320) that is configured to classify content material (291) based on the preferred set of features (131').
9. The classification system (100, 200, 300) of claim 8, wherein the classifier (240, 320) includes a learning system (320) that is trained via the sets of features (131) generated by the evolutionary algorithm (160).
10. The method of any one or more of Claims 1 to 5, the feature set selector (100,
200, 300) of Claim 6 or 7, or the classification system of Claim 8 or 9, wherein the combination features (141) include at least one of: a noun phrase and a verb phrase.
PCT/EP2001/000311 2000-02-07 2001-01-11 Generation and effectiveness evaluation of a multi-feature classification system using genetic algorithms WO2001059610A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020017012812A KR20010113779A (en) 2000-02-07 2001-01-11 Multi-feature combination generation and classification effectiveness evaluation using genetic algorithms
DE60128405T DE60128405D1 (en) 2000-02-07 2001-01-11 CREATING AND EVALUATING THE USEFULNESS OF A CLASSIFICATION SYSTEM BASED ON SEVERAL CHARACTERISTICS USING GENETIC ALGORITHMS
EP01951179A EP1397759B1 (en) 2000-02-07 2001-01-11 Generation and effectiveness evaluation of a multi-feature classification system using genetic algorithms
JP2001558869A JP2003534583A (en) 2000-02-07 2001-01-11 Genetic Algorithm-Based Generation and Classification Effectiveness Evaluation of Multiple Feature Combinations

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/498,882 US6892191B1 (en) 2000-02-07 2000-02-07 Multi-feature combination generation and classification effectiveness evaluation using genetic algorithms
US09/498,882 2000-02-07

Publications (2)

Publication Number Publication Date
WO2001059610A2 true WO2001059610A2 (en) 2001-08-16
WO2001059610A3 WO2001059610A3 (en) 2003-12-24

Family

ID=23982888

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/000311 WO2001059610A2 (en) 2000-02-07 2001-01-11 Generation and effectiveness evaluation of a multi-feature classification system using genetic algorithms

Country Status (7)

Country Link
US (1) US6892191B1 (en)
EP (1) EP1397759B1 (en)
JP (1) JP2003534583A (en)
KR (1) KR20010113779A (en)
AT (1) ATE362141T1 (en)
DE (1) DE60128405D1 (en)
WO (1) WO2001059610A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7130866B2 (en) 2002-07-30 2006-10-31 Koninklijke Philips Electronics N.V. Controlling the growth of a feature frequency profile by deleting selected frequency counts of features of events
US20210125080A1 (en) * 2019-10-24 2021-04-29 International Business Machines Corporation Method and apparatus for enhancing effectivity of machine learning solutions

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0114236D0 (en) * 2001-06-12 2001-08-01 Hewlett Packard Co Artificial language generation
US7139738B2 (en) 2002-06-27 2006-11-21 Koninklijke Philips Electronics N.V. Face recognition using evolutionary algorithms
US20040010480A1 (en) * 2002-07-09 2004-01-15 Lalitha Agnihotri Method, apparatus, and program for evolving neural network architectures to detect content in media information
GB2407657B (en) * 2003-10-30 2006-08-23 Vox Generation Ltd Automated grammar generator (AGG)
US20060212279A1 (en) * 2005-01-31 2006-09-21 The Board of Trustees of the University of Illinois and Methods for efficient solution set optimization
US7529748B2 (en) * 2005-11-15 2009-05-05 Ji-Rong Wen Information classification paradigm
AU2006320692A1 (en) * 2005-11-29 2007-06-07 Google Inc. Detecting repeating content in broadcast media
US8131656B2 (en) * 2006-01-31 2012-03-06 The Board Of Trustees Of The University Of Illinois Adaptive optimization methods
US7979365B2 (en) * 2006-01-31 2011-07-12 The Board Of Trustees Of The University Of Illinois Methods and systems for interactive computing
US7831531B1 (en) 2006-06-22 2010-11-09 Google Inc. Approximate hashing functions for finding similar content
US8019593B2 (en) * 2006-06-30 2011-09-13 Robert Bosch Corporation Method and apparatus for generating features through logical and functional operations
US8411977B1 (en) 2006-08-29 2013-04-02 Google Inc. Audio identification using wavelet-based signatures
EP2297680A4 (en) * 2008-05-01 2013-06-19 Icosystem Corp Methods and systems for the design of choice experiments and deduction of human decision-making heuristics
US8346800B2 (en) * 2009-04-02 2013-01-01 Microsoft Corporation Content-based information retrieval
CN103631802B (en) * 2012-08-24 2015-05-20 腾讯科技(深圳)有限公司 Song information searching method, device and corresponding server
US10439891B2 (en) * 2014-04-08 2019-10-08 International Business Machines Corporation Hyperparameter and network topology selection in network demand forecasting
KR101871940B1 (en) 2014-05-12 2018-06-27 한화에어로스페이스 주식회사 Method and system for establishing predictive model of plant abnormality
EP3288409A4 (en) * 2015-04-26 2019-04-10 Samuel Lightstone Method, device and system for fitness tracking

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897629A (en) * 1996-05-29 1999-04-27 Fujitsu Limited Apparatus for solving optimization problems and delivery planning system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821333A (en) * 1986-08-22 1989-04-11 Environmental Research Inst. Of Michigan Machine learning procedures for generating image domain feature detector structuring elements
US5146406A (en) * 1989-08-16 1992-09-08 International Business Machines Corporation Computer method for identifying predicate-argument structures in natural language text
US5048095A (en) * 1990-03-30 1991-09-10 Honeywell Inc. Adaptive image segmentation system
US5798785A (en) 1992-12-09 1998-08-25 Discovery Communications, Inc. Terminal for suggesting programs offered on a television program delivery system
US5343251A (en) 1993-05-13 1994-08-30 Pareto Partners, Inc. Method and apparatus for classifying patterns of television programs and commercials based on discerning of broadcast audio and video signals
US5410344A (en) 1993-09-22 1995-04-25 Arrowsmith Technologies, Inc. Apparatus and method of selecting video programs based on viewers' preferences
US5479523A (en) * 1994-03-16 1995-12-26 Eastman Kodak Company Constructing classification weights matrices for pattern recognition systems using reduced element feature subsets
US5758257A (en) 1994-11-29 1998-05-26 Herz; Frederick System and method for scheduling broadcast of and access to video programs and other data using customer profiles
US5682206A (en) 1995-09-25 1997-10-28 Thomson Consumer Electronics, Inc. Consumer interface for programming device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897629A (en) * 1996-05-29 1999-04-27 Fujitsu Limited Apparatus for solving optimization problems and delivery planning system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHANG E I ET AL: "Using genetic algorithms to select and create features for pattern classification" -, 17 June 1990 (1990-06-17), pages 747-752, XP010006887 *
JIHOON YANG, VASANT HONAVAR: "Feature Subset Selection Using A Genetic Algorithm" ACM, [Online] vol. 97, no. 02, 3 May 1997 (1997-05-03), pages 1-12, XP002256773 Retrieved from the Internet: <URL:http://citeseer.ist.psu.edu/yang98fea ture.html> [retrieved on 2003-10-07] *
SHETH B ET AL: "Evolving agents for personalized information filtering" PROCEEDINGS OF THE CONFERENCE ON ARTIFICIAL INTELLIGENCE FOR APPLICATIONS. ORLANDO, MAR. 1 - 5, 1993, LOS ALAMITOS, IEEE COMP. SOC. PRESS, US, vol. CONF. 9, 1 March 1993 (1993-03-01), pages 345-352, XP010125585 ISBN: 0-8186-3840-0 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7130866B2 (en) 2002-07-30 2006-10-31 Koninklijke Philips Electronics N.V. Controlling the growth of a feature frequency profile by deleting selected frequency counts of features of events
US20210125080A1 (en) * 2019-10-24 2021-04-29 International Business Machines Corporation Method and apparatus for enhancing effectivity of machine learning solutions

Also Published As

Publication number Publication date
JP2003534583A (en) 2003-11-18
DE60128405D1 (en) 2007-06-21
EP1397759A2 (en) 2004-03-17
ATE362141T1 (en) 2007-06-15
EP1397759B1 (en) 2007-05-09
US6892191B1 (en) 2005-05-10
WO2001059610A3 (en) 2003-12-24
KR20010113779A (en) 2001-12-28

Similar Documents

Publication Publication Date Title
US6892191B1 (en) Multi-feature combination generation and classification effectiveness evaluation using genetic algorithms
US9164994B2 (en) Intelligent default weighting process for criteria utilized to score media content items
CN103052954B (en) Commending system is retrieved based on profile content
US20100070507A1 (en) Hybrid content recommending server, system, and method
US6751776B1 (en) Method and apparatus for personalized multimedia summarization based upon user specified theme
US5410344A (en) Apparatus and method of selecting video programs based on viewers&#39; preferences
US8220023B2 (en) Method for content presentation
US20070288965A1 (en) Recommended program information providing method and apparatus
US20020093591A1 (en) Creating audio-centric, imagecentric, and integrated audio visual summaries
JP2005521144A (en) Recommendation system using multiple recommendation scores
US20050165782A1 (en) Information processing apparatus, information processing method, program for implementing information processing method, information processing system, and method for information processing system
JP2006500859A (en) Commercial recommendation device
CN108228541B (en) Method and device for generating document abstract
Li et al. Leave no user behind: Towards improving the utility of recommender systems for non-mainstream users
GB2566257A (en) System and method for content discovery
EP3369252B1 (en) Video content summarization and class selection
JP2007102489A (en) Program data processor, program data processing method, control program, recording medium, and video recorder, reproduction device and information display device with program data processor
Shafaei et al. Age suitability rating: Predicting the MPAA rating based on movie dialogues
Shafaei et al. Rating for parents: Predicting children suitability rating for movies based on language of the movies
EP2151799A1 (en) Recommander method and system, in particular for IPTV
Grimaldi et al. Experimenting with music taste prediction by user profiling
EP1183622A1 (en) Determining a distribution of a numeric variable
CN110717064A (en) Personalized audio play list generation method and device and readable storage medium
Fernandes et al. Unification of hdp and lda models for optimal topic clustering of subject specific question banks
Lavania et al. A practical online framework for extracting running video summaries under a fixed memory budget

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWE Wipo information: entry into national phase

Ref document number: 2001951179

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2001 558869

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020017012812

Country of ref document: KR

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1020017012812

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2001951179

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 2001951179

Country of ref document: EP