US 6226408 B1 Abstract A system, method, and software product provide for unsupervised identification of complex, nonlinear subspaces in high dimensional data. The system includes a vector quantization module, a weighted topology representing graph module, and an encoding module. The vector quantization module takes vector data inputs and extracts a group of inputs about a number of cluster centers, using a globally optimized clustering process. The weighted topology representing graph module creates a weighted graph of the vector space, using the cluster centers as nodes, weighting edges between nodes as a function of the density of the vectors between the linked nodes. The encoding module uses the weighted graph to recode the input vectors based on their proximity to the cluster centers and the connectedness of the graph. The recoded vectors are reinput into the vector quantization module, and the process repeated until termination, for example at a limited number of cluster centers. Upon termination, the clusters thus identified may be highly nonlinear in the original data space.
Claims(13) 1. A computer implemented method for unsupervised identification of associated sets of data items in a collection of data items without using labels encoded in the data items, the method comprising the operations of:
representing each data item by an input vector having a plurality of components representing portions of the data of the data item;
quantizing the plurality of input vectors by associating each input vector with a closest one of a number of cluster centers, the number of cluster centers less than the plurality of data items, each of the input vectors contributing to each cluster center;
linking the cluster centers with edges to form a graph, each edge between two cluster centers weighted according to a density of the input vectors between the two cluster centers;
encoding each input vector as an encoded vector having a coded vector component for each cluster center, each vector component determined as a function of a distance between the input vector and the respective cluster center, and a distance between the respective cluster center and a cluster center nearest the input vector;
repeating the operations of quantizing, linking, and encoding using the encoded vectors as the input vectors until a termination condition is satisfied; and
for each encoded vector remaining after the termination condition is satisfied, labeling the data item associated with the encoded vector with a label associated with nearest cluster center.
2. The method of claim
1, wherein quantizing the plurality of input vectors further comprises:adjusting each cluster center as a function of a distance of the cluster center to each of the input vectors.
3. The method of claim
1, wherein quantizing the plurality of input vectors further comprises:selecting a number of the input vectors as cluster centers; and
updating each cluster center to a mean of its distance to each input vector.
4. The method of claim
3, wherein updating each cluster center to a mean of its distance to each input vector further comprises:for each input vector:
determining a distance of the input vector to each cluster center;
ranking the cluster centers by their distance to input vector;
updating a partial sum vector of each cluster center using the input vector and the ranking to the cluster center;
updating an exponentially decayed contribution of the input vector to each cluster center using the ranking of the cluster center; and
updating each cluster center to the mean of the partial sum vectors for the cluster center.
5. The method of claim
1, wherein linking the cluster centers with edges to form a graph, each edge between two cluster centers weighted according to a density of the input vectors between the two cluster centers, further comprises:for each edge between two cluster centers, weighting the edge as a function of a number of input vectors for which the two clusters are the closest cluster centers to each of the number of input vectors.
6. The method of claim
1, wherein linking the cluster centers with edges to form a graph, each edge between two cluster centers weighted according to a density of the input vectors between the two cluster centers, further comprises:for each edge between two cluster centers:
determining a number of input vectors for which the two cluster centers are the closest cluster centers to each of the input vectors;
establishing a weight of the edge as an inverse of the determined number of input vectors.
7. The method of claim
1, wherein encoding each input vector as an encoded vector further comprises:encoding each input vector with an output code:
where
O
_{k }is the output code for a k-th cluster center; r
_{k }is a distance rank of the k-th cluster center to the input vector being encoded; α is a weighting parameter;
p
_{kq }is a length of a shortest path in the graph from a cluster center closest to the input vector being encoded to the k-th cluster center; and τ is a decay parameter.
8. The method of claim
1, wherein the distance between the input vector and the respective cluster center vector is a distance rank, and the distance between the respective cluster center and the cluster center nearest the input vector is a distance rank.9. A computer implemented method for identifying clusters of associated data items in a database containing N data items, without using encoded labeling information in the data items, the method comprising the operations of:
representing each data item by an input vector V having a plurality of components representing portions of the data of the data item;
selecting a subset of K input vectors as cluster centers C, where K<N;
updating each cluster center C
_{k }(k=1 to K) as a function of a distance of each vector V_{n }(n=1 to N) to the cluster center C_{k}; linking the cluster centers C with edges to form a graph, each edge between two cluster centers C
_{i }and C_{j }weighted according to a density of the vectors V between the two cluster centers C_{i }and C_{j}; encoding each input vector V with an output code O
_{k }for each cluster center C_{k }to form an encoded vector V′, the output code O_{k }for each cluster center C_{k }determined as function of a distance of the cluster center C_{k }to the input vector V and a shortest path from the cluster center C_{k }to a cluster center C_{o }nearest the input vector V; repeating the selecting, linking, and encoding operations using the encoded vectors V′ as the input vectors V until a termination condition is satisfied; and
for each encoded vector V′, labeling the data item represented by the encoded vector V with a label associated with the nearest cluster center C
_{o}. 10. A computer implemented method for unsupervised identification of associated sets of data items in a collection of data items without use of labeling information encoded in the data items, the method comprising the operations of:
representing each data item by an input vector having a plurality of components representing portions of the data of the data item;
quantizing the plurality of input vectors by associating each input vector with a closest one of a number of cluster center vectors, the number of cluster center vectors less than the plurality of data items, each of the input vectors contributing to each cluster center vector;
forming a graph by:
associating pairs of cluster center vectors with an edge, such that each cluster center vector is associated with at least one other cluster center vector; and
weighting each edge between a pair of cluster center vectors as a function of a number of input vectors to which the pair of cluster center vectors are the closest cluster center vectors;
encoding each input vector as an encoded vector having a coded vector component for each cluster center vector, each vector component determined as a function of a distance of the input vector to the respective cluster center vector and a shortest path in the graph from the respective cluster center vector to a cluster center vector nearest the input vector;
repeating the operations of quantizing, forming, and encoding using the encoded vectors as the input vectors until a termination condition is satisfied; and
for each encoded vector remaining after the termination condition is satisfied, labeling the data item associated with the encoded vector with a label defined for the cluster center vector nearest the encoded vector.
11. A computer implemented method for unsupervised identification of associated sets of data items in a collection of data items without use of labeling information encoded in the data items, the method comprising the operations of:
representing each data item by an input vector having a plurality of components representing portions of the data of the data item;
quantizing the plurality of input vectors by associating each input vector with one of a number of cluster center vectors, the number of cluster center vectors less than the plurality of data items, each cluster center vector having components determined as a function of a distance of the cluster center vector to each of the plurality of input vectors;
linking pairs of cluster center vectors with edges to form a graph, each edge between a pair of cluster center vectors weighted as a function of a number of input vectors to which the pair of cluster center vectors are the closest cluster center vectors;
encoding each input vector as an encoded vector having a vector component for each cluster center vector, each vector component determined according to a relationship between the input vector and the cluster center vector and a relationship between the cluster center vector and a cluster center vector closest to the input vector;
repeating the operations of quantizing, linking, and encoding using the encoded vectors as the input vectors until a termination condition is satisfied; and
for each encoded vector remaining after the termination condition is satisfied, labeling the data item associated with the encoded vector with a label defined for the cluster center vector nearest the encoded vector.
12. A computer system for unsupervised identification of associated sets of data items in a collection of data items without use of labeling information encoded in the data items, the system comprising:
a database containing a plurality of data items, each data item associated with a vector having a plurality of components representing portions of the data of the data item;
a quantizing module coupled to the database to receive the plurality of input vectors and associate each input vector with one of a number of cluster center vectors, the number of cluster center vectors less than the plurality of data items, each cluster center vector having components determined as a function of a distance of the cluster center vector to each of the plurality of input vectors;
a graph module coupled to the quantizing module to form a graph by linking pairs of cluster center vectors with edges, and to weight each edge between a pair of cluster center vectors as a function of a number of input vectors to which the pair of cluster center vectors are the closest cluster center vectors;
an encoding module coupled to the graph module and the database to encode each input vector as an encoded vector having a vector component for each cluster center vector, and to determine each vector component according to a relationship between the input vector and the cluster center vector and a relationship between the cluster center vector and a cluster center vector closest to the input vector; and
an executive module that iteratively inputs the encoded vectors to the quantizing module as the input vectors until a termination condition is satisfied, and that labels the data item associated with the encoded vector with a label defined for the cluster center with which the encoded vector is associated.
13. A computer system for unsupervised identification of associated sets of data items in a collection of data items without use of labeling information encoded in the data items, the method comprising the operations of:
means for representing each data item by an input vector having a plurality of components representing portions of the data of the data item;
means for quantizing the plurality of input vectors by associating each input vector with a closest one of a number of cluster center, the number of cluster center less than the plurality of data items, each of the input vectors contributing to each cluster center;
means for linking the cluster centers with edges to form a graph, each edge between two cluster centers weighted according to a density of the input vectors between the two cluster centers;
means for encoding each input vector as an encoded vector having a coded vector component for each cluster center vector, each vector component determined as a function of a distance between the input vector and the respective cluster center vector, and a distance between the respective cluster center and a cluster center nearest the input vector;
means for repeating the operations of quantizing, forming, and encoding using the encoded vectors as the input vectors until a termination condition is satisfied; and
means for labeling a data item associated with an encoded vector remaining after the termination condition is satisfied, with a label associated with nearest cluster center to the encoded vector.
Description 1. Field of Invention The present invention relates generally to data processing systems, and more particularly to data processing systems for data mining, pattern recognition, and data classification using unsupervised identification of complex data patterns. 2. Background of the Invention Statistical classification (or “learning”) methods attempt to segregate bodies of data into classes or categories based on objective parameters present in the data itself. Generally, classification methods may be either supervised or unsupervised. In supervised classification, training data containing examples of known categories are presented to a learning mechanism, which learns one more sets of relationships that define each of the known classes. New data may then be applied to the learning mechanism, which then classifies the new data using the learned relationships. Examples of supervised classification systems include linear regression, and certain types of artificial neural networks, such as backpropagation and learning vector quantization networks. Because supervised learning relies on the identification of known classes in the training data, supervised learning is not useful in exploratory data analysis, such as database mining, where the classes of the data are to be discovered. Unsupervised learning is used in these instances to discover these classes and the parameters or relationships that characterize them. Unsupervised classification attempts to learn the classification based on similarities between the data items themselves, and without external specification or reinforcement of the classes. Unsupervised learning includes methods such as cluster analysis or vector quantization. Cluster analysis attempts to divide the data into “clusters” or groups that ideally should have members that are very similar to each other, and very dissimilar to members of other clusters. Similarity is then measured using some distance metric which measures the distance between data items, and clusters together data items that are closer to each other. Well-known clustering techniques include MacQueen's K-means algorithm, and Kohonen's Self-Organizing Map algorithm. One of the as of yet unattained goals of unsupervised learning has been a general learning method that could be applied in multiple stages to discover increasingly complex structures in the input. A specific case of this problem is that of identifying, or separating, highly nonlinear subspaces of data, perhaps surrounded by noise. The unsupervised identification of nonlinear subspaces of data is an important problem in pattern recognition and data mining. Many data analysis problems reduce to this identification problem; examples include feature extraction, hierarchical cluster analysis and transformation-invariant identification of patterns. Although work in this field dates back to the 1930s, good solutions exist only for cases where the data occur in approximately linear subspaces, or in cases where the data points group into well-separated, disjoint classes. Instead, if the data points lie along very nonlinear subspaces and is clouded by noise, conventional unsupervised methods fail. In particular, such nonlinear subspaces are more likely in multidimensional real-world data. Accordingly, it is desirable to provide a system and method of unsupervised learning that is particularly suited to identifying non-linear clusters of data in highly dimensional data, and thus suited for data exploration, such as database mining, or feature extraction, and similar applications. The present invention overcomes the limitations of conventional unsupervised and supervised learning methods and systems by providing for unsupervised learning of non-linear subspaces in complex, high dimensional data. The present invention uses a hierarchical combination of clustering/vector quantization and data encoding based on proximity and connectedness of the data distribution. Vector quantization partitions the input data distribution into regions; data encoding expresses relationships between connected regions, and enables these relationships to influence further vector quantization until the number of connected regions in the data reaches a desired number. The present invention accepts data that is organized as an unordered collection of vectors that is unlabeled with respect to some variable of interest, and outputs a clustering of such data into relatively disjoint clusters, even when there is considerable noise present in the data. The present invention constructs a hierarchical layering of clusters, with each layer successively building increasingly complex approximations to the data space. Each layer in the hierarchy takes its input from the previous layer (with the first receiving the data), learns a graph-based representation of its input, and re-encodes the input data using distance and proximity features of the graph into an encoded form, which is input into the next layer for processing. The repeated processing of the data results in the forming of increasingly larger (more data items) clusters which are not necessarily linear in the data space. In a preferred implementation, the present invention uses three separate processes. First, a robust vector quantization process is used to distribute cluster centers throughout the input space. The vector quantization process provides a globally optimal distribution by positioning each cluster center as a function of the proximity of all of the data items being input, not merely those that are closest according to some distance metric. Second, a topological graph construction process links up the cluster centers into a graph that approximates the topology of the input space. The graph models both the distribution of the cluster centers, and the density of the input vectors between the cluster centers. Third, an encoding process encodes the input vectors using the topological graph, using path lengths along the graph and the rank-ordered distances from cluster centers to each input vector to generate an output code for each cluster. This encoded vector set is used as the input set on a subsequent pass through the data. The present invention is particularly suited to large data mining operations in multidimensional real-world data. These include pattern recognition, data compression, and invariant feature extraction for representing natural data in symbolic form. FIG. 1 is a flowchart illustrating the unsupervised identification process according to an embodiment of the present invention. FIG. 2 is an illustration of the two spiral problem. FIG. 3 is an illustration of the result of vector quantization of the two spirals data set. FIGS. 4 FIGS. 5 FIGS. 6 FIGS. 7 FIGS. 8 FIG. 9 is a high-level block diagram illustrating a system for unsupervised identification of data vectors. FIG. 10 is an illustration of a weighted topology representing graph. Referring now to FIG. 1, there is shown a flowchart of a process The unsupervised identification process In one embodiment, the data collection is a collection of credit card transactions, each credit card transaction is represented by a vector, with vector components for the date, amount, standard industry codes, authorization codes, and any other quantitative measures of transactional behavior of the card holder. In another related embodiment, each data item is an account of a transacting entity, such as a credit card holder, merchant, point of sale device, or the like. A vector for such a data item includes current account balance, average rolling balances for past 30, 60, or 90 days, high balance amount, number and amount of late payments or overdrafts, and any other quantitative measures derived from raw transactions generated by the transacting entity. Raw transaction data may also be used. In a further embodiment, data items are medical insurance or worker compensation claims, with vector components for claim amount, injury type, medical procedure code for current treatment applied, prior treatment codes, hospital and doctor identifying codes, and so forth. In yet another embodiment for pattern recognition and image analysis, the data items are bitmapped images, and the vector components may include color histograms, texture parameters, luminance maps, and other derived data values. While the vectors may contain certain labeling or classifying information (e.g. zip code, SIC code, etc.), they are unlabeled or unclassified with respect to some independent variable of interest. Continuing the above example of credit card or medical claim transactions, the variable of interest may be whether a transaction is fraudulent or non-fraudulent. The particular variable of interest may have known values (e.g. fraudulent/ non-fraudulent) or may have unknown values which are discovered by the unsupervised identification process. With respect to a particular variable of interest, the data vectors are by their nature grouped into m relatively dense or cohesive clusters or data manifolds Mj, j=0 . . . m−1. These clusters may be highly nonlinear, perhaps wrapping or folding around each other throughout the D dimensional vector space. In addition, there may be background noise that lies between the manifolds so that strict boundaries do not exist between. However, it is assumed that the manifolds remain sufficiently distinct, and have few intersections, that is, there are relatively few, e.g., less than 5-10%, of the vectors that are common to multiple manifolds. The present invention solves the problem of identifying to which manifold or cluster each data vector belongs, and thereby providing a basis for appropriate labeling of the data vectors. FIG. 2 illustrates a simple example of the classification problem solved by the present invention. In this illustrative example, the data items are 2-dimensional vectors densely distributed in two intertwined spiral distributions, and surrounded by uniformly random noise of low density. In the example of FIG. 2, there are 10,000 points, of which 3,976 are associated together to form the inner spiral The present invention provides a process that is applied repeatedly, in a hierarchy of stages, to extract increasingly larger scale clusters of vectors from the initial set of inputs vectors V. Generally, each stage, or layer in the hierarchy takes as it input a set of vectors from the previous layer, encodes a representation of the input vectors, and re-encodes the input vectors for processing by the next layer. The present invention entails three processes operating at each layer. Referring to FIG. 1, first, a vector quantization process The set of encoded vectors At some point a termination condition is satisfied The present invention is useful for identifying clusters of data in real-world data mining applications. For example, in credit card or medical claim analysis, a typical data item such as a credit card account or medical claim account may be represented by 200 to 400 different variables, most of which are derived from transactional data. In such applications there tends to be complex nonlinear clusters of data. For highly nonlinear clusters, it is preferred to use a robust, globally-optimal vector quantization algorithm. Traditional algorithms such as K-Means are inadequate because of their high degree of sensitivity to initial conditions and because of slow convergence properties. Although several better variants have been proposed (e.g. Linde, et. al 1980, Jain 1988, Ng and Han 1994, Ester et. al 1996), most have a significant heuristic component, and often do not work well in high-dimensional spaces. In our preferred embodiment, the vector quantization process is a batched extension of the Neural Gas algorithm of Martinez et al. 1993. Neural Gas provably optimizes the same mean-squared distance cost function as the K-means algorithm, while achieving near-globally optimal solutions. However, Neural Gas is not designed to work efficiently in high-dimensional data intensive environments. The batched extension, here termed Batch Neural Gas (BaNG), can converge in O(log K) passes through the data, increases robustness in high-dimensional spaces, and eliminates learning parameters, making it simpler to implement and use in practice than the original Neural Gas algorithm. The Batch Neural Gas algorithm is best understood as a generalization of K-means. To clarify the similarities and differences, is it helpful to summarize K-means. Essentially, K-Means determines for each vector which cluster center it is closest to, and then updates each cluster center to the mean of the vectors for which that cluster center is the closest. Thus, K-means only uses a portion of the vectors to update the cluster centers, ignoring the influence of any vector which is closer to another cluster center. Batch Neural Gas takes into account the location of all input vectors when updating the cluster centers. In an epoch of the BaNG algorithm, each cluster center is updated using all the input vectors, unlike K-Means which uses only the closest. For a given cluster center, this is accomplished as follows: For each data point {right arrow over (V)} is called the co-efficient of vector {right arrow over (V)} A preferred implementation of a vector quantization process 1. Randomly select a set of K data vectors {right arrow over (V)} 2. Allocate an array of K vectors {right arrow over (S)} to hold partial contributions for the cluster center updates; initialize this array to zero. Thus, each vector {right arrow over (S)} 3. Allocate an array of K scalar values T to hold the total co-efficient contributions of each vector to each cluster center; initialize this array to zero. Thus, each element T 4. Initialize the decay parameter λ to K, the number of cluster centers in the current level of the hierarchy 5. Now loop through all of the data vectors V. For each data vector {right arrow over (V)} 5.1. Determine the distance from {right arrow over (V)} 5.2. Sort the distances in ascending order, and assign each cluster center C 5.3. Associate the current vector {right arrow over (V)} 5.4. For each of the K partial sum vectors {right arrow over (S)} where r 5.5. For each of the K cluster centers C 6. Update each cluster center C This operation normalizes the location of the cluster center in the vector space, but accounts for the influence or contribution of all of the input vectors V, and not merely those that are closest to the cluster center, as in K-means. This ensures that the cluster centers are globally optimized, rather than merely locally optimized. 7. Determine if λ>=ε, where ε is decay threshold, preferably 0.1 to 0.5. If so, then terminate. Otherwise, divide λ by two, reset T and S to zero, and go to step 5. To understand how the BaNG algorithm works, it is helpful to examine what cluster centers are produced when λ is infinitely large or infinitesimally small. When λ→∞, all vectors contribute equally to updating all the cluster centers C Accordingly, the decay parameter λ provides a mechanism to gradually guide the vector quantization process FIG. 3 shows the result of vector quantization process of the two-spirals data set. The vector quantization process Weighted Topology Representing Graph The weighted topology representing graph (“WTRG”) process Generally, the objective of constructing a weighted topology representing graph A simple algorithm for creating a topology representing graph includes taking each vector {right arrow over (V)} A preferred implementation of a method for constructing a weighted topology representing graph is as follows: 1. Loop over all vectors V. For each data vector V 2. Find two closest cluster centers C 3. Test whether there is an existing edge between C 3.1. If so, increment a density count d 3.2. If not, create an edge between C 4. For each edge between two cluster centers C 5. Normalize weights W. Divide each edge weight W In the normalized graph, an edge that has average point density has a weight of 1.0, edges with higher densities have lower weights, and those with lower densities have higher weights. FIG. 10 illustrates a weighted topology representing graph Between any two cluster centers in a normalized graph, a shortest path may be computed, using the weights as a proxy for edge length, rather than the actual Euclidean distance between the cluster centers. The length of the shortest path through the graph between any two cluster centers then is a measure of how well the cluster centers are connected through regions of high density: the shorter the path, the more tightly connected they are and vice versa. This observation forms the basis for the next step in the overall process to identify nonlinear data clusters, viz., encoding of the input vectors to represent neighborhood relationships in the graph Encoding Process The key objective of encoding In a preferred embodiment, for each input vector {right arrow over (V)} One preferred embodiment of the encoding process 1. Compute All Pairs Shortest Paths between each pair of cluster centers C 2. Loop over the input vectors V. For each data vector V 2.1. Find the Euclidean distance of vector V 2.2. Sort the distances in ascending order and assign each cluster center C 2.3. For each cluster center C The output codes are floating point values from 0.0 to 1.0. Note that the output codes that now characterize the input vectors are wholly distinct from the originally data, but rather represent abstractly the relationship of the example codes input vector to the distribution and continuity of clusters in the original data. Through the parameter α the encoding process The K dimensional output vectors V′ produced by the encoding process To further explain how the present invention provides for unsupervised identification, it is helpful to examine how the identification process In the WTRG process To understand the encoding process When the second layer of the hierarchy is trained using the encoded vector set As each epoch of identification process As additional epochs of processing are executed, the noise vectors finally begin to be linked up into a single cluster. FIGS. 7 FIGS. 8 If necessary, the noise vectors could have been removed using a simple filter based on thresholding the input distance to cluster centers. Such filters are preferably executed between the encoding process Referring now to FIG. 9 there is shown an illustration of a system architecture for implementing an embodiment of the present invention. The system includes a computer Stored in memory of the computer The present invention may be usefully applied to a number of different applications to identify clusters of data without supervision. These various embodiments are now described. In a first preferred embodiment, system Where the transacting entity is a consumer with a credit card account, the transaction data items are individual credit transactions. One useful set of variables for credit card transaction data items includes: Expiration date for the credit card; Dollar amount spent in each SIC (Standard Industrial Classification) merchant group category during a selected time period; Percentage of dollars spent by a customer in each SIC merchant group category during a selected time period; Number of transactions in each SIC merchant group category during a selected time period; Percentage of number of transactions in each SIC merchant group category during a selected time period; Categorization of SIC merchant group categories by fraud rate (high, medium, or low risk); Categorization of SIC merchant group categories by customer types (groups of customers that most frequently use certain SIC categories); Categorization of geographic regions by fraud rate (high, medium, or low risk); Categorization of geographic regions by customer types; Mean number of days between transactions; Variance of number of days between transactions; Mean time between transactions in one day; Variance of time between transactions in one day; Number of multiple transaction declines at same merchant; Number of out-of-state transactions; Mean number of transaction declines; Year-to-date high balance; Transaction amount; Transaction date and time; Transaction type. Preferably, these variables are processed to generate various averages or rates of change before being applied to the processing methods of the present invention. Those of skill in the art will appreciate that a subset of these variables may used, or other variables may be substituted in this application. Where the transacting entity is a merchant with a merchant account and the transaction data items are credit card charges, the transaction data items may be similar, and include variables particular to the merchant. Where the transacting entity is a point of sale device, such as terminal at a retail outlet, the transaction variables would include variables particular to both the transaction, such as the item being purchased, the amount, department, date, time, and so forth, and the current status of the point of sale device, such as device ID, location, current bill and change count, user identification of current operator, and the like. In one embodiment the transaction data stored in the database In this embodiment then, the process In pattern-recognition applications, the method can be utilized to learn a hierarchy of matched filters that are attuned to the statistics of the input data. The present invention may also be used for invariant feature extraction and solve the problem of converting natural sensory data into high-level symbolic form. For example, consider the problem of learning invariances in images. A given image defines a single point in the high-dimensional pixel space. The smooth transformations of this image defines a continuous manifold in this space: e.g. smoothly rotated versions of the image would define a nonlinear curve in the pixel space. Such a manifold can be learned by our algorithm, thereby representing the invariance manifold of the image. If a collection of images, with several examples of such transformations are used as the training input, one could in principle separate out the individual manifolds and classify images invariant to these transformations. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |