WO2001003053A1 - Visualization method and visualization system - Google Patents

Visualization method and visualization system Download PDF

Info

Publication number
WO2001003053A1
WO2001003053A1 PCT/FI2000/000603 FI0000603W WO0103053A1 WO 2001003053 A1 WO2001003053 A1 WO 2001003053A1 FI 0000603 W FI0000603 W FI 0000603W WO 0103053 A1 WO0103053 A1 WO 0103053A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
predictive
model
visualization
distribution
Prior art date
Application number
PCT/FI2000/000603
Other languages
French (fr)
Inventor
Petri Tapani Kontkanen
Jussi Mika Antero Lahtinen
Petri Jukka MYLLYMÄKI
Tomi Viljam Silander
Henry Rainer Tirri
Kimmo Antero Valtonen
Original Assignee
Bayes Information Technology Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bayes Information Technology Ltd. filed Critical Bayes Information Technology Ltd.
Priority to EP00944080A priority Critical patent/EP1206752A1/en
Priority to US10/019,477 priority patent/US6873325B1/en
Priority to AU58316/00A priority patent/AU5831600A/en
Publication of WO2001003053A1 publication Critical patent/WO2001003053A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods

Definitions

  • the present invention relates to computerized system modeling, and more particularly to a method transforming a high-dimensional data domain into a low- dimensional visual representation. Specifically, the invention is directed to such a method as described in the preamble of claim 1.
  • Computer visualization tools are needed for presenting the results of ever increasing amounts of processed data.
  • the conventional approach is to take some few variables at a time, process them and their relations, for example, with a spreadsheet, and display the result, for example, as bar charts and die charts.
  • this conventional approach produces typically a great number of charts, with a very weak connection to each other.
  • the charts are typically presented in as a sequence of charts. From such a sequence of charts it is usually very difficult to see and comprehend the overall significance of the results.
  • the data is processed instead of a spreadsheet with more elaborate techniques, such as statistical methods or neural networks, but the results are still typically presented in sequential form using conventional charts.
  • a term data vector having a certain number of components refers to a data point having a certain number of attributes.
  • the attributes/components may have continuous or discrete numerical values or they can have ordinal or nominal values.
  • the data vectors are vectors of a data domain or a data space.
  • high-dimensional data vectors are displayed using typically a two- or three-dimensional device.
  • a corresponding visualization vector having usually two or three coordinates, which determine the location of a point representing the data vector on the display device, is determined typically for each data vector.
  • Another even more significant limitation concerns the use of more elaborate conventional data dimension reduction methods that can be used to define a visualization vector for a data vector.
  • the goal is to replace the original high- dimensional data vectors with much shorter vectors, while losing as little information as possible. Consequently, a pragmatically sensible data reduction scheme is such that when two data vectors are close to each other in the data space, the corresponding visualization vectors are also close to each other in the visualization space.
  • the closeness of data vectors in the data space is in these methods defined via a geometric distance measure such as the Euclidean distance.
  • the attributes of the data can be various and heterogeneous, and therefore various dimension of the data space can have different scaling and meaning.
  • the geometric distances between the data vectors do not properly reflect the properties of complex data domains, where the data typically is not coded in a geometric or spatial form.
  • changing one bit in a vector may totally change the relevance of the vector, and make it in some sense a quite different vector, although geometrically the difference is only one bit.
  • a geometric distance metric for example the Euclidean distance metric, is a difficult task.
  • all attributes are treated as equal.
  • a third significant limitation is related to data mining.
  • Data mining is a process that uses specific techniques to find patterns in data, allowing a user to conduct a relatively broad search in databases for relevant information that may not be explicitly stored in the data.
  • a user initially specifies a search phrase or strategy and the system then extracts patterns and relations corresponding to that strategy from the stored data. It usually takes some time for extracting the patterns, and therefore the extracted patterns and relations are presented to the user by a data analyst with a delay. The probably invoked new requests cause a new processing cycle with a relatively long time delay.
  • a data visualization tool/method that visually approximates in one instance the whole data domain although it includes a large number of variables.
  • a tool/method where the results of the data mining process are visualized instantly and the data mining process is typically carried out in one session.
  • An object of the invention is to realize a flexible visualization method.
  • a further object of the invention is to realize a method, which is able to handle heterogeneous data straightforwardly and enables the visualization of heterogeneous data.
  • Objects of the invention are achieved by constructing a set of probabilistic models, generating predictive distributions from this set of probabilistic models, and deteirnining visualization vectors corresponding to the data vectors using the predictive distributions.
  • the method according to the invention is a method for generating visual representations of multidimensional data domains, which method comprises the steps of:
  • the present invention is a method for transforming a multivariate data domain into a visual low-dimensional representation.
  • the method utilizes probabilistic models of the data domain.
  • a probabilistic model is a model, which associates with each point of the data domain a certain probability.
  • there may be a certain set of predetermined models and the construction of a set of probabilistic models for a certain visualization process may mean, for example, the selection of models describing the data domain from the set of predetermined models.
  • the selection of models, or more generally the construction of models can involve the use of a tiaining data set, some expert knowledge of the data domain and/or some logical constraints.
  • the multidimensional space is converted to a low- dimensional space using a transformation, which maps each data vector in the domain space to a vector in a visual space having a lower dimension.
  • the visual space typically has one, two or three dimensions.
  • the transformation is such that when two vectors are close to each other in the domain space, the corresponding vectors in the visual space are also close to each other.
  • usually an Euclidean distance is used to define the distance between vectors in the visual space, and the distance between vectors in the domain space is typically defined using at least one predictive distribution derived from the constructed probabilistic model. At least one of the constructed models is thus directly used in the visualization process to produce the predictive distribution(s).
  • the set of probabilistic models may consist of one or more probabilistic models.
  • the set of predictive distributions may consist of one or more predictive distributions. If more than one predictive distributions are generated, they may relate to one or more of the constructed probabilistic models. It is, for example, possible to have one constructed model and derive two predictive distributions from said model. A second example is to have two constructed models and two predictive distributions, where a first predictive distribution relates to one constructed model and a second predictive distribution relates to the other constructed model.
  • the predictive distribution is used as input to the visualization process, where the visualization vectors corresponding to the data vectors are calculated.
  • the predictive distribution can, for example, be used in estimating how close two data vectors are to each other.
  • similarity of data vectors or, in other words, distance between data vectors
  • Two data vectors in the domain space may be considered similar if they lead to similar predictions, when the data vectors are given as inputs to the constructed model.
  • a first instance of the predictive distribution relating to a first data vector in the domain space is calculated, and a second instance predictive distribution relating to a second data vector in the domain space is calculated.
  • the distance between the first and the second data vector in the domain space depends on the similarity of the first and second instances of the predictive distribution, in other words it depends on the distance between the first and second instances of the predictive distribution.
  • Various distance metrics where the distance between data vectors is determined using instances of the predictive distribution, are discussed in the detail description of the invention.
  • the predictive distribution corresponding to a data vector is typically a predictive distribution conditioned with the values of some components of the data vector.
  • the data attributes, whose values are not used as conditions, are called target attributes.
  • a method according to the invention may thus be a supervised data visualization method. This is very useful, for example, when a user knows in which data attributes he is interested in and can select these attributes as target attributes.
  • the visualization method according to the invention is an unsupervised method.
  • an unsupervised visualization method When an unsupervised visualization method is used, the user does not have to select any data attribute as target attribute. This is an advantage, for example, when among the data attributes there is no natural candidate for the target attribute. It is possible, for example, to make an unsupervised visualization work automatically, so that constructs the probabilistic model(s) using the data and then visualizes the data without a user tervening the visualization.
  • a visual representation of the data domain is generated using the determined visual locations.
  • a method according to the invention is very suitable for data mining, where domain experts try to capture interesting regularities from the visual image. Because at least one predictive distribution is used in dete ⁇ nining the visual locations, visualization according to the invention often efficiently reveals hidden structures in the data. In data mining, it is furthermore possible to view visualizations that relate to various target attribute sets, i.e. to various predictive distributions.
  • At least one probabilistic model is constructed and it may be stored for further use.
  • the probabilistic model is a Bayesian model, it is quite straightforward to produce predictive distributions using the probabilistic model.
  • the present invention provides procedures for visually displaying and manipulating multi-dimensional data with, for example, the following advantages.
  • Data visualization can be simplified as the visualization result is typically a two or three- dimensional plot.
  • Information can be synthesized from data, as the visualization results may reveal hidden structures of the data, and at least partly as a result of the revealed structures, decision making can be simplified.
  • Trends and data relationships can be more easily visualized and uncovered, for example, using various colors and/or markers are used to mark different attribute values in the visual representation.
  • report generation can be simplified, and data administration can be performed more easily and understandably when one understands the domain better.
  • the invention relates also to a visualization system, which comprises means for receiving data to be visualized, and which is characterized in that it further comprises
  • Figure 1 illustrates examples of visualization results produced by a method according to a first advantageous embodiment of the invention
  • Figure 2 illustrates first visualization results produced by a method according to the first advantageous embodiment of the invention and second visualization results produced using a conventional visualization method
  • Figure 3 illustrates examples of visualization results produced by a method according to a second advantageous embodiment of the invention.
  • Figure 4 illustrates a diagram of a system, which is an example of a system according to the present invention.
  • letter M refers to a probabilistic model, which associates with each point of the data domain a certain probability.
  • the model M relates to a probability distribution P(X ⁇ ,..., X travel ⁇ M) on the space of possible data vectors x, where a data vector has n attributes/components X t .
  • a typical example of a probabilistic model is a parametric model where M is the structure of the model and ⁇ represents the parameters of the model. In this case, each parameterized instance (M, ⁇ ) of the parametric model produces a probability distribution P(X X , ... , X travel ⁇ M, ⁇ ).
  • a probabilistic model used in a method according to the invention may be a supervised model or an unsupervised model.
  • a supervised model means that, for example, one of the data attributes is selected as a class attribute, which is the focus of the visualization. In supervised models, the target attributes are thus typically selected already when the model is constructed. In unsupervised models it is not necessary to decide the target attributes when the model is constructed; they can be selected when the distances between the data vectors are determined.
  • the probabilistic model M used in a method according to the invention may belong to a family of models known as Bayesian (belief) network models.
  • a Bayesian network is a representation of a probability distribution over a set of (typically) discrete variables, consisting of an acyclic directed graph, where the nodes correspond to domain variables, and the arcs define a set of independence assumptions which allow the joint probability distribution for a data vector to be factorized as a product of simple conditional probabilities.
  • Bayesian network models see e.g., (Pearl, 1988).
  • One example of a Bayes network model which can be used in a method according to the invention, is the naive Bayes model.
  • the naive Bayes model is a supervised model, where one of the data attributes is selected as a class variable.
  • a description of the naive Bayes model can be found, for example, in (Kontkanen, Myllymaki, Silander, Tirri, 1998).
  • a further example of a probabilistic model usable in a method according to the invention is a model belonging to a family of mixtures of Bayesian network models.
  • a mixture of Bayesian network models is a weighted sum of several Bayesian network models.
  • a training set of sample data, or many training sets from one or more data sources, can be used in constructing the probabilistic model(s).
  • construction of a model refers to selecting a suitable model structure and suitable parameters for the selected model structure. Theoretically justifiable techniques for learning models from sample data are discussed in (Heckerman, 1996). It is also possible to use, alternatively or in addition to a framing set, further information about the data domain. For example, the model construction may be based at least partly on knowledge about the problem domain represented as prior distributions and/or as logical constraints.
  • a framing set When a framing set is used, it is possible to use, for example, part of the data to be visualized as a fraining set and still use the whole data in the visualization process. In other words, it is possible that the fraining set is a subset of the data to be visualized. Furthermore, it is possible that data to be visualized is a subset of the fraining set or that the fraining set consists of the data to be visualized.
  • a predictive distribution may be a conditional distribution for one or more of the domain attributes X t given the other attributes.
  • X ⁇ x l5 ..., x N ⁇ denote a data matrix having N data vectors x,.
  • Each data vector consists of n components, in other words the data has n attributes ... X tract.
  • the attributes X t to be discrete.
  • the predictive distribution is typically a conditional predictive distribution
  • x t is the value of the attribute X t in data vector x
  • x denotes that the values of those attributes, which are outside the target set X ⁇ ,..., X m , are assumed to have the attribute values of data vector x.
  • an instance of the conditional predictive distribution For a given a data vector x ; it is possible to compute an instance of the predictive distribution. For example, an instance of the conditional predictive distribution is
  • x k is the value of attribute X k in data vector x
  • the instance of the predictive distribution means that a conditional probability (where the values of the other attributes are as indicated above) is associated with each possible value x k ⁇ , x k 2,- - ⁇ of each target attribute X k .
  • the predictive distribution may be a conditional distribution for one or more latent attributes, given the constructed model. Furthermore, the predictive distribution may be a combination of a conditional distribution for at least one domain attribute and a conditional distribution for one or more latent attributes.
  • X' denote a visualization matrix where each ⁇ -component data vector x t is replaced by a typically two or three-component visualization vector x,'.
  • a visualization matrix X' can easily be plotted on a two- or three-dimensional display. Consequently, for visualizing high-dimensional data, we need to find a transformation (function), which maps each data vector in the domain space to a vector in the visual space.
  • the corresponding visualization vectors should be close to each other in the visualization space.
  • One way to determine the visual locations is to determine them using pairwise distances between the data vectors to be visualized.
  • a criterion presented above is often minimized, but it is possible to find visualization vectors also using other criterion.
  • a distance metric which involves a predictive distribution or predictive distributions, is typically scale invariant as we have moved from the original attribute space to the probability space. This also allows us to handle different type of attributes (discrete or continuous) in the same consistent framework. Furthermore, the framework is theoretically on a more solid basis as our domain assumptions must be formalized in the model M. There are various ways to define a similarity measure between, for example, two instances of a predictive distribution. In a method according to one embodiment of the invention, the following distance metric is used:
  • MAP(x,) denotes the maximum posterior probability (MAP) assignment for the target attributes X ⁇ ,..., X m with respect to the selected predictive distribution, for example a predictive distribution presented in Equation 1.
  • the MAP assignment is the one with the highest probability. For example, if there is only one target attribute a conditional predictive distribution P(X ⁇ ⁇ x c ) associates probabilities for each possible value Xu, x 12 ,... of the target attribute X ⁇ and MAP assignment for the target attribute X ⁇ is the value x lk having the highest probability.
  • a first instance P(X ⁇ ⁇ x t ) of the predictive distribution associates first probabilities (Pa, Pi 2 ,.. -) and a second instance of the predictive distribution associates second probabilities ( , , P ⁇ ,...) for each possible value *n, x 12t ...
  • Equation 3 it is the probability that a first random outcome drawn from a first instance P(X ⁇ ,..., X m ⁇ x t ) of a predictive distribution is different from a second random outcome drawn from a second instance P(X ⁇ ,...X m ⁇ of the predictive distribution.
  • the pairwise distance between two data vectors x t and j is defined by
  • MAP(X J ) denotes the maximum posterior probability assignment for the target attributes X ⁇ ,..., X m with respect to the selected predictive distribution.
  • the distance metric defined in Equation 3 also here the distance between two data vectors x, and x, is dete ⁇ nined using a first instance P(X ⁇ ,... r X m ⁇ xf) and a second instance P(X ⁇ ,..., X m ⁇ X j ) of the selected predictive distribution.
  • the distance metrics defined in Equations 3 and 4 are supervised, as some attributes are selected as target attributes. Consequently, a visualization method using either of these distance metrics is a supervised method.
  • the pairwise distances by using more than one conditional predictive distribution.
  • the pairwise distance between two data vectors x t and x is defined in the following way
  • MAP ⁇ denotes the maximum posterior probability value of target attribute X k with respect to predictive distribution P(X k ⁇ x c ).
  • P(X k ⁇ x c ) the maximum posterior probability value of target attribute X k with respect to predictive distribution P(X k ⁇ x c ).
  • each attribute X k is in turn selected as a target attribute in a conditional predictive distribution.
  • the distance metric defined in Equation 5 is unsupervised, as all attributes are treated equally. When this metric is used with unsupervised models, it is usually enough to construct one model, as various conditional predictive distribution can be obtained from an unsupervised model. If this metric is used with supervised models, it may be necessary to construct several probabilistic models. For example, if the naive Bayes model is used, typically n models are constructed for a certain data domain, and in each model a different attribute is selected as the class variable. From each model it is then possible to obtain a conditional predictive distribution relating to the class variable.
  • the pairwise distance between two data vectors x t and x is defined as the symmetric Kullback- Leibler-divergence (see, for example, (German, Carlin, Stern, Rubin, 1995)) between a first instance P(X ⁇ ,..., X m ⁇ x t ) and a second instance P(X ⁇ ,...X m ⁇ X j ) of the predictive distribution conditioned with the variable value assignments present in a data vector.
  • a Kullback-Leibler divergence has an infinite range, which may lead to computational problems with practical implementations.
  • the visualization vectors are found minimizing Equation 2, in other word using the Sammon's mapping.
  • the visualization space is a space where each dimension represents directly a component of an instance of a predictive distribution.
  • a visualization vector x' corresponding to a data vector x, could be
  • the first visual coordinate is the conditional probability that the attribute X ⁇ has the value x u -
  • one probabilistic model which is the naive Bayes model mentioned above, is constructed.
  • the naive Bayes model By fixing the model structure to the naive Bayes model, the problem of searching for a good model structure is avoided.
  • the naive Bayes model produces very good results, and it is computationally quite simple.
  • the naive Bayes model is constructed, for example, using part of the available data as a fraining set and using the rest of the data in the visualization.
  • the class variable X n is used as the target attribute when the predictive distributions are calculated. Data vectors are thus visualized according to the classification distribution obtained by using the simple naive Bayesian network model.
  • the dimension of the visual space is two or three and the pairwise distance between data vectors in the data space is defined by Equation 3.
  • any search algorithm can be used, for example the following very straightforward stochastic greedy algorithm is used.
  • the algorithm starts with a random visualization X 9 , changes a randomly selected visualization vector x to a randomly selected new visualization, and accepts the change if the value of criterion in Equation 2 is decreased.
  • one visualization vector is changed at time.
  • the new candidate visual vector are generated from a normal distribution centered around the current visual vector, which means that small moves are more likely to be suggested than large ones. This stepwise procedure is repeated, for example, one million times.
  • Figure 1 presents six illustrative examples of the two-dimensional visualization produced using a method according to the first advantageous embodiment of the invention.
  • Visualization vectors corresponding to data vectors having different class labels are indicated with different type of markers in Figure 1.
  • the dataset being visualized are publicly available classification datasets from UCI data repository (Blake, Keogh, Merz, 1998).
  • visualizations of the following datasets are shown: Australian Credit, Balance Scale, Connect-4, German Credit, Thyroid disease and Vehicle Silhouettes.
  • the data shown in Figure 1 is various: some datasets comprise information relating to the credit card owners, one comprises information about patients having a certain disease, and one comprises information about vehicle silhouettes.
  • the visualizations in Figure 1 show clearly structures in the data domains, and the visualization method according to the first advantageous embodiment of the invention can thus be used to visualize various data domains successfully.
  • Figure 2 presents a comparative example, where a certain dataset (Breast Cancer from the UCI data repository) is visualized using a method according to the first advantageous embodiment of the invention (left-hand side panel of Figure 2) and using an Euclidean visualization method, where the distance between the data vectors is the Euclidean distance (right-hand side panel of Figure 2).
  • the Equation 2 is also niinimized using a similar stochastic greedy gorithm as in a method according to the first advantageous embodiment of the invention and the number of steps in the algorithm is the same for both visualizations presented in Figure 2.
  • the Euclidean visualization produces a scattered image without any noticeable trends.
  • the visualization which is the result of a method according to a first advantageous embodiment of the invention, shows a clear structure.
  • the method according to the first advantageous embodiment of the invention is thus more applicable to visualization and data mining than the Euclidean visualization and produces typically better results than the Euclidean visualization.
  • a method according to the invention where for example naive Bayes model, a single fraining set and a stochastic greedy algorithm are used, is quite simple and computationally comparative to, for example, conventional visualization schemes employing Euclidian distance metrics in the data domain.
  • the visualization can be obtained quite fast.
  • the quality of visualizations produces using a method according to the invention can be further enhanced, for example, using a more versatile probabilistic model.
  • a more versatile probabilistic model In general, if the naive Bayes model is used, the Sammon's mapping requires most computing resources. If more versatile models are used, then the construction of the probabilistic model may require also quite much computing resources.
  • Figure 3 presents four illustrative examples of the two-dimensional visualization produced using a method according to a second advantageous embodiment of the invention, where the unsupervised distance metrics defined in Equation 5 and naive Bayes model are used.
  • the unsupervised distance metrics defined in Equation 5 and naive Bayes model are used.
  • several naive Bayes models describing the data are constructed here.
  • Visualization vectors corresponding to data vectors having different class labels are indicated with different type of markers in Figure 3.
  • the dataset being visualized are from UCI data repository.
  • visualizations of the following datasets are shown: Breast Cancer (Wisconsin), Heart Disease (Hungarian), Ionosphere and Vehicle Silhouettes.
  • an unsupervised visualization method according to the invention may clearly reveal hidden structures in data domains.
  • the data to be visualized is data generated from said constructed model. This can be useful in e.g. domains where the amount of available data is so little that proper visualizations of the domains are hard to make. Generating data using the constructed probabilistic model, and then visualizing the generated data can also be used as a tool in gaining insight on the constructed probabilistic model.
  • the invention relates also to a computer system for visualizing multidimensional data.
  • the system comprises means for processing the data to achieve a model of the data domain, which can then be used for interactively developing and manipulating visual representations of the domain.
  • the implementation as a software tool advantageously comprises means for storing the probabilistic model structures, means for constructing a probabilistic model of the data domain using the stored probabilistic model structure, as well as means for using the constructed model in a visualization process as described previously.
  • the visual representation can be physically embodied in a computer-readable medium for visualization on a computer display device.
  • the stored probabilistic model structures may be any model structures discussed above, and the construction of the probabilistic model and the deterrnining of the visual locations may be performed using any methods described above.
  • Figure 4 illustrates a third advantageous embodiment of the invention.
  • Figure 4 shows, how various components of a computer system interact providing the functionality of the inventive method.
  • the computer system comprises means 100 for model construction, means 110 for location determination, means 120 for data visualization, means 130 for providing a user interface, and a processing unit 140.
  • the means 130 for providing a user interface may for example comprise a display unit, a keyboard, a pointing device such as a mouse, and any other typical user interface elements of a computer system.
  • the means 100 for model construction, means 110 for location determination, and means 120 for data visualization can advantageously be realized as program instructions stored in a memory medium and executed by the processing unit 140.
  • one or more fraining data sets 150 may be used as inputs for the means 100 for model construction.
  • the means for model construction 100 may comprise, for example, a certain set of predefined structures of parametric models and means for selecting a proper model structure and suitable parameters for the selected model structure.
  • the probabilistic model or models 151 and at least one visualization data set 152 are input into means 110 for location determination for producing visual location data 153.
  • the visual location data 153 is input into means 120 for data visualization for producing a visual representation of data.
  • the data is visualized on a display device by using the visual locations determined according to the inventive method.
  • the computer system further comprises means for allowing the user to manipulate the visual presentation according to different domain variable characteristics by using for example colors, shapes and animation.
  • the visual display functions also as an interface to the data to be visualized so that the user can study the contents of the original data vector through the corresponding visual location in the visual representation. This means that, for example, by pointing a certain visual location in a display device with a mouse, the attributes of the corresponding data vector are shown to the user.

Abstract

The present invention relates to computerized system modeling, and more particularly to a method and system for transforming a multivariate data domain into a low-dimensional visual representation. Probabilistic models of the data domain are utilized, and at least one probabilistic model is used to produce at least one predictive distribution. The predictive distributions are used as inputs to the visualization process, where the multidimensional space is converted to a low-dimensional space. In this process data vectors are considered similar, for example, if the corresponding instances of a predictive distribution, conditioned with the variable value assignments found in the data vectors, are similar. Consequently, similarity is not defined directly using the physical properties of the data vectors, but indirectly through the probabilistic predictive model(s). This allows the use of the heterogeneous data (with both continuous and discrete attributes with different value ranges) in a theoretically solid manner without need for heuristic scaling and normalization schemes in data preprocessing.

Description

Visualization method and visualization system
TECHNICAL FIELD OF THE INVENTION
The present invention relates to computerized system modeling, and more particularly to a method transforming a high-dimensional data domain into a low- dimensional visual representation. Specifically, the invention is directed to such a method as described in the preamble of claim 1.
BACKGROUND OF THE INVENTION
Computer visualization tools are needed for presenting the results of ever increasing amounts of processed data. The conventional approach is to take some few variables at a time, process them and their relations, for example, with a spreadsheet, and display the result, for example, as bar charts and die charts. In a complex domain, where each data point may have several attributes, this conventional approach produces typically a great number of charts, with a very weak connection to each other. The charts are typically presented in as a sequence of charts. From such a sequence of charts it is usually very difficult to see and comprehend the overall significance of the results. In a more advanced case the data is processed instead of a spreadsheet with more elaborate techniques, such as statistical methods or neural networks, but the results are still typically presented in sequential form using conventional charts.
In the following description a term data vector having a certain number of components refers to a data point having a certain number of attributes. The attributes/components may have continuous or discrete numerical values or they can have ordinal or nominal values. The data vectors are vectors of a data domain or a data space. In a visualization process, high-dimensional data vectors are displayed using typically a two- or three-dimensional device. A corresponding visualization vector having usually two or three coordinates, which determine the location of a point representing the data vector on the display device, is determined typically for each data vector.
Efforts exist to display data in low-dimensional presentation using, for example, conventional scatter plots that visually represent data vectors as graphical objects plotted along one, two, or three axes. If each data vector has a great number of components, which are usually called attributes, problems are encountered since besides the three dimensions offered by a three-dimensional display, only a few additional dimensions can be represented in this manner by using, for example, color and shape variations when representing the data.
Another even more significant limitation concerns the use of more elaborate conventional data dimension reduction methods that can be used to define a visualization vector for a data vector. The goal is to replace the original high- dimensional data vectors with much shorter vectors, while losing as little information as possible. Consequently, a pragmatically sensible data reduction scheme is such that when two data vectors are close to each other in the data space, the corresponding visualization vectors are also close to each other in the visualization space. Traditionally the closeness of data vectors in the data space is in these methods defined via a geometric distance measure such as the Euclidean distance. The attributes of the data can be various and heterogeneous, and therefore various dimension of the data space can have different scaling and meaning. The geometric distances between the data vectors do not properly reflect the properties of complex data domains, where the data typically is not coded in a geometric or spatial form. In this type of domains, changing one bit in a vector may totally change the relevance of the vector, and make it in some sense a quite different vector, although geometrically the difference is only one bit. For example, as many data sets contain nominal or ordinal attributes, this means that some of the data vector components have nominal or ordinal values, and finding a reasonable coding with respect a geometric distance metric, for example the Euclidean distance metric, is a difficult task. In a geometric distance metric, all attributes (vector components) are treated as equal. Therefore it is obvious that an attribute with a scale of, say, between -1000 and 1000, is more influential than an attribute with a range between -1 and 1. To circumvent this problem, the attributes can of course be normalized, but it is not at all clear what is the optimal way to implement the normalization. In addition, in real- world situations the similarity of two vectors is not a universal property, but depends on the specific focus of the user: even if two vectors can be regarded as similar from one point of view, they may appear quite dissimilar from another point of view.
A third significant limitation is related to data mining. Data mining is a process that uses specific techniques to find patterns in data, allowing a user to conduct a relatively broad search in databases for relevant information that may not be explicitly stored in the data. In a typical data mining process, a user initially specifies a search phrase or strategy and the system then extracts patterns and relations corresponding to that strategy from the stored data. It usually takes some time for extracting the patterns, and therefore the extracted patterns and relations are presented to the user by a data analyst with a delay. The probably invoked new requests cause a new processing cycle with a relatively long time delay. There is thus a need for a data visualization tool/method that visually approximates in one instance the whole data domain although it includes a large number of variables. Furthermore, there is need for a tool/method where the results of the data mining process are visualized instantly and the data mining process is typically carried out in one session.
SUMMARY OF THE INVENTION
An object of the invention is to realize a flexible visualization method. A further object of the invention is to realize a method, which is able to handle heterogeneous data straightforwardly and enables the visualization of heterogeneous data.
Objects of the invention are achieved by constructing a set of probabilistic models, generating predictive distributions from this set of probabilistic models, and deteirnining visualization vectors corresponding to the data vectors using the predictive distributions.
The method according to the invention is a method for generating visual representations of multidimensional data domains, which method comprises the steps of:
- selecting data to be visualized from at least one data source, and
- choosing the number of dimensions to be used in the visualization, and which method is characterized in that it further comprises the steps of:
- constructing a set of probabilistic models, - generating a set of predictive distributions from said set of probabilistic models, and
- using at least one predictive distribution belonging to said set of predictive distributions, determining a visual location for each data vector to be visualized.
The dependent claims describe further advantageous embodiments of the invention.
The present invention is a method for transforming a multivariate data domain into a visual low-dimensional representation. The method utilizes probabilistic models of the data domain. A probabilistic model is a model, which associates with each point of the data domain a certain probability. In a method according to the invention, there may be a certain set of predetermined models, and the construction of a set of probabilistic models for a certain visualization process may mean, for example, the selection of models describing the data domain from the set of predetermined models. The selection of models, or more generally the construction of models, can involve the use of a tiaining data set, some expert knowledge of the data domain and/or some logical constraints.
In the visualization process the multidimensional space is converted to a low- dimensional space using a transformation, which maps each data vector in the domain space to a vector in a visual space having a lower dimension. The visual space typically has one, two or three dimensions. Typically it is required that the transformation is such that when two vectors are close to each other in the domain space, the corresponding vectors in the visual space are also close to each other. In a method according to the invention, usually an Euclidean distance is used to define the distance between vectors in the visual space, and the distance between vectors in the domain space is typically defined using at least one predictive distribution derived from the constructed probabilistic model. At least one of the constructed models is thus directly used in the visualization process to produce the predictive distribution(s).
The set of probabilistic models may consist of one or more probabilistic models. Similarly, the set of predictive distributions may consist of one or more predictive distributions. If more than one predictive distributions are generated, they may relate to one or more of the constructed probabilistic models. It is, for example, possible to have one constructed model and derive two predictive distributions from said model. A second example is to have two constructed models and two predictive distributions, where a first predictive distribution relates to one constructed model and a second predictive distribution relates to the other constructed model.
In a method according to the invention, the predictive distribution is used as input to the visualization process, where the visualization vectors corresponding to the data vectors are calculated. The predictive distribution can, for example, be used in estimating how close two data vectors are to each other. In a method according to the invention, similarity of data vectors (or, in other words, distance between data vectors) is not defined directly using the values of the components of the data vectors, but indirectly through the probabilistic predictive model(s). This allows the use of heterogeneous data (with both continuous and discrete attributes with different value ranges) in a theoretically solid manner without need for heuristic scaling and normalization schemes in data preprocessing.
Consider an example of using one predictive distribution in detennining a distance between two data vectors. Two data vectors in the domain space may be considered similar if they lead to similar predictions, when the data vectors are given as inputs to the constructed model. Typically a first instance of the predictive distribution relating to a first data vector in the domain space is calculated, and a second instance predictive distribution relating to a second data vector in the domain space is calculated. The distance between the first and the second data vector in the domain space depends on the similarity of the first and second instances of the predictive distribution, in other words it depends on the distance between the first and second instances of the predictive distribution. Various distance metrics, where the distance between data vectors is determined using instances of the predictive distribution, are discussed in the detail description of the invention.
In a method according to the invention, the predictive distribution corresponding to a data vector is typically a predictive distribution conditioned with the values of some components of the data vector. The data attributes, whose values are not used as conditions, are called target attributes. In a method according to the invention it is thus possible to change the focus of the visualization by changing the target attributes. A method according to the invention may thus be a supervised data visualization method. This is very useful, for example, when a user knows in which data attributes he is interested in and can select these attributes as target attributes. Alternatively, it is possible to use an unsupervised probabilistic model and use a distance metric that does not involve a selection of certain target attributes. In this case, the visualization method according to the invention is an unsupervised method. When an unsupervised visualization method is used, the user does not have to select any data attribute as target attribute. This is an advantage, for example, when among the data attributes there is no natural candidate for the target attribute. It is possible, for example, to make an unsupervised visualization work automatically, so that constructs the probabilistic model(s) using the data and then visualizes the data without a user tervening the visualization.
Typically after the visual locations corresponding to the data vectors are deteπnined, a visual representation of the data domain is generated using the determined visual locations. In addition to plain visualization, a method according to the invention is very suitable for data mining, where domain experts try to capture interesting regularities from the visual image. Because at least one predictive distribution is used in deteπnining the visual locations, visualization according to the invention often efficiently reveals hidden structures in the data. In data mining, it is furthermore possible to view visualizations that relate to various target attribute sets, i.e. to various predictive distributions.
In a method according to the invention, at least one probabilistic model is constructed and it may be stored for further use. Especially, if the probabilistic model is a Bayesian model, it is quite straightforward to produce predictive distributions using the probabilistic model.
The present invention provides procedures for visually displaying and manipulating multi-dimensional data with, for example, the following advantages. Data visualization can be simplified as the visualization result is typically a two or three- dimensional plot. Information can be synthesized from data, as the visualization results may reveal hidden structures of the data, and at least partly as a result of the revealed structures, decision making can be simplified. Trends and data relationships can be more easily visualized and uncovered, for example, using various colors and/or markers are used to mark different attribute values in the visual representation. Furthermore, report generation can be simplified, and data administration can be performed more easily and understandably when one understands the domain better.
The invention relates also to a visualization system, which comprises means for receiving data to be visualized, and which is characterized in that it further comprises
- means for constructing a set of probabilistic models using predetermined probabilistic model structures, - means for generating a set of predictive distributions from said set of probabilistic models,
- means for determining, using at least one predictive distribution belonging to said set of predictive distributions, visual locations for data vectors, which constitute at least part of the data to be visualized, and - means for producing a visualization using said visual locations. BRIEF DESCRIPTION OF THE DRAWINGS
The invention is described in more detail in the following with reference to the accompanying drawings, of which
Figure 1 illustrates examples of visualization results produced by a method according to a first advantageous embodiment of the invention,
Figure 2 illustrates first visualization results produced by a method according to the first advantageous embodiment of the invention and second visualization results produced using a conventional visualization method,
Figure 3 illustrates examples of visualization results produced by a method according to a second advantageous embodiment of the invention, and
Figure 4 illustrates a diagram of a system, which is an example of a system according to the present invention.
DETAILED DESCRIPTION
In the following description letter M refers to a probabilistic model, which associates with each point of the data domain a certain probability. In other words, the model M relates to a probability distribution P(X\,..., X„\ M) on the space of possible data vectors x, where a data vector has n attributes/components Xt. A typical example of a probabilistic model is a parametric model where M is the structure of the model and θ represents the parameters of the model. In this case, each parameterized instance (M, θ) of the parametric model produces a probability distribution P(XX, ... , X„\ M, θ).
A probabilistic model used in a method according to the invention may be a supervised model or an unsupervised model. A supervised model means that, for example, one of the data attributes is selected as a class attribute, which is the focus of the visualization. In supervised models, the target attributes are thus typically selected already when the model is constructed. In unsupervised models it is not necessary to decide the target attributes when the model is constructed; they can be selected when the distances between the data vectors are determined. The probabilistic model M used in a method according to the invention may belong to a family of models known as Bayesian (belief) network models. A Bayesian network is a representation of a probability distribution over a set of (typically) discrete variables, consisting of an acyclic directed graph, where the nodes correspond to domain variables, and the arcs define a set of independence assumptions which allow the joint probability distribution for a data vector to be factorized as a product of simple conditional probabilities. For an introduction to Bayesian network models, see e.g., (Pearl, 1988). One example of a Bayes network model, which can be used in a method according to the invention, is the naive Bayes model. The naive Bayes model is a supervised model, where one of the data attributes is selected as a class variable. A description of the naive Bayes model can be found, for example, in (Kontkanen, Myllymaki, Silander, Tirri, 1998). A further example of a probabilistic model usable in a method according to the invention is a model belonging to a family of mixtures of Bayesian network models. A mixture of Bayesian network models is a weighted sum of several Bayesian network models.
A training set of sample data, or many training sets from one or more data sources, can be used in constructing the probabilistic model(s). In case of parametric models, for example, construction of a model refers to selecting a suitable model structure and suitable parameters for the selected model structure. Theoretically justifiable techniques for learning models from sample data are discussed in (Heckerman, 1996). It is also possible to use, alternatively or in addition to a framing set, further information about the data domain. For example, the model construction may be based at least partly on knowledge about the problem domain represented as prior distributions and/or as logical constraints. When a framing set is used, it is possible to use, for example, part of the data to be visualized as a fraining set and still use the whole data in the visualization process. In other words, it is possible that the fraining set is a subset of the data to be visualized. Furthermore, it is possible that data to be visualized is a subset of the fraining set or that the fraining set consists of the data to be visualized.
It is possible to produce predictive distributions given a probabilistic model. A predictive distribution may be a conditional distribution for one or more of the domain attributes Xt given the other attributes. Let X = {xl5 ..., xN} denote a data matrix having N data vectors x,. Each data vector consists of n components, in other words the data has n attributes ... X„. For simplicity, in the sequel we will assume the attributes Xt to be discrete. Let us assume that we wish to visualize data with respect to m target attributes Xi,..., Xm. In this case the predictive distribution is typically a conditional predictive distribution
P( ,,..., m|x ,Mj = P\X ,...,Xm\Xm+l = xm+λ,...,Xn = xn ,Mj,
where M is a constructed model, xt is the value of the attribute Xt in data vector x, and x denotes that the values of those attributes, which are outside the target set X\,..., Xm, are assumed to have the attribute values of data vector x. The number of target attributes can be, for example, one, i.e. m = 1. If, for example, the naive Bayes model is used, the target set typically consists of the class attribute.
For a given a data vector x; it is possible to compute an instance of the predictive distribution. For example, an instance of the conditional predictive distribution is
Figure imgf000011_0001
xm' +l,...,Xn = xn' ,M) , (1)
where xk is the value of attribute Xk in data vector x,. The instance of the predictive distribution means that a conditional probability (where the values of the other attributes are as indicated above) is associated with each possible value x, xk2,- - ■ of each target attribute Xk.
If a constructed probabilistic model involves one or more latent attributes, the predictive distribution may be a conditional distribution for one or more latent attributes, given the constructed model. Furthermore, the predictive distribution may be a combination of a conditional distribution for at least one domain attribute and a conditional distribution for one or more latent attributes.
Let X' denote a visualization matrix where each ^-component data vector xt is replaced by a typically two or three-component visualization vector x,'. Such a visualization matrix X' can easily be plotted on a two- or three-dimensional display. Consequently, for visualizing high-dimensional data, we need to find a transformation (function), which maps each data vector in the domain space to a vector in the visual space. In order to have a meaningful visualization for two data vectors, which are close to each other in the domain space, the corresponding visualization vectors should be close to each other in the visualization space. One way to determine the visual locations (visualization vectors) is to determine them using pairwise distances between the data vectors to be visualized. Let us note the distance between between data vectors x, and x in the domain space with d(xh xj) and the distance between the corresponding visualization vectors x*' and x in the visual space with d(x , x'j). It is possible, for example, to find a best visualization matrix X' in least-square sense by minimizing the sum of the squares of the distance differences d(x xj) - d(x , x'j). This is called Sammon's mapping (see (Kohonen, 1995)). Formally, we can express this requirement, for example, in the following manners:
N N
Minimize ∑ ∑ (d(xI ,x,) - d,(x, I ,x' )) or ι=l j=ι+l
Minimize . (2)
Figure imgf000012_0001
In a method according to the invention, a criterion presented above is often minimized, but it is possible to find visualization vectors also using other criterion.
The geometric Euclidean distance seems a natural choice for the distance metric d'(-) in the visualization space, but this distance measure typically does not make a good similarity metric in the high-dimensional domain space. In many complex domains geometric distance measures reflect poorly the significant similarities and differences between the data vectors. In a method according to the invention, if the pairwise distances between data vectors are computed, they are computed by using at least one predictive distribution generated from a constructed probabilistic model M. Two vectors are typically considered similar if they lead to similar predictions, when given as input to the same probabilistic model M. For example, data vectors x, and xj can be considered similar, if the corresponding instances of a predictive distribution, i.e. P(X\,...Xm\ x, , M) and P(X\,...Xm| x, , M), are similar. A distance metric, which involves a predictive distribution or predictive distributions, is typically scale invariant as we have moved from the original attribute space to the probability space. This also allows us to handle different type of attributes (discrete or continuous) in the same consistent framework. Furthermore, the framework is theoretically on a more solid basis as our domain assumptions must be formalized in the model M. There are various ways to define a similarity measure between, for example, two instances of a predictive distribution. In a method according to one embodiment of the invention, the following distance metric is used:
d(xt, xj) = 1.0 -
Figure imgf000013_0001
MAP(x,)) (3)
where MAP(x,) denotes the maximum posterior probability (MAP) assignment for the target attributes X\,..., Xm with respect to the selected predictive distribution, for example a predictive distribution presented in Equation 1. Of all the possible value combinations for the target attributes, the MAP assignment is the one with the highest probability. For example, if there is only one target attribute
Figure imgf000013_0002
a conditional predictive distribution P(X\\ xc) associates probabilities for each possible value Xu, x12,... of the target attribute X\ and MAP assignment for the target attribute X\ is the value xlk having the highest probability. In other words, P(MAP(x )=MAP(x/)) is the probability that the values of the target attributes in data vector Xj are the same as the values of the target attributes in data vector x,-, when the values of the attributes outside the target set are assumed to have the values they have in xt and Xj. Consider again the above example involving one target attribute X\. In this case, a first instance P(X\\ xt ) of the predictive distribution associates first probabilities (Pa, Pi2,.. -) and a second instance
Figure imgf000013_0003
of the predictive distribution associates second probabilities ( , , Pβ,...) for each possible value *n, x12t... of the target attribute X and /J(MAP(xi)=MAP(x7)) = PnPjι+ PaPji+~ - further wording for the distance metric in Equation 3 is that it is the probability that a first random outcome drawn from a first instance P(X\,..., Xm\ xt ) of a predictive distribution is different from a second random outcome drawn from a second instance P(Xι,...Xm\
Figure imgf000013_0004
of the predictive distribution.
In a method according to a second embodiment of the invention, the pairwise distance between two data vectors xt and j is defined by
d(xh xj) = - log P(MAP(x*)= MAP(x)), (4)
where MAP(XJ) denotes the maximum posterior probability assignment for the target attributes X\,..., Xm with respect to the selected predictive distribution. Similarly as the distance metric defined in Equation 3, also here the distance between two data vectors x, and x, is deteπnined using a first instance P(Xι,...rXm\ xf) and a second instance P(X\,..., Xm\ Xj ) of the selected predictive distribution. The distance metrics defined in Equations 3 and 4 are supervised, as some attributes are selected as target attributes. Consequently, a visualization method using either of these distance metrics is a supervised method.
It is possible to define the pairwise distances by using more than one conditional predictive distribution. In a method according to a third embodiment of the invention, the pairwise distance between two data vectors xt and x, is defined in the following way
<K*h ) = -∑log (MAPt(xI ) = MAP,(xy)) , (5) k=\
where MAP^ denotes the maximum posterior probability value of target attribute Xk with respect to predictive distribution P(Xk\ xc). This means that each attribute Xk is in turn selected as a target attribute in a conditional predictive distribution. The distance metric defined in Equation 5 is unsupervised, as all attributes are treated equally. When this metric is used with unsupervised models, it is usually enough to construct one model, as various conditional predictive distribution can be obtained from an unsupervised model. If this metric is used with supervised models, it may be necessary to construct several probabilistic models. For example, if the naive Bayes model is used, typically n models are constructed for a certain data domain, and in each model a different attribute is selected as the class variable. From each model it is then possible to obtain a conditional predictive distribution relating to the class variable. Preferably, when a distance metric defined in Equation 3, 4 or 5 is used, the visualization vectors are found using the Sammon's mapping.
In a method according to a fourth embodiment of the invention, the pairwise distance between two data vectors xt and x, is defined as the symmetric Kullback- Leibler-divergence (see, for example, (German, Carlin, Stern, Rubin, 1995)) between a first instance P(X\,..., Xm\ xt ) and a second instance P(Xι,...Xm\ Xj ) of the predictive distribution conditioned with the variable value assignments present in a data vector. A Kullback-Leibler divergence has an infinite range, which may lead to computational problems with practical implementations. Preferably, the visualization vectors are found minimizing Equation 2, in other word using the Sammon's mapping.
It is also possible to use a predictive distribution to define the visual locations directly. In a method according to a further embodiment of the invention, the visualization space is a space where each dimension represents directly a component of an instance of a predictive distribution. A component of an instance of a predictive distribution means here the probability that the target attributes have certain predetermined values, e.g.
Figure imgf000015_0001
= xn and X2 = χn- In a three-dimensional visualization space, for example, a visualization vector x', corresponding to a data vector x, could be
x', = rø = xn\ x,c , M), P(XX = xl2\ x,c, M), P(XX = xl3\ x,c, Mj).
Here, for example, the first visual coordinate is the conditional probability that the attribute X\ has the value xu-
In a method according to a first advantageous embodiment of the invention, one probabilistic model, which is the naive Bayes model mentioned above, is constructed. By fixing the model structure to the naive Bayes model, the problem of searching for a good model structure is avoided. In many cases the naive Bayes model produces very good results, and it is computationally quite simple. The naive Bayes model is constructed, for example, using part of the available data as a fraining set and using the rest of the data in the visualization.
In a method according to the first advantageous embodiment, the class variable Xn is used as the target attribute when the predictive distributions are calculated. Data vectors are thus visualized according to the classification distribution obtained by using the simple naive Bayesian network model.
In a method according to the first advantageous embodiment, the dimension of the visual space is two or three and the pairwise distance between data vectors in the data space is defined by Equation 3. For minimizing the criterion in Equation 2, any search algorithm can be used, for example the following very straightforward stochastic greedy algorithm is used. The algorithm starts with a random visualization X9, changes a randomly selected visualization vector x to a randomly selected new visualization, and accepts the change if the value of criterion in Equation 2 is decreased. In other words, one visualization vector is changed at time. The new candidate visual vector are generated from a normal distribution centered around the current visual vector, which means that small moves are more likely to be suggested than large ones. This stepwise procedure is repeated, for example, one million times. Figure 1 presents six illustrative examples of the two-dimensional visualization produced using a method according to the first advantageous embodiment of the invention. Visualization vectors corresponding to data vectors having different class labels are indicated with different type of markers in Figure 1. The dataset being visualized are publicly available classification datasets from UCI data repository (Blake, Keogh, Merz, 1998). In Figure 1, visualizations of the following datasets are shown: Australian Credit, Balance Scale, Connect-4, German Credit, Thyroid disease and Vehicle Silhouettes.
As the names of these datasets indicate, the data shown in Figure 1 is various: some datasets comprise information relating to the credit card owners, one comprises information about patients having a certain disease, and one comprises information about vehicle silhouettes. The visualizations in Figure 1 show clearly structures in the data domains, and the visualization method according to the first advantageous embodiment of the invention can thus be used to visualize various data domains successfully.
Figure 2 presents a comparative example, where a certain dataset (Breast Cancer from the UCI data repository) is visualized using a method according to the first advantageous embodiment of the invention (left-hand side panel of Figure 2) and using an Euclidean visualization method, where the distance between the data vectors is the Euclidean distance (right-hand side panel of Figure 2). In the Euclidean method, the Equation 2 is also niinimized using a similar stochastic greedy gorithm as in a method according to the first advantageous embodiment of the invention and the number of steps in the algorithm is the same for both visualizations presented in Figure 2.
As can be seen in Figure 2, the Euclidean visualization produces a scattered image without any noticeable trends. The visualization, which is the result of a method according to a first advantageous embodiment of the invention, shows a clear structure. The method according to the first advantageous embodiment of the invention is thus more applicable to visualization and data mining than the Euclidean visualization and produces typically better results than the Euclidean visualization. A method according to the invention, where for example naive Bayes model, a single fraining set and a stochastic greedy algorithm are used, is quite simple and computationally comparative to, for example, conventional visualization schemes employing Euclidian distance metrics in the data domain. The visualization can be obtained quite fast. Furthermore, as a simple method according to the first advantageous embodiment produces already good visualizations, the quality of visualizations produces using a method according to the invention can be further enhanced, for example, using a more versatile probabilistic model. In general, if the naive Bayes model is used, the Sammon's mapping requires most computing resources. If more versatile models are used, then the construction of the probabilistic model may require also quite much computing resources.
Figure 3 presents four illustrative examples of the two-dimensional visualization produced using a method according to a second advantageous embodiment of the invention, where the unsupervised distance metrics defined in Equation 5 and naive Bayes model are used. As explained in connection with Equation 5, several naive Bayes models describing the data are constructed here. Visualization vectors corresponding to data vectors having different class labels are indicated with different type of markers in Figure 3. The dataset being visualized are from UCI data repository. In Figure 3, visualizations of the following datasets are shown: Breast Cancer (Wisconsin), Heart Disease (Hungarian), Ionosphere and Vehicle Silhouettes. As can be seen in Figure 3, also an unsupervised visualization method according to the invention may clearly reveal hidden structures in data domains.
For the visualization examples presented in Figures 1, 2 and 3, part of the data sets derived from the UCI data repository is used as a fraining set. The fraining set is not included in the data to be visualized in Figures 1, 2 and 3.
In a further embodiment of the invention, the data to be visualized is data generated from said constructed model. This can be useful in e.g. domains where the amount of available data is so little that proper visualizations of the domains are hard to make. Generating data using the constructed probabilistic model, and then visualizing the generated data can also be used as a tool in gaining insight on the constructed probabilistic model.
The invention relates also to a computer system for visualizing multidimensional data. Preferably, the system comprises means for processing the data to achieve a model of the data domain, which can then be used for interactively developing and manipulating visual representations of the domain.
The implementation as a software tool advantageously comprises means for storing the probabilistic model structures, means for constructing a probabilistic model of the data domain using the stored probabilistic model structure, as well as means for using the constructed model in a visualization process as described previously. The visual representation can be physically embodied in a computer-readable medium for visualization on a computer display device.
In a visualization system according to the invention, the stored probabilistic model structures may be any model structures discussed above, and the construction of the probabilistic model and the deterrnining of the visual locations may be performed using any methods described above.
Figure 4 illustrates a third advantageous embodiment of the invention. Figure 4 shows, how various components of a computer system interact providing the functionality of the inventive method. According to Figure 4, the computer system comprises means 100 for model construction, means 110 for location determination, means 120 for data visualization, means 130 for providing a user interface, and a processing unit 140.
The means 130 for providing a user interface may for example comprise a display unit, a keyboard, a pointing device such as a mouse, and any other typical user interface elements of a computer system. The means 100 for model construction, means 110 for location determination, and means 120 for data visualization can advantageously be realized as program instructions stored in a memory medium and executed by the processing unit 140.
According to the third advantageous embodiment of the invention, for producing at least one probabilistic model 151 one or more fraining data sets 150 may be used as inputs for the means 100 for model construction. The means for model construction 100 may comprise, for example, a certain set of predefined structures of parametric models and means for selecting a proper model structure and suitable parameters for the selected model structure. The probabilistic model or models 151 and at least one visualization data set 152 are input into means 110 for location determination for producing visual location data 153. The visual location data 153 is input into means 120 for data visualization for producing a visual representation of data.
Preferably, the data is visualized on a display device by using the visual locations determined according to the inventive method. Preferably, the computer system further comprises means for allowing the user to manipulate the visual presentation according to different domain variable characteristics by using for example colors, shapes and animation. Preferably, the visual display functions also as an interface to the data to be visualized so that the user can study the contents of the original data vector through the corresponding visual location in the visual representation. This means that, for example, by pointing a certain visual location in a display device with a mouse, the attributes of the corresponding data vector are shown to the user.
In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention. While advantageous embodiments of the invention have been described in detail, it should be apparent that many modifications and variations thereto are possible, all of which fall within the true spirit and scope of the invention.
References
Blake, C, Keogh, E., & Merz, C. (1998). UCI repository of machine learning databases. (URL: ~http:/ ~/www.ics.uci.edu/-rnlearn/MLRepository.html)
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (1995). Bayesian data analysis. Chapman ~& Hall.
Heckerman, D. (1996). A tutorial on learning with Bayesian networks (Tech. Rep. No. S4SR-TR95-06). One Microsoft Way, Redmond, WA 98052: Microsoft Research, Advanced Technology Division.
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer- Verlag.
Kontkanen, P., Myllymaki, P., Silander, T., & Tirri, H. (1998). BAYDA: Software for Bayesian classification and feature selection. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98) (pp. 254-258). AAAI Press, Menlo Park.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann Publishers, San Mateo, CA.

Claims

Claims
1. Method for generating visual representations of multidimensional data domains, which method comprises the steps of:
- selecting data to be visualized from at least one data source, and - choosing the number of dimensions to be used in the visualization, characterized in that the method further comprises the steps of:
- constructing a set of probabilistic models,
- generating a set of predictive distributions from said set of probabilistic models, and - using at least one predictive distribution belonging to said set of predictive distributions, deterrnining a visual location for each data vector to be visualized.
2. A method according to claim 1, characterized in that it further comprises the step of storing at least one probabilistic model belonging to said set of probabilistic models.
3. A method according to claim 1, characterized in that it further comprises the step of generating a visual representation of the data domain using said determined visual locations.
4. Method according to claim 1, characterized in that in said step of constructing a set of probabilistic models, the model construction is based at least partly on a set of sample data from said at least one data source.
5. Method according to claim 4, characterized in that said set of sample data is a set of data consisting of the data selected in said step of selecting data to be visualized.
6. Method according to claim 4, characterized in that said set of sample data is a subset of the data selected in said step of selecting data to be visualized.
7. Method according to claim 4, characterized in that in said step of selecting data to be visualized, a subset of said set of sample data is selected.
8. Method according to claim 1, characterized in that in said step of constructing a set of probabilistic models, the model construction is based at least partly on knowledge about the problem domain represented as prior distributions.
9. Method according to claim 1, characterized in that in said step of constructing a set of probabihstic models, the model construction is based at least partly on knowledge about the problem domain represented as logical constraints.
10. Method according to claim 1, characterized in that at least one probabihstic model belonging to said set of probabihstic models belongs to the family of models known as Bayesian networks.
11. Method according to claim 1, characterized in that at least one probabihstic model belonging to said set of probabihstic models belongs to the family of mixtures of Bayesian network models.
12. Method according to claim 1, characterized in that it further comprises of step of generating data using at least one probabilistic model belonging to said set of probabilistic models, and in that in said step of selecting data to be visuahzed, said generated data is selected.
13. Method according to claim 1, characterized in that at least one predictive distribution belonging to said set of predictive distributions is the conditional distribution for at least one domain attribute.
14. Method according to claim 1, characterized in that at least one predictive distribution belonging to said set of predictive distributions is the conditional distribution for at least one latent attribute.
15. Method according to claim 1, characterized in that at least one predictive distribution belonging to said set of predictive distributions is a combination of the conditional distribution for at least one domain attribute and the conditional distribution for at least one latent attribute.
16. Method according to claim 1, characterized in that the number of dimensions used in the step of generating a visual representation is one.
17. Method according to claim 1, characterized in that the number of dimensions used in the step of generating a visual representation is two.
18. Method according to claim 1, characterized in that the number of dimensions used in the step of generating a visual representation is three.
19. Method according to claim 1, characterized in that in said step of deterrnining the visual locations, said visual locations are detemiined by pairwise distances between data vectors to be visualized, where the pairwise distances are computed by using at least one predictive distribution belonging to said set of predictive distributions.
20. Method according to claim 19, characterized in that in said step of determining the visual locations, a technique known as Sammon's mapping is used.
21. Method according to claim 19, characterized in that said set of predictive distributions comprises a conditional distribution and the pairwise distance between a first data vector and a second data vector is the symmetric Kullback-Leibler- distance between a first instance of the conditional distribution, where the conditional variables are assigned the values present in the first data vector, and a second instance of the conditional distribution, where the conditional variables are assigned the values present in the second data vector.
22. Method according to claim 19, characterized in that said set of predictive distributions comprises a conditional distribution and the pairwise distance between a first data vector and a second data vector is defined using at least the probability that a first random outcome drawn from a first instance of the conditional distribution, where the conditional variables are assigned the values present in the first data vector, is different from a second random outcome drawn from a second instance of the conditional distribution, where the conditional variables are assigned the values present in the second data vector.
23. Method according to claim 19, characterized in that in said step of deterrnining the visual locations, a technique known as Sammon's mapping is used.
24. Method according to claim 23, characterized in that said set of probabilistic models comprises a naive Bayes model.
25. Method according to claim 1, characterized in that said set of predictive distributions comprises a first conditional distribution for first domain attribute(s) and a second conditional distribution for second domain atfribute(s), and in that in said step of deteπnining the visual locations, said visual locations are determined by pairwise distances between data vectors to be visualized, where the pairwise distances are computed by using at least the first conditional distribution and the second conditional distribution.
26. Method according to claim 25, characterized in that said set of probabihstic models comprises a first probabilistic model and a second probabihstic model, and the first conditional distribution is related to the first probabihstic model and the second conditional distribution is related to the second probabihstic model.
27. Method according to claim 1, characterized in that in said step of determiriing the visual locations, the visual locations are determined by defining a coordinate system where each dimension represents one component of an instance of a predictive distribution belonging to said set of predictive distributions.
28. Method according to claim 1, characterized in that said set of probabihstic models consists of one probabihstic model.
29. Method according to claim 1, characterized in that said set of predictive distributions consists of one predictive distribution.
30. A visualization system, which comprises means for receiving data to be visualized, characterized in that it further comprises
- means for constructing a set of probabihstic models using predetermined probabilistic model structures,
- means for generating a set of predictive distributions from said set of probabilistic models,
- means for deteπnining, using at least one predictive distribution belonging to said set of predictive distributions, visual locations for data vectors, which constitute at least part of the data to be visualized, and
- means for producing a visualization using said visual locations.
31. A visualization system according to claim 30, characterized in that it further comprises means for storing the probabihstic model structures.
32. A visualization system according to claim 30, characterized in that it further comprises means for providing a user interface.
33. A visualization system according to claim 30, characterized in that it further comprises means for displaying said visualization.
34. A visualization system according to claim 30, characterized in that it further comprises means for storing said visualization on a computer-readable medium.
35. A visualization system according to claim 30, characterized in that the means for constructing a set of probabilistic models, the means for generating a set of predictive distributions, the means for deterrnining visual locations and the means for producing a visualization are realized as program instructions stored in a memory medium and in that the visualization system further comprises a processing unit for executing the program instructions.
PCT/FI2000/000603 1999-06-30 2000-06-30 Visualization method and visualization system WO2001003053A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP00944080A EP1206752A1 (en) 1999-06-30 2000-06-30 Visualization method and visualization system
US10/019,477 US6873325B1 (en) 1999-06-30 2000-06-30 Visualization method and visualization system
AU58316/00A AU5831600A (en) 1999-06-30 2000-06-30 Visualization method and visualization system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI991490A FI991490A0 (en) 1999-06-30 1999-06-30 visualization method
FI991490 1999-06-30

Publications (1)

Publication Number Publication Date
WO2001003053A1 true WO2001003053A1 (en) 2001-01-11

Family

ID=8554992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2000/000603 WO2001003053A1 (en) 1999-06-30 2000-06-30 Visualization method and visualization system

Country Status (5)

Country Link
US (1) US6873325B1 (en)
EP (1) EP1206752A1 (en)
AU (1) AU5831600A (en)
FI (1) FI991490A0 (en)
WO (1) WO2001003053A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7647096B2 (en) 2001-05-14 2010-01-12 Kent Ridge Digital Labs Methods and apparatus for calculating and presenting the probabilistic functional maps of the human brain

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3335602B2 (en) * 1999-11-26 2002-10-21 株式会社クリエイティブ・ブレインズ Thinking system analysis method and analyzer
US7557805B2 (en) * 2003-04-01 2009-07-07 Battelle Memorial Institute Dynamic visualization of data streams
US20080071764A1 (en) * 2006-09-19 2008-03-20 Kazunari Omi Method and an apparatus to perform feature similarity mapping
US8060540B2 (en) 2007-06-18 2011-11-15 Microsoft Corporation Data relationship visualizer
US8423596B2 (en) * 2009-02-05 2013-04-16 Sean Gifford Methods of multivariate data cluster separation and visualization
KR102029055B1 (en) * 2013-02-08 2019-10-07 삼성전자주식회사 Method and apparatus for high-dimensional data visualization
DE102015111549A1 (en) * 2015-07-16 2017-01-19 Wolfgang Grond Method for visually displaying electronic output data sets
US10795566B1 (en) * 2017-06-05 2020-10-06 Mineset, Inc. Two dimensional evidence visualizer
US10229092B2 (en) 2017-08-14 2019-03-12 City University Of Hong Kong Systems and methods for robust low-rank matrix approximation
CN108038790B (en) * 2017-11-24 2021-10-15 东华大学 Situation analysis system with internal and external data fusion
US11847132B2 (en) 2019-09-03 2023-12-19 International Business Machines Corporation Visualization and exploration of probabilistic models
CN113096101A (en) * 2021-04-15 2021-07-09 深圳市玻尔智造科技有限公司 Defect detection method for mobile phone screen with default image-level label

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993000651A1 (en) * 1991-06-28 1993-01-07 Digital Equipment Corporation Method for visually representing a volumetric set of non-geometric multidimensional data
EP0863469A2 (en) * 1997-02-10 1998-09-09 Nippon Telegraph And Telephone Corporation Scheme for automatic data conversion definition generation according to data feature in visual multidimensional data analysis tool

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5640468A (en) * 1994-04-28 1997-06-17 Hsu; Shin-Yi Method for identifying objects and features in an image
US6128613A (en) * 1997-06-26 2000-10-03 The Chinese University Of Hong Kong Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words
US6292771B1 (en) * 1997-09-30 2001-09-18 Ihc Health Services, Inc. Probabilistic method for natural language processing and for encoding free-text data into a medical database by utilizing a Bayesian network to perform spell checking of words
US6058206A (en) * 1997-12-01 2000-05-02 Kortge; Chris Alan Pattern recognizer with independent feature learning
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US6567814B1 (en) * 1998-08-26 2003-05-20 Thinkanalytics Ltd Method and apparatus for knowledge discovery in databases
US6466929B1 (en) * 1998-11-13 2002-10-15 University Of Delaware System for discovering implicit relationships in data and a method of using the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993000651A1 (en) * 1991-06-28 1993-01-07 Digital Equipment Corporation Method for visually representing a volumetric set of non-geometric multidimensional data
EP0863469A2 (en) * 1997-02-10 1998-09-09 Nippon Telegraph And Telephone Corporation Scheme for automatic data conversion definition generation according to data feature in visual multidimensional data analysis tool

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7647096B2 (en) 2001-05-14 2010-01-12 Kent Ridge Digital Labs Methods and apparatus for calculating and presenting the probabilistic functional maps of the human brain

Also Published As

Publication number Publication date
AU5831600A (en) 2001-01-22
US6873325B1 (en) 2005-03-29
EP1206752A1 (en) 2002-05-22
FI991490A0 (en) 1999-06-30

Similar Documents

Publication Publication Date Title
Barra et al. 3D shape retrieval using kernels on extended Reeb graphs
Soman et al. Machine learning with SVM and other kernel methods
Singh et al. Topological methods for the analysis of high dimensional data sets and 3d object recognition.
Talbot et al. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers
Seeger Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations
Schreck et al. Techniques for precision-based visual analysis of projected data
Froyen et al. Bayesian hierarchical grouping: Perceptual grouping as mixture estimation.
Bespalov et al. Scale-space representation of 3d models and topological matching
US6873325B1 (en) Visualization method and visualization system
US6970884B2 (en) Methods and apparatus for user-centered similarity learning
Flores et al. Domains of competence of the semi-naive Bayesian network classifiers
Elad et al. Directed search in a 3D objects database using SVM
Kontkanen et al. Supervised model-based visualization of high-dimensional data
Punera et al. Soft cluster ensembles
Nabney et al. Semisupervised learning of hierarchical latent trait models for data visualization
Chen et al. Experiments with rough set approach to face recognition
Lebbah et al. A probabilistic self-organizing map for binary data topographic clustering
Singh et al. Image-based machine learning for reduction of user fatigue in an interactive model calibration system
Runkler Relational Gustafson Kessel clustering using medoids and triangulation
Shan Probabilistic Models on Fibre Bundles
Siedlecki et al. Mapping techniques for exploratory pattern analysis
Mountrakis et al. Adaptable user profiles for intelligent geospatial queries
Aitnouri et al. On comparison of clustering techniques for histogram pdf estimation
Mu et al. Automatic generation of co-embeddings from relational data with adaptive shaping
Pechenizkiy et al. On the Use of Information Systems Research Methods in Datamining

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2000944080

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 10019477

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2000944080

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWR Wipo information: refused in national office

Ref document number: 2000944080

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2000944080

Country of ref document: EP