WO2001003053A1 - Visualization method and visualization system - Google Patents
Visualization method and visualization system Download PDFInfo
- Publication number
- WO2001003053A1 WO2001003053A1 PCT/FI2000/000603 FI0000603W WO0103053A1 WO 2001003053 A1 WO2001003053 A1 WO 2001003053A1 FI 0000603 W FI0000603 W FI 0000603W WO 0103053 A1 WO0103053 A1 WO 0103053A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- predictive
- model
- visualization
- distribution
- Prior art date
Links
- 238000012800 visualization Methods 0.000 title claims abstract description 89
- 238000007794 visualization technique Methods 0.000 title description 10
- 238000009826 distribution Methods 0.000 claims abstract description 116
- 239000013598 vector Substances 0.000 claims abstract description 114
- 238000000034 method Methods 0.000 claims abstract description 108
- 230000000007 visual effect Effects 0.000 claims abstract description 55
- 238000010276 construction Methods 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 14
- 230000001143 conditioned effect Effects 0.000 abstract description 3
- 238000010606 normalization Methods 0.000 abstract description 3
- 239000007787 solid Substances 0.000 abstract description 3
- 230000001131 transforming effect Effects 0.000 abstract description 3
- 238000005094 computer simulation Methods 0.000 abstract description 2
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000000704 physical effect Effects 0.000 abstract 1
- 238000007418 data mining Methods 0.000 description 8
- 238000013079 data visualisation Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 208000024799 Thyroid disease Diseases 0.000 description 1
- 125000002015 acyclic group Chemical group 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 208000019622 heart disease Diseases 0.000 description 1
- 239000005433 ionosphere Substances 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 208000021510 thyroid gland disease Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
Definitions
- the present invention relates to computerized system modeling, and more particularly to a method transforming a high-dimensional data domain into a low- dimensional visual representation. Specifically, the invention is directed to such a method as described in the preamble of claim 1.
- Computer visualization tools are needed for presenting the results of ever increasing amounts of processed data.
- the conventional approach is to take some few variables at a time, process them and their relations, for example, with a spreadsheet, and display the result, for example, as bar charts and die charts.
- this conventional approach produces typically a great number of charts, with a very weak connection to each other.
- the charts are typically presented in as a sequence of charts. From such a sequence of charts it is usually very difficult to see and comprehend the overall significance of the results.
- the data is processed instead of a spreadsheet with more elaborate techniques, such as statistical methods or neural networks, but the results are still typically presented in sequential form using conventional charts.
- a term data vector having a certain number of components refers to a data point having a certain number of attributes.
- the attributes/components may have continuous or discrete numerical values or they can have ordinal or nominal values.
- the data vectors are vectors of a data domain or a data space.
- high-dimensional data vectors are displayed using typically a two- or three-dimensional device.
- a corresponding visualization vector having usually two or three coordinates, which determine the location of a point representing the data vector on the display device, is determined typically for each data vector.
- Another even more significant limitation concerns the use of more elaborate conventional data dimension reduction methods that can be used to define a visualization vector for a data vector.
- the goal is to replace the original high- dimensional data vectors with much shorter vectors, while losing as little information as possible. Consequently, a pragmatically sensible data reduction scheme is such that when two data vectors are close to each other in the data space, the corresponding visualization vectors are also close to each other in the visualization space.
- the closeness of data vectors in the data space is in these methods defined via a geometric distance measure such as the Euclidean distance.
- the attributes of the data can be various and heterogeneous, and therefore various dimension of the data space can have different scaling and meaning.
- the geometric distances between the data vectors do not properly reflect the properties of complex data domains, where the data typically is not coded in a geometric or spatial form.
- changing one bit in a vector may totally change the relevance of the vector, and make it in some sense a quite different vector, although geometrically the difference is only one bit.
- a geometric distance metric for example the Euclidean distance metric, is a difficult task.
- all attributes are treated as equal.
- a third significant limitation is related to data mining.
- Data mining is a process that uses specific techniques to find patterns in data, allowing a user to conduct a relatively broad search in databases for relevant information that may not be explicitly stored in the data.
- a user initially specifies a search phrase or strategy and the system then extracts patterns and relations corresponding to that strategy from the stored data. It usually takes some time for extracting the patterns, and therefore the extracted patterns and relations are presented to the user by a data analyst with a delay. The probably invoked new requests cause a new processing cycle with a relatively long time delay.
- a data visualization tool/method that visually approximates in one instance the whole data domain although it includes a large number of variables.
- a tool/method where the results of the data mining process are visualized instantly and the data mining process is typically carried out in one session.
- An object of the invention is to realize a flexible visualization method.
- a further object of the invention is to realize a method, which is able to handle heterogeneous data straightforwardly and enables the visualization of heterogeneous data.
- Objects of the invention are achieved by constructing a set of probabilistic models, generating predictive distributions from this set of probabilistic models, and deteirnining visualization vectors corresponding to the data vectors using the predictive distributions.
- the method according to the invention is a method for generating visual representations of multidimensional data domains, which method comprises the steps of:
- the present invention is a method for transforming a multivariate data domain into a visual low-dimensional representation.
- the method utilizes probabilistic models of the data domain.
- a probabilistic model is a model, which associates with each point of the data domain a certain probability.
- there may be a certain set of predetermined models and the construction of a set of probabilistic models for a certain visualization process may mean, for example, the selection of models describing the data domain from the set of predetermined models.
- the selection of models, or more generally the construction of models can involve the use of a tiaining data set, some expert knowledge of the data domain and/or some logical constraints.
- the multidimensional space is converted to a low- dimensional space using a transformation, which maps each data vector in the domain space to a vector in a visual space having a lower dimension.
- the visual space typically has one, two or three dimensions.
- the transformation is such that when two vectors are close to each other in the domain space, the corresponding vectors in the visual space are also close to each other.
- usually an Euclidean distance is used to define the distance between vectors in the visual space, and the distance between vectors in the domain space is typically defined using at least one predictive distribution derived from the constructed probabilistic model. At least one of the constructed models is thus directly used in the visualization process to produce the predictive distribution(s).
- the set of probabilistic models may consist of one or more probabilistic models.
- the set of predictive distributions may consist of one or more predictive distributions. If more than one predictive distributions are generated, they may relate to one or more of the constructed probabilistic models. It is, for example, possible to have one constructed model and derive two predictive distributions from said model. A second example is to have two constructed models and two predictive distributions, where a first predictive distribution relates to one constructed model and a second predictive distribution relates to the other constructed model.
- the predictive distribution is used as input to the visualization process, where the visualization vectors corresponding to the data vectors are calculated.
- the predictive distribution can, for example, be used in estimating how close two data vectors are to each other.
- similarity of data vectors or, in other words, distance between data vectors
- Two data vectors in the domain space may be considered similar if they lead to similar predictions, when the data vectors are given as inputs to the constructed model.
- a first instance of the predictive distribution relating to a first data vector in the domain space is calculated, and a second instance predictive distribution relating to a second data vector in the domain space is calculated.
- the distance between the first and the second data vector in the domain space depends on the similarity of the first and second instances of the predictive distribution, in other words it depends on the distance between the first and second instances of the predictive distribution.
- Various distance metrics where the distance between data vectors is determined using instances of the predictive distribution, are discussed in the detail description of the invention.
- the predictive distribution corresponding to a data vector is typically a predictive distribution conditioned with the values of some components of the data vector.
- the data attributes, whose values are not used as conditions, are called target attributes.
- a method according to the invention may thus be a supervised data visualization method. This is very useful, for example, when a user knows in which data attributes he is interested in and can select these attributes as target attributes.
- the visualization method according to the invention is an unsupervised method.
- an unsupervised visualization method When an unsupervised visualization method is used, the user does not have to select any data attribute as target attribute. This is an advantage, for example, when among the data attributes there is no natural candidate for the target attribute. It is possible, for example, to make an unsupervised visualization work automatically, so that constructs the probabilistic model(s) using the data and then visualizes the data without a user tervening the visualization.
- a visual representation of the data domain is generated using the determined visual locations.
- a method according to the invention is very suitable for data mining, where domain experts try to capture interesting regularities from the visual image. Because at least one predictive distribution is used in dete ⁇ nining the visual locations, visualization according to the invention often efficiently reveals hidden structures in the data. In data mining, it is furthermore possible to view visualizations that relate to various target attribute sets, i.e. to various predictive distributions.
- At least one probabilistic model is constructed and it may be stored for further use.
- the probabilistic model is a Bayesian model, it is quite straightforward to produce predictive distributions using the probabilistic model.
- the present invention provides procedures for visually displaying and manipulating multi-dimensional data with, for example, the following advantages.
- Data visualization can be simplified as the visualization result is typically a two or three- dimensional plot.
- Information can be synthesized from data, as the visualization results may reveal hidden structures of the data, and at least partly as a result of the revealed structures, decision making can be simplified.
- Trends and data relationships can be more easily visualized and uncovered, for example, using various colors and/or markers are used to mark different attribute values in the visual representation.
- report generation can be simplified, and data administration can be performed more easily and understandably when one understands the domain better.
- the invention relates also to a visualization system, which comprises means for receiving data to be visualized, and which is characterized in that it further comprises
- Figure 1 illustrates examples of visualization results produced by a method according to a first advantageous embodiment of the invention
- Figure 2 illustrates first visualization results produced by a method according to the first advantageous embodiment of the invention and second visualization results produced using a conventional visualization method
- Figure 3 illustrates examples of visualization results produced by a method according to a second advantageous embodiment of the invention.
- Figure 4 illustrates a diagram of a system, which is an example of a system according to the present invention.
- letter M refers to a probabilistic model, which associates with each point of the data domain a certain probability.
- the model M relates to a probability distribution P(X ⁇ ,..., X travel ⁇ M) on the space of possible data vectors x, where a data vector has n attributes/components X t .
- a typical example of a probabilistic model is a parametric model where M is the structure of the model and ⁇ represents the parameters of the model. In this case, each parameterized instance (M, ⁇ ) of the parametric model produces a probability distribution P(X X , ... , X travel ⁇ M, ⁇ ).
- a probabilistic model used in a method according to the invention may be a supervised model or an unsupervised model.
- a supervised model means that, for example, one of the data attributes is selected as a class attribute, which is the focus of the visualization. In supervised models, the target attributes are thus typically selected already when the model is constructed. In unsupervised models it is not necessary to decide the target attributes when the model is constructed; they can be selected when the distances between the data vectors are determined.
- the probabilistic model M used in a method according to the invention may belong to a family of models known as Bayesian (belief) network models.
- a Bayesian network is a representation of a probability distribution over a set of (typically) discrete variables, consisting of an acyclic directed graph, where the nodes correspond to domain variables, and the arcs define a set of independence assumptions which allow the joint probability distribution for a data vector to be factorized as a product of simple conditional probabilities.
- Bayesian network models see e.g., (Pearl, 1988).
- One example of a Bayes network model which can be used in a method according to the invention, is the naive Bayes model.
- the naive Bayes model is a supervised model, where one of the data attributes is selected as a class variable.
- a description of the naive Bayes model can be found, for example, in (Kontkanen, Myllymaki, Silander, Tirri, 1998).
- a further example of a probabilistic model usable in a method according to the invention is a model belonging to a family of mixtures of Bayesian network models.
- a mixture of Bayesian network models is a weighted sum of several Bayesian network models.
- a training set of sample data, or many training sets from one or more data sources, can be used in constructing the probabilistic model(s).
- construction of a model refers to selecting a suitable model structure and suitable parameters for the selected model structure. Theoretically justifiable techniques for learning models from sample data are discussed in (Heckerman, 1996). It is also possible to use, alternatively or in addition to a framing set, further information about the data domain. For example, the model construction may be based at least partly on knowledge about the problem domain represented as prior distributions and/or as logical constraints.
- a framing set When a framing set is used, it is possible to use, for example, part of the data to be visualized as a fraining set and still use the whole data in the visualization process. In other words, it is possible that the fraining set is a subset of the data to be visualized. Furthermore, it is possible that data to be visualized is a subset of the fraining set or that the fraining set consists of the data to be visualized.
- a predictive distribution may be a conditional distribution for one or more of the domain attributes X t given the other attributes.
- X ⁇ x l5 ..., x N ⁇ denote a data matrix having N data vectors x,.
- Each data vector consists of n components, in other words the data has n attributes ... X tract.
- the attributes X t to be discrete.
- the predictive distribution is typically a conditional predictive distribution
- x t is the value of the attribute X t in data vector x
- x denotes that the values of those attributes, which are outside the target set X ⁇ ,..., X m , are assumed to have the attribute values of data vector x.
- an instance of the conditional predictive distribution For a given a data vector x ; it is possible to compute an instance of the predictive distribution. For example, an instance of the conditional predictive distribution is
- x k is the value of attribute X k in data vector x
- the instance of the predictive distribution means that a conditional probability (where the values of the other attributes are as indicated above) is associated with each possible value x k ⁇ , x k 2,- - ⁇ of each target attribute X k .
- the predictive distribution may be a conditional distribution for one or more latent attributes, given the constructed model. Furthermore, the predictive distribution may be a combination of a conditional distribution for at least one domain attribute and a conditional distribution for one or more latent attributes.
- X' denote a visualization matrix where each ⁇ -component data vector x t is replaced by a typically two or three-component visualization vector x,'.
- a visualization matrix X' can easily be plotted on a two- or three-dimensional display. Consequently, for visualizing high-dimensional data, we need to find a transformation (function), which maps each data vector in the domain space to a vector in the visual space.
- the corresponding visualization vectors should be close to each other in the visualization space.
- One way to determine the visual locations is to determine them using pairwise distances between the data vectors to be visualized.
- a criterion presented above is often minimized, but it is possible to find visualization vectors also using other criterion.
- a distance metric which involves a predictive distribution or predictive distributions, is typically scale invariant as we have moved from the original attribute space to the probability space. This also allows us to handle different type of attributes (discrete or continuous) in the same consistent framework. Furthermore, the framework is theoretically on a more solid basis as our domain assumptions must be formalized in the model M. There are various ways to define a similarity measure between, for example, two instances of a predictive distribution. In a method according to one embodiment of the invention, the following distance metric is used:
- MAP(x,) denotes the maximum posterior probability (MAP) assignment for the target attributes X ⁇ ,..., X m with respect to the selected predictive distribution, for example a predictive distribution presented in Equation 1.
- the MAP assignment is the one with the highest probability. For example, if there is only one target attribute a conditional predictive distribution P(X ⁇ ⁇ x c ) associates probabilities for each possible value Xu, x 12 ,... of the target attribute X ⁇ and MAP assignment for the target attribute X ⁇ is the value x lk having the highest probability.
- a first instance P(X ⁇ ⁇ x t ) of the predictive distribution associates first probabilities (Pa, Pi 2 ,.. -) and a second instance of the predictive distribution associates second probabilities ( , , P ⁇ ,...) for each possible value *n, x 12t ...
- Equation 3 it is the probability that a first random outcome drawn from a first instance P(X ⁇ ,..., X m ⁇ x t ) of a predictive distribution is different from a second random outcome drawn from a second instance P(X ⁇ ,...X m ⁇ of the predictive distribution.
- the pairwise distance between two data vectors x t and j is defined by
- MAP(X J ) denotes the maximum posterior probability assignment for the target attributes X ⁇ ,..., X m with respect to the selected predictive distribution.
- the distance metric defined in Equation 3 also here the distance between two data vectors x, and x, is dete ⁇ nined using a first instance P(X ⁇ ,... r X m ⁇ xf) and a second instance P(X ⁇ ,..., X m ⁇ X j ) of the selected predictive distribution.
- the distance metrics defined in Equations 3 and 4 are supervised, as some attributes are selected as target attributes. Consequently, a visualization method using either of these distance metrics is a supervised method.
- the pairwise distances by using more than one conditional predictive distribution.
- the pairwise distance between two data vectors x t and x is defined in the following way
- MAP ⁇ denotes the maximum posterior probability value of target attribute X k with respect to predictive distribution P(X k ⁇ x c ).
- P(X k ⁇ x c ) the maximum posterior probability value of target attribute X k with respect to predictive distribution P(X k ⁇ x c ).
- each attribute X k is in turn selected as a target attribute in a conditional predictive distribution.
- the distance metric defined in Equation 5 is unsupervised, as all attributes are treated equally. When this metric is used with unsupervised models, it is usually enough to construct one model, as various conditional predictive distribution can be obtained from an unsupervised model. If this metric is used with supervised models, it may be necessary to construct several probabilistic models. For example, if the naive Bayes model is used, typically n models are constructed for a certain data domain, and in each model a different attribute is selected as the class variable. From each model it is then possible to obtain a conditional predictive distribution relating to the class variable.
- the pairwise distance between two data vectors x t and x is defined as the symmetric Kullback- Leibler-divergence (see, for example, (German, Carlin, Stern, Rubin, 1995)) between a first instance P(X ⁇ ,..., X m ⁇ x t ) and a second instance P(X ⁇ ,...X m ⁇ X j ) of the predictive distribution conditioned with the variable value assignments present in a data vector.
- a Kullback-Leibler divergence has an infinite range, which may lead to computational problems with practical implementations.
- the visualization vectors are found minimizing Equation 2, in other word using the Sammon's mapping.
- the visualization space is a space where each dimension represents directly a component of an instance of a predictive distribution.
- a visualization vector x' corresponding to a data vector x, could be
- the first visual coordinate is the conditional probability that the attribute X ⁇ has the value x u -
- one probabilistic model which is the naive Bayes model mentioned above, is constructed.
- the naive Bayes model By fixing the model structure to the naive Bayes model, the problem of searching for a good model structure is avoided.
- the naive Bayes model produces very good results, and it is computationally quite simple.
- the naive Bayes model is constructed, for example, using part of the available data as a fraining set and using the rest of the data in the visualization.
- the class variable X n is used as the target attribute when the predictive distributions are calculated. Data vectors are thus visualized according to the classification distribution obtained by using the simple naive Bayesian network model.
- the dimension of the visual space is two or three and the pairwise distance between data vectors in the data space is defined by Equation 3.
- any search algorithm can be used, for example the following very straightforward stochastic greedy algorithm is used.
- the algorithm starts with a random visualization X 9 , changes a randomly selected visualization vector x to a randomly selected new visualization, and accepts the change if the value of criterion in Equation 2 is decreased.
- one visualization vector is changed at time.
- the new candidate visual vector are generated from a normal distribution centered around the current visual vector, which means that small moves are more likely to be suggested than large ones. This stepwise procedure is repeated, for example, one million times.
- Figure 1 presents six illustrative examples of the two-dimensional visualization produced using a method according to the first advantageous embodiment of the invention.
- Visualization vectors corresponding to data vectors having different class labels are indicated with different type of markers in Figure 1.
- the dataset being visualized are publicly available classification datasets from UCI data repository (Blake, Keogh, Merz, 1998).
- visualizations of the following datasets are shown: Australian Credit, Balance Scale, Connect-4, German Credit, Thyroid disease and Vehicle Silhouettes.
- the data shown in Figure 1 is various: some datasets comprise information relating to the credit card owners, one comprises information about patients having a certain disease, and one comprises information about vehicle silhouettes.
- the visualizations in Figure 1 show clearly structures in the data domains, and the visualization method according to the first advantageous embodiment of the invention can thus be used to visualize various data domains successfully.
- Figure 2 presents a comparative example, where a certain dataset (Breast Cancer from the UCI data repository) is visualized using a method according to the first advantageous embodiment of the invention (left-hand side panel of Figure 2) and using an Euclidean visualization method, where the distance between the data vectors is the Euclidean distance (right-hand side panel of Figure 2).
- the Equation 2 is also niinimized using a similar stochastic greedy gorithm as in a method according to the first advantageous embodiment of the invention and the number of steps in the algorithm is the same for both visualizations presented in Figure 2.
- the Euclidean visualization produces a scattered image without any noticeable trends.
- the visualization which is the result of a method according to a first advantageous embodiment of the invention, shows a clear structure.
- the method according to the first advantageous embodiment of the invention is thus more applicable to visualization and data mining than the Euclidean visualization and produces typically better results than the Euclidean visualization.
- a method according to the invention where for example naive Bayes model, a single fraining set and a stochastic greedy algorithm are used, is quite simple and computationally comparative to, for example, conventional visualization schemes employing Euclidian distance metrics in the data domain.
- the visualization can be obtained quite fast.
- the quality of visualizations produces using a method according to the invention can be further enhanced, for example, using a more versatile probabilistic model.
- a more versatile probabilistic model In general, if the naive Bayes model is used, the Sammon's mapping requires most computing resources. If more versatile models are used, then the construction of the probabilistic model may require also quite much computing resources.
- Figure 3 presents four illustrative examples of the two-dimensional visualization produced using a method according to a second advantageous embodiment of the invention, where the unsupervised distance metrics defined in Equation 5 and naive Bayes model are used.
- the unsupervised distance metrics defined in Equation 5 and naive Bayes model are used.
- several naive Bayes models describing the data are constructed here.
- Visualization vectors corresponding to data vectors having different class labels are indicated with different type of markers in Figure 3.
- the dataset being visualized are from UCI data repository.
- visualizations of the following datasets are shown: Breast Cancer (Wisconsin), Heart Disease (Hungarian), Ionosphere and Vehicle Silhouettes.
- an unsupervised visualization method according to the invention may clearly reveal hidden structures in data domains.
- the data to be visualized is data generated from said constructed model. This can be useful in e.g. domains where the amount of available data is so little that proper visualizations of the domains are hard to make. Generating data using the constructed probabilistic model, and then visualizing the generated data can also be used as a tool in gaining insight on the constructed probabilistic model.
- the invention relates also to a computer system for visualizing multidimensional data.
- the system comprises means for processing the data to achieve a model of the data domain, which can then be used for interactively developing and manipulating visual representations of the domain.
- the implementation as a software tool advantageously comprises means for storing the probabilistic model structures, means for constructing a probabilistic model of the data domain using the stored probabilistic model structure, as well as means for using the constructed model in a visualization process as described previously.
- the visual representation can be physically embodied in a computer-readable medium for visualization on a computer display device.
- the stored probabilistic model structures may be any model structures discussed above, and the construction of the probabilistic model and the deterrnining of the visual locations may be performed using any methods described above.
- Figure 4 illustrates a third advantageous embodiment of the invention.
- Figure 4 shows, how various components of a computer system interact providing the functionality of the inventive method.
- the computer system comprises means 100 for model construction, means 110 for location determination, means 120 for data visualization, means 130 for providing a user interface, and a processing unit 140.
- the means 130 for providing a user interface may for example comprise a display unit, a keyboard, a pointing device such as a mouse, and any other typical user interface elements of a computer system.
- the means 100 for model construction, means 110 for location determination, and means 120 for data visualization can advantageously be realized as program instructions stored in a memory medium and executed by the processing unit 140.
- one or more fraining data sets 150 may be used as inputs for the means 100 for model construction.
- the means for model construction 100 may comprise, for example, a certain set of predefined structures of parametric models and means for selecting a proper model structure and suitable parameters for the selected model structure.
- the probabilistic model or models 151 and at least one visualization data set 152 are input into means 110 for location determination for producing visual location data 153.
- the visual location data 153 is input into means 120 for data visualization for producing a visual representation of data.
- the data is visualized on a display device by using the visual locations determined according to the inventive method.
- the computer system further comprises means for allowing the user to manipulate the visual presentation according to different domain variable characteristics by using for example colors, shapes and animation.
- the visual display functions also as an interface to the data to be visualized so that the user can study the contents of the original data vector through the corresponding visual location in the visual representation. This means that, for example, by pointing a certain visual location in a display device with a mouse, the attributes of the corresponding data vector are shown to the user.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00944080A EP1206752A1 (en) | 1999-06-30 | 2000-06-30 | Visualization method and visualization system |
US10/019,477 US6873325B1 (en) | 1999-06-30 | 2000-06-30 | Visualization method and visualization system |
AU58316/00A AU5831600A (en) | 1999-06-30 | 2000-06-30 | Visualization method and visualization system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI991490A FI991490A0 (en) | 1999-06-30 | 1999-06-30 | visualization method |
FI991490 | 1999-06-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001003053A1 true WO2001003053A1 (en) | 2001-01-11 |
Family
ID=8554992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FI2000/000603 WO2001003053A1 (en) | 1999-06-30 | 2000-06-30 | Visualization method and visualization system |
Country Status (5)
Country | Link |
---|---|
US (1) | US6873325B1 (en) |
EP (1) | EP1206752A1 (en) |
AU (1) | AU5831600A (en) |
FI (1) | FI991490A0 (en) |
WO (1) | WO2001003053A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7647096B2 (en) | 2001-05-14 | 2010-01-12 | Kent Ridge Digital Labs | Methods and apparatus for calculating and presenting the probabilistic functional maps of the human brain |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3335602B2 (en) * | 1999-11-26 | 2002-10-21 | 株式会社クリエイティブ・ブレインズ | Thinking system analysis method and analyzer |
US7557805B2 (en) * | 2003-04-01 | 2009-07-07 | Battelle Memorial Institute | Dynamic visualization of data streams |
US20080071764A1 (en) * | 2006-09-19 | 2008-03-20 | Kazunari Omi | Method and an apparatus to perform feature similarity mapping |
US8060540B2 (en) | 2007-06-18 | 2011-11-15 | Microsoft Corporation | Data relationship visualizer |
US8423596B2 (en) * | 2009-02-05 | 2013-04-16 | Sean Gifford | Methods of multivariate data cluster separation and visualization |
KR102029055B1 (en) * | 2013-02-08 | 2019-10-07 | 삼성전자주식회사 | Method and apparatus for high-dimensional data visualization |
DE102015111549A1 (en) * | 2015-07-16 | 2017-01-19 | Wolfgang Grond | Method for visually displaying electronic output data sets |
US10795566B1 (en) * | 2017-06-05 | 2020-10-06 | Mineset, Inc. | Two dimensional evidence visualizer |
US10229092B2 (en) | 2017-08-14 | 2019-03-12 | City University Of Hong Kong | Systems and methods for robust low-rank matrix approximation |
CN108038790B (en) * | 2017-11-24 | 2021-10-15 | 东华大学 | Situation analysis system with internal and external data fusion |
US11847132B2 (en) | 2019-09-03 | 2023-12-19 | International Business Machines Corporation | Visualization and exploration of probabilistic models |
CN113096101A (en) * | 2021-04-15 | 2021-07-09 | 深圳市玻尔智造科技有限公司 | Defect detection method for mobile phone screen with default image-level label |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993000651A1 (en) * | 1991-06-28 | 1993-01-07 | Digital Equipment Corporation | Method for visually representing a volumetric set of non-geometric multidimensional data |
EP0863469A2 (en) * | 1997-02-10 | 1998-09-09 | Nippon Telegraph And Telephone Corporation | Scheme for automatic data conversion definition generation according to data feature in visual multidimensional data analysis tool |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5640468A (en) * | 1994-04-28 | 1997-06-17 | Hsu; Shin-Yi | Method for identifying objects and features in an image |
US6128613A (en) * | 1997-06-26 | 2000-10-03 | The Chinese University Of Hong Kong | Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words |
US6292771B1 (en) * | 1997-09-30 | 2001-09-18 | Ihc Health Services, Inc. | Probabilistic method for natural language processing and for encoding free-text data into a medical database by utilizing a Bayesian network to perform spell checking of words |
US6058206A (en) * | 1997-12-01 | 2000-05-02 | Kortge; Chris Alan | Pattern recognizer with independent feature learning |
US6192360B1 (en) * | 1998-06-23 | 2001-02-20 | Microsoft Corporation | Methods and apparatus for classifying text and for building a text classifier |
US6567814B1 (en) * | 1998-08-26 | 2003-05-20 | Thinkanalytics Ltd | Method and apparatus for knowledge discovery in databases |
US6466929B1 (en) * | 1998-11-13 | 2002-10-15 | University Of Delaware | System for discovering implicit relationships in data and a method of using the same |
-
1999
- 1999-06-30 FI FI991490A patent/FI991490A0/en unknown
-
2000
- 2000-06-30 WO PCT/FI2000/000603 patent/WO2001003053A1/en not_active Application Discontinuation
- 2000-06-30 AU AU58316/00A patent/AU5831600A/en not_active Abandoned
- 2000-06-30 EP EP00944080A patent/EP1206752A1/en not_active Ceased
- 2000-06-30 US US10/019,477 patent/US6873325B1/en not_active Expired - Lifetime
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993000651A1 (en) * | 1991-06-28 | 1993-01-07 | Digital Equipment Corporation | Method for visually representing a volumetric set of non-geometric multidimensional data |
EP0863469A2 (en) * | 1997-02-10 | 1998-09-09 | Nippon Telegraph And Telephone Corporation | Scheme for automatic data conversion definition generation according to data feature in visual multidimensional data analysis tool |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7647096B2 (en) | 2001-05-14 | 2010-01-12 | Kent Ridge Digital Labs | Methods and apparatus for calculating and presenting the probabilistic functional maps of the human brain |
Also Published As
Publication number | Publication date |
---|---|
AU5831600A (en) | 2001-01-22 |
US6873325B1 (en) | 2005-03-29 |
EP1206752A1 (en) | 2002-05-22 |
FI991490A0 (en) | 1999-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Barra et al. | 3D shape retrieval using kernels on extended Reeb graphs | |
Soman et al. | Machine learning with SVM and other kernel methods | |
Singh et al. | Topological methods for the analysis of high dimensional data sets and 3d object recognition. | |
Talbot et al. | EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers | |
Seeger | Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations | |
Schreck et al. | Techniques for precision-based visual analysis of projected data | |
Froyen et al. | Bayesian hierarchical grouping: Perceptual grouping as mixture estimation. | |
Bespalov et al. | Scale-space representation of 3d models and topological matching | |
US6873325B1 (en) | Visualization method and visualization system | |
US6970884B2 (en) | Methods and apparatus for user-centered similarity learning | |
Flores et al. | Domains of competence of the semi-naive Bayesian network classifiers | |
Elad et al. | Directed search in a 3D objects database using SVM | |
Kontkanen et al. | Supervised model-based visualization of high-dimensional data | |
Punera et al. | Soft cluster ensembles | |
Nabney et al. | Semisupervised learning of hierarchical latent trait models for data visualization | |
Chen et al. | Experiments with rough set approach to face recognition | |
Lebbah et al. | A probabilistic self-organizing map for binary data topographic clustering | |
Singh et al. | Image-based machine learning for reduction of user fatigue in an interactive model calibration system | |
Runkler | Relational Gustafson Kessel clustering using medoids and triangulation | |
Shan | Probabilistic Models on Fibre Bundles | |
Siedlecki et al. | Mapping techniques for exploratory pattern analysis | |
Mountrakis et al. | Adaptable user profiles for intelligent geospatial queries | |
Aitnouri et al. | On comparison of clustering techniques for histogram pdf estimation | |
Mu et al. | Automatic generation of co-embeddings from relational data with adaptive shaping | |
Pechenizkiy et al. | On the Use of Information Systems Research Methods in Datamining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000944080 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10019477 Country of ref document: US |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2000944080 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWR | Wipo information: refused in national office |
Ref document number: 2000944080 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000944080 Country of ref document: EP |