« PreviousContinue »
(12) United States Patent ao) Patent No.: Us 7,542,947 B2
Guyonetal. (45) Date of Patent: Jun. 2,2009
(54) DATA MINING PLATFORM FOR BIOINFORMATICS AND OTHER KNOWLEDGE DISCOVERY
(75) Inventors: Isabelle Guyon, Berkeley, CA (US);
Edward P. Reiss, San Francisco, CA
(US); Rene Doursat, Sun Valley, NV
(US); Jason Aaron Edward Weston,
New York, NY (US); David D. Lewis,
Chicago, IL (US)
(73) Assignee: Health Discovery Corporation,
Savannah, GA (US)
( * ) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 0 days.
(21) Appl.No.: 11/928,641
(22) Filed: Oct. 30, 2007
(65) Prior Publication Data
US 2008/0097939 Al Apr. 24, 2008
Related U.S. Application Data
(63) Continuation of application No. 10/481,068, filed as application No. PCT/US02/19202 on Jun. 17, 2002, now Pat. No. 7,444,308, said application No. 11/928, 641 is a continuation-in-part of application No. 10/478,192, filed as application No. PCT/US02/16012 on May 20, 2002, now Pat. No. 7,318,051, which is a continuation-in-part of application No. 10/057,849, filed on Jan. 24, 2002, now Pat. No. 7,117,188, which is a continuation-in-part of application No. 09/633, 410, filed on Aug. 7, 2000, now Pat. No. 6,882,990.
(60) Provisional application No. 60/298,842, filed on Jun. 15, 2001, provisional application No. 60/298,757, filed on Jun. 15, 2001, provisional application No. 60/298,867, filed on Jun. 15, 2001, provisional application No. 60/191,219, filed on Mar. 22, 2000, provisional application No. 60/184,596, filed on Feb. 24, 2000, provisional application No. 60/168,703, filed on Dec. 2, 1999, provisional application No. 60/161,806, filed on Oct. 27, 1999.
(51) Int. CI.
G06F 7/00 (2006.01)
The data mining platform comprises a plurality of system modules, each formed from a plurality of components. Each module has an input data component, a data analysis engine for processing the input data, an output data component for outputting the results of the data analysis, and a web server to access and monitor the other modules within the unit and to provide communication to other units. Each module processes a different type of data, for example, a first module processes microarray (gene expression) data while a second module processes biomedical literature on the Internet for information supporting relationships between genes and diseases and gene functionality. In the preferred embodiment, the data analysis engine is a kernel-based learning machine, and in particular, one or more support vector machines (SVMs). The data analysis engine includes a pre-processing function for feature selection, for reducing the amount of data to be processed by selecting the optimum number of attributes, or "features", relevant to the information to be discovered.
22 Claims, 12 Drawing Sheets
Ben-Dor et al., "Clustering Gene Expression Patterns", Journal of
Computational Biology, vol. 6, Nos. 3/4, 1999, pp. 281-297.
Kim et al., "Retrieval of the Top N Matches with Support Vector
Machines", 2000 IEEE.
Kelly, "An Algorithm for Merging Hyperellipsoidal Clusters", 1994.
Pavlidis et al., Gene Functional Classification From Heterogeneous
Data, Proceedings of the 5th International Conference on Computa-
tional Biology, Apr. 2001, pp. 249-255.
Syed et al., A Study of Support Vectors on Model Independent
Example Selection, Proceedings of the 5th ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, Jul.
1999, pp. 272-276.
Sevon et al., TreeDt: Gene Mapping by Tree Disequilibrium Test, 2000 ACM 1-58113-000-0/00/0000.
Pfahringer et al., Preprocessing Tasks and Methods, Mar. 1999, Aus-
trian Research Institute for Al.
Yang et al., Data-Driven Theory Refinement Algorithms for
Bioinformatics, International Joint Conference on Neural Networks,
Jul. 1999, pp. 4064-4068.
Walker, R.L.,Parallel Clustering System Using the Methodologies of
Evolutionary Computations, Proceedings of the 2001 Congress on
Evolutionary Computation, pp. 831-838.
Moore, S.K., Harmonizing Data, Setting Standards [Genomics,
Information Sets], IEEE Spectrum, Jan. 2001, vol. 38, Iss 1, pp.
PCT/US02/19202 International Search Report issued Jan. 2, 2003. * cited by examiner