WO2001008036A2

WO2001008036A2 - Method and system for dynamic storage and validation of research data

Info

Publication number: WO2001008036A2
Application number: PCT/US2000/020287
Authority: WO
Inventors: Long Qu; Jian Wang; Christopher C. Harrington; D. Lansing Taylor; Mandy M. Raab Carson
Original assignee: Cellomics, Inc.
Priority date: 1999-07-27
Filing date: 2000-07-26
Publication date: 2001-02-01
Also published as: WO2001008036A3; AU6375800A

Abstract

A method and system for sharing validated experimental data (e.g., cell experimental data). Raw experimental data is accepted and divided into one or more raw data components using a modifiable data object model. The modifiable data object model includes a small hierarchical structure that does not waste storage space and can be expanded by users. The modifiable data object model can be modified by a user to add raw experimental data for virtually any experiment conducted by a user. The raw experimental data is validated and if desired made available to other researchers via a publicly accessible server on a computer network (e.g., the Internet), thereby allowing the validated experimental data to be shared by researchers via the computer network. The publicly accessible server is associated with a knowledge repository for accumulating a body of knowledge for researchers in a particular field (e.g., cell biology). The raw experimental data can also be automatically validated with a pre-determined validation process to create validated experimental data. The validated experimental data is made available immediately after validation on a publicly accessible server on a computer network. The present invention may also be used to further facilitate a user's understanding of biological functions, such as cell functions, to design experiments more intelligently and to analyze experimental results more thoroughly by making raw and validated experimental data immediately available via a computer network after input. Specifically, the present invention may help drug discovery scientists select better targets for pharmaceutical intervention in the hope of curing diseases. The method and system may also help facilitate the abstraction of knowledge from information for biological experimental data and provide new bioinformatic techniques.

Description

METHOD AND SYSTEM FOR DYNAMIC STORAGE AND VALIDATION

OF RESEARCH DATA

CROSS REFERENCES TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Application No. 60/145,770, filed on July 27, 1999.

U.S. GOVERNMENT RIGHTS This invention was made in part with support from the U.S. Government under Contract No. N00014-98-C-0326, awarded by the U.S. Office of Naval Research, an organization of the U.S. Department of Defense. The U.S. Government may have certain nonexclusive rights in this invention.

FIELD OF THE INVENTION This invention relates to storing, retrieving, analyzing and distributing experimental information. More specifically, it relates to a method and system for dynamic storing and validating of research data.

BACKGROUND OF THE INVENTION

Traditionally, cell biology research has largely been a manual, labor intensive activity. With the advent of tools that can automate cell biology experimentation.

The rate at which complex information is generated about the functioning of cells has increased dramatically. As a result, cell biology is not only an academic discipline, but also the new frontier for large-scale drug discovery. Cells are the basic units of life and integrate information from Deoxyribonucleic Acid ("DNA"), Ribonucleic Acid ("RNA"), proteins, metabolites, ions and other cellular components. New drug compounds that may look promising at a nucleotide level may be toxic at a cellular level. Thus, cell biology is becoming increasingly important to test now drug compounds. Florescence-based reagents can be applied to cells to determine ion concentrations, membrane potentials, enzyme activities, gene expression, as well as the presence of metabolites, proteins, lipids, carbohydrates, and other cellular components.

Innovations in automated screening systems for biological and other research are capable of generating enormous amounts of data. The massive volumes of feature-rich data being generated by these systems and the effective management and use of information from the data has created a number of very challenging problems. As is known in the art, "feature-rich" data includes data wherein one or more individual features of an object of interest (e.g., a cell) can be collected. For more information on feature-rich cell screening see "High content fluorescence-based screening," by Kenneth A. Guiliano, et al., Journal of Biomolecular Screening, Vol. 2, No. 4, pp. 249-259, Winter 1997, ISSN 1087-0571 , "PTH receptor internalization," Bruce R. Conway, et al., Journal of Biomolecular Screening, Vol. 4, No. 2, pp. 75-68, April 1999, ISSN 1087-0571, "Fluorescent- protein biosensors: new tools for drug discovery," Kenneth A. Giuliano and D.

Lansing Taylor, Trends in Biotechnology, ("TIBTECH"), Vol. 16, No. 3, pp. 99-146, March 1998, ISSN 0167-7799, all of which are incorporated herein by reference.

To fully exploit the potential of data from high- volume data generating screening instrumentation, there is a need for new informatic and bioinformatic tools. As is known in the art, "bioinformatic" techniques are used to address problems related to the collection, processing, storage, retrieval and analysis of biological information including cellular information. Bioinformatics is defined as the systematic development and application of information technologies and data processing techniques for collecting, analyzing and displaying data obtained by experiments, modeling, database searching, and instrumentation to make observations about biological processes. How to present, organize and analyze the complex information about cell functioning so that new knowledge can be generated is critical for both pharmaceutical research and basic cell biology research. Experimental data collected by bioinformatic systems are typically published in scientific journals. There are several problems associated with relying on scientific journals to publish experimental data collected from high-volume data generating screening instrumentation. One problem is that traditionally cell biology researchers rely on paper journals as the sole media to publish research data and to learn new ideas obtained from screening instrumentation. The peer review and publishing process for paper journals typically requires an extended time period (e.g., several months to several years) before new research data is made available and mailed to subscribers. Submitting research data for publication in paper journals slows the distribution of research data and may result in duplicated or overlapping experiments, thereby wasting time and resources. Another problem is that published research data is scatted across a wide spectrum of paper journals. Few researchers, scientists, or even organizations have the resources or the time to obtain and make use of even the most relevant articles that may relate to their current research interests. As a result, significant research efforts are often overlooked. Another problem is that innovative or radical new research data often have less chance to get published under current peer review system used for paper journals. However, such radical new ideas often lead to significant scientific breakthroughs.

Another problem is that it is often difficult to validate research data published for the first time in paper journals until other scientists and researchers have conducted similar experiments. The long delays in the publishing process for paper journals typically slows innovations that could be learned once research data is validated by others via similar experiments.

There have been attempts to solve some of these problems with on-line publishing of research data on publicly accessible computer network such as the Internet or intranets. However, many of these on-line publishing sites still suffer from many of the problems described above.

In addition, another problem with on-line publishing is that developing a format for generic electronic storage of research data is difficult. When building a database to store experimental data, a data object model is pre-determined by database developers and presented to users. The pre-determined data object model simplifies design, development, implementation, and maintenance of the data stored in a database. One approach known in the art is to construct a data object model based on a fixed tree structure with a fixed number nodes. For example, a first level of nodes in a tree structure for biological research data may include nodes for all of the biological species that researchers commonly conduct experimentation.

However, such an approach is typically not feasible for storing research data from multiple research data from labs with multiple researchers conducting experiments on many different species. Considering the large number of known biological species, such a fixed tree structure for a database may potentially include a node for every known biological species. Such a fixed tree structure would waste a tremendous amount of storage space, computationally expensive and slow to process. However, if the fixed tree structure does not include a node for all known biological species, then a user who desires a species that is not in the fixed tree structure could not add experimental data for the species to an on-line database.

Another problem with on-line publishing is that although experimental data can be immediately published, there is typically no timely validation, or any validation at all, of the on-line experimental data stored on-line.

Thus, it is desirable to provide a bioinformatic system that allows for dynamic input, storage and validation of research data. The bioinformatic system should allow scientists and researchers to immediately publish, share, organize and validate research data via a computer network.

SUMMARY OF THE INVENTION In accordance with preferred embodiments of the present invention, some of the problems associated with publishing experimental data are overcome. A method and system for dynamic storage and validation of experimental data is presented.

One aspect of the invention includes a method for sharing experimental data. The method includes accepting raw experimental data on a publicly accessible server on a computer network. The raw experimental data is divided into one or more raw data components using a modifiable data object model. The modifiable data object model includes a small hierarchical structure that can be expanded and used to capture new knowledge. The modifiable data object model can be modified by a user to add raw experimental data for virtually any experiment conducted by a user. In one embodiment of the present invention, the modifiable data object model may include a modifiable cell data object model. However, the present invention is not limited to such an embodiment.

If the raw experimental data can be validated using a pre-determined validation process, one or more knowledge objects are created and stored in a database. The one or more knowledge objects are made available to other researchers as validated experimental data via the publicly accessible server on the computer network, thereby allowing the validated experimental data to be shared by researchers via the computer network.

Another aspect of the invention includes a method for sharing validated experimental data. The method includes accepting raw experimental data on a publicly accessible server on a computer network. The publicly accessible server is associated with a knowledge repository for accumulating a body of knowledge for researchers in a particular field. The raw experimental data is automatically validated with a pre-determined validation process to create validated experimental data. The validated experimental data and associated raw experimental data is made available immediately after validation on the publicly accessible server on the computer network, thereby allowing the validated experimental data to be quickly shared by researchers via the computer network.

The present invention may also be used to further facilitate a user's understanding of biological functions, such as cell functions, to design experiments more intelligently and to analyze experimental results more thoroughly by making raw and validated experimental data immediately available via a computer network (e.g., the Internet) after input. Raw experimental data is continuously accepted to further validate a collected body of knowledge in a knowledge repository. Specifically, the present invention may help drug discovery scientists select better targets for pharmaceutical intervention in the hope of curing diseases. The foregoing and other features and advantages of preferred embodiments of the present invention will be more readily apparent from the following detailed description. The detailed description proceeds with references to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention are described with reference to the following drawings, wherein:

FIG. 1 illustrates an exemplary experimental data storage system for sharing experimental data;

FIGS. 2A and 2B are flow diagram illustrating a method for sharing experimental data; FIG. 3 is a block diagram illustrating a user-modifiable data object model;

FIG. 4 is a block diagram visually illustrating the exemplary user-modifiable data object model of FIG. 3;

FIG. 5 is a flow diagram illustrating a method for sharing validated experimental data; FIG. 6 is a block diagram illustrating an exemplary knowledge repository; and

FIG. 7 is a block diagram visually illustrating the creation of new knowledge with the method of FIG. 5.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Exemplary data storage system

FIG. 1 illustrates an exemplary experimental data storage system 10 for one embodiment of the present invention. The data storage system 10 includes one or more internal user computers 12, 14, (only two of which are illustrated) for inputting, retrieving, validating and analyzing experimental data on a private local area network ("LAN") 16 (e.g., an intranet). The LAN 16 is connected to one or more internal proprietary databases 18, 20 (only two of which are illustrated) used to store private proprietary experimental information that is not available to the public. The LAN 16 is connected to an publicly accessible database server 22 that is connected to one or more internal experimental information databases 24, 26 (only two of which are illustrated) comprising a publicly part of a data store for experimental data. The publicly accessible database server 22 is connected to a public network 28 (e.g., the Internet). One or more external user computers, 30, 32, 34, 36 (only four of which are illustrated) are connected to the public network 28, to plural public domain databases 38, 40, 42 (only three of which are illustrated) and databases 24, 26 including experimental data and other related experimental information available to the public. However, more, fewer or other equivalent data store components can also be used and the present invention is not limited to the data storage system 10 components illustrated in FIG. 1. In one specific exemplary embodiment of the present invention, data storage system 10 includes the following specific components. However, the present invention is not limited to these specific components and other similar or equivalent components may also be used. The one or more internal user computers, 12, 14, and the one or more external user computers, 30, 32, 34, 36, are conventional personal computers that include a display application that provide a Graphical User Interface ("GUI") application. The GUI application is used to lead a scientist or lab technician through input, retrieval, analysis and validation of experimental data and supports custom viewing capabilities. The GUI application also supports data exported into standard desktop tools such as spreadsheets, graphics packages, and word processors. The internal user computers 12, 14, connect to the one or more private proprietary databases 18, 20, the publicly accessable database server 22 and the one or more or more public databases 24, 26 over the LAN 16. In one embodiment of the present invention, the LAN 16 is a 100 Mega-bit ("Mbit") per second or faster Ethernet, LAN. However, other types of LANs could also be used (e.g., optical or coaxial cable networks). In addition, the present invention is not limited to these specific components and other similar components may also be used.

In one specific embodiment of the present invention, one or more protocols from the Internet Suite of protocols are used on the LAN 16 so LAN 16 comprises a private intranet. Such a private intranet can communicate with other public or private networks using protocols from the Internet Suite. As is known in the art, the Internet Suite of protocols includes such protocols as the Internet Protocol ("IP"), Transmission Control Protocol ("TCP"), User Datagram Protocol ("UDP"), Hypertext Transfer Protocol ("HTTP"), Hypertext Markup Language ("HTML"), extensible Markup Language ("XML") and others. The one or more private proprietary databases 18, 20, and the one or more publicly available databases 24, 26 are multi-user, multi-view databases that store experimental data. The databases 18, 20, 24, 26 use relational database tools and structures. The data stored within the one or more internal proprietary databases 18, 20 is not available to the public. Databases 24, 26. are made available to the public through publicly accessable database server 22 using selected security features (e.g., login, password, firewall, etc.)

The one or more external user computers, 30, 32, 34, 36, are connected to the public network 28 and to plural public domain databases 38, 40, 42. The plural public domain databases 38, 40, 42 include experimental data and other information in the public domain and are also multi-user, multi-view databases. The plural public domain databases 38, 40, 42, include such well known databases such as provided by Medline, Gen Bank. SwissProt, PDB, etc.

An operating environment for components of the data storage system 10 for preferred embodiments of the present invention include a processing system with one or more high speed Central Processing Unit(s) ("CPU") or other processor(s) and a memory system. In accordance with the practices of persons skilled in the art of computer programming, the present invention is described below with reference to acts and symbolic representations of operations or instructions that are performed by the processing system, unless indicated otherwise. Such acts and operations or instructions are referred to as being "computer-executed," "CPU executed," or "processor executed."

It will be appreciated that acts and symbolically represented operations or instructions include the manipulation of electrical signals by the CPU. An electrical system represents data bits which cause a resulting transformation or reduction of the electrical signals, and the maintenance of data bits at memory locations in a memory system to thereby reconfigure or otherwise alter the CPU's operation, as well as other processing of signals. The memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to the data bits. The data bits may also be maintained on a computer readable medium including magnetic disks, optical disks, organic memory, and any other volatile (e.g., Random Access Memory ("RAM")) or non-volatile (e.g., Read-Only Memory ("ROM")) mass storage system readable by the CPU. The computer readable medium includes cooperating or interconnected computer readable medium, which exist exclusively on the processing system or may be distributed among multiple interconnected cooperating processing systems that may be local or remote to the processing system. Sharing experimental data FIGS. 2 A and 2B are a flow diagram illustrating a Method 46 for sharing experimental data. In FIG. 2 A at Step 48, raw experimental data is accepted on a publicly accessible server on a computer network. At Step 50, the raw experimental data is divided into one or more raw data components using a modifiable data object model. The one or more raw data components include more one or more raw data attributes. The modifiable data object model includes a small hierarchical structure that can be expanded by users. The modifiable data object model functions as a filter to capture potential new knowledge as raw data components. At Step 52, the one or more raw data components are stored in a database. The database is accessible via the publicly accessible server. At Step 54. the raw data components are made available as raw experimental data in the database. At Step 56, a test is conducted to determine whether any of one or more raw data components can be validated using a predetermined validation process.

If any of the one or more raw data components can be validated at Step 56 in FIG. 2A, in FIG. 2B at Step 58 one or more validated data components are saved as one or more knowledge objects. The one or more knowledge objects include more one or more attributes. At Step 60, the one or more knowledge objects are stored in the database. At Step 62, the one or more knowledge objects are made available as validated experimental data in the database via the publicly accessible server on the computer network, thereby allowing the validated experimental data to be shared by researchers via the computer network.

Method 46 is illustrated with one specific exemplary embodiment of the present invention. However, the present invention is not limited to this embodiment. In such an exemplary embodiment at FIG. 2A at Step 48, raw experimental data is accepted on a publicly accessible server 22 on a computer network 28. For example, raw experimental data from feature-rich cell screening systems described above is accepted. However, the present invention is not limited to feature-rich cell data, and virtually any type of raw experimental data can be accepted.

In such an embodiment at Step 48, an electronic input form created in the Hyper Text Mark-up Language ("HTML"), or the extensible Mark-up Language ("XML") or other hardware independent mark-up languages known in the art is displayed for a user to enter raw experimental data. The input form allows the user to input raw experimental data. However, virtually any programming language can be used to create and display the electronic input form (e.g., C, C++, Visual Basic, Visual C++, Java, etc.) and the present invention is not limited to hardware independent mark-up languages.

At Step 50. the raw experimental data is divided into one or more raw data components using a modifiable data object model. The one or more raw data components include more one or more raw data attributes. The raw data attributes include localization, origin, function, effect, component, etc. In one preferred embodiment of the present invention, the modifiable data object model is a user-modifiable data object model. The user-modifiable data object model includes a small hierarchical structure that can be expanded by users but does not waste storage space by including entries for all possible types of experimental data that could be input by a user.

FIG. 3 is a block diagram illustrating a user-modifiable data object model 64. In one preferred embodiment of the present invention, the user-modifiable data object model 64 is used for biological experimental data. In such an embodiment, the user- modifiable data object model 64 includes a user-modifiable data object model template including a small hierarchical structure with N-levels.

In one embodiment of the present invention the user-modifiable data object model includes multiple biological functional units with attributes. The attributes include, for example, a localization of a biological functional unit, an origin of a biological functional unit, a function of a biological functional unit, a pharmacological effect of a biological functional unit, etc. A localization attribute, for example, defines a physical location of a biological functional unit in a cell, system or species, etc. For example, an origin of a biological functional unit can be determined from the small hierarchical structure. An exemplary user-modifiable data object model for biology includes, but is not limited to an origin defined by a species 66, localization in a hierarchical structure including a system 68, an organ 70, a tissue 72, a cell 74 and a compartment 76.

However, the user-modifiable data object model 64 is not limited to a hierarchical structure and the user-modifiable data object model can be used for virtually any type of experimental data. The present invention is not limited to experimental a user-modifiable data object model for biological data. In such an embodiment, the raw experimental data is divided into raw data components using the user-modifiable data object model 64. The raw data components include one or more raw data attributes. The one or more attributes include, but are not limited to, for example, species 66, system 68, organ 70, tissue 72, cell 74 or compartment 76 based attributes. For example, the cell 74 data component may include a cell level process attribute, a cell level pathway attribute, a cell level space attribute, etc.

FIG. 4 is a block diagram visually illustrating an exemplary user-modifiable data object model 78 that is used to define two biological functional units. The user- modifiable data object model 78 allows users to dynamically modify the small hierarchy and insert new nodes at any level in the hierarchy dynamically to insert his/her own raw experimental data. The user-modifiable data object model 78 illustrated in FIG. 4 currently includes only two biological functional units. The first biological functional unit 80 is experimental data that comes from a human cardiac cell (82, 84, 86, 88). The second component is experimental data that comes from yeast 90. Not all species will include components for all levels in the hierarchy.

For example, experimental data for yeast 90 may include only cell experimental data 92 because yeast is a single cell-organism without defined systems, organs or tissues. In addition, in this example no data has been included for cell compartments 76 for the human cardiac cells 80.

In one embodiment of the present invention, the user-modifiable data object model is a "tree" with multiple nodes. As is known in the art, a tree is a data structure comprising one or more nodes that are linked together in a hierarchical fashion. One node is the root node; nodes except the root node are children of another node; and each node has zero or more nodes as children. In another embodiment of the present invention, the hierarchy can be represented with a graph. As is known in the art, a "graph" is a data structure comprising one or more nodes and one or more edges, which connect pairs of nodes. If any two nodes in a graph can be connected by a path along edges, the graph is said to be "connected."

In another embodiment of the present invention, the hierarchy can be represented with a directed graph. As is known in the art, a "directed graph" is a graph whose edges have a direction. An edge in a directed graph not only relates two nodes in a graph, but it also specifies a predecessor-successor relationship. A "directed path" through a directed graph is a sequence of nodes, (ni , n , ... n_k) , such that there is a directed edge from n, to n,₊ι for all appropriate i.

The user-modifiable data object model can be represented exclusively by a tree, exclusively by a graph, a directed graph, or any combination thereof. In addition, the present invention is not limited to trees or graphs and other hierarchical structures can also be used.

Returning to FIG. 2A at Step 52, the one or more raw data components are stored in a database 24. 26. The database 24, 26 is publicly available via publicly accessible database server 22. At Step 54, the raw data components are made available as raw experimental data in the database 24, 26 via the public network 28. At Step 56, a test is conducted to determine whether any of one or more raw data components can be validated using a pre-determined validation process. In one embodiment of the present invention, a validation index is assigned in one of two ways: ( 1 ) manual assignment method by an editorial board; or (2) preferably, using an automated method. If manual assignment method is completed, an editorial board made up of distinguished scientists will confer periodically to manually assess the credibility of the information associated with an entity or transformation. A validation index (e.g., from zero to one hundred) is assigned. A validation index of zero indicates a lowest level of validity for the information (e.g., results from a single experiment, an unknown researcher, etc.). A validation index of one hundred indicates a very high level of validity for the information (e.g., similar results obtained from many different experiments, a well-known researcher, etc.).

If automatic assignment is used, an automated (i.e., without further input) method is used to take into account multiple pre-determined factors that contribute to the validity of a piece of biological information. The predetermined factors are evaluated to calculate a validation index. The pre-determined factors may include for example, but are not limited to, such factors as a number of experiments or references used to create the raw experimental data, a quality of a source of an experiment or reference, what type of experiment was used to acquire the raw experimental data, what was the quality of an experiment used to acquire the raw experimental data, a reputation, if any, of the researcher that supplied the raw experimental data. More, fewer or equi\ alent factors can also be used.

If any of the one or more raw data components can be validated at Step 56 in FIG. 2A, in FIG. 2B at Step 58 one or more knowledge objects are created. Raw experimental data is accepted and validated continuously to further validate the one or more knowledge objects created. At Step 60, the one or more knowledge objects are stored in the database 24, 26. At Step 62, the one or more knowledge objects are made available as validated experimental data in the database 24, 26 via the publicly accessible server 22 on the public computer network 28. This allows the validated experimental data to be shared by researchers via the public computer network 28. Method 46 may allow scientists and researchers to publish, share, organize, present, validate and distribute experimental data in an integrated on-line environment. This method may significantly speed up biomedical research drug discovery by providing quick access to raw and validated experimental data using a modifiable data object model. Modifiable cell data object model

In one embodiment of the present invention, the modifiable data object model 64 is designed specifically for cellular information at the cell 74 and compartment levels 76. In such an embodiment, a biological functional unit for a cell 74 includes a cell hierarchy and hierarchy sub-levels for attributes including type of cells, health of cells (e.g., cancerous), stage in a cell cycle (e.g., mitosis), line of cells (primary or modified), etc. A biological functional unit for a compartment 76 includes a compartment hierarchy and hierarchy sub-levels for attributes including a localization, an origin, function, pharmacological effect, etc. Not all attributes will be used in all instances for cells or compartments.

The modifiable cell object data model is illustrated with respect to the compartment level 76. The compartment localization attribute, for example, defines a physical location of a compartment in a cell. The origin attribute, for example, describes a specific type of a compartment. The function attribute, for example, describes a physiological function of a component. The pharmacological effect attribute, for example, describes potential effects a new drug candidate compound may activate or effect. In one embodiment of the present invention, the compartment hierarchy includes, but is not limited to. hierarchical sub-levels for cellular sub-components including: actin filaments, intermediate filaments, microtubles, golgi network, cis- cisterna, medial-cisterna, trans-cisterna, secretory vesicle, golgi vesicle, inner membrane, outer membrane, matrix, inter-membrane space, nuclear pores, nuclear envelop, nucleolus, chromatin, lipid bi-layers membrane, ion channels or receptors.

In one embodiment of the present invention, the compartment localization attribute includes, but is not limited to, hierarchical sub-levels for organelles including: chloroplasts, cytoplasm, golgi apparatus, lysosomes, mitochondria, nucleus, peroxisomes, plasma membranes, rough endoplasmic reticulums, smooth endoplasmic reticulums, or vacuoles.

However, the present invention is not limited to such embodiments and other embodiments with more, fewer or equivalent hierarchical levels or sub-hierarchical levels can also be used. In addition, more, fewer or equivalent cellular or compartment attributes can also be used.

The user-modifiable data cell object model includes a small, cell related hierarchical structure with a small sub-hierarchical levels that can be expanded by users. This allows users to add virtually any type of experimental data related to cells 74 or compartments 76. Sharing validated experimental data

FIG. 5 is a flow diagram illustrating a Method 94 for sharing validated experimental data. At Step 96, raw experimental data is accepted on a publicly accessible server on a computer network using a modifiable data object model. The publicly accessible server is associated with a knowledge repository for accumulating a body of knowledge for researchers in a particular field. As is known in the art, a "body of knowledge" includes reasoning and problem-solving approaches (i.e., experiments) that a researcher uses to obtain raw experimental data.

At Step 98, the raw experimental data is validated automatically (i.e., without additional input) with a pre-determined validation process to create validated experimental data. At Step 100, the validated experimental data and associated raw experimental data is made available immediately after validation on the publicly accessible server on the computer network. Validated and raw experimental data can be immediately shared by researchers via the computer network. New knowledge is added to the body of knowledge. Method 94 is illustrated with one specific exemplary embodiment of the present invention. However, the present invention is not limited to this embodiment. In such an embodiment, at Step 96, raw experimental data is accepted on a publicly accessible server 22 on a computer network 28 using the modifiable data object model described above. The publicly accessible server 22 is associated with a knowledge repository (e.g., database 24, 26) for an accumulated body of knowledge for researchers in a particular field (e.g., cellular biology).

FIG. 6 is a block diagram illustrating an exemplary knowledge repository 102. The knowledge repository 102 includes a centralized or distributed database 24,26 with a publicly accessible server 22 that accepts raw experimental data (e.g., raw cellular experimental data) from research labs (or researchers) 1 through N, 104, 106, 108. 1 10, via the public computer network 28.

At Step 98, the raw experimental data is validated automatically with a predetermined validation process to create validated experimental data. In one embodiment of the present invention, the pre-determined automatic validation process is the automated validation process described above for Method 46. However, other automatic validation processes can also be used and the present invention is not limited to the automatic validation process described.

At Step 100, the validated experimental data and associated raw experimental data is made available immediately after validation on the publicly accessible server 22 on the computer network 28. This allows the validated and raw experimental data to be immediately shared by researchers via the computer network 28 and adds new knowledge to the knowledge base in the knowledge repository 102.

FIG. 7 is a block diagram visually illustrating the creation of new knowledge 1 12 with Method 96. At Step 96, raw experimental data 1 14 is accepted using the modifiable data object model. At Step 98, the raw experimental data 1 14 is validated automatically with a pre-determined data validation process 116 to create validated experimental data 1 18. The validated experimental data 1 18 comprises new knowledge 120. At Step 100, the validated experimental data 1 18 and associated raw experimental data 1 14 is made available immediately after validation on the publicly accessible server 28 on the computer network 28.

The methods and system described herein can also be used in conjunction with the methods and system of co-pending U.S. Application No. 09/507,577, entitled "Method and System for Dynamic Storage, Retrieval and Analysis of Experimental Data with Determined Relationships," assigned to the same Assignee as the present application. For example, the methods and system described herein may be included in an implementation of a cell pathway editor such as the exemplary cell pathway editor described in this co-pending application, to help increase cell pathway knowledge.

The methods and system described herein allows scientists and researchers to electronically publish, share, organize, validate and retrieve research data in an integrated on-line environment. The method and system may also be used to further facilitate a user's understanding of biological functions, such as cell functions, to design experiments more intelligently and to analyze experimental results more thoroughly. Specifically, the present invention may help drug discovery scientists select better targets for pharmaceutical intervention in the hope of curing diseases. In view of the wide variety of embodiments to which the principles of the present invention can be applied, it should be understood that the illustrated embodiments are exemplary only. The illustrated embodiments should not be taken as limiting the scope of the present invention. For example, the steps of the flow diagrams may be taken in sequences other than those described, and more or fewer elements may be used in the block diagrams. While various elements of the preferred embodiments have been described as being implemented in software, in other embodiments in hardware or firmware implementations may alternatively be used, and vice-versa. The claims should not be read as limited to the described order or elements unless stated to that effect. Therefore, all embodiments that come within the scope and spirit of the following claims and equivalents thereto are claimed as the invention.

Claims

WE CLAIM:

1. A method for sharing experimental data, comprising: accepting raw experimental data on a publicly accessible server on a computer network; dividing the raw experimental data into a plurality of raw data components using a modifiable data object model, wherein the plurality of raw data components include a plurality of raw data attributes; storing the plurality of raw data components in a database, wherein the database is accessible via the publicly accessible server; making the plurality of raw data components available as raw experimental data in the database; and determining whether any of the plurality of raw data components can be validated using a pre-determined validation process, and if so, creating a plurality of knowledge objects using the modifiable data object model, wherein the plurality of knowledge objects include a plurality of validated data attributes; storing the plurality of knowledge objects in the database; and making the plurality of knowledge objects available as validated experimental data in the database via the publicly accessible server on the computer network, thereby allowing the validated experimental data to be shared by researchers via the computer network.

2. A computer readable medium having stored therein instructions for causing a central processing unit to execute the method of Claim 1.

3. The method of Claim 1 wherein the computer network is the Internet.

4. The method of Claim 1 wherein the step of determining whether any of the plurality of raw data components can be validated using a pre-determined validation process includes assigning a numerical validation index to the plurality of knowledge objects.

5. The method of Claim 4 wherein the numerical validation index is stored in the database.

6. The method of Claim 1 wherein the pre-determined validation process includes considering validation factors including a number of experiments or references used to create raw experimental data, a quality of a source of an experiment or reference, what type and quality of experiment was used to acquire the raw experimental data, or a reputation, if any, of the researcher that supplied the raw experimental data, that are used to create a numerical validation index.

7. The method of Claim 1 wherein the step of making the plurality of knowledge objects available as validated experimental data in the database includes making the validated experimental data components available with a numerical validation index, wherein the numerical validation index indicates a validity confidence level for the validated experimental data components.

8. The method of Claim 1 wherein the modifiable data object model includes a modifiable cell data object model.

9. The method of Claim 7 wherein the modifiable cell object model includes one or more biological functional unit hierarchies including a compartment hierarchy with a localization, origin, function or effect attribute sub-hierarchy.

10. The method of Claim 9 wherein the compartment hierarchy includes hierarchical levels for: actin filimaments, intermediate filaments, microtubles, golgi network, cis-cisterna, medial-cisterna, trans-cisterna, secretory vesicles, giogi vesicles, inner membrane, outer membrane, matrix, inter-membrane space, nuclear pores, nuclear envelop, nucleolus, chromatin, lipid bi-layers membrane, ion channels or receptors.

11. The method of Claim 9 wherein the localization sub-hierarchy includes hierarchical sub-levels for: chloroplasts, cytoplasm, golgi apparatus, lysosomes, mitochondria, nucleus, peroxisomes, plasma membranes, rough endoplasmic reticulums, smooth endoplasmic reticulums, or vacuoles.

12. The method of Claim 1 wherein the raw data components are stored in a tree, a graph or a directed graph created with a modifiable data object model.

13. The method of Claim 1 wherein the knowledge objects are stored in a tree, a graph or a directed graph created with a modifiable cell data object model.

14. The method of Claim 1 wherein the raw experimental data includes raw cellular biology experimental data.

15. The method of Claim 1 wherein the knowledge data includes validated cellular biology experimental data.

16. A method for sharing validated experimental data, comprising: accepting raw experimental data on a publicly accessible server on a computer network using a modifiable data object model, wherein the publicly accessible server is associated with a knowledge repository for accumulating a body of knowledge for researchers in a particular field; validating automatically the raw experimental data with a pre-determined validation process to create validated experimental data; and making the validated experimental data and associated raw experimental data available immediately after validation on the publicly accessible server on the computer network, thereby allowing the validated experimental data to be shared by researchers via the computer network, and hereby adding new knowledge to the body of knowledge.

17. A computer readable medium having stored therein instructions for causing a central processing unit to execute the method of Claim 16.

18. The method of Claim 16 wherein the step of validating automatically the raw experimental data with a pre-determined validation process includes considering a number of experiments or references used to create raw experimental data, a quality of a source of an experiment or reference, what type and quality of experiment was used to acquire the raw experimental, or a reputation, if any, of the researcher that supplied the raw experimental data to create a validation index for the validated experimental data.

19. The method of claim 18 wherein the validation index is associated with the validated experimental data.

20. The method of Claim 16 wherein the computer network is the Internet.

21. The method of claim 16 wherein the body of knowledge includes cellular biology knowledge.

22. A experimental data validation system, comprising in combination: a modifiable data object model for dividing raw experimental data into a plurality of raw data components and for creating a plurality of knowledge objects from the plurality of raw data components, wherein the modifiable data object model is modifiable by a user as raw experimental data is accepted; an experimental data validator for validating raw experimental data with a predetermined validation process to create validated experimental data; a database for storing raw experimental data accepted from a user and validated experimental data created by the experimental data validtor; and a publicly accessible server on a computer network associated with the database for allowing raw experimental data to be shared by researchers via the computer network immediately after validation by the experimental data validator.