US 20050260663 A1
The invention develops models of functional proteomics. Simulation scenarios of protein pathway vectors and protein-protein interactions are modeled from limited information in protein databases. The system focuses on three integrated subsystems, including (1) a system to model protein-protein interactions using an evolvable Global Proteomic Model (GPM) of functional proteomics to ascertain healthy pathway operations, (2) a system to identify haplotypes customized for specific pathology using dysfunctional protein pathway simulations of the function of combinations of single nucleotide polymorphisms (SNPs) so as to ascertain pathology mutation sources and (3) a pharmacoproteomic modeling system to develop, test and refine proposed drug solutions based on the molecular structure and topology of mutant protein(s) in order to manage individual pathologies. The system focuses on simulating the degenerative genetic disease categories of cancer, neurodegenerative diseases, immunodegenerative diseases and aging. The system reveals approaches to reverse engineer and test personalized medicines based upon dysfunctional proteomic pathology simulations.
1. A bioinformatics system for functional proteomics modelling, the system comprising one, two or all three of the following:
a first subsystem which involves development of an evolvable Global Proteomics Model, which uses data from the Human Genome Project and from protein and genetic databases on structural proteomics and which supplies a foundation for simulations of healthy protein-protein interactions;
a second subsystem which involves development of simulations to identify the operation and source of individual diseases in dysfunctional protein-protein interactions; and
a third subsystem which involves development of simulations for pharmacoproteomics in which prospective drug targets are modelled, tested and refined for optimum effectiveness for individualized therapy.
2. A bioinformatics system as claimed in
3. A bioinformatics system as claimed in
4. A bioinformatics system as claimed in
5. A bioinformatics system as claimed in
6. A bioinformatics system as claimed in
7. A bioinformatics system as claimed in
8. A bioinformatics system as claimed in
9. A bioinformatics system as claimed in
10. A bioinformatics system as claimed in
11. An adaptive dynamic computer system for modelling functional proteomics having a plurality of system layers interconnected to one another, comprising:
a first layer including human genome databases;
a second layer including structural proteomic libraries;
a third layer including a global proteomic model;
a fourth layer including functional proteomic maps;
a fifth layer including modelling of protein behaviours;
a sixth layer including a multi agent system of intelligent mobile software agents;
a seventh layer including simulations of protein interactions;
an eighth layer including individual pathology identification of mutation combinations;
a ninth layer including pharmacoproteomics;
a tenth layer including a pathology applications category typology;
an eleventh layer including oncoproteomics, neuroproteomics, immunoproteomics and gerontoproteomics.
The present application claims the benefit of priority under 35 U.S.C. 119 from U.S. Provisional Patent Application Ser. No. 60/572,716, filed on May 19, 2004, the disclosures of which are hereby incorporated by reference in their entirety for all purposes.
Field of Invention
The present invention pertains to computational biology, post-genomic informatics, structural proteomics and functional proteomics. The invention uses evolutionary computation approaches to design and select simulation scenarios of protein-protein interactions for functional proteomic modeling.
Prior art patent applications that apply to the present invention mainly involve structural proteomics mapping, protein pathway discovery mapping and specific disease application protein mapping.
In Rzhetsky (molecular interaction network prediction), U.S. patent application publication No. 20030068610, Palsson (operational reaction pathway identification), U.S. patent application No. 20040072723, Heal (protein sequence interaction rule prediction), U.S. patent application No. 20030059844, and Gustafsson (functional biomolecule identification), U.S. patent application No. 20040072245, systems are developed to identify structural protein relationships. Unknown molecular interactions, protein sequence activity relationships and protein reaction pathways are mapped using computational methods involving data search space development, probabilistic analysis, comparison analysis or rule prediction. These approaches are limited to structural proteomics mapping.
Lett (image-based biological simulations), U.S. patent application No. 20030018457, teaches a method to simulate structural protein image data in time series to modify model predictions. Ramnarayan (structural protein modeling of polymorphisms for drug design), U.S. patent application No. 20030158672, compares healthy and mutant structural protein 3-D modeling for pharmacogenomics drug design. These patent applications model 3-D or time series data but are limited to isolated proteins' structures.
Liu (neurological disorder inhibitor), U.S. patent application No. 20020006606, presents a model to inhibit JNK and MLK kinase activity to prevent neuronal cell death in neurodegenerative disease. This approach does not model the process of protein function in this specific disease application to show how the proposed therapy is effective.
Most of the research history involving the technologies of the present system—including structural protein prediction, protein pathway prediction, protein model generation, SNP identification, personalized medicine and evolutionary computation—is represented in the academic literature described below.
The development of proteomics is fairly recent. The massive data sets derived from the human genome present a vast treasure of information about proteins. Theorists from biology and chemistry have built models in which the genetic data are useful for understanding individual protein structures. Data about the structure of individual proteins are input into a multiplicity of protein databases. These databases include the Berkeley Structural Genomics Center, Joint Center for Structural Genomics, Oxford Protein Production Facility, Protein Structure Factory and Structural Proteomics in Europe. In addition to structural proteomic (SP) data collection resources, there are a number of protein interaction databases: the Biomolecular Interaction Network Database, the Database of Interacting Proteins, The General Repository for Interacting Datasets, the Human Protein Interaction Database and the Human Protein Reference Database. These databases generally input protein information collected by biomolecular researchers. But the problem emerges of how to organize this vast data reservoir in order to improve our understanding of protein processes.
Much research in bioinformatics is directed to the prediction of protein structures from raw protein data. The goal here is to model individual proteins in a 3-D way akin to capturing portraits of a range of individuals. This work is preliminary to understanding the operation and functioning of proteins in specific cellular pathways.
Professor Kim et al., at the University of California, Berkeley, have taken a step towards providing order to these protein data sets. Kim used computer analyses to calculate the relationships within a sampling of human proteins in order to develop a structural proteomic computer model. In this research, a 3-D representation of the protein fold space is presented, which is generally considered to be a sort protein periodic table (PPT). This SP data is organized to plainly show the evolution of protein structures from simple to complex forms. In this preliminary work, however, Kim does not place the PPT model into a functional model in order to give operational meaning to the fundamental protein structure data. Simulations based on the PPT are thus restricted in terms of their useful functional information.
Paek et al. at the University of Seoul in the Republic of Korea have presented a multi-layered model to represent cell signaling pathways. Software, such as Vector PathBlazer (and others), is also available to map biological pathways and present protein-protein interaction analysis, though it is generally limited and restricted because it relies on genomic and SP data sets. Using software tools for functional protein modeling, a new generation of biosystems modeling is available that will rapidly accelerate our understanding of genetic information.
The HAPMAP is a database that collects information about haplotypes, combinations of single nucleotide polymorphisms (SNPs). This genetic mutation information is significant for the identifying of disease sources. However, the HAPMAP focuses on common haplotypes and not specific individuals' haplotypes and hence is not useful in the development of personalized medicine.
Personalized medicine that takes information about an individual's disease, uses experimental biological and computer techniques to trace the source to the genetic level, develops a combination of drugs to treat the disease and refines the therapy in a customized way is the goal of physicians and biological researchers. Yet only since the human genome has been deciphered has this goal of pharmacogenomics been possible. So far, only small advances have been made in which specific mutations in individuals with specific diseases, such as forms of cancer, have been traced to the genomic source. In these cases, customized combination drug therapies targeted to individual pathologies manage the disease.
The field of bioinformatics applies computational analysis to the biological sciences. One main research model for bioinformatics has been the application of artificial intelligence to biological systems. Koza and G. Fogel have done early research in this field. Koza's research on genetic programming, building on Holland's research in genetic algorithms, generally emulates biological processes of evolution by developing multiple generations of programs based on principles of mutation, sexual reproduction and natural selection in order to solve complex optimization problems. Guyon (pattern identification in biological systems), U.S. patent application No. 20030172043, presents methods that use Support Vector Machines and Recursive Feature Elimination by optimizing training weights in a classifier for pattern identification. While this method applies EC techniques to gene and SP classification, it does not produce FP activity patterns that are useful for understanding proteomic processes.
Finally, the Santa Fe Institute (SFI) has accomplished sophisticated computational analyses of biological processes. SFI researchers have developed EC models for application to biological self-organizing systems in an effort to emulate these complex processes. By simulating genetic interactions, these researchers have developed a paradigm to understand the functional operation of complex evolutionary systems. However, this highly theoretical work has failed to provide useful systematic functional proteomic models or pharmacoproteomic models.
While the identification of the architecture of genes in the Human Genome Project (HGP) presents information on the construction of individual proteomic structures, much more needs to be done to advance our understanding of proteomic function. For example, if genetic diseases are caused by unique combinations of genetic mutations, the identification of these mutations is critical to understanding disease sources and finding solutions. Development of the HGP thus enables a shift in the emphasis in the biological sciences toward a personalized identifying and curing of disease. The field of human genetics shifts its emphasis to proteomics, pharmacogenomics and pharmacoproteomics.
The use of advanced computational analysis is fundamental to the field of proteomics. While most proteomics research so far has focused on predicting 3-D representations of protein structures, much work is yet to be done on understanding the operation of protein interactions in cellular pathways. One application of evolutionary computation to functional proteomics, for instance, is to compute the values of training weights of protein interactions so as to accurately emulate optimal FP operations.
Though preliminary to our understanding of protein operations, these research streams leave much yet to be done.
Now that the human genome has been sequenced, the next frontier for the biological sciences is post-genomic informatics and proteomics. Proteomics, the computational analysis of proteins, is divided into structural proteomics and functional proteomics. Structural proteomics seeks to understand the organizational properties of proteins from their twenty amino acid components, including geometrical and topological characteristics of protein configurations. Functional proteomics seeks to understand how proteins interact in a dynamic cellular environment.
Whereas genomics has been concerned with identifying the thirty-six thousand genes in the human genome, which consist of about three billion nucleic acid components, proteomics is concerned with a hundred times more information. Since cellular behavior is constituted of the interactions of hundreds of thousands of proteins, it is critical to understand interactions within this complex system if we are to understand the healthy, and pathological, operations of biology. By identifying the causes and organization of pathological proteomic interactions, researchers may be able not only to understand their genetic causes but also to design effective therapies.
There are several key questioned raised by functional proteomics. How can functional maps of proteins be organized from limited information? How can genetic information be connected to proteomic function and pathology? How can the function of certain proteins be predicted based on analogous protein structures, functions and interactions? How can multivariate simulations be designed that posit various protein pathway scenarios? How can dynamic simulations of proteomic processes be designed that present a methodology to select optimal as well as suboptimal simulation scenarios?
How can protein irregularities and pathologies be modeled? How can cellular dysfunctions be isolated in silico and the conditions reverse engineered to discover the genetic source? How can dysfunctional protein-protein interactions be simulated?
How can pharmacoproteomic therapies be designed based on simulations of an individual's unique pathology and genetic mutations? How can these functional proteomics modeling approaches be used to engineer complex chemical compounds that repair genetic damage manifested in protein malfunctions? How can systems be designed to create DNA-based therapies and multivariate scenarios to test new chemical compounds so as to minimize side effects and injurious drug interactions?
The present invention addresses the challenges expressed in these questions.
The challenge of functional proteomics is to develop methods to visualize protein activity, typically with imperfect information. To do this, it is necessary to develop models from which simulations can be generated. Once healthy protein structures are mapped and functional proteomic activities are simulated, it becomes possible to analyze dysfunctional protein interaction processes. With information resources like the HGP and the HAPMAP, genetic information and mutation information can inform FP models about these dysfunctional protein operations. Not only can we trace the source of genetic diseases, we can now understand their complex operations, and thus move closer to developing effective therapies to manage them.
So far, a large knowledge gap remains between the massive genomic data sets that we already have, on the one hand, and the useful data for biological systems that need to be developed, on the other. The expedient application of novel computational and experimental techniques is proposed to solve these problems. As knowledge of functional proteomics increases, we should be able to identify the optimal parameters of good health, which will lead to increased longevity, and also identify the biochemical processes that cause and treat disease. In particular, the ability of the human body to fight various types of cancer and viruses, as well as degeneration manifested in aging, may be contingent on a better understanding of functional proteomics.
The present invention therefore seeks to identify novel methods to meet these challenges and demonstrate (1) protein function visualization, (2) protein pathology identification and (3) personalized drug discovery and testing.
The present invention integrates several subsystems into a bioinformatics system for functional proteomics modeling. The first subsystem involves development of an evolvable Global Proteomics Model (GPM), which relies on data from the HGP and protein periodic table (PPT) of structural proteins, and which supplies a foundation for simulations of healthy protein-protein interactions. The second subsystem involves development of simulations to identify the operation and source of individual diseases in dysfunctional protein-protein interactions. The third subsystem involves development of simulations for pharmacoproteomics in which prospective drug targets are modeled, tested and refined for optimum effectiveness for individualized therapy.
The core system uses novel hybrid evolutionary computation techniques for the search, analysis and organization of data sets and the development and selection of simulations for complex biological processes. The system employs intelligent mobile software agents (IMSAs) which operate in a multi-agent system (MAS) in order to carry out computational operations rapidly and efficiently. IMSAs work together to process parts of complex computations in order to successfully solve complex FP optimization problems. By using simulations generated by IMSAs in the three main categories of FP modeling, dysfunctional proteomic modeling and pharmacoproteomics modeling, we are able to emulate and reconstruct complex self-organizing biological systems.
The use of simulations in emulating complex biological operations is useful so as to process temporal priority geometries of proteomic processes. Not only does the system emulate and predict healthy and dysfunctional protein interaction behaviors, but it also identifies ways to correct dysfunctional processes.
Development of a GPM is useful for supplying a baseline from which to compare healthy proteomic simulations. The GPM relies on data from genetic and structural proteomic databases in order to develop a functional proteomic model for understanding general protein-protein interactions. The GPM continually receives inputs from SP data sources, including protein pathway and protein-lipid pathway data sets, as these data become available.
The GPM is a meta-model that employs adaptive algorithms and is both evolvable and interactive: IMSAs draw on data sets from the GPM but also input data and analyses into the GPM from subsequent simulations drawn from the GPM. The GPM is continually optimized by active IMSA operations. Ultimately, the GPM develops models of self-organizing protein systems. The GPM is a central resource upon which FP simulations are generated.
The GPM is an important frame of reference regarding healthy protein functions against which dysfunctional protein operations may be compared. In the second sub-system of the present invention, IMSAs generate simulations from data sets involving dysfunctional protein interactions. Genetic diseases typically result from mutations that manifest in the operation of mutated proteins. Effectively modeling the operation of mutated proteins helps us to identify the structural proteomic source of the disease. Once a mutated protein is identified as the origin of the dysfunctional protein process, then the dysfunctional protein geometry can be analyzed and prospective corrections developed.
Genetic diseases can be traced to highly individualized genetic causes because they typically result from multiple mutations rather than a single universal mutation. Hence, the present system describes a personalized approach to discovering the unique combinations of mutations in each individual that will manifest in a genetic disease. By comparing the dysfunctional FP simulations to the GPM, we are able to track the process of the disease in a personalized manner. Because combinations of mutations occur in most diseases, multiple mutated proteins must be targeted for an effective therapy to manage the disease on the proteomic level. Identification and simulation of these processes and disease sources are critical to proposing effective solutions.
The third subsystem of the present invention involves development of a system for pharmacoproteomics. Once a disease is analyzed via proteomic simulations, the mutant proteins' structures are analyzed, and effective customized solutions are offered. The active computational system designs a compound to solve the problem with each distinctive mutant protein. The advent of personalized medicine depends upon these techniques and systems. The solutions offered include repairing, replacing or silencing (blocking) the affected proteins.
During the testing of the proposed solution designed from simulations in the system, feedback is provided to modify and refine the customized solution, a necessary process in complex multi-pathway dysfunctions. The present system's combination of active techniques provides a useful model for adaptive personalized medicine.
The three main subsystems of the present invention each employ distinctive hybrid EC techniques. In the case of the GPM, specific methods are designed to collect and analyze information for FP simulation presentations of intracellular protein pathway operations. In the case of the mutation combination identification, dysfunctional simulation scenarios are modeled and probable solutions identified. Finally, in the case of pharmacoproteomics, FP simulations propose and test prospective solutions to mutant protein structural problems.
The present system is applicable to several main degenerative genetic diseases. Cancer is a paradigm for analysis of this system because multiple mutations cause unique neoplasms which can be remedied through the understanding and repairing of proteomic processes. Neurodegenerative diseases, including Alzheimer's disease (AD), Parkinson's disease (PD) and Huntington's disease (HD), involve proteomic processes of cell death that can be curbed by applying this system. Immunodegenerative diseases, including Rheumatoid arthritis, lupus and forms of diabetes, can be rendered manageable via the understanding of their proteomic function that is provided by this system. Aging involves processes that can be understood by simulating proteomic processes comprised in this system. Finally, the identification of optimum health is made possible by using FP simulations that give us insight into equilibrium conditions. The present system affords understanding of both these important healthy proteomic operations and the identification and solution generation of dysfunctional proteomics.
The proteomic modeling of these disease categories produces the fields of oncoproteomics, neuroproteomics, immunoproteomics and gerontoproteomics, respectively. Taken together, these genetic diseases affect as many as half the population. An understanding of these complex proteomic processes may improve the quality of life for millions of patients.
Innovations of the Present System
The present system proposes numerous innovations. The GPM surpasses a structural protein database. The GPM and other database information sources generate simulations that emulate molecular protein interactions. Analysis of these complex data sources systematically organizes the protein interactions manifest in protein pathways. The production of simulations with multiple vectors and scenarios optimizes the modeling process of functional proteomics.
IMSAs are employed to link database data sets and the GPM and to analyze patterns in the data. The use of IMSAs, multiple agents of which are used cooperatively, in a parallel computer environment and a MAS operating system, solve complex problems efficiently in real time.
By using the GPM as a source of comparison, IMSAs assemble information about dysfunctional protein behavior. The analysis of combinations of genetic mutations and their FP dysfunctional manifestations as unique diseases represents a major advance in personalized disease discovery. Not only are the sources and consequences of unique mutation combinations traced and simulated, but solutions to the structural deformity of mutant proteins are identified as well.
The present system identifies ways to test and refine prospective solutions for problems involving dysfunctional proteins by developing a novel process of pharmacoproteomics. This process allows for an active approach to identification and testing of compounds for personalized medicine. The system model presented here for the active discovery of unique pathologies, mutation combinations and effective therapies is novel and useful.
The present system develops and integrates novel hybrid EC techniques for each subsystem. Evolutionary search solutions for the FP scenario problem are presented. Evolutionary solutions to the pathway identification problem are also presented. A method is provided to test sets of mutations to find optimal combinations at the core of individual pathology. Dysfunctional pathway scenario identification is performed using EC methods. Drug candidate solution generation is performed using EC techniques, as is drug candidate solution testing.
By showing how to identify and develop solutions to degenerative pathology problems, the present system suggests ways to fortify the immune system, slow the onset of neurodegenerative disorders, manage neoplasms on the proteomic level and identify effective anti-aging proteomics models.
Another implication of the system is that its employment of combined methods makes it possible to predict pathologies from FP simulations, which may prove useful in disease prevention.
Advantages of the Present System
Optimal therapies can be identified and selected for each individual by using the proposed biological system simulation scenarios. Since individual pathologies change, these methods and models represent a shift from universal medical approaches towards personalized medicine. Ultimately, these approaches will allow development of pharmacoproteomics, personalized medicine based on our emerging knowledge of protein interaction operations. Consequently, the methods of the present invention lie at the heart of solutions to problems involving post-genomic informatics.
The present system allows researchers to “see” specific protein interactions in both healthy and diseased processes by applying simulations. Ultimately, it is possible, with the use of the present system, to understand genes in terms of what they do and how they do it.
By using the present system, researchers will be able to focus on key mutation combinations and pathways without distraction from any irrelevant information in highly complex proteomic systems. Novel approaches to the discovery of the proteomic causes of diseases create opportunities to develop customized solutions. Identification on the proteomic level of geometric deformities allows the design of molecular level drug compounds for individual therapies which will not only accelerate drug discovery but increase efficiency and preserve valuable resources. The evolution from universal medicine to personalized medicine is thereby facilitated by the use of the present system.
Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the present invention. Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to accompanying drawings.
The present disclosures illustrate in detail the main ideas of the present system. Since the present invention has numerous embodiments, it is not the intention herein to restrict the description of the invention to a single embodiment.
The system and methods incorporated in the present invention are implemented by using software program code applied to networks of computers. Specifically, the present invention represents a dynamic adaptive distributed computer system that includes a multi-agent system (MAS). The main embodiment of the distributed computer system is implemented with complex databases. The system incorporates intelligent mobile software agents (IMSAs) within the MAS that organize into groups for problem-solving functions.
The main biological challenges, after discovery of the human genome, are (a) understanding the normal functioning of proteins on the cellular level, (b) identifying the causes of biological pathologies and (c) predicting effective therapies. An assessment of the normal functioning of proteins reveals that healthy biological system equilibrium is optimized by good health; yet it is increasingly evident that some diseases are caused by genetic damage and consequent proteomic pathology. Genetic damage may result from natural mutations or exogenous factors such as carcinogens. The process of aging, for instance, produces genetic damage at the DNA level that manifests as cellular degradation.
While genomics provides a vast amount of information about the sequencing of genes and the production of amino acids that allows us to identify the structure of proteins, it does not provide us with information about the complex operation of protein interactions on the molecular and cellular levels. We need to know the precise operational functioning of protein interactions if we are to develop and validate drug therapies. So far, drug development has been a highly inaccurate and risky proposition. Functional proteomics promises new ways to search for and design complex biochemical compounds for specific purposes. With proteomic techniques for mapping and predicting protein interactions, we can identify and test new drugs and at the same time reduce toxicity. Concomitantly, we may isolate tumor cells and identify proteins that healthy cells lack and thereby develop treatments that fortify immunological responsiveness, attack tumor development, or stifle accelerated cell development. Its focused attention on the molecular level gives functional proteomics an advantage over earlier drug therapies in solving complex problems.
A new class of drugs has proven the general functional proteomic approach to be worthwhile. Gleevec, Erbitux, Herceptin and Iressa have effectively limited the progress of some cancers. Gleevec isolates enzymes that fuel cancer growth, while Iressa blocks the EGFR (epidermal growth factor receptor) protein. In the case of Herceptin, the strategy is to target tumor cell receptors. Iressa and Erbitux are antiproliferative agents that operate as signal transduction inhibitors, which interfere with the pathways that fuel tumor growth. With Iressa, researchers may need to screen for mutations in order to target the drug most effectively. In the future, new classes of drugs may treat a range of diseases from cancer to diabetes and from viral infections to cellular degeneration associated with aging. Ultimately, the development of personalized medicine will allow individualized treatments based on our unique genetic configurations by simply identifying and repairing dysfunctional genes. Pharmacoproteomics will enable the alignment of gene with drug therapy, with the intended effect of allowing us to control various diseases that have a genetic origin.
Biological organisms are complex self-organizing systems that consist of dynamic interactions of subsystems. Biology bridges the gulf between molecular-level information and information about integrated biological systems. Dynamic functions of biological systems include metabolic pathway processes, feed-forward cellular networks, feedback from chemical inputs on the cellular level and feedback to regulate a stable environment within complex networks. The genetic information from DNA and RNA provides “time-release” aspects of self-organizational sequences in dynamic biological systems. Whereas most models, such as protein topology, have provided a transition from the genome to three-dimensional structural proteomics, we propose a deeper understanding of the complex processes that constitute functional proteomics; this insight involves time series intervals of protein interactions, predictions and causes. To assess and analyze metabolic pathways on the molecular and cellular levels, it is essential to model them as dynamic interactions.
Structural and Functional Proteomics
Since the completion of the human genome, much work has advanced understanding of the connections between DNA data, RNA data and protein structure. DNA and RNA data organize the twenty amino acid components of proteins in specific configurations. There are four levels of abstraction in protein geometry. The primary level provides information about amino acid sequences. The secondary level provides information about protein coils and loops. The tertiary level provides information about three-dimensional folding of proteins. Finally, the quaternary level provides information about the complex dynamic interactions between proteins. Structural proteomics deals with the first three levels, the geometric aspects of protein configuration. Understanding distinctive protein shapes is critical to understanding proteomic process interactions.
Databases of structural proteomics provide information about families of known protein shapes. Since proteins are assemblages of unique configurations of amino acids, these complex structures are ordered according to similarities between families and subfamilies of proteins. Native protein structures occur in thermodynamically optimal conditions whereby temperature, Ph and electrical current are in equilibrium. If these qualities are not in equilibrium, the protein shapes will distort. In some cases, protein structure can also be computed by analysis of protein bond lengths, bond angles and torsion angles on the molecular level.
Much of structural proteomics derives from analysis of and comparison to these libraries of information. Comparisons of unknown proteins to parts or blocks of proteins in protein databases such as those at universities in the U.S., Europe and Japan illustrate the ability to test various combinations to assess and predict protein behaviors from structural information alone. The challenge is to make sense of protein analysis on the basis of limited information. Unfortunately, protein structure information and analysis alone are insufficient to understand the complexities of proteomics.
In contrast to structural proteomics, functional proteomics simulates protein macro-molecular behavior. Functional proteomics organizes functional maps that emulate the operation of protein interactions. As such, functional proteomics focuses on identifying cell signaling pathways and potential pathways.
As a first step toward organizing a fully developed functional proteomics with dynamic relationships, we must see protein reactions as simple biochemical mechanics with molecular causes and effects. In the simplest case, a protein molecule will act on another protein molecule, typically the nearest neighbor. This molecular cause and effect relationship leads to more complex biochemical kinetics in which reactive events are stimulated by turning chemical thresholds on and off. Chemical chain reactions of molecular level proteins based on DNA information supplied to the proteins' amino acid components thus occur in the context of a complex biological system. As we develop a fuller picture of multiple reactions, we discover that there are multiple protein reaction pathways.
The combining of multiple protein reaction pathways leads to construction of models which represent highly complex protein-protein interactions. The challenge here is to reconstruct, and phenomenologically describe, proteomic interaction processes. Multiple vector reactions in a complex self-organizing system require a mapping of dynamic protein interactions. Since these interactions occur in the cellular environment, complex multi-pathway cellular interactions that feature feedback mechanisms are also modeled in functional proteomics. Functional proteomics represents the manifestation of the dynamics of structural proteomics. Because much of structural proteomics is contingent on protein databases, functional proteomics also constructs complex databases that identify causal relationships between proteins. One way to track protein behaviors is to isolate key subsystems such as cell type and correlate this information with protein variables. Another way to track protein behaviors is to differentiate the main states of the protein both in and out of equilibrium.
The fundamental attribute of functional proteomics is its temporal dimension. Since protein interactions are temporally based, the temporal dynamics of protein network interactions represent genetically stimulated protein multi-pathway sequences. These sequences may be represented as isolated events or as complex interaction dynamics. One of the challenges for both genomic informatics and functional proteomics is to identify the genetic triggers of biochemical functions. Another challenge is to identify the mechanisms of cell receptor proteins that are targeted by other proteins acting as triggers which turn complex protein interaction sequences on and off. As we obtain more empirical evidence we are more precisely able to identify and model complex proteomic developmental processes such as embryonic growth.
A particularly useful benefit of functional proteomics is its ability to predict protein-protein interactions as well as protein reactions to biochemical substances. Protein pathway predictions can be made by identifying similar structural proteins in protein databases and comparing their behavior. As the interaction combinations become much more complex, it is necessary to rely on computational resources to model the network interactions between proteins. Because of the high number of variables in complex functional proteomic modeling, the adopted models have constraints on predictive capability in an inverse correlation to the degree of complication. One way to model multivariate protein interaction multivector pathways is to develop multiple scenario simulations. This approach allows us to add or remove variables in a visual replica and to predict the consequences of prospective reactions. This approach also allows us to reverse engineer chemicals based on the analysis and synthesis of our understanding of biochemical processes.
Proteomic Computational Modeling Methods
There are several main approaches to obtaining empirical data on the structure of proteins. These include 2-D gel, mass spectrometry, microarrays and X-ray crystallography. The first two approaches provide images of protein sequences, while microarrays measure slight differences between similar proteins. X-ray crystallography is a process in which atomic level images of proteins are obtained.
When combining these empirical methods for obtaining data on protein structure with genomic information about DNA sequencing, we are better able to assess primary and secondary information about individual proteins. However, the use of this empirical data to build 3-D models of protein folding on the tertiary level and 4-D models of protein interaction on the quaternary level requires adoption of advanced computation models.
Bioinformatics incorporates multiple evolutionary computation techniques to solve problems with the goal of obtaining information for building models of complex protein behaviors. These techniques include the use of artificial neural networks which learn and adapt for data mining, data search and pattern matching in large databases and development of self-organizing maps. For example, multiple sources of inputs in a complex pathway of numerous vectors may point to a dominant pathway in which the threshold of inputs are ranked by priority; calculation of these inputs and thresholds may be performed by ANN processes. In another example, multivariate analyses and regression analyses may be used to perform these modeling calculations. Such combination of computation methods into hybrid approaches results in optimal effect.
Because part of the challenge of structural proteomics is to mine large protein databases in order to assess similar patterns, the use of complex data mining strategies that involve active search and pattern matching processes is computationally more efficient than passive approaches. Intelligent software agents for search are therefore proposed to produce complex dynamic mapping results.
Multivariate regression methods provide ways to isolate variables for multifactoral analysis. The classification of sequences according to the families and sub-families of protein classes allows researchers to isolate these variables on the atomic level. A comparison of new proteins with familiar proteins reveals new protein attributes. In addition, comparisons of unknown proteins with interspecies protein information reveal protein factors with features that are common to multiple species; hence we establish a larger database to draw upon than that derived from a single genome. Cluster analysis uses pairwise similarity analysis techniques to assess the parameters of similar groups. Decision tree analysis can also be useful for discovering the classification of protein structures.
Computational approaches to the development of structural protein databases include the use of complex statistical methodologies such as Bayesian learning, simulated annealing, Monte Carlo methods, Support Vector Machines and hidden Markov chains. In most cases, these methods are adopted in environments with imperfect information in which random search is performed from a sample of data in order to narrow the range of model development. For the applications of functional proteomics, vector probabilities are created via these statistical techniques. These techniques allow us to identify factors that are missing from a solution and thus to identify missing components. The testing of multiple potential variables and their interpolation within a restricted search space to optimal solution candidates constitutes a way to solve this class of proteomic problems.
Combinatorial optimization techniques are used to assess the unique combinations of molecules of a given protein when only limited information is available. In particular, combinatorial optimization approaches are useful in developing models of functional proteomics in which a number of complex combinations of proteins interact with multiple vectors and pathways. Distributed and parallel computation systems are employed in order to calculate the optimization parameters of these complex functional proteomic models.
One of the goals of structural proteomics is to predict folding properties of protein behavior. With the use of probabilistic analytical techniques we are able to predict protein properties within a limited range of probability. Similarly with functional proteomics, the use of probabilistic analytical techniques allows us to predict vectors of protein reaction and interaction but only within limited ranges. Pathway matching techniques can be applied by comparing the pathways of known protein interactions with limited data about newly discovered proteins. Machine learning approaches to these predictive models accelerate their calculations.
Evolutionary computation involves development of multiple generations of solutions to complex computational problems. The several types of evolutionary computation include genetic algorithms, genetic programming and automatic programming methods. It is useful to combine the best parts of these methods into an integrative model for applying hybrid evolutionary computation methods in order to solve complex functional proteomic problems. The application of EC techniques may be accelerated by using distributed artificial intelligence technologies. The use of multiple parallel computation approaches enables the testing of protein functioning. It is advanced that the use of intelligent mobile software agents in D-AI can solve the problems of functional proteomics. For example, an intelligent search agent can perform data mining with greater accuracy, predictive probability (within a range of scenarios) and greater speed.
Intelligent mobile software agents (IMSAs) operate in distributed computer systems. In one operation, an IMSA makes an initial map of a protein interaction that provides information about a newly discovered protein network. By comparing the proteins with existing protein interaction databases, new statistical information is added and the map is updated. This information is used to design a dynamic map customized for a specific protein interaction sequence. Real time dynamic comparisons of active biochemical and cellular interactions with known databases provide a basis for customized proteomic model development. IMSAs are used for identification of specific protein relationships, for active pattern matching of similar functional protein database processes and for comparisons of different types of chemical analyses (including across genomes) and reactions. Using these methods and tools, we may work backwards from a particular problem involving cellular pathology and thereby narrow the pool of data to be analyzed. By assessing the classification of analogical protein structures we are able to identify similar functional protein pathways, a process that narrows the data scope appreciably. These computational approaches are active, efficient and synthetic and therefore well suited for functional protein interaction analysis and synthesis. The use of these computational methods and models markedly accelerates experimental processes and adds immeasurably to our acquisition of valuable knowledge.
Computer simulations are a central part of proteomic analysis. With them, information about proteins is organized, analyzed and evaluated. For example, in structural proteomics, protein folding calculations of possible geometric configurations are made based on sequence analyses. Modeling functional proteomic data sets using computer simulations is more complex.
The phenomenological modeling of protein interaction pathways is necessary for understanding protein reactions, protein effects and drug effects. Used in reverse, these same models assess the protein interaction causes of cellular pathologies. One of the best ways to model functional protein interactions is to develop contingency simulations of complex processes. Thus a functional proteomic model would have not only a limited range but also simulation scenarios with contingencies based on limited, and updated, information. These modeling scenarios are hybrid simulations, that is, they are both discreet and continuous based on multiple protein behaviors. The simulation and modeling system consists of a hypothetical model, a database, a simulation engine and a visualization engine.
Simplified simulations are created by removing as much inessential data as possible in order to focus on a particular problem. For instance, using this reduced information model we can assess the immediate consequences of a biochemical reaction, such as a small molecule ligand interacting with a large protein molecule. As the model evolves, we fill in the pieces of the puzzle, moving from a partial map, in which an outline is obtained with limited information, to a more robust model. This simple simulation is useful for assessing a limited range of protein reactions. This model also helps isolate anomalies.
Multivariate simulations that develop dynamic models for functional proteomics emphasize different phases of events, adding and subtracting variables to develop a map that emulates the operation of protein pathway vectors. The various factors are analyzed and evaluated by comparisons with known protein interaction sequences. With this model, we are able to color code the various pathways in order to separate the related proteins in a complex self-organizing system and thus assess more complex structural anomalies. The selection of optimal scenarios from among the various proteomic simulation runs will provide the most transparent understanding of functional protein interactions within the constraints of limited information. This multivariate simulation approach allows for the accelerated substitution of experimental processes.
Simulation scenarios apply experimental data to develop contingency scenarios based on the limits of information but are constrained to using probabilistic inference. We develop adaptive deterministic molecular spatio-temporal simulations based on an emerging knowledge bank. These multifaceted protein reactions are represented as contingencies in simulation scenarios in which input variables are modified to assess changes in outcomes. In this way, we can test various combinations of molecules with predicted results.
By producing functional protein simulations with multiple scenarios based on input variable limits, we are able to increase the probabilities of accurate predictions of protein pathway vectors and protein-protein interactions based on information from similar known protein families. These techniques allow us to anticipate possible similar predictions of protein interactions based on similar comparisons. As an example of this, we can separate the healthy operation of cellular function from pathological operation and seek to identify the protein pathway functions that cause disease. By specifying the narrow conditions of optimal health we are better able to identify pathological conditions. By using these complex simulation scenarios for functional proteomics we are able to test and evaluate drugs for specific pathologies. By reversing this same approach, we may begin with pathologies and work to identify protein pathway causes of disease which allow us to develop drugs that target specific proteins for accelerated drug discovery.
General System Architecture and Dynamics
The main system incorporates a number of system layers or operational protocols.
After healthy proteomic function is revealed, the system identifies mutation combinations (haplotypes) for specific pathologies in distinctive disease categories (340). This dysfunctional proteomic information is sometimes reverse engineered from pathology to the genetic source (345). But in another mode of the system, a personalized medicine system is developed from the identification of dysfunctional proteomic information (350). By using IMSAs (355), the system builds customized model(s) of unique proteomic pathology (360) and develops customized solutions to a specific pathology (365). By applying the solution (370), testing the solution (375) and refining the solution (380), a customized management of the disease is possible (385).
The discussion below of the functional proteomic modeling system follows this general model. The first seven figures cover the general system architecture.
The Global Proteomic Model (GPM) is generally described in
The main simulation types indicated in
A third type of IMSA (740) develops the pharmacoproteomic model (745), from which solution development simulations (750) are produced in silico. Pathology FP pathway simulations also inform solution development simulations. From these solution development simulations, solution testing (755) is performed in which feedback is obtained about the effects of the solution. From this testing process, the solution refinement process (760) is implemented, eventually leading to optimum solution scenario (765) development and selection.
The functional classification of protein families is further delineated in
The protein development process is illustrated generally in
In order to further our understanding of proteomics, protein modeling systems generate specific representations of protein interactions.
On the functional side of the modeling of proteins represented in
Protein function relies on protein structure for its main building blocks. Consequently, we need to understand the main components of proteins, which are represented in
These SP conditions include binding, transport, regulation, signaling, receptor, target, inhibitor and disruption features. In all cases, the protein structure is evaluated according to conditions that are intracellular. That is, SP analysis is made based on understanding the context of protein operation. The challenge is to understand, and to model, the operation of a protein in action.
As the attempt to model proteins moves from understanding a particular protein's structure to the interoperation of multiple proteins, the range of complexity increases. The representation of protein function is made in the GPM with multiple criteria that go beyond the PPT descriptive categories limited to SP. Because the universe of proteins is extremely complex, it is necessary to narrow the range of criteria in order to focus our understanding of their behaviors.
The FP criteria specified in
FP interaction criteria generally emphasize the active mode of systemic operation in contrast to the emphasis in SP on merely portraying an individual protein. Physical motion of groups of interacting entities is a key aspect of protein function. In
In order to understand FP we must first understand the general principles of SP which provide the main building blocks of system operation.
Protein interactions occur within cellular pathways.
Much of the process of FP relies on the binding properties of interacting proteins.
For optimal binding, proteins require a compatible geometric fit. One of the generators of pathology on the molecular level is the dysfunctional geometric interaction of proteins that are generated from mutations. Examples of this phenomenon may be observed in sickle cell anemia (
In order to map FP operations, the GPM serves as a major model, informed by SP data sets and by the PPT, to draw from in order to develop healthy cellular pathway simulations and general protein function maps.
Knowledge of protein function affords greater insight into the meaning of individual proteins. Since many proteins appear to be very similar but function very differently, assessing the organization of protein structures by the criteria of protein function, allows us to appropriately reorganize large sets of proteins. Since the structural protein data can be organized from simple to complex, generally mirroring the historical evolution of proteins, we can cross reference the SP with FP data to elucidate the more evolved functions. These data filtering processes provide us a context of empirical analysis of protein interactions in cellular systems and allow us to organize protein architectures and processes into a general protein model. See also the discussion at
According to the current dogma, protein pathology is caused by genetic mutations. These mutations combine in unique ways to present in each individual's pathology. The challenge of dysfunctional proteomics, or FP pathology, is to identify the unique combination of mutations, or haplotypes, that cause specific diseases and to simulate the specific dysfunctional protein interactions. Once the dysfunctional protein interactions are detected, then the source of the deformity in the geometries of specific proteins is identified and solutions presented to repair these specific (and sometimes unique) deformations. FIGS. 27 to 37 apply to identification and analysis of dysfunctional proteins that cause degenerative diseases.
A comparison of healthy protein function and an unhealthy FP process is shown in
Genetic mutations are the main cause of genetic diseases.
An individual's pathology assessment, based on an analysis of mutations, is shown in
The construction of an individualized haplotypes model is further described in
In order for pathologies to be understood, we need to assess their sources. Granting that genetic mutations cause pathologies, our goal is to trace the origins of disease from the pathology, through the proteomic (both functional and structural) operations to the genetic source. This process of understanding the genesis of disease involves a reverse engineering of pathology.
Simulations are an optimal format for modeling pathology prediction from FP data. In
Since unique combinations of SNPs are shared between individuals, typically caused by genetic inheritance in families, there are general haplotypes shared by groups of individuals.
The need to identify the combinations of genetic mutations that create mutant proteins which, in turn, cause dysfunctional protein behaviors that are responsible for genetic diseases is just the first part of understanding these diseases. While it is true that the invention of the GPM is important to our functional understanding of the operation of interactive proteins in that it provides a baseline model for the understanding of dysfunctional protein operations, the main goal in this system is to identify the proteomic sources of diseases so that we can develop solutions that will allow us to manage these diseases on the proteomic level. FIGS. 38 to 46 generally discuss the process of customized medicine called pharmacoproteomics. The main objective of individualized medicine is to identify and to precisely describe, through the use of simulations, dysfunctional protein structures for each individual so that we may identify specific solutions to bring each unique pathological cellular pathway to optimum health and to manage genetic diseases at the proteomic level. The scientific community has completed a combination of discoveries that make this system possible.
As shown in
The left sequence of
In order to develop customized solutions to specific dysfunctional proteomic problems, it is necessary to precisely identify the problem. Since the functional proteomic pathology typically is merely the manifestation of a structural proteomic deformity, identifying the structural deformation is critical to solving the problem. Even if we can identify the mutations and the mutation combinations that create distinctive pathologies, we nevertheless are required to precisely isolate the SP deformity with computer-aided design techniques and with the collection and comparison of data sets from protein databases and the GPM. To identify appropriate solutions, then, the pharmacoproteomic model requires evaluation of the structural and functional proteomic dysfunctions in order to be able to identify appropriate solutions.
Individualized diseases that build up via cellular damage (and manifest as mutations) (4100) and the dysfunctional proteins and the sources of the pathology are identified (4110) and compared to the GPM in order to assess healthy FP operations of a specific pathway (4120). FP dysfunction(s) are detected (4130), and the dysfunctional parameters are identified (4135). Individualized simulations of probable dysfunction are constructed, and the most probable simulation is selected within specific conditions (4140). Prospective solutions to correct the protein dysfunction at the structural protein deformity level are identified (4145). The defect is then corrected at the FP level (4150).
Because they are genetic diseases, with common genetic inheritances, some pathologies are shared between individuals in the same family or the same community. Consequently genetic diseases that are common to specific groups may be managed by combining specific combinations of medicines which treat specific combinations of shared mutations among a group. Though not considered personalized medicine, the modeling of medicines targeted at groups with inherited diseases is considered to be semi-customized. In
Following data collection from specific sub-populations (4300), common pathologies for various sub-populations are aggregated (4310), and specific diseases within substantial sub-groups are selected to efficiently treat (4320). Like in the personalized medicine model, the structure of dysfunctional proteins causing common pathologies is identified (4330), and individuals are tested for common mutations that have a common pathology (4340). Combinations of drugs are finally applied to address a multiple, specific genetic mutation caused by protein structure deformity (4350).
The targeting of combinations of drugs to manage multiple disease-causing mutations may be observed in the example of one form of lung cancer. EGFR, a tyrosine kinase (TK) enzyme, is overabundant in eighty percent of lung cancers and plays a major role in over-stimulating cell division. The drug Iressa, a TK inhibitor, may be useful to limiting EGFR, but for it to be effective, the patient must possess a key mutation. Consequently, Iressa is effective in only a limited number of patients in whom the mutation is present. In another example, the HER-2 protein is a cell-surface receptor protein that plays a role in some forms of breast cancer. The drug Herceptin stops activation of the HER-2 protein in some patients with specific mutations. In both cases, patients must be screened for a combination of genetic mutations in order to assess the potential effectiveness of these drugs against their particular forms of the diseases. The present system introduces models that simulate the operation of proteins, giving researchers more precise tools to “see” the genetic and proteomic causes of disease as well as the effects of particular drugs on these unique combinations of genetic mutations.
Another model for managing dysfunctional proteins is the application of RNAi techniques, typically via adenovirus, to block the genetic production of malicious proteins. As
In still another model, the body's mechanisms to attack dysfunctional proteins are enhanced. In this paradigm, the immune system is fortified to resist proteomic dysfunctions.
In the main methods to manage the FP manifestations of disease summarized above, antibodies carry the respective remedies to the appropriate targets. In addition, vaccines may be customized for particular patients by taking their own cells and fashioning a response that fights particular diseases.
The goal of pharmacoproteomics is to develop customized therapies for specific diseases. The use of combinations of the above methods is therefore appropriate in order to tailor specific remedies to specific complex disease problems. Understanding the interoperation of the functional proteomics provides a crucial step toward identifying the causes of genetic disease which is itself preparatory to designing customized therapeutic solutions.
Once the personalized medicinal therapies of pharmacoproteomics are fashioned, they must be tested. In order to test specific proteomic solutions to complex problems, it is necessary to receive systemic feedback.
The most prominent applications of the present functional proteomics modeling system to genetic diseases include degenerative diseases of cancer, neurodegenerative diseases, immunodegenerative diseases and aging. FIGS. 47 to 53 discuss these main disease categories in the context of proteomic interactions.
In the case of immunodegenerative diseases such as forms of arthritis, allergies and diabetes, the problem lies in disequilibrium of the regulatory system—dysfunction of the protecting mechanisms against exogenous diseases leaves patients susceptible to a range of secondary diseases—the cause is cellular degradation, the source is either endogenous (genetic) or exogenous (e.g., a virus that degrades or suppresses the immune system), the solution is to fortify the immune system or to delay accumulation of degradation, and the biomechanism is to fortify biological mechanisms or to block those processes which interfere with healthy operation. Finally, in the case of aging, the problem lies in the erosion or deterioration of cellular mechanisms, the cause is genetic intracellular mutations (such as in the mitochondria or mitochondrial lining because of oxidation), the source is endogenous, the solution is to delay degradation or stimulate healthy function (such as with antioxidants), and the biomechanism is to block cellular degradation or fortify proteomic mechanisms.
These four degenerative diseases clearly contrast with optimum health, in which biomechanisms are generally in equilibrium. Since the goal of personalized medicine is to provide corrections to genetic dysfunctions, it is useful to identify the healthy functioning of proteins and cellular systems and make comparisons with the range of diseases.
The B-RAF protein is present in about twenty percent of Colo-Rectal cancers. Erbitux, which is antibody-based, is effective in delaying the progress of the disease, suggesting that multiple proteins affect these cellular processes. The B-RAF protein is also a factor in as many as eighty percent of skin cancers, intracellular mutations of which are sometimes caused by radiation. Kidney cancer, like lung cancer, is caused by EGFR protein surpluses that require a TK inhibitor such as Tarceva or an angiogenic such as Avastin.
Finally, in the case of Chronic Myeloid Leukemia (CML), which presents as an over-generation of white blood cells, chromosomes 9 and 22 break and rejoin into a hybrid 9-22 chromosome. BCR-ABL genes combine to form the BCR-ABL protein, a TK enzyme that produces a signal for the cell to grow. Gleevec is a class of promitotic medicine that acts as a TK inhibitor by filling the gap in the geometrical deformity of the dysfunctional protein created by the genetic mutation. Because there is only one mutation that causes this disease, there is a high rate of neoplasm control from Gleevec therapy applied to those patients with this unique mutation.
All of these examples demonstrate functional proteomic interactions and dysfunctional protein mutations as the cause of disease. Each of these classes of protein dysfunctions requires a different type of solution to be effective for specific combinations of mutations. The present system contains modeling and simulation subsystems that show healthy proteomic operation as well as dysfunctional operation of pathologies and pharmacoproteomic approaches to personalized medicine. It is argued that this general approach is the future of medicine.
Mitochondrial membrane integrity and cellular wall integrity degradation is caused by oxidation and exposure to free radicals that occurs in the process of producing energy (ATP) for the cell. Intracellular mechanisms cause dysfunctional processes that can be inhibited with the use of proteins such as antioxidants; these correct for the oxidative effect of free radicals produced by the mitochondria. The present system makes it possible to identify and enhance free radicals' mechanisms of cellular respiration and thus constitutes a key way to retard the aging process.
One strategy to slow the aging process is to slow the mitochondrial DNA (mtDNA) mutation rate which then affects oxidation. Reducing the circular feedback mechanism of reduced ATP increases free radicals, which increases mtDNA mutation accumulation and in turn reduces ATP; doing so makes it possible to slow the effects of aging. Though mtDNA, which consists of thirteen genes, control some mitochondrial operation, nuclear DNA control mtDNA regulation. Therefore, to address the problem of mitochondrial DNA mutations, proteins will be configured to block the effects of the accumulation of the mutations in both mtDNA and nuclear DNA.
Another model to slow the aging process involves the telomeres. Telomeres are “pre-programmed” to copy the DNA a specific number of times before decaying. After this period, mutations begin to accumulate. In order to extend the replication process in which the telomeres play a prominent part, several strategies are applied to affect the mechanisms involved in gene replication. First, the enzyme responsible for DNA replication will be refined and enhanced in order to increase the precision of its function. Second, the effect of this increase in precision will be more accurate replication of telomeres, in effect extending their effective copying life, which minimizes mutations and limits the corrosive effects of the aging process. Finally, in order to increase the accuracy and precision of the DNA replication process, it is necessary to identify and enhance the proteins involving RNA replication precision. A combining of these strategies which are identifiable and solvable using the present system forms the groundwork for gerontoproteomics.
IMSAs employ multiple techniques to build functional proteomics models. The Monte Carlo (MC) simulation method breaks down random data sets into clusters for analysis over time sequences. The Bayesian theory is used to simulate experiments in which an early phase will inform and guide a later phase; this is useful in reorganizing and refining the model generated by accumulating data sets over time.
In the upper left grid of the chart in
FIGS. 55 to 65 apply to the development of proteomic models using IMSAs. FIGS. 55 to 58 delineate normal FP modeling, FIGS. 59 to 62 illuminate pathological modeling of dysfunctional proteomics, and FIGS. 63 to 65 elucidate pharmacoproteomic modeling.
The general modeling system architecture is presented as a foundation for organizing complex data sets based on self-organizing sets of the GPM, individual mutation combinations and pharmacoproteomics, each of which is a category of optimization problem. Consequently, various techniques are employed to model these problem categories as presented in this system.
IMSAs, core components of this modeling system, are software agents that move from machine to machine to collect and analyze data and generally build FP models. The IMSAs operate in a multi-agent system (MAS) as specialized sophisticated software entities that cooperate or compete to solve complex computational problems. In the context of this system, the IMSAs employ hybrid EC techniques and other computational techniques such as MC and Bayesian approaches and artificial neural networks (A-NN) in mobile software code that is programmed to model and simulate complex FP behaviors.
In general, EC consists of computational processes which emulate the theory of biological evolution, in which software algorithms or software programs are “bred” using principles of natural selection, mutation and sexual reproduction. The aim is to develop multiple runs of computer programs which lead, at each successful generation of development, to the selection of the strongest possible outcomes. Over time, this process is intended to identify solutions to hard problems. Given substantial computer hardware capabilities, the use of these computational strategies and techniques yields rapid solutions in real time since many generations of computer programs can be bred to solve problems quickly.
Multiple IMSAs may work together to solve complex problems. An IMSA will send signals to other IMSAs requesting information on or analysis of a problem. In order to solve a combinatorial optimization problem more quickly, multiple IMSAs will divide the problem into parts or solve it in multiple phases. In this further embodiment of the system, multiple IMSAs perform functions to complete a task.
Use of multiple IMSAs that employ various hybrid EC techniques simultaneously to solve aspects of larger problems, allows the complex modeling of the GPM, individual mutation combination identification, and pharmacoproteomics to be performed. By employing modular EC techniques, IMSAs seamlessly integrate and automatically update AI for advanced IMSA operations. The central challenge share by these main functions is how to identify the classification of data sets in an ordered way. Because organizing very large data sets requires experimentation, in silico techniques are utilized in conjunction with wet lab procedures to decipher, via a process of trial and error, initial organizational models. Hybrid EC techniques, as used by IMSAs, are expected to be a major resource for the biological sciences in coming generations.
The pathway generation process is described in
An optimal simulation generation process using IMSAs is described in
Whereas the general simulation construction is performed as described above, the pathology protein pathway simulation generation model using IMSAs is described in the following figures. In
Of course, once the haplotypes are identified, it is necessary to develop solutions to the problem mutations.
Once the FP problem is identified and the solution designed and applied, it is necessary to test it.
Several main types of simulations in the present system correspond to the three main problem categories of the GPM, protein mutation analysis and pharmacoproteomics.
Dysfunctional protein pathway simulations consist of (1) mutation combination simulations, (2) reverse engineering simulations from disease to genetic mutation(s), (3) variable based scenario simulations based on dysfunctional protein operations, (4) simulations of pathway scenarios of dysfunctional protein interactions, (5) optimal pathway selection process simulations, and (6) simulations to identify the SP profile of mutant protein(s) from dysfunctional pathway analyses.
Active and interactive pharmacoproteomics process simulations consist of (1) simulations to design a custom solution to (combinations of) mutant protein topologies, (2) simulations to test solution candidates using pathway scenarios and updated feedback data, and (3) simulations to refine solutions using real data from the solution candidate feedback process.
Since Monte Carlo (MC) statistical simulation techniques are suited to molecular modeling processes,
At this point in the operation of the GPM, the analysis of the FP scenarios informs the reorganization process of the SP data sets (7110) while it also informs the GPM (7150) at a second level. Further analysis of SP data from the simulations re-sorts and reclassifies FP data (7160). Though SP data inform FP models, FP data facilitate the organization of SP and gene data sets through filtering and re-sorting processes. This analysis of the SP data then informs the SP inputs (7100), and the GPM is updated with increased probabilistic certainty (7170) at the third level. Simulations are then generated (7130) from this level of the GPM with greater efficiency and certainty. In this evolutionary way, the GPM is updatable and accumulates more detail about FP processes. While there are limits to the completeness of the GPM, the multiple passes of the GPM operation make the GPM an evolvable, dynamic meta-model from which simulations are generated.
From the GPM it is possible to generate simulations that provide a hypothetical testing approach to understanding protein operations. The simulations available from GPM data analyze not only a single protein's multiple vectors and variables on a cellular pathway but also the complex interoperation of multiple proteins. In these multiple probabilistic scenarios, various values and training weights change. The simulations order and constantly reorder data sets by rapidly testing probabilities within limited ranges in order to identify various aspects of the problem.
Since protein binding is a key protein function that affects degenerative diseases, it is important to model aggregation scenarios between proteins. Such simulations identify docking sequences, optimal binding criteria, and binding blockage potentials as well as drug interaction probabilities. The application of combinatorial geometry to these classes of aggregation problems assists in computer-aided identification and design of virtual small molecules.
A range of functional simulations is generated from information in the GPM, which, taken together, provides a powerful toolkit for biochemical researchers. In addition to forward motion simulations, there are also backward motion simulations (from effect to cause). Reverse simulations trace multiple probable causes of a dysfunction. Forward simulations trace multiple pathway-centric protein interactions. Multivariate simulations, generated from different prospective assumptions, present different scenarios within varied probabilistic ranges. Simulations may focus on various types of proteins, such as on binding aspects of specific macro-molecules, or may focus on different angles of a binding site in order to analyze scenarios and probabilities. Simulations also accommodate and emulate the complex feedback mechanisms of protein-protein system adaptation, effects of which are not available in SP analyses. Simulations also analyze the potential pathways of protein behavior under specific disequilibria conditions. Finally, comparative analysis of simulations provides valuable information about dysfunctional operation as well as sharpens our understanding of functional protein-protein operations. These approaches assist in the simulation of protein behavior predictions.
Simulations are time sensitive representations of systems of interactive molecular protein phenomena. Rather than simulate protein phenomena in a time-consistent way, FP simulations present time-asynchronous processes. Specific processes are accelerated or decelerated within specific equilibria conditions. Identification and understanding of the enzyme processes which may accelerate a protein pathway reaction a thousand-fold are central to understanding threshold event catalysts and the challenges of modeling these processes. The way that simulations are time modulated, then, represents a novelty in the process of modeling functional proteomics.
Finally, as we have seen in the evolvability of the GPM, multiple generations of simulations are required to accurately represent protein functional relationships. This process of simulation refinement is limited by the quality and quantity of our information about specific pathways and FP interactions.
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference for all purposes in their entirety.