WO2009095319A1 - Cancer prognosis by majority voting - Google Patents

Cancer prognosis by majority voting Download PDF

Info

Publication number
WO2009095319A1
WO2009095319A1 PCT/EP2009/050478 EP2009050478W WO2009095319A1 WO 2009095319 A1 WO2009095319 A1 WO 2009095319A1 EP 2009050478 W EP2009050478 W EP 2009050478W WO 2009095319 A1 WO2009095319 A1 WO 2009095319A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
predetermined
threshold level
atomic
classifier
Prior art date
Application number
PCT/EP2009/050478
Other languages
German (de)
French (fr)
Inventor
Mathias Gehrmann
Christian VON TÖRNE
Original Assignee
Siemens Healthcare Diagnostics Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Healthcare Diagnostics Gmbh filed Critical Siemens Healthcare Diagnostics Gmbh
Publication of WO2009095319A1 publication Critical patent/WO2009095319A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Definitions

  • the present invention relates to methods, kits and systems for the prognosis of cancer in untreated patients, preferably breast cancer patients. More specifically, the present invention relates to the prognosis of cancer based on measurements of the expression levels of marker genes in tumor samples. Marker genes are disclosed which allow for an accurate classification of cancer patients using a majority voting scheme, comprising multiple "atomic" classifiers.
  • Breast cancer is one of the leading causes of cancer death in women in western countries. Breast cancer claims the lives of approximately 40,000 women and is diagnosed in approximately 200,000 women annually in the United States alone. Over the last few decades, adjuvant systemic therapy has led to markedly improved survival in early breast cancer. This clinical experience has led to consensus recommendations offering adjuvant systemic therapy for the vast majority of breast cancer patients. In breast cancer a multitude of treatment op- tions are available which can be applied in addition to the routinely performed surgical removal of the tumor and subsequent radiation of the tumor bed.
  • a possible outcome of cancer in particular, breast cancer, in untreated patients.
  • This is also referred to the "progno- sis" of breast cancer in a patient (as opposed to e.g. the "prediction” of the possible outcome of a cancer therapy) .
  • Determination of the risk for an unfavorable outcome of a disease is a valuable basis for deciding on the best possible treatment strategy for a cancer patient.
  • Van ' t Veer et al identified a prognostic signature consisting of 70 and 231 genes in a finding cohort of 78 sporadic breast cancers of node negative women younger than 53 years of age (Van't Veer et al . , 2002. Gene expres- sion profiling predicts clinical outcome of breast cancer. Nature 415: 530-536; Van de Vijver et al . , 2002. A gene- expression signature as a predictor of survival in breast cancer. N Eng J Med 347 : 1999-2009) .
  • Majority voting is a known method for robust classification based on multivariate data (e.g. James, G. (1998) "Majority Vote Classifiers: Theory and Applications", Stanford Univer- sity Doctoral Thesis, available online at http: //www- rcf.usc.edu/ ⁇ gareth/research/thesis.ps). This document does not disclose the application of majority voting schemes to the prediction of cancer from expression profiling data.
  • Majority voting classification has been applied to the classification of breast cancer patients in non-published Patent Applications PCT/EP2006/005717 claiming priority of GB0512299 (16 June 2005), and EP 06020209.0 of 27 September 2006.
  • the specific classification scheme of the present invention (using multiple multivariate atomic classifiers), however, is not disclosed.
  • the present invention fulfills the need for advanced methods for the prognosis of breast cancer on the basis of readily accessible experimental data.
  • the present invention is based on the surprising finding that the outcome of cancer, preferably breast cancer in patients which do not receive chemotherapy can be accurately predicted from the expression levels of a small number of marker genes, using a majority voting classification scheme. Accordingly, the present invention relates to prognostic methods for the determination of the outcome of breast cancer in non-treated breast cancer patients, using information on the expression of a small number of highly informative marker genes. Methods of the invention use simple "atomic" classifiers, each based on a small number of marker genes, which can classify a tumor sample into 2 or multiple prognostic groups, such as a "high risk” group and a "low risk group”. The results of multiple atomic classifiers are combined by majority voting to an accurate and robust prognosis.
  • prognosis within the meaning of the invention, shall be understood to be the prediction of the outcome of a disease under conditions where no systemic chemotherapy is applied in the adjuvant setting.
  • prognosis is an estimation of the likelihood of metastasis fee survival of said patient over a predetermined period of time, e.g. over a period of 5 years.
  • said prognosis is an estimation of the likelihood of death of disease of said patient over a predetermined period of time, e.g. over a period of 5 years.
  • Tumor sample shall be understood to be any sample taken from a tumor of the respective patient.
  • Tumor samples can be taken by biopsy, needle biopsy, by surgery or any other way of obtaining samples of a tumor of a cancer patient.
  • Formalin-fixed paraffin embedded (FFPE) samples are preferred. Fresh-frozen samples can also be used.
  • a “majority vote”, within the meaning of the invention shall be understood to be any algorithm combining votes of multiple classifiers in terms of a majority decision.
  • the term “majority vote” takes thus the ordinary meaning of this term in the art of statistics.
  • a "weighted majority vote” is understood to be a majority vote in which the votes of the individual classifiers are weighted by multiplication with a suitable scaling factor. In a weighted majority vote at least some of the scaling factors are different from 1.
  • a "classifier”, within the meaning of the invention, shall be understood to be any algorithm or scheme for classifying objects into one of multiple classes, based on properties of said object.
  • Preferred objects of this invention are cancer patients, and preferred properties of said patients, within this invention, are expression levels of marker genes.
  • a "marker gene”, within the meaning of the invention, shall be understood as being any gene of a patient, the expression level of which provides information useful in the prognosis of cancer in said patient.
  • Preferred marker genes of the invention are those mentioned in the claims and examples section .
  • a “multivariate classifier”, within the meaning of the invention, shall be understood to be a classifier which depends on more than one variable, i.e., depends on the expression level of more than one marker gene.
  • a "risk class”, within the meaning of the invention, shall be understood to relate to a multitude of patients which are binned in a group of patients according to their risk of unfavorable disease outcome.
  • Unfavorable disease outcome in this regard, can be death of disease within a certain period of time, e.g. within 5 years.
  • "Risk classes” of the current invention can be "high risk” classes, "low risk” classes and optionally “intermediate risk” classes. Any other definition of "risk classes”, however, can be assumed as well.
  • an "expression level”, within the present invention, shall be understood to be any measure for the strength of expression of a gene in a tissue sample. In preferred embodiments of the invention, the average expression of a gene in a tissue sample is measured.
  • a "system for the prognosis of cancer”, within the meaning of the invention, shall be understood any collection of equipment, hardware and software that is capable of performing a prognosis of cancer, e.g. in accordance to the present inven- tion.
  • the system can be built from separate pieces of hardware (and software), but can also be integral, e.g. combined in a single piece of hardware, having all necessary features combined within a housing.
  • the present invention relate to a method of prognosis of cancer in a patient from a tumor sample of said patient comprising the steps of: determining the expression level of a first, a second and a third marker gene of multiple atomic classifiers, said atomic classifiers being multivariate classifiers; performing a classification of said sample into one of multiple risk classes, for each of said multiple atomic classifiers; performing a majority vote using the outcome of said multiple classifications.
  • said majority vote is a weighted majority vote. This leads to improved sensitivity and specificity of the method.
  • the method in- volves two consecutive majority votes. This can further improve the sensitivity and specificity of the methods of the invention .
  • said cancer is breast cancer or ovarian cancer.
  • said determination of expression levels is in a formalin-fixed paraffin embedded sample.
  • Formalin-fixed paraffin embedded samples are rou- tinely prepared, when tissue samples are taken from tumors of cancer patients.
  • said determination of expression levels is in a fresh-frozen sample.
  • the prognosis is a classification of said patient into one of two distinct classes, said classes being a "high risk” class and a "low risk” class.
  • said prognosis is a classification of the patient into one of three classes, said three classes corresponding to a "high risk” class, an "intermedi- ate risk” class and a "low risk” class.
  • said risk is a risk of death of said patient within a predetermined period of time, e.g. within 5 years from the taking of the sample.
  • said multiple atomic classifiers are distinct atomic classifiers selected from the group consisting of:
  • an atomic classifier predicting "high risk” if PGR expression is below a predetermined first threshold level and ESRl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk” if PGR expression is below said predetermined first threshold level and ESRl expression is above said a predetermined second threshold level; said atomic classifier predicting "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk” if
  • PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
  • an atomic classifier predicting "high risk” if PGR expression is below a predetermined first threshold level and ILlRl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk” if PGR expression is below said predetermined first threshold level and ILlRl expression is above said a predetermined second threshold level; said atomic classifier predicting "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk” if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
  • PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
  • an atomic classifier predicting "low risk” if PGR expression is below a predetermined first threshold level and TOP2A expression is below a predetermined second threshold level; said atomic classifier predicting "high risk” if PGR expression is below said predetermined first threshold level and TOP2A a is above said a predetermined second threshold level; said atomic classifier predicting "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk” if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level; (v) an atomic classifier predicting "low risk” if PGR expression is below a predetermined first threshold level and UBE2C expression is below a predetermined second threshold level; said atomic classifier predicting "high risk” if PGR expression is below said predetermined first threshold level and UBE2C a is above said a predetermined second threshold level; said atomic classifier
  • PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
  • atomic classifier (vi) and atomic classifier (vii) are used in the ma- jority voting step.
  • all atomic classifiers of said group of atomic classifiers are used for majority voting.
  • only a single atomic classifiers of said group of atomic classifiers are used for ma- jority voting.
  • an expression level of a marker gene is substituted with the expression level of a substitute gene, said substitute gene being co-regulated with said marker gene.
  • the threshold for PGR expression is about 3.8
  • the threshold for ESRl expression is about 6.2
  • the threshold for MLPH expression is about 11.4
  • the threshold for ILlRl expression is about 7.2
  • the threshold for TOP2A expression is about 9.8
  • the threshold for UBE2C expression is about 10.0
  • the threshold for TOP2A expression is about 9.8 in ato- mic classifier (iv)
  • the threshold for TOP2A expression is about 8.5 in atomic classifier
  • the threshold for TOP2A expression is about 8.1 in atomic classifier
  • the threshold for GREMl expression is about 9.0
  • the present invention also relates to a system for the prog- nosis of cancer in a patient from samples taken from said patient, said system comprising means for determining the expression level of a first, a second and a third marker gene of multiple atomic classifiers; means for performing multiple classifications of said sample into one of multiple risk classes, with each of said multiple atomic classifiers; means for performing a majority vote using the outcome of said multiple classifications.
  • said means for determining the expression level is a gene chip system, a real time PCR system.
  • said means for performing multiple classifications is a separate personal computer, or a computer integral with the remaining components of the system.
  • the computer receives expression level measurements directly from said means for determining the expression level. This reduces labor needed to perform the methods of the invention on said system.
  • a user can choose between at least two majority voting schemes. This allows to compare the outcome of two or more distinct majority voting schemes, which is helpful to increase the confidence in the obtained results.
  • the present invention further relates to methods of prognosis, wherein an expression level of a marker gene is substituted with the expression level of a substitute gene, said substitute gene being co-regulated with said marker gene.
  • a first gene is considered to be co-regulated with a second gene, if the expression level of said first gene is increased if the expression level of the second gene is increased, and if the expression level of said first gene is decreased if the expression level of said second gene in decreased.
  • substitute genes are used as marker genes.
  • the substitute genes are normally those genes, which are co-regulated with the original marker genes, thus carry the same information content as the original marker genes. Examples :
  • Tissue samples were collected from a large number of test objects afflicted with breast cancer. Clinical data on all test objects was available. In particular, data on the survival of the test objects over a period of 5 years was available for all test objects. Tissue samples were formalin fi- xed paraffin embedded (FFPE) samples of tumors of the test objects. These samples are routinely prepared for cancer patients .
  • FFPE formalin fi- xed paraffin embedded
  • Methods of the present invention can also be applied to fresh frozen samples, but it is envisioned that the present methods are optimized for FFPE samples.
  • Test data was collected by determining the gene expression of a large number of potential marker genes in multiple test objects.
  • Test objects were divided into two groups, namely a "case” group (i.e. objects that deceased of cancer within 5 years after biopsy) and a “control” group (i.e. objects with >5 years survival after biopsy) .
  • Atomic classifiers which showed a sensitivity of larger than 95% (i.e. more than 95% of the test objects of the "case” group of the training set were classified "high risk"), and specificity of >50% (i.e. [[... add definition of specificity ...]] were selected.
  • Suitable cutoff values for the expression levels of the marker genes were determined from the bimodal distribution of the expression level measurement. [[... add more detailed description of determination of cutoff values here ...]].
  • a first atomic classifier is shown in Fig. 1. It predicts
  • Threshold levels are preferably 3.8 for PGR expression, 6.2 for ESRl, and 11.4 for MLPH.
  • a second atomic classifier is shown in Fig. 2. It predicts "high risk” if PGR expression is below a predetermined first threshold level and ILlRl expression is below a predetermined second threshold level; it predicts "low risk” if PGR expression is below said predetermined first threshold level and ILlRl expression is above said a predetermined second threshold level; it predicts "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; or it predicts "high risk” if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level.
  • Threshold levels are preferably 3.8 for PGR, 7.2 for ILlRl, 11.4 for MLPH.
  • a 3rd atomic classifier is shown in Fig. 3. Said classifier predicts "high risk” if PGR expression is below a predetermined first threshold level; it predicts "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk” if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level. Threshold levels are preferably 3.8 for PGR and 11.4 for MLPH.
  • a 4th atomic classifier is shown in Fig. 4.
  • Said classifier predicts "low risk” if PGR expression is below a predetermined first threshold level and TOP2A expression is below a predetermined second threshold level; it predicts "high risk” if PGR expression is below said predetermined first threshold level and TOP2A expression is above said a predetermined second threshold level; it predicts "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk” if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level.
  • Preferred thresholds are 3.8 for PGR expression, 9.8 for TOP2A, 11.4 for MLPH.
  • a 5th atomic classifier is shown in Fig. 5.
  • This classifier predicts "low risk” if PGR expression is below a predeter- mined first threshold level and UBE2C expression is below a predetermined second threshold level; it predicts "high risk” if PGR expression is below said predetermined first threshold level and UBE2C a is above said a predetermined second threshold level; it predicts "low risk” if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk” if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level.
  • Preferred thresholds for this classifier are 3.8 for PGR, 10.0 for UBE2C and 11.4 for MLPH.
  • the 1st to 5th atomic classifiers depend on the expression of the MLPH gene. They are referred to as the classifiers of the "MLPH cluster”.
  • a 6th atomic classifier is shown in Fig. 6. This classifier predicts "high risk” if TOP2A expression is below a predetermined first threshold level and GREMl expression is below a predetermined second threshold level; it predicts "low risk” if TOP2A expression is below said predetermined first threshold level and GREMl a is above said a predetermined second threshold level; and it predicts "high risk” if TOP2A expression is above said predetermined first threshold level. Pre- ferred thresholds for this classifier are 8.5 for TOP2A and 9.0 for GREMl.
  • a 7th atomic classifier is shown in Fig. 7.
  • This classifier predicts "low risk” if PLAU expression is below a predetermined first threshold level and MYBL2 expression is below a predetermined second threshold level; it predicts "high risk” if PLAU expression is below said predetermined first threshold level and MYBL2 a is above said a predetermined second threshold level; it predicts "low risk” if PLAU expression is above said predetermined first threshold level and TOP2A expression is below a predetermined third threshold level; and it predicts "high risk” if PLAU expression is above said predetermined first threshold level and TOP2A expression is above a predetermined third threshold level.
  • Preferred threshold levels are 8.6 for PLAU, 7.4 for MYBL2, and 8.1 for TOP2A.
  • the 6th and the 7th atomic classifiers are referred to as the "Cluster 5".
  • a first majority voting scheme is shown in Fig. 8. Seven atomic classifiers are used, each of them providing one vote (-1 corresponding to "low risk” and +1 corresponding to "high risk”) .
  • the average vote of atomic classifiers 1 to 5 i.e. the "MLPH cluster” is determined, and compared to a predetermined threshold. A preferred threshold is zero.
  • the average vote of atomic classifiers 6 and 7 is calculated and compared to a predetermined threshold.
  • a preferred threshold is zero.
  • the result of the two comparisons is combined by comparing the average of the two average votes to a predetermined threshold. Again, a preferred threshold is zero.
  • An average vote of the two average votes above the threshold results in a final classification in the "high risk” group.
  • a second majority voting scheme is shown in Figure 9. It uses the same 7 atomic classifiers as were used in the previous scheme.
  • the second majority voting scheme however, a comparison step of the average votes from the "MLPH Cluster", and from the "Cluster 5" is not performed. Instead a weighted vote is performed, in which the weighted average is com- puted according to the formula: (2 * MLPH + 5 * Cluster ⁇ ) / 7, wherein MLPH is the average vote of the MLPH Cluster, and Custer ⁇ is the average vote of the Cluster 5.
  • the weighted average so computed is compared to a predetermined threshold, preferably to a zero threshold.
  • a third majority voting scheme uses only the two atomic classifiers of the Cluster 5 for the computation of an average vote.
  • the average vote of atomic classifiers (vi) and (vii) is then compared to a suitable threshold, preferably to a ze- ro threshold.
  • Majority voting schemes can also be designed to classify a sample into one of multiple classes, e.g. three classes. These classes can be a "high risk” class, an "intermediate risk” class and a "low risk” class. In a majority voting scheme this can be achieved if not one, but multiple, e.g. two, thresholds are used for the final classification. For example, an average vote a can be compared to two threshold values tl and t2, with tl ⁇ t2, and a sample can then be classified "low risk", if a ⁇ tl; it can be classified “intermediate risk”, if tl ⁇ a ⁇ t; and it can be classified "high risk", if t2 ⁇ a.
  • the average vote of the MPLH cluster and the Cluster 5 can only be -1, 0 or 1.
  • "low risk” is classified, if the average vote is -1, "intermediate risk” is classified if the average vote is 0, and "high risk” is classified if the average vote is 1.
  • U-plasminogen activator Contains Urokinase-type plasmino- gen activator long chain A Urokinase-type plasminogen acti- vator short chain A

Abstract

The present invention relates to methods, kits and systems for the prognosis of cancer in untreated patients. More preferably, the present invention relates to the prognosis of breast cancer based on measurements of the expression levels of marker genes in tumor samples of breast cancer patients using a majority voting scheme. Marker genes are disclosed which allow for an accurate prognosis of cancer in cancer patients.

Description

CANCER PROGNOSIS BY MAJORITY VOTING
Technical Field
The present invention relates to methods, kits and systems for the prognosis of cancer in untreated patients, preferably breast cancer patients. More specifically, the present invention relates to the prognosis of cancer based on measurements of the expression levels of marker genes in tumor samples. Marker genes are disclosed which allow for an accurate classification of cancer patients using a majority voting scheme, comprising multiple "atomic" classifiers.
Background of the Invention
Breast cancer is one of the leading causes of cancer death in women in western countries. Breast cancer claims the lives of approximately 40,000 women and is diagnosed in approximately 200,000 women annually in the United States alone. Over the last few decades, adjuvant systemic therapy has led to markedly improved survival in early breast cancer. This clinical experience has led to consensus recommendations offering adjuvant systemic therapy for the vast majority of breast cancer patients. In breast cancer a multitude of treatment op- tions are available which can be applied in addition to the routinely performed surgical removal of the tumor and subsequent radiation of the tumor bed.
Yet most, if not all of the different drug treatments have numerous potential adverse effects which can severely impair patients' quality of life. This makes it mandatory to select the treatment strategy on the basis of a careful risk assessment for the individual patient to avoid over- as well as under treatment.
Of particular importance, in this regard, is an assessment of a possible outcome of cancer, in particular, breast cancer, in untreated patients. This is also referred to the "progno- sis" of breast cancer in a patient (as opposed to e.g. the "prediction" of the possible outcome of a cancer therapy) . Determination of the risk for an unfavorable outcome of a disease (e.g. death of disease within a predetermined period of time) is a valuable basis for deciding on the best possible treatment strategy for a cancer patient.
Expression levels of marker genes could be linked to cancer prognosis by several investigators using supervised analysis methods that are assumed to be more appropriate for class prediction studies. Van ' t Veer et al . identified a prognostic signature consisting of 70 and 231 genes in a finding cohort of 78 sporadic breast cancers of node negative women younger than 53 years of age (Van't Veer et al . , 2002. Gene expres- sion profiling predicts clinical outcome of breast cancer. Nature 415: 530-536; Van de Vijver et al . , 2002. A gene- expression signature as a predictor of survival in breast cancer. N Eng J Med 347 : 1999-2009) . They used a case versus control statistics, with development of metastasis within five years defined as case and disease free survival of more than five years as control, and found that the expression values of at least 70 genes could be used to calculate an average "good prognosis" profile. Unknown tumor samples were classified by correlation of the gene expression of these 70 genes to the good prognosis signature.
Majority voting is a known method for robust classification based on multivariate data (e.g. James, G. (1998) "Majority Vote Classifiers: Theory and Applications", Stanford Univer- sity Doctoral Thesis, available online at http: //www- rcf.usc.edu/~gareth/research/thesis.ps). This document does not disclose the application of majority voting schemes to the prediction of cancer from expression profiling data.
Majority voting classification has been applied to the classification of breast cancer patients in non-published Patent Applications PCT/EP2006/005717 claiming priority of GB0512299 (16 June 2005), and EP 06020209.0 of 27 September 2006. The specific classification scheme of the present invention (using multiple multivariate atomic classifiers), however, is not disclosed.
In regard to the continuing need for materials and methods useful in making clinical decisions on cancer therapy, the present invention fulfills the need for advanced methods for the prognosis of breast cancer on the basis of readily accessible experimental data.
Summary of the Invention
The present invention is based on the surprising finding that the outcome of cancer, preferably breast cancer in patients which do not receive chemotherapy can be accurately predicted from the expression levels of a small number of marker genes, using a majority voting classification scheme. Accordingly, the present invention relates to prognostic methods for the determination of the outcome of breast cancer in non-treated breast cancer patients, using information on the expression of a small number of highly informative marker genes. Methods of the invention use simple "atomic" classifiers, each based on a small number of marker genes, which can classify a tumor sample into 2 or multiple prognostic groups, such as a "high risk" group and a "low risk group". The results of multiple atomic classifiers are combined by majority voting to an accurate and robust prognosis.
Detailed description of the Invention
"Prognosis", within the meaning of the invention, shall be understood to be the prediction of the outcome of a disease under conditions where no systemic chemotherapy is applied in the adjuvant setting. In preferred methods of the invention, prognosis is an estimation of the likelihood of metastasis fee survival of said patient over a predetermined period of time, e.g. over a period of 5 years. In further preferred methods of the invention, said prognosis is an estimation of the likelihood of death of disease of said patient over a predetermined period of time, e.g. over a period of 5 years.
A "tumor sample" shall be understood to be any sample taken from a tumor of the respective patient. Tumor samples can be taken by biopsy, needle biopsy, by surgery or any other way of obtaining samples of a tumor of a cancer patient. Formalin-fixed paraffin embedded (FFPE) samples are preferred. Fresh-frozen samples can also be used.
A "majority vote", within the meaning of the invention shall be understood to be any algorithm combining votes of multiple classifiers in terms of a majority decision. The term "majority vote" takes thus the ordinary meaning of this term in the art of statistics.
A "weighted majority vote" is understood to be a majority vote in which the votes of the individual classifiers are weighted by multiplication with a suitable scaling factor. In a weighted majority vote at least some of the scaling factors are different from 1.
A "consecutive majority vote", within the meaning of the invention, shall be understood to be a majority vote that com- bines the outcome of multiple votes, at least one of which multiple votes represents the outcome of a majority vote itself .
A "classifier", within the meaning of the invention, shall be understood to be any algorithm or scheme for classifying objects into one of multiple classes, based on properties of said object. Preferred objects of this invention are cancer patients, and preferred properties of said patients, within this invention, are expression levels of marker genes.
A "marker gene", within the meaning of the invention, shall be understood as being any gene of a patient, the expression level of which provides information useful in the prognosis of cancer in said patient. Preferred marker genes of the invention are those mentioned in the claims and examples section .
A "multivariate classifier", within the meaning of the invention, shall be understood to be a classifier which depends on more than one variable, i.e., depends on the expression level of more than one marker gene.
A "risk class", within the meaning of the invention, shall be understood to relate to a multitude of patients which are binned in a group of patients according to their risk of unfavorable disease outcome. Unfavorable disease outcome, in this regard, can be death of disease within a certain period of time, e.g. within 5 years. "Risk classes" of the current invention can be "high risk" classes, "low risk" classes and optionally "intermediate risk" classes. Any other definition of "risk classes", however, can be assumed as well.
An "expression level", within the present invention, shall be understood to be any measure for the strength of expression of a gene in a tissue sample. In preferred embodiments of the invention, the average expression of a gene in a tissue sample is measured.
A "system for the prognosis of cancer", within the meaning of the invention, shall be understood any collection of equipment, hardware and software that is capable of performing a prognosis of cancer, e.g. in accordance to the present inven- tion. The system can be built from separate pieces of hardware (and software), but can also be integral, e.g. combined in a single piece of hardware, having all necessary features combined within a housing.
The present invention relate to a method of prognosis of cancer in a patient from a tumor sample of said patient comprising the steps of: determining the expression level of a first, a second and a third marker gene of multiple atomic classifiers, said atomic classifiers being multivariate classifiers; performing a classification of said sample into one of multiple risk classes, for each of said multiple atomic classifiers; performing a majority vote using the outcome of said multiple classifications.
In preferred methods of the invention, said majority vote is a weighted majority vote. This leads to improved sensitivity and specificity of the method.
In preferred methods of the invention, the method in- volves two consecutive majority votes. This can further improve the sensitivity and specificity of the methods of the invention .
In preferred methods of the invention, said cancer is breast cancer or ovarian cancer.
In preferred methods of the invention, said determination of expression levels is in a formalin-fixed paraffin embedded sample. Formalin-fixed paraffin embedded samples are rou- tinely prepared, when tissue samples are taken from tumors of cancer patients. Alternatively, said determination of expression levels is in a fresh-frozen sample.
In preferred methods of the invention, the prognosis is a classification of said patient into one of two distinct classes, said classes being a "high risk" class and a "low risk" class. Alternatively, said prognosis is a classification of the patient into one of three classes, said three classes corresponding to a "high risk" class, an "intermedi- ate risk" class and a "low risk" class. In preferred methods of the invention, said risk is a risk of death of said patient within a predetermined period of time, e.g. within 5 years from the taking of the sample.
In preferred methods of the invention, said multiple atomic classifiers are distinct atomic classifiers selected from the group consisting of:
(i) an atomic classifier predicting "high risk" if PGR expression is below a predetermined first threshold level and ESRl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is below said predetermined first threshold level and ESRl expression is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if
PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(ii) an atomic classifier predicting "high risk" if PGR expression is below a predetermined first threshold level and ILlRl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is below said predetermined first threshold level and ILlRl expression is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(iii) an atomic classifier predicting "high risk" if PGR expression is below a predetermined first threshold Ie- ve1 ; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if
PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(iv) an atomic classifier predicting "low risk" if PGR expression is below a predetermined first threshold level and TOP2A expression is below a predetermined second threshold level; said atomic classifier predicting "high risk" if PGR expression is below said predetermined first threshold level and TOP2A a is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level; (v) an atomic classifier predicting "low risk" if PGR expression is below a predetermined first threshold level and UBE2C expression is below a predetermined second threshold level; said atomic classifier predicting "high risk" if PGR expression is below said predetermined first threshold level and UBE2C a is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if
PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(vi) an atomic classifier predicting "high risk" if TOP2A expression is below a predetermined first threshold level and GREMl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk" if TOP2A expression is below said predetermined first threshold level and GREMl a is above said a predetermined second threshold level; said atomic classifier predicting "high risk" if TOP2A expression is above said predetermined first threshold level; and (vii) an atomic classifier predicting "low risk" if PLAU expression is below a predetermined first threshold level and MYBL2 expression is below a predetermined second threshold level; said atomic classifier predicting "high risk" if PLAU expression is below said predetermined first threshold level and MYBL2 a is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PLAU expression is above said predetermined first threshold level and TOP2A expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PLAU expression is above said predetermined first threshold level and TOP2A expression is above a predetermined third threshold level.
In preferred methods of the invention only atomic classifier (vi) and atomic classifier (vii) , above, are used in the ma- jority voting step. Alternatively, all atomic classifiers of said group of atomic classifiers are used for majority voting. In yet another alternative, only a single atomic classifiers of said group of atomic classifiers are used for ma- jority voting.
In preferred methods of the invention, an expression level of a marker gene is substituted with the expression level of a substitute gene, said substitute gene being co-regulated with said marker gene.
In preferred methods of the invention the threshold for PGR expression is about 3.8, the threshold for ESRl expression is about 6.2, the threshold for MLPH expression is about 11.4, the threshold for ILlRl expression is about 7.2, the threshold for TOP2A expression is about 9.8, the threshold for UBE2C expression is about 10.0, the threshold for TOP2A expression is about 9.8 in ato- mic classifier (iv) , the threshold for TOP2A expression is about 8.5 in atomic classifier (vi) , the threshold for TOP2A expression is about 8.1 in atomic classifier (vii) , the threshold for GREMl expression is about 9.0, the threshold for PLAU expression is about 8.6, and the threshold for MYBL2 expression is about 7.4.
The present invention also relates to a system for the prog- nosis of cancer in a patient from samples taken from said patient, said system comprising means for determining the expression level of a first, a second and a third marker gene of multiple atomic classifiers; means for performing multiple classifications of said sample into one of multiple risk classes, with each of said multiple atomic classifiers; means for performing a majority vote using the outcome of said multiple classifications. In preferred systems of the invention, said means for determining the expression level is a gene chip system, a real time PCR system.
In preferred systems of the invention, said means for performing multiple classifications is a separate personal computer, or a computer integral with the remaining components of the system.
In preferred systems of the invention, the computer receives expression level measurements directly from said means for determining the expression level. This reduces labor needed to perform the methods of the invention on said system.
In preferred systems of the invention, a user can choose between at least two majority voting schemes. This allows to compare the outcome of two or more distinct majority voting schemes, which is helpful to increase the confidence in the obtained results.
The present invention further relates to methods of prognosis, wherein an expression level of a marker gene is substituted with the expression level of a substitute gene, said substitute gene being co-regulated with said marker gene. A first gene is considered to be co-regulated with a second gene, if the expression level of said first gene is increased if the expression level of the second gene is increased, and if the expression level of said first gene is decreased if the expression level of said second gene in decreased.
Also envisioned are methods in which substitute genes are used as marker genes. The substitute genes are normally those genes, which are co-regulated with the original marker genes, thus carry the same information content as the original marker genes. Examples :
Tumor sample selection
Tissue samples were collected from a large number of test objects afflicted with breast cancer. Clinical data on all test objects was available. In particular, data on the survival of the test objects over a period of 5 years was available for all test objects. Tissue samples were formalin fi- xed paraffin embedded (FFPE) samples of tumors of the test objects. These samples are routinely prepared for cancer patients .
Methods of the present invention can also be applied to fresh frozen samples, but it is envisioned that the present methods are optimized for FFPE samples.
Determination of expression levels
[[... detailed description of expression level measurements, or reference to standard method, must be the method that results in values for comparison with the thresholds cited in the claims ... ] ]
Identification of marker genes
Test data was collected by determining the gene expression of a large number of potential marker genes in multiple test objects. Test objects were divided into two groups, namely a "case" group (i.e. objects that deceased of cancer within 5 years after biopsy) and a "control" group (i.e. objects with >5 years survival after biopsy) .
70% of the available test data was used as a "training set" and 30% of the available data was used as a "test set" for validation. Only the training set was used for the identification of suitable marker genes. Genes were selected that showed a well measurable bimodal distribution of the expression levels in the case and control group of the training set. Genes that showed a significant difference in their mean expression level, taking into ac- count the standard deviation in the respective expression level measurements were preferred.
Identification of atomic classifiers
With a small set of marker genes, "atomic" classifiers were constructed. Atomic classifiers which showed a sensitivity of larger than 95% (i.e. more than 95% of the test objects of the "case" group of the training set were classified "high risk"), and specificity of >50% (i.e. [[... add definition of specificity ...]] were selected. Suitable cutoff values for the expression levels of the marker genes were determined from the bimodal distribution of the expression level measurement. [[... add more detailed description of determination of cutoff values here ...]].
Using these atomic classifiers, a majority voting scheme was constructed to obtain a global classifier combining the outcomes of the atomic classifiers, thus rendering the classification more accurate and robust. An exhaustive combinatorial approach was taken to identify the best possible combination of marker genes and classification rules.
The following atomic classifiers were identified:
A first atomic classifier is shown in Fig. 1. It predicts
"high risk" if PGR expression expression is below a predetermined first threshold level and ESRl expression is below a predetermined second threshold level; it predicts "low risk" if PGR expression is below said predetermined first threshold level and ESRl expression is above said a predetermined second threshold level; it predicts "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level. Threshold levels are preferably 3.8 for PGR expression, 6.2 for ESRl, and 11.4 for MLPH.
A second atomic classifier is shown in Fig. 2. It predicts "high risk" if PGR expression is below a predetermined first threshold level and ILlRl expression is below a predetermined second threshold level; it predicts "low risk" if PGR expression is below said predetermined first threshold level and ILlRl expression is above said a predetermined second threshold level; it predicts "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; or it predicts "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level. Threshold levels are preferably 3.8 for PGR, 7.2 for ILlRl, 11.4 for MLPH.
A 3rd atomic classifier is shown in Fig. 3. Said classifier predicts "high risk" if PGR expression is below a predetermined first threshold level; it predicts "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level. Threshold levels are preferably 3.8 for PGR and 11.4 for MLPH.
A 4th atomic classifier is shown in Fig. 4. Said classifier predicts "low risk" if PGR expression is below a predetermined first threshold level and TOP2A expression is below a predetermined second threshold level; it predicts "high risk" if PGR expression is below said predetermined first threshold level and TOP2A expression is above said a predetermined second threshold level; it predicts "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level. Preferred thresholds are 3.8 for PGR expression, 9.8 for TOP2A, 11.4 for MLPH.
A 5th atomic classifier is shown in Fig. 5. This classifier predicts "low risk" if PGR expression is below a predeter- mined first threshold level and UBE2C expression is below a predetermined second threshold level; it predicts "high risk" if PGR expression is below said predetermined first threshold level and UBE2C a is above said a predetermined second threshold level; it predicts "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and it predicts "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level. Preferred thresholds for this classifier are 3.8 for PGR, 10.0 for UBE2C and 11.4 for MLPH.
The 1st to 5th atomic classifiers depend on the expression of the MLPH gene. They are referred to as the classifiers of the "MLPH cluster".
A 6th atomic classifier is shown in Fig. 6. This classifier predicts "high risk" if TOP2A expression is below a predetermined first threshold level and GREMl expression is below a predetermined second threshold level; it predicts "low risk" if TOP2A expression is below said predetermined first threshold level and GREMl a is above said a predetermined second threshold level; and it predicts "high risk" if TOP2A expression is above said predetermined first threshold level. Pre- ferred thresholds for this classifier are 8.5 for TOP2A and 9.0 for GREMl. A 7th atomic classifier is shown in Fig. 7. This classifier predicts "low risk" if PLAU expression is below a predetermined first threshold level and MYBL2 expression is below a predetermined second threshold level; it predicts "high risk" if PLAU expression is below said predetermined first threshold level and MYBL2 a is above said a predetermined second threshold level; it predicts "low risk" if PLAU expression is above said predetermined first threshold level and TOP2A expression is below a predetermined third threshold level; and it predicts "high risk" if PLAU expression is above said predetermined first threshold level and TOP2A expression is above a predetermined third threshold level. Preferred threshold levels are 8.6 for PLAU, 7.4 for MYBL2, and 8.1 for TOP2A.
The 6th and the 7th atomic classifiers are referred to as the "Cluster 5".
It was found that the above atomic predictors can be used in a majority voting scheme to achieve a more accurate and ro- bust prediction of cancer.
Majority voting
The following majority voting schemes gave particularly good results.
A first majority voting scheme is shown in Fig. 8. Seven atomic classifiers are used, each of them providing one vote (-1 corresponding to "low risk" and +1 corresponding to "high risk") . The average vote of atomic classifiers 1 to 5 (i.e. the "MLPH cluster") is determined, and compared to a predetermined threshold. A preferred threshold is zero. Then, the average vote of atomic classifiers 6 and 7 (the "Cluster 5") is calculated and compared to a predetermined threshold. A preferred threshold is zero. Finally, the result of the two comparisons is combined by comparing the average of the two average votes to a predetermined threshold. Again, a preferred threshold is zero. An average vote of the two average votes above the threshold results in a final classification in the "high risk" group.
A second majority voting scheme is shown in Figure 9. It uses the same 7 atomic classifiers as were used in the previous scheme. The second majority voting scheme, however, a comparison step of the average votes from the "MLPH Cluster", and from the "Cluster 5" is not performed. Instead a weighted vote is performed, in which the weighted average is com- puted according to the formula: (2 * MLPH + 5 * Clusterδ) / 7, wherein MLPH is the average vote of the MLPH Cluster, and Custerδ is the average vote of the Cluster 5. The weighted average so computed is compared to a predetermined threshold, preferably to a zero threshold.
A third majority voting scheme uses only the two atomic classifiers of the Cluster 5 for the computation of an average vote. The average vote of atomic classifiers (vi) and (vii) is then compared to a suitable threshold, preferably to a ze- ro threshold.
Majority voting schemes can also be designed to classify a sample into one of multiple classes, e.g. three classes. These classes can be a "high risk" class, an "intermediate risk" class and a "low risk" class. In a majority voting scheme this can be achieved if not one, but multiple, e.g. two, thresholds are used for the final classification. For example, an average vote a can be compared to two threshold values tl and t2, with tl < t2, and a sample can then be classified "low risk", if a < tl; it can be classified "intermediate risk", if tl < a < t; and it can be classified "high risk", if t2 < a.
In a majority voting scheme of Figure 8, the average vote of the MPLH cluster and the Cluster 5 can only be -1, 0 or 1. In a preferred embodiment, "low risk" is classified, if the average vote is -1, "intermediate risk" is classified if the average vote is 0, and "high risk" is classified if the average vote is 1.
Marker genes
The following marker genes are used in methods of the invention. They are identified by their respective entries in the
UniProtKB/Swiss-Prot database.
(see e.g. http://www.expasy.org/uniprot)
ESRl:
Entry name ESR1_HUMAN
Primary accession number P03372
Secondary accession numbers Q13511 Q14276 Q9NU51 Q9UDZ7 Q9UIS7
Integrated into Swiss-Prot on July 21, 1986
Sequence was last modified on June 1, 1994 (Sequence version 2)
Annotations were last modified on November 14, 2006 (Entry version 106)
Protein name Estrogen receptor
Synonyms ER Estradiol receptor ER-alpha Gene name Name: ESRl
Synonyms: ESR, NR3A1
From Homo sapiens (Human) [TaxID:
9606]
GREMl:
Entry name GREM1_HUMAN
Primary accession number 060565
Secondary accession numbers Q52LV3 Q8N914 Q8N936 Integrated into Swiss-Prot on April 12, 2005 Sequence was last modified on August 1, 1998 (Sequence version 1)
Annotations were last modified on October 31, 2006 (Entry version 38) Protein name Gremlin-1 [Precursor]
Synonyms Cysteine knot superfamily 1
BMP antagonist 1
Increased in high glucose protein 2
IHG-2
Down-regulated in Mos-transformed cells protein
Proliferation-inducing gene 2 protein
Gene name Name: GREMl
Synonyms: CKTSFlBl, DAND2, DRM,
PIG2
From Homo sapiens (Human) [TaxID: 9606]
ILlRl :
Entry name IL1R1_HUMAN
Primary accession number P14778 Secondary accession numbers None
Integrated into Swiss-Prot on April 1, 1990 Sequence was last modified on April 1, 1990 (Sequence version 1)
Annotations were last modified on November 14, 2006 (Entry version 94)
Protein name Interleukin-1 receptor type I
[Precursor]
Synonyms IL-lR-1
IL-IRTl IL-lR-alpha p80
CD121a antigen Gene name Name: ILlRl
Synonyms: ILlR, ILlRA, ILlRTl From Homo sapiens (Human) [TaxID:
9606] MLPH :
Entry name MELPH_HUMAN
Primary accession number Q9BV36
Secondary accession number Q9HA71 Integrated into Swiss-Prot on June 16, 2003
Sequence was last modified on June 1, 2001 (Sequence version 1)
Annotations were last modified on November 14, 2006 (Entry version 45) Protein name Melanophilin
Synonyms Exophilin-3 Synaptotagmin-like protein 2a SIp homolog lacking C2 domains a
Gene name Name: MLPH Synonyms: SLAC2A
From Homo sapiens (Human) [TaxID:
9606]
MYBL2: Entry name MYBB_HUMAN
Primary accession number P10244
Secondary accession numbers None
Integrated into Swiss-Prot on July 1, 1989
Sequence was last modified on July 1, 1989 (Sequence version 1)
Annotations were last modified on November 14, 2006
(Entry version 79)
Protein name Myb-related protein B
Synonym B-Myb Gene name Name: MYBL2
Synonyms: BMYB
From Homo sapiens (Human) [TaxID:
9606] PLAU:
Entry name UROK_HUMAN
Primary accession number P00749
Secondary accession numbers Q15844 Q16618 Q969W6
Integrated into Swiss-Prot on July 21, 1986
Sequence was last modified on March 20, 1987 (Sequence version 1)
Annotations were last modified on October 31, 2006 (Entry version 101)
Protein name Urokinase-type plasminogen acti- vator
[Precursor]
Synonyms EC 3.4.21.73 uPA
U-plasminogen activator Contains Urokinase-type plasmino- gen activator long chain A Urokinase-type plasminogen acti- vator short chain A
Urokinase-type plasminogen acti- vator chainB
Gene name Name : PLAU
From Homo sapiens (Human) [TaxID:
9606]
PGR: Entry name PRGR_HUMAN
Primary accession number P06401
Secondary accession number Q9UPF7
Integrated into Swiss-Prot on January 1, l!
Sequence was last modified on March 21, 2006 (Sequence version 4)
Annotations were last modified on November 14, 2006
(Entry version 101)
Protein name Progesterone receptor Synonym PR Gene name Name : PGR
Synonyms: NR3C3
From Homo sapiens (Human) [TaxID: 9606]
TOP2A:
Entry name TOP2A_HUMAN
Primary accession number P11388 Secondary accession numbers Q71UN1 Q71UQ5 Q9HB24 Q9HB25
Q9HB26 Q9UP44 Q9UQP9
Integrated into Swiss-Prot on July 1, 1989
Sequence was last modified on May 4, 2001 (Sequence version 3) Annotations were last modified on November 14, 2006
(Entry version 92)
Protein name DNA topoisomerase 2-alpha
Synonyms EC 5.99.1.3 DNA topoisomerase II, alpha iso- zyme
Gene name Name: TOP2A
Synonyms: TOP2
From Homo sapiens (Human) [TaxID:
9606]
UBE2C:
Entry name UBE2C_HUMAN
Primary accession number 000762
Secondary accession numbers None Integrated into Swiss-Prot on December 15, 1998
Sequence was last modified on July 1, 1997 (Sequence version 1)
Annotations were last modified on October 31, 2006 (Entry version 67) Protein name Ubiquitin-conjugating enzyme E2 C
Synonyms EC 6.3.2.19 Ubiquitin-protein ligase C Ubiquitin carrier protein C UbcHlO Gene name Name: UBE2C
Synonyms: UBCHlO
From Homo sapiens (Human) [TaxID: 9606]

Claims

Claims :
1. Method of prognosis of cancer in a patient from a tumor sample of said patient comprising the steps of: determining the expression level of a first, a second and a third marker gene of multiple atomic classifiers, said atomic classifiers being multivariate classifiers; performing a classification of said sample into one of multiple risk classes, for each of said multiple atomic clas- sifiers; performing a majority vote using the outcome of said multiple classifications.
2. Method of claim 1, wherein said majority vote is a weighted majority vote.
3. Method of any one of the preceding claims, wherein the method involves two consecutive majority votes.
4. Method of any one of the preceding claims, wherein said cancer is breast cancer or ovarian cancer.
5. Method of any one of the preceding claims, wherein said determination of expression levels is in a formalin-fixed pa- raffin embedded sample.
6. Method of any one of the preceding claims 1-4, wherein said determination of expression levels is in a fresh-frozen sample .
6. Method of any one of the preceding claims, wherein said prognosis is a classification of said patient into one of two distinct classes, said classes being a "high risk" class and a "low risk" class.
7. Method of any one of the preceding claims, wherein said prognosis is a classification of the patient into one of three classes, said three classes corresponding to a "high risk" class, an "intermediate risk" class and a "low risk" class .
8. Method of claim 6 or 7, wherein said risk is a risk of death of said patient within a predetermined period of time.
9. Method of any of the preceding claims, wherein said multiple atomic classifiers are distinct atomic classifiers selected from the group consisting of: (i) an atomic classifier predicting "high risk" if PGR expression is below a predetermined first threshold level and ESRl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is below said predetermined first threshold level and ESRl expression is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(ii) an atomic classifier predicting "high risk" if PGR expression is below a predetermined first threshold level and ILlRl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is below said predetermined first threshold level and ILlRl expression is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level; (iϋ) an atomic classifier predicting "high risk" if PGR expression is below a predetermined first threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(iv) an atomic classifier predicting "low risk" if PGR expression is below a predetermined first threshold level and TOP2A expression is below a predetermined second threshold level; said atomic classifier predicting "high risk" if
PGR expression is below said predetermined first threshold level and TOP2A a is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(v) an atomic classifier predicting "low risk" if PGR expression is below a predetermined first threshold level and UBE2C expression is below a predetermined second threshold level; said atomic classifier predicting "high risk" if PGR expression is below said predetermined first threshold level and UBE2C a is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if PGR expression is above said predetermined first threshold level and MLPH expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PGR expression is above said predetermined first threshold level and MLPH expression is above a predetermined third threshold level;
(vi) an atomic classifier predicting "high risk" if TOP2A expression is below a predetermined first threshold level and GREMl expression is below a predetermined second threshold level; said atomic classifier predicting "low risk" if
TOP2A expression is below said predetermined first threshold level and GREMl a is above said a predetermined second threshold level; said atomic classifier predicting "high risk" if TOP2A expression is above said predetermined first threshold level; and
(vii) an atomic classifier predicting "low risk" if PLAU expression is below a predetermined first threshold level and MYBL2 expression is below a predetermined second threshold level; said atomic classifier predicting "high risk" if PLAU expression is below said predetermined first threshold level and MYBL2 a is above said a predetermined second threshold level; said atomic classifier predicting "low risk" if
PLAU expression is above said predetermined first threshold level and TOP2A expression is below a predetermined third threshold level; and said atomic classifier predicting "high risk" if PLAU expression is above said predetermined first threshold level and TOP2A expression is above a predetermined third threshold level.
10. Method of claim 9, wherein only atomic classifier (vi) and atomic classifier (vii) are used in the majority voting step .
11. Method of claim 9, wherein all atomic classifiers of said group of atomic classifiers are used for majority voting.
12. Method of claim 9, wherein only a single atomic classi- fiers of said group of atomic classifiers are used for majority voting.
13. Method of any one of the preceding claims, wherein an expression level of a marker gene is substituted with the ex- pression level of a substitute gene, said substitute gene being co-regulated with said marker gene.
14. Method of any one of claims 9-13, wherein the threshold for PGR expression is about 3.8, wherein the threshold for ESRl expression is about 6.2, wherein the threshold for MLPH expression is about 11.4, wherein the threshold for ILlRl expression is about 7.2, wherein the threshold for TOP2A expression is about 9.8, wherein the threshold for UBE2C expression is about 10.0, wherein the threshold for TOP2A expression is about 9.8 in atomic classifier (iv) , wherein the threshold for TOP2A expression is about 8.5 in atomic classifier (vi) , wherein the threshold for TOP2A expression is about 8.1 in atomic classifier (vii) , wherein the threshold for GREMl expression is about 9.0, wherein the threshold for PLAU expression is about 8.6, and wherein the threshold for MYBL2 expression is about 7.4.
15. A system for the prognosis of cancer in a patient from samples taken from said patient, said system comprising means for determining the expression level of a first, a second and a third marker gene of multiple atomic classifiers; means for performing multiple classifications of said sample into one of multiple risk classes, with each of said multiple atomic classifiers; means for performing a majority vote using the outcome of said multiple classifications.
16. System of claim 15, wherein said means for determining the expression level is a gene chip system, or a real time PCR system.
17. System of claim 15 or 16, wherein said means for per- forming multiple classifications is a separate personal computer, or a computer integral with the remaining components of the system.
18. System of claim 17, wherein the computer receives ex- pression level measurements directly from said means for determining the expression level.
19. System of any one of claims 15-18, wherein a user can choose between at least two majority voting schemes.
PCT/EP2009/050478 2008-01-28 2009-01-16 Cancer prognosis by majority voting WO2009095319A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP08001560 2008-01-28
EP08001560.5 2008-01-28

Publications (1)

Publication Number Publication Date
WO2009095319A1 true WO2009095319A1 (en) 2009-08-06

Family

ID=40428342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/050478 WO2009095319A1 (en) 2008-01-28 2009-01-16 Cancer prognosis by majority voting

Country Status (1)

Country Link
WO (1) WO2009095319A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010003771A1 (en) * 2008-06-16 2010-01-14 Siemens Healthcare Diagnostics Gmbh Molecular markers for cancer prognosis
WO2010003773A1 (en) * 2008-06-16 2010-01-14 Siemens Medical Solutions Diagnostics Gmbh Algorithms for outcome prediction in patients with node-positive chemotherapy-treated breast cancer
US10301685B2 (en) 2013-02-01 2019-05-28 Sividon Diagnostics Gmbh Method for predicting the benefit from inclusion of taxane in a chemotherapy regimen in patients with breast cancer
EP3556867A1 (en) * 2009-11-23 2019-10-23 Genomic Health, Inc. Methods to predict clinical outcome of cancer
US10577661B2 (en) 2010-03-31 2020-03-03 Myriad International Gmbh Method for breast cancer recurrence prediction under endocrine treatment
US11505832B2 (en) 2017-09-08 2022-11-22 Myriad Genetics, Inc. Method of using biomarkers and clinical variables for predicting chemotherapy benefit

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004079014A2 (en) * 2003-03-04 2004-09-16 Arcturus Bioscience, Inc. Signatures of er status in breast cancer
WO2006052731A2 (en) * 2004-11-05 2006-05-18 Genomic Health, Inc. Molecular indicators of breast cancer prognosis and prediction of treatment response
WO2006084272A2 (en) * 2005-02-04 2006-08-10 Rosetta Inpharmatics Llc Methods of predicting chemotherapy responsiveness in breast cancer patients
WO2006093507A2 (en) * 2005-02-25 2006-09-08 H. Lee Moffitt Cancer Center And Research Institute, Inc. Methods and systems for predicting cancer outcome
US20070099209A1 (en) * 2005-06-13 2007-05-03 The Regents Of The University Of Michigan Compositions and methods for treating and diagnosing cancer

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004079014A2 (en) * 2003-03-04 2004-09-16 Arcturus Bioscience, Inc. Signatures of er status in breast cancer
WO2006052731A2 (en) * 2004-11-05 2006-05-18 Genomic Health, Inc. Molecular indicators of breast cancer prognosis and prediction of treatment response
WO2006084272A2 (en) * 2005-02-04 2006-08-10 Rosetta Inpharmatics Llc Methods of predicting chemotherapy responsiveness in breast cancer patients
WO2006093507A2 (en) * 2005-02-25 2006-09-08 H. Lee Moffitt Cancer Center And Research Institute, Inc. Methods and systems for predicting cancer outcome
US20070099209A1 (en) * 2005-06-13 2007-05-03 The Regents Of The University Of Michigan Compositions and methods for treating and diagnosing cancer

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010003771A1 (en) * 2008-06-16 2010-01-14 Siemens Healthcare Diagnostics Gmbh Molecular markers for cancer prognosis
WO2010003773A1 (en) * 2008-06-16 2010-01-14 Siemens Medical Solutions Diagnostics Gmbh Algorithms for outcome prediction in patients with node-positive chemotherapy-treated breast cancer
EP2469440A3 (en) * 2008-06-16 2014-01-01 Sividon Diagnostics GmbH Molecular markers for cancer prognosis
EP3556867A1 (en) * 2009-11-23 2019-10-23 Genomic Health, Inc. Methods to predict clinical outcome of cancer
EP3739060A1 (en) * 2009-11-23 2020-11-18 Genomic Health, Inc. Methods to predict clinical outcome of cancer
US10577661B2 (en) 2010-03-31 2020-03-03 Myriad International Gmbh Method for breast cancer recurrence prediction under endocrine treatment
US10851427B2 (en) 2010-03-31 2020-12-01 Myriad International Gmbh Method for breast cancer recurrence prediction under endocrine treatment
US11913078B2 (en) 2010-03-31 2024-02-27 Myriad International Gmbh Method for breast cancer recurrence prediction under endocrine treatment
US10301685B2 (en) 2013-02-01 2019-05-28 Sividon Diagnostics Gmbh Method for predicting the benefit from inclusion of taxane in a chemotherapy regimen in patients with breast cancer
US11505832B2 (en) 2017-09-08 2022-11-22 Myriad Genetics, Inc. Method of using biomarkers and clinical variables for predicting chemotherapy benefit

Similar Documents

Publication Publication Date Title
Ross et al. Tissue-based genomics augments post-prostatectomy risk stratification in a natural history cohort of intermediate-and high-risk men
JP6058780B2 (en) Prognosis prediction of colorectal cancer
Wang et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer
JP6351112B2 (en) Gene expression profile algorithms and tests to quantify the prognosis of prostate cancer
AU2005304824B2 (en) Predicting response to chemotherapy using gene expression markers
EP1410011B1 (en) Diagnosis and prognosis of breast cancer patients
CN103502473B (en) The prediction of gastro-entero-pancreatic tumor (GEP-NEN)
US8788444B2 (en) Data analysis method and system
US20120066163A1 (en) Time to event data analysis method and system
Mohamed et al. Pasireotide and octreotide antiproliferative effects and sst2 trafficking in human pancreatic neuroendocrine tumor cultures
Livshits et al. Pathway-based personalized analysis of breast cancer expression data
DK2158332T3 (en) PROGRAM FORECAST FOR MELANANCANCES
KR101672531B1 (en) Genetic markers for prognosing or predicting early stage breast cancer and uses thereof
CN107709636A (en) For diagnosing or detecting the method and composition of lung cancer
US20110224908A1 (en) Gene signature for diagnosis and prognosis of breast cancer and ovarian cancer
WO2009095319A1 (en) Cancer prognosis by majority voting
WO2012066451A1 (en) Prognostic and predictive gene signature for colon cancer
WO2010063121A1 (en) Methods for biomarker identification and biomarker for non-small cell lung cancer
Borup et al. Molecular signatures of thyroid follicular neoplasia
Huang et al. Molecular portrait of breast cancer in C hina reveals comprehensive transcriptomic likeness to C aucasian breast cancer and low prevalence of luminal A subtype
WO2009094318A2 (en) Molecular staging of stage ii and iii colon cancer and prognosis
Simon Analysis of DNA microarray expression data
Griffith et al. A robust prognostic signature for hormone-positive node-negative breast cancer
EP2406729B1 (en) A method, system and computer program product for the systematic evaluation of the prognostic properties of gene pairs for medical conditions.
US20140018253A1 (en) Gene expression panel for breast cancer prognosis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09706633

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09706633

Country of ref document: EP

Kind code of ref document: A1