WO2005106453A2 - Mass spectrometer - Google Patents

Mass spectrometer Download PDF

Info

Publication number
WO2005106453A2
WO2005106453A2 PCT/GB2005/001679 GB2005001679W WO2005106453A2 WO 2005106453 A2 WO2005106453 A2 WO 2005106453A2 GB 2005001679 W GB2005001679 W GB 2005001679W WO 2005106453 A2 WO2005106453 A2 WO 2005106453A2
Authority
WO
WIPO (PCT)
Prior art keywords
sample
analytes
molecules
components
mass
Prior art date
Application number
PCT/GB2005/001679
Other languages
French (fr)
Other versions
WO2005106453A3 (en
Inventor
Richard Denny
Keith Richardson
John Skilling
Original Assignee
Micromass Uk Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0409677A external-priority patent/GB0409677D0/en
Application filed by Micromass Uk Limited filed Critical Micromass Uk Limited
Priority to EP05740580.5A priority Critical patent/EP1745500B1/en
Priority to CA2564279A priority patent/CA2564279C/en
Priority to US11/568,408 priority patent/US8012764B2/en
Priority to JP2007510124A priority patent/JP5009784B2/en
Publication of WO2005106453A2 publication Critical patent/WO2005106453A2/en
Publication of WO2005106453A3 publication Critical patent/WO2005106453A3/en

Links

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • G01N30/7233Mass spectrometers interfaced to liquid or supercritical fluid chromatograph
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/02Details
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/11Automated chemical analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/24Nuclear magnetic resonance, electron spin resonance or other spin effects or mass spectrometry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/25Chemistry: analytical and immunological testing including sample preparation

Definitions

  • the present invention relates to a method of mass spectrometry and a mass spectrometer.
  • the preferred embodiment relates to a method which allows relative quantitation of analyte compounds especially where incomplete and noisy measurements are made.
  • the preferred embodiment is particularly applicable to the measurement and quantitation of peptide digest products or daughter compound abundances.
  • the preferred embodiment relates to relative Bayesian quantitation of analyte/daughter groups.
  • the preferred embodiment relates to a probabilistic or Bayesian approach to determining the relative quantitation of a component, molecule or analyte present in two or more samples .
  • Bayesian probability theory handles probabilities of statements. Probabilities tell how certain those statements are true.
  • Bayesian reasoning So called Bayes rule defines how a rational agent changes its beliefs when it gets new information (evidence) . Bayesian probabilities or certainties are always conditional. This means that probabilities are estimated in the context of some background assumptions. Conditional probabilities are usually written using the notation P (Thing I Assumption) . The probabilities are numbers between zero and one that tell how certain it is that Thing is true when it is believed that the Assumption is true.
  • Conditional probabilities are often written in the form P(D
  • P (M) or P(D) are generally considered to be imprecise Bayesian notations, since all the probabilities are actually conditional. However, sometimes, when all the terms have the same background assumptions then it may not be necessary to repeat them.
  • probabilities should be written in the form P(D
  • Expert systems often calculate the probabilities of inter-dependent events by giving each parent event a weighting.
  • Bayesian Belief Networks are considered to provide a mathematically correct and therefore more accurate method of measuring the effects of events on each other. The mathematics involved enables calculations to be made in both directions. So it is possible, for example, to find out which event was the most likely cause of another.
  • p(AB) means the probability of A and B happening. This is a special case of the following Product Rule for dependent events, where p(A
  • B) means the probability of A given that B has already occurred: p(AB) p(A) * p(B
  • A) p(AB) p(B) * p(A
  • ⁇ 0 can be taken to be a hypothesis which may have been developed ab initio or induced from some preceding set of observations, but before the new observation or evidence E.
  • the term P( ⁇ 0 ) is called the prior probability of H 0 .
  • H 0 ) is the conditional probability of seeing the observation E given that the hypothesis H 0 is true - as a function of H 0 given E, it is called the likelihood function.
  • P(E) is called the marginal probability of E and it is a normalizing constant and can be calculated as the sum of all mutually exclusive hypotheses:
  • ⁇ ) is called the posterior probability of H 0 given E.
  • H 0 )/P(E) gives a measure of the impact that the observation has on belief in the hypothesis. If it is unlikely that the observation will be made unless the particular hypothesis being considered is true, then this scaling factor will be large. Multiplying this scaling factor by the prior probability of the hypothesis being correct gives a measure of the posterior probability of the hypothesis being correct given the observation.
  • the keys to making the inference work is the assigning of the prior probabilities given to the hypothesis and possible alternatives, and the calculation of the conditional probabilities of the observation under different hypotheses. In the analysis of multiple biological samples or a complex mixture of biological samples it may be desired to compare the relative concentrations of component compounds .
  • a protein or peptide is expressed differently in two or more different samples.
  • One sample may, for example, comprise a sample taken from a healthy organism, whilst the other sample may comprise a sample taken from a patient. If a particular protein or peptide is expressed to a significantly greater or lesser extent in the patient sample relative to the sample taken from a healthy organism (i.e. control sample) then this may be indicative of a disease state.
  • Complex mixtures of biological samples can be analysed using a mass spectrometer preferably in combination with a liquid chromatograph . It is known to use the ion intensity or ion count rate recorded by a mass spectrometer as a measure of the concentration of each peptide.
  • the data relating to each sample is, however, subject to various systematic errors such as injection volume errors as well as various non-systematic effects such as counting statistics. Due to the complexity of the samples and the sometimes low concentrations of various components, molecules or analytes in the samples, the data can sometimes or often be incomplete. The data may also include interferences. As a result the assignment of data to components, molecules or analytes or the identification of components, molecules or analytes may be uncertain. According to conventional approaches these factors can cause results that may appear to be anomalous and hence are thus discarded. As a result, it may not always be possible to quantify some components, molecules or analytes present in two or more samples and/or some data may be rejected out of hand when in fact it may not be anomalous .
  • a method of mass spectrometry comprising: providing a first sample comprising a first mixture of components, molecules or analytes; providing a second different sample comprising a second mixture of components, molecules or analytes; and probabilistically determining or quantifying the relative intensity, concentration or expression level of a component, molecule or analyte in the first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in the second sample.
  • a plurality of further samples each comprising a mixture of components, molecules or analytes may be provided.
  • the components, molecules or analytes preferably comprise proteins, protein digest products, peptides or fragments of peptides .
  • the components, molecules or analytes in the first mixture are preferably the same species as the components, molecules or analytes in the second mixture and/or components, molecules or analytes in further mixtures.
  • the components, molecules or analytes in the first mixture may be different species to the components, molecules or analytes in the second mixture and/or to components, molecules or analytes in further mixtures.
  • the method preferably further comprises: digesting the first mixture of components, molecules or analytes; and/or digesting the second mixture of components, molecules or analytes; and/or digesting further mixtures of components, molecules or analytes.
  • the first mixture of components, molecules or analytes is digested to form a first complex mixture; and/or the second mixture of components, molecules or analytes is digested to form a second complex mixture; and/or further mixtures of components, molecules or analytes are digested to form further complex mixtures.
  • the complex mixtures preferably comprise complex mixtures of peptides or protein digest products .
  • the method further comprises : dividing the first sample into one or more first replicate samples; and/or dividing the second sample into one or more second replicate samples; and/or dividing further samples into one or more further replicate samples; and/or dividing the first complex mixture into one or more first replicate samples; and/or dividing the second complex mixture into one or more second replicate samples; and/or dividing the further complex mixtures into one or more further replicate samples .
  • the method further comprises: separating components, analytes or molecules in the first sample by means of a separation process; and/or separating components, analytes or molecules in the second sample by means of a separation process; and/or separating components, analytes or molecules in further samples by means of a separation process; and/or separating components, analytes or molecules in the first replicate samples by means of a separation process; and/or separating components, analytes or molecules in the second replicate samples by means of a separation process; and/or separating components, analytes or molecules in further replicate samples by means of a separation process .
  • the separation process preferably comprises liquid chromatography.
  • the separation process may comprise: (i) High Performance Liquid Chromatography ("HPLC”); (ii) anion exchange; (iii) anion exchange chromatography; (iv) cation exchange; (v) cation exchange chromatography; (vi) ion pair reversed-phase chromatography; (vii) chromatography; (viii) single dimensional electrophoresis; (ix) multi-dimensional electrophoresis; (x) size exclusion; (xi) affinity; (xii) revere phase chromatography; (xiii) Capillary Electrophoresis Chromatography (“CEC”); (xiv) electrophoresis; (xv) ion mobility separation; (xvi) Field Asymmetric Ion Mobility Separation or Spectrometry (“FAIMS”); or (xvi) capillary electrophoresis.
  • HPLC High Performance Liquid Chromatography
  • anion exchange anion exchange chromatography
  • iv cation exchange
  • the method preferably further comprises : ionising components, analytes or molecules in the first sample; and/or ionising components, analytes or molecules in the second sample; and/or ionising components, analytes or molecules in further samples; and/or ionising components, analytes or molecules in the first replicate samples; and/or ionising components, analytes or molecules in the second replicate samples; and/or ionising components, analytes or molecules in further replicate samples.
  • the method preferably further comprises : mass analysing components, analytes or molecules in the first sample; and/or mass analysing components, analytes or molecules in the second sample; and/or mass analysing components, analytes or molecules in further samples; and/or mass analysing components, analytes or molecules in the first replicate samples; and/or mass analysing components, analytes or molecules in the second replicate samples; and/or mass analysing components, analytes or molecules in further replicate samples.
  • the step of mass analysing components, analytes or molecules preferably further comprises producing mass spectral data comprising a plurality of mass peaks.
  • the method further comprises determining the mass or mass to charge ratio of one or more of the mass peaks.
  • the method further comprises determining the signal intensity, or the integrated signal, for one or more of the mass peaks. According to the preferred embodiment the method further comprises determining the retention time for one or more of the mass peaks .
  • the method further comprises clustering mass peaks from the first sample and/or the second sample and/or further samples.
  • the method comprises clustering mass peaks from the first replicate sample and/or the second replicate sample and/or further replicate samples .
  • the method further comprises : recognising or identifying components, analytes or molecules in the first sample; and/or recognising or identifying components, analytes or molecules in the second sample; and/or recognising or identifying components, analytes or molecules in further samples; and/or recognising or identifying components, analytes or molecules in the first replicate samples; and/or recognising or identifying components, analytes or molecules in the second replicate samples; and/or recognising or identifying components, analytes or molecules in further replicate samples.
  • the components, analytes or molecules are preferably recognised or identified on the basis of mass or mass to charge ratio or accurate mass or accurate mass to charge ratio.
  • the accurate mass or mass to charge ratio of the components, analytes or molecules is preferably determined to within 20 ppm, 19 ppm, 18 ppm, 17 ppm, 16 ppm, 15 ppm, 14 ppm, 13 ppm, 12 ppm, 11 ppm, 10 ppm, 9 ppm, 8 ppm, 7 ppm, 6 ppm, 5 ppm, 4 ppm, 3 ppm, 2 ppm, 1 ppm or ⁇ 1 ppm.
  • the mass or mass to charge ratio of the components, analytes or molecules is preferably determined to within 0.01 mass units, 0.009 mass units, 0.008 mass units, 0.007 mass units, 0.006 mass units, 0.005 mass units, 0.004 mass units, 0.003 mass units, 0.002 mass units, 0.001 mass units or ⁇ 0.001 mass units.
  • Components, analytes or molecules are preferably recognised or identified on the basis of chromatographic retention time or another physico-chemical property.
  • the method further comprises fragmenting components, molecules or analytes in a collision or fragmentation cell to form, create or generate a plurality of fragment, daughter or product ions.
  • the fragment, daughter or product ions are mass analysed.
  • the method further comprises : identifying or recognising components, molecules or analytes in the first sample on the basis of fragment, daughter or product ions; and/or identifying or recognising components, molecules or analytes in the second sample on the basis of fragment, daughter or product ions; and/or identifying or recognising components, molecules or analytes in further samples on the basis of fragment, daughter or product ions.
  • the method further comprises obtaining or assigning probabilities for the correct identification of mass peaks.
  • the method further comprises determining or deriving the probabilities from a protein search procedure.
  • the method preferably further comprises assigning a constant probability of correct identification where no probability is determined or derived from a protein search procedure.
  • the method further comprises assigning the probability of correct identification as a value x% wherein preferably x is selected from the group consisting of: (i) ⁇ 5%; (ii) 5-10%; (iii) 10-15%; (iv) 15-20%; (v) 20-25%; (vi) 25-30%; (vii) 30-35%; (viii) 35-40%; (ix) 40-45%; (x) 45- 50%; (xi) 50-55%; (xii) 55-60%; (xiii) 60-65%; (xiv) 65-70%; (xv) 70-75%; (xvi) 75-80%; (xvii) 80-85%; (xviii) 85-90%; (xix) 90-95%; and (xx) > 95%.
  • x is selected from the group consisting of: (i) ⁇ 5%; (ii) 5-10%; (iii) 10-15%; (iv) 15-20%; (v) 20-25%; (vi) 25-30%; (vii) 30
  • the method further comprises assigning a constant probability of correct identification in the event that no protein search procedure is performed.
  • the method further comprises assigning the probability of correct identification as a value x% .
  • x is selected from the group consisting of: (i) ⁇ 5%; (ii) 5-10%; (iii) 10-15%; (iv) 15-20%; (v) 20-25%; (vi)
  • the method further comprises determining, formulating or assigning a prior probability distribution function Pr(L) for the relative amount or concentration L of components, molecules or analytes present in each sample.
  • the prior probability distribution function Pr(L) is proportional to exp(-L/ ⁇ ) wherein A corresponds with a maximum signal intensity recorded for a mass peak.
  • corresponds with a mean or average signal intensity recorded for mass peaks.
  • the prior probability distribution function Pr(L) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
  • the prior probability distribution function Pr(L) has a distribution with an integral equal to one.
  • the method further comprises determining, formulating or assigning a prior probability distribution function Pr(k) for the overall response factor k of each component, molecule or analyte in the sample.
  • k includes one or more of the following: (i) digestion efficiency; (ii) relative product yield; (iii) losses in delivery; (iv) ionisation efficiency; (v) transmission efficiency; and (vi) detection efficiency.
  • the prior probability distribution function Pr(k) is proportional to exp(-k/k 0 ), where k 0 is a constant.
  • kn 1.
  • the prior probability distribution function Pr(k) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
  • the prior probability distribution function Pr(k) has a distribution with an integral equal to one.
  • the method further comprises determining, formulating or assigning a prior probability distribution function Pr(h) for the relative amount of sample h of each component, molecule or analyte in each sample used in an analysis.
  • h includes one or more of the following: (i) amount of solvent added; and (ii) amount of material injected.
  • the prior probability distribution function Pr(h) is proportional to exp(-h/h 0 ), where h 0 is a constant.
  • h 0 1.
  • the prior probability distribution function Pr(h) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
  • the prior probability distribution function Pr(h) has a distribution with an integral equal to one.
  • the method further comprises determining, formulating or assigning a prior probability distribution function Pr(G) for the noise contribution factor G assumed for observed signal intensities and/or applied to predicted signal intensities.
  • G includes one or more of the following: (i) ion statistical shot noise; and (ii) Electrospray ionisation droplet statistical shot noise.
  • the prior probability distribution function Pr(G) is preferably proportional to exp(-G/G 0 ), where G 0 is a constant.
  • G 0 1.
  • the prior probability distribution function Pr(G) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
  • the prior probability distribution function Pr(G) has a distribution with an integral equal to one.
  • the method further comprises locating, determining, identifying or choosing one or more internal standards or references.
  • the one or more internal standards or references comprise one or more components, molecules or analytes which have substantially the same intensity, concentration or expression level in all of the samples.
  • the one or more internal standards or references may comprise one or more components, molecules or analytes added to each sample.
  • the one or more internal standards or references may be endogenous or exogenous to the first sample and/or the second sample and/or further samples.
  • the method preferably further comprises applying or using a Markov Chain Monte Carlo predictive procedure or investigating iteratively using a Markov Chain Monte Carlo algorithm to determine likely values for the relative concentrations L of each component, molecule or analyte in each of the samples.
  • the Markov Chain Monte Carlo predictive procedure or algorithm is selected from the group consisting of: (i) Metropolis Hastings algorithm; (ii) Gibbs Sampling algorithm; (iii) Hamiltonian Monte Carlo algorithm; and (iv) Slice Sampling algorithm.
  • the Markov Chain Monte Carlo predictive procedure or algorithm is used in conjunction with simulated annealing and/or nested sampling.
  • the method further comprises predicting what would be observed for each mass peak intensity given probability distribution functions Pr(L) and/or Pr(k) and/or Pr(h) and/or Pr(G) and/or given the probability p of correct identification. According to an embodiment the method further comprises comparing peak intensities that are predicted with those that are observed. According to an embodiment the method further comprises adjusting the value of L or the probability distribution function Pr(L). According to an embodiment the method further comprises adjusting the value of k or the probability distribution function Pr (k) . According to an embodiment the method further comprises adjusting the value of h or the probability distribution function Pr (h) . According to an embodiment the method further comprises adjusting the value of G or the probability distribution function Pr(G).
  • the method further comprises predicting what would be observed for each mass peak intensity given the adjusted probability distribution functions Pr(L) and/or Pr(k) and/or Pr (h) . and/or Pr(G) and/or given the probability p of correct identification.
  • the method preferably further comprises comparing peak intensities that are predicted with those that are observed.
  • the method further comprises accepting or rejecting adjusted probability distribution functions.
  • the method further comprises repeating or terminating the cycle of adjusting probability distribution functions and/or predicting intensities and/or comparing predicted intensities with observed intensities.
  • the method further comprises determining the ratios Li j of relative concentrations L of each component, molecule or analyte in each of the samples for every pair i,j of samples.
  • the method further comprises continuing the Markov Chain Monte Carlo predictive procedure to determine more likely values for the relative concentrations L of each component, molecule or analyte in each of the samples and the ratios Lij of the relative concentrations L.
  • the number of determinations of the ratios ij of the relative concentrations L is preferably pre-defined according to required accuracy of mean values.
  • the method preferably further comprises calculating mean values for the ratios Lij of the relative concentrations L of each component, molecule or analyte in each of the samples for every pair i,j of the samples.
  • the method further comprises calculating standard deviations and/or relative standard deviations for the ratios ij of the relative concentrations L of each component, molecule or analyte in each of the samples for every pair i,j of the samples.
  • the first sample and/or the second sample and/or further samples comprise a plurality of different biopolymers, proteins, peptides, polypeptides, oligionucleotides, oligionucleosides, amino acids, carbohydrates, sugars, lipids, fatty acids, vitamins, hormones, portions or fragments of DNA, portions or fragments of cDNA, portions or fragments of RNA, portions or fragments of mRNA, portions or fragments of tRNA, polyclonal antibodies, monoclonal antibodies, ribonucleases, enzymes, metabolites, polysaccharides, phosphorolated peptides, phosphorolated proteins, glycopeptides, glycoproteins or steroids.
  • the first sample and/or the second sample and/or further samples may comprise non-equimolar heterogeneous complex mixtures.
  • the first sample is taken from a diseased organism and the second sample is taken from a non-diseased organism;
  • the first sample is taken from a treated organism and the second sample is taken from a non- treated organism; or
  • the first sample is taken from a mutant organism and the second sample is taken from a wild type organism.
  • the method further comprises identifying components, molecules or analytes in the first sample and/or the second sample and/or further samples.
  • the components, molecules or analytes in the first sample and/or the second sample and/or further samples are preferably only identified if the intensity of the components, molecules or analytes in the first sample differs from the intensity of the components, molecules or analytes in the second sample and/or further samples by more than a predetermined amount.
  • the components, molecules or analytes in the first sample and/or the second sample and/or further samples may only identified if the average intensity of a plurality of different components, molecules or analytes in the first sample differs from the average intensity of a plurality of different components, molecules or analytes in the second sample and/or further samples by more than a predetermined amount .
  • the predetermined amount is preferably selected from the group consisting of: (i) 1%; (ii) 2%; (iii) 5%; (iv) 10%; (v) 20%; (vi) 50%; (vii) 100%; (viii) 150%; (ix) 200%; (x) 250%; (xi) 300%; (xii) 350%; (xiii) 400%; (xiv) 450%; (xv) 500%; (xvi) 1000%; (xvii) 5000%; and (xviii) 10000%.
  • the mass or mass to charge ratio of molecules, components or analytes and/or peptide digest products and/or fragment, daughter or product ions are preferably mass analysed by either: (i) a Fourier Transform ("FT") mass spectrometer; (ii) a Fourier Transform Ion Cyclotron Resonance (“FTICR”) mass spectrometer; (iii) a Time of Flight (“TOF”) mass spectrometer; (iv) an orthogonal acceleration Time of
  • the first sample and/or the second sample and/or further samples are preferably ionised by an ion source selected from the group consisting of: (i) an Electrospray ionisation (“ESI”) ion source; (ii) an Atmospheric Pressure Photo Ionisation (“APPI”) ion source; (iii) an Atmospheric Pressure Chemical Ionisation (“APCI”) ion source; (iv) a Matrix Assisted Laser Desorption Ionisation (“MALDI”) ion source; (v) a Laser Desorption Ionisation (“LDI”) ion source; (vi) an Atmospheric Pressure Ionisation (“API”) ion source; (vii) a Desorption Ionisation on Silicon (“DIOS”) ion source; (viii) an Electron Impact ("El”) ion source; (ix) a Chemical Ionisation ("CI”) ion source; (x) a Field Ionisation (“FI”) ion source; (xi) a Field De
  • a mass spectrometer comprising means arranged to probabilistically determine or quantify the relative intensity, concentration or expression level of a component, molecule or analyte in a first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in a second sample.
  • the mass spectrometer preferably further comprises a liquid chromatograph.
  • the mass spectrometer further comprises one or mass filters and/or one or more mass analysers.
  • the one or more mass filters and the one or more mass analysers are preferably selected from the group consisting of: (i) an orthogonal acceleration Time of Flight mass analyser; (ii) an axial acceleration Time of Flight mass analyser; (iii) a Paul 3D quadrupole ion trap mass analyser; (iv) a 2D or linear quadrupole ion trap mass analyser; (v) a Fourier Transform Ion Cyclotron Resonance mass analyser; (vi) a magnetic sector mass analyser; (vii) a quadrupole mass analyser; and (viii) a Penning trap mass analyser.
  • the mass spectrometer preferably further comprises an ion source.
  • the ion source may comprise a pulsed ion source or a continuous ion source.
  • the ion source may be selected from the group consisting of: (i) an Electrospray ionisation (“ESI”) ion source; (ii) an Atmospheric Pressure Photo Ionisation (“APPI”) ion source; (iii) an Atmospheric Pressure Chemical Ionisation (“APCI”) ion source; (iv) a Matrix Assisted Laser Desorption Ionisation (“MALDI”) ion source; (v) a Laser Desorption Ionisation (“LDI”) ion source; (vi) an Atmospheric Pressure Ionisation (“API”) ion source; (vii) a Desorption Ionisation on Silicon (“DIOS”) ion source; (viii) an Electron Impact ("El”) ion source; (ix) a Chemical Ionisation ("CI”) ion source; (x) a Field Ionisation (“FI”) ion source; (x
  • a method of relatively quantifying one or more molecular species among several samples comprising: dividing each sample into multiple replicate samples; for each of the replicate samples obtaining a signal for each of several tentatively identified digestion products of the molecular species in question, wherein the signal is proportional to the concentration of the parent species subject to random noise; obtaining or assigning probabilities that each tentative identification is correct; assigning a prior probability distribution function for the relative amount L of each molecular species in each sample; assigning a prior probability distribution function for the relative amount k of digestion product produced from each molecular species; assigning a prior probability distribution function for the relative amount h of sample for each replicate sample; assigning a prior probability distribution function for the noise level G in each sample; choosing an internal standard wherein the concentration of the internal standard is known to be the same in all of the replicate samples; updating the probability distribution for the relative amount L of each molecular species in each sample; obtaining samples according to the probability distribution for the relative amount L of each molecular species in
  • a method of mass spectrometry comprising: providing a first sample comprising a first mixture of components, molecules or analytes; providing a second different sample comprising a second mixture of components, molecules or analytes; and determining or quantifying the, relative intensity, concentration or expression level of a component, molecule or analyte in said first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in said second sample.
  • a mass spectrometer comprising means arranged to determine or quantify the relative intensity, concentration or expression level of a component, molecule or analyte in a first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in a second sample.
  • the preferred embodiment preferably uses a forward modelling algorithm to average over the contribution of unknown ionisation and digestion efficiencies to the measured ion count.
  • the measured ion count of a peptide can be expressed as being proportional to the product of its concentration in the original sample and a factor relating to its ionisation and digestion efficiencies.
  • a further feature of the exploration according to the preferred embodiment is that assignments to data can be switched on or off such that the presence of outliers or outlying data may be investigated. Relative concentrations of proteins or peptides in each sample can then be calculated and a percentage confidence interval given using the results of the probabilistic exploration.
  • Mass spectral data and microarray data present different challenges. The preferred embodiment is particularly concerned with data which exhibits underlying Poisson noise (counting statistics) .
  • the preferred algorithm as implemented in the method and apparatus according to the preferred embodiment may be considered as being directed to solving a problem where there are two unknown numbers A and B and it is desired to determine the ratio of B/A. Samples of A (A ⁇ ,A 2 ...A N ) and B (B ⁇ ,B 2 ...B M ) are provided.
  • the samples of A and B can either be considered as "Good” or “Bad” and each sample may be considered as coming with a probability e.g. Pr (A 3 is Good) .
  • a "Good” sample of A will be close to A in some mathematically well defined sense.
  • "Bad” samples of A could be almost anything.
  • B According to the preferred embodiment it is desired to infer the ratio B/A given only this information, and also to provide an uncertainty estimate for the ratio.
  • the numbers A and B are proportional to concentrations of peptides in solution as measured by a mass spectrometer. The following example may be considered:
  • B 100 and that B equals 500 is plausible if the last sample of A and the third sample of B are considered as being "bad" and hence are rejected as being outliers.
  • the preferred embodiment does not however immediately reject data which may initially appear to be spurious.
  • the ratio B/A as determined by the preferred embodiment and a corresponding uncertainty estimate is determined to be 5.1 ⁇ 0.1.
  • the preferred embodiment can be considered from a different perspective and can be considered as addressing a second related question. This problem can be considered to be that there are 2+K unknown numbers A, B, ki, k 2 ... k ⁇ and that some of the 2*K possible products are provided or known:
  • the preferred embodiment relates to a method and apparatus which incorporates an algorithm designed to quantify changes in abundance of an analyte compound across several physical samples containing the analyte or its products and at least one internal standard compound. Any number of replicate measurements may be available from each sample, and the data may be noisy and generally also incomplete. It is known from the outset that there is a probability of incorrect assignment of data and that some assignments are more likely to be correct than others.
  • the preferred embodiment relates to the application of a novel mathematical model of the data and to using Markov chain Monte Carlo techniques to explore the space of model parameters in such a way that changes in abundance along with associated uncertainties can be measured and determined.
  • Standard statistical techniques such as pairwise t-tests and ANOVA cannot be applied in situations where the number of measurements in each sample is different, when measurements are missing, where assignments of data are ambiguous, where measurements are experimentally correlated or where the number of measurements is very small.
  • experimental data is often noisy and incomplete and hence it is apparent that conventional known techniques are of limited use in being able to process and analyse noisy and incomplete experimental data.
  • a particular advantageous aspect of the preferred embodiment is that a normalisation step does not need to be performed as a separate step in order to determine the relative concentration of a particular analyte present in two or more separate samples .
  • the preferred embodiment allows for daughter compounds (e.g. peptides) which are associated probabilistically with parents (e.g. proteins). This is particularly useful when daughters are enzymatic digest products of proteins and wherein peptide identification information comes from tandem mass spectrometry.
  • the preferred embodiment also deals transparently with missing data. Conventional approaches, by contrast, are particularly problematic and prone to error when data is missing.
  • the preferred embodiment relates to a probabilistic or Bayesian method of measuring differences in the relative concentration of a particular analyte present in multiple different samples.
  • the preferred embodiment is particularly advantageous in being able accurately to quantify analytes present in samples even though the experimental data may be less than perfect.
  • the data may, for example, suffer from an unknown gain and/or there may be other global or poorly understood sources of noise.
  • the concentration of each analyte in the original samples may be represented in the data by one or more compounds. These compounds preferably comprise digestion product/fragments which shall be referred to hereinafter as daughters .
  • For each sample several replicate experiments are preferably performed i.e. the sample is divided up into a number of sub-samples and each sub-sample may be separately analysed.
  • running the preferred procedures on multiple replicate samples helps to improve the accuracy of the quantification steps according to the preferred 'embodiment.
  • This information may, for example, come from the analysis of fragments of peptides by tandem mass spectrometry (MSMS) wherein peptide digest products are fragmented in a collision or fragmentation cell and the resulting fragment, daughter or product ions are mass analysed.
  • MSMS tandem mass spectrometry
  • Some peptides may not have complete coverage across all experiments for reasons other than low concentration. Such reasons may be practical considerations. For example, a number of peptides with a similar mass to charge ratio may elute from the liquid chromatograph at a similar time making identification difficult.
  • the preferred embodiment enables an output to be generated which may comprise ratios of concentration for each analyte between pairs of conditions with associated uncertainties, the probability that each ratio exceeds one, a full posterior probability distribution for each ratio, or other desired statistics.
  • the preferred method assumes that the ideal measured intensity of each peptide in the mass spectral data is proportional to the concentration of the corresponding parent protein, that the measured intensities are inherently subject to at least Poisson noise (counting statistics) , and that there exists at least one measured peptide which can be assumed to be at the same concentration in each experiment for each sample (this will be referred to hereinafter as an "internal standard") .
  • the preferred method depends on constructing a model of the data taking into account the problems and requirements described above.
  • the underlying data D ⁇ for each peptide (before noise and gain) is assumed to be given by:
  • L is the concentration of protein present in a sample
  • h expresses how much sample (or what fraction of the sample) was used in a particular replicate experiment
  • k is a coefficient which expresses the efficiency with which a peptide is produced from the corresponding protein ion and also how efficiently the mass spectrometer observes the peptide ion.
  • the actual observed data D 0 is assumed to be subject to Poisson noise and an unknown gain G to allow for global scaling of the noise level.
  • the probability of observing D 0 given a particular set of model parameters L, k and h is:
  • Pr(D 0 1 L, k, h) Pr(D 0 ⁇ D u )p +Pr(D 0 1 R)(l -p) (2)
  • Equation 2 The quantity in Equation 2 will be referred to hereinafter as the likelihood.
  • the Gamma function (x) is a commonly used special function
  • p the probability that the parent analyte is correctly assigned
  • B) the background probability of observing a particular datum D 0 given an incorrect parent assignment .
  • Equation 4 reflects the fact that data attached to an incorrect assignment could be almost anything roughly consistent with the overall scale of the data ⁇ .
  • is taken to be the size of the largest datum.
  • is taken to be a probability weighted average over all data. Should the result of the probability function as detailed in Equation 4 be larger and thus more significant in calculating the likelihood (Equation 2) than the result of Equation 3, then the assignment can be considered incorrect.
  • the prior probability distributions are denoted Pr(L), Pr(h) etc.
  • the prior probability distributions encapsulate what is known about the parameters before the data is examined, ensuring that unrealistic values are not investigated.
  • an exponential form for the prior probability distributions for parameters L, h, k and G is preferably used. For example:
  • L 0 is set as being ⁇
  • k 0 is set as being 1
  • G 0 is also set as being 1.
  • Pr(L, h, k, G, Data) Pr(L) Pr(h) Pr(k) Pr(G)] ⁇ [ Pr(D 0 ⁇ L,h,k) ( 6 ) Data
  • L, h and k are vectors on the LHS of the above expression.
  • the dimension of the vector L is the number of samples multiplied by the number of analytes.
  • the dimension of the vector h is the number of experiments.
  • the dimension of the vector k is the number of daughters (e.g. peptides).
  • the total number of model parameters (including the gain and ignoring the internal standard) equals (the number of samples times the number of analytes) plus (the number of experiments) plus (the number of daughters) plus 1.
  • the joint probability distribution as given in Equation 6 is therefore a high dimensional function.
  • the quantity of interest is preferably the set of ratios of elements of the vector L and the corresponding set of uncertainties, relating to a single protein or peptide in multiple samples . It is preferred not to locate the single vector L which maximises the joint probability, but to obtain probability distributions for ratios of elements of L. An example would be Pr(L2/Ll, Data) . Such probability distributions are often asymmetrical, making the associated uncertainties difficult to express.
  • the approximation preferably keeps only four terms per protein in the fully expanded joint probability. These four terms correspond to: (i) peptides assigned correctly in all experiments; (ii) peptides assigned correctly in all but least probable experiment (lowest value of p) ; (iii) peptides assigned incorrectly in all but strongest experiment (highest value of p) ; and (iv) peptides assigned incorrectly in all experiments.
  • peptides assigned correctly in all experiments (ii) peptides assigned correctly in all but least probable experiment (lowest value of p) ; (iii) peptides assigned incorrectly in all but strongest experiment (highest value of p) ; and (iv) peptides assigned incorrectly in all experiments.
  • the solution to these problems can still become very slow when a large numbers of analytes is involved.
  • the preferred embodiment enables the exploration to proceed more efficiently by preferably analytically reducing the dimensionality of the posterior probability distribution (Equation 6) by removing all components of the vector k thus leaving one less parameter to explore and thus saving computational power, in a procedure known as marginalisation. This is possible as it is unnecessary to record the magnitude of the vector k.
  • Marginalisation is a process wherein both sides of the joint probability function (Equation 6) are integrated with respect to one of the vectors.
  • marginalisation proceeds by the integration of the joint probability function with respect to k.
  • marginalisation may proceed by the integration of the joint probability function with respect to h.
  • a further integration may be performed, such that k and h may both be removed from the joint probability function.
  • the second integration in such a method is, however, often difficult (and sometimes impossible), as the first integral may not be a true function.
  • Fig. 2 also shows the experimentally determined relationships or ratios as reconstructed according to the preferred embodiment from the noisy and incomplete data as shown in Fig. 1. It is apparent from Fig. 1 that in the sixth experiment no data was modelled as being present or obtained for the internal standard or invariant ions. However, nonetheless as can be see from Fig. 2 the ratio h6/hl has still been recovered successfully despite the lack of any internal standard in this experiment by the method of the preferred embodiment. It is to be noted that all of the sample ratios were successfully recovered and are shown in Fig.
  • Equation 3 the Poisson distribution given in Equation 3 above may be replaced by a Gaussian approximation to a Poisson distribution.
  • the exponential prior probability distribution function as presented in Equation 4 above may be replaced by a gamma distribution for any of the parameters G,L,h or k. For example, according to an embodiment:
  • Equation 3 the exponential prior probability distribution function as given in Equation 3 above may be replaced by a normal distribution for any of the parameters G, L, h or k. For example:
  • Equation 3 The exponential prior probability distribution function as given in Equation 3 may according to another embodiment be replaced by a lognormal distribution for any of the parameters G, L, h or k. For example:
  • marginalisation may proceed by integrating over h instead of k.
  • all other values i.e. G, h, k
  • nuisance parameters i.e. parameter required for the calculation but otherwise unnecessary for the output.
  • One of these values can be removed from the joint probability function by integrating both sides with respect to this value. For instance, to remove k, it is necessary to integrate with respect to k, giving:
  • Pr(E,&, G,Data) j * (Pr(L)l?r(h)?r(k)-Pr(G)Yl?r(D 0 ] L,h,k)) dk (10 )

Abstract

A mass spectrometer and method of mass spectrometry are disclosed wherein two separate samples are mass analysed and then the relative intensity, concentration or expression level of one or more components, molecules or analytes in a first sample is quantitated relative to the intensity, concentration or expression level of one or more components, molecules or analytes in a second sample. The relative quantitation is performed probabilistically without the need to resort to using internal calibrants.

Description

MASS SPECTROMETER
The present invention relates to a method of mass spectrometry and a mass spectrometer. The preferred embodiment relates to a method which allows relative quantitation of analyte compounds especially where incomplete and noisy measurements are made. The preferred embodiment is particularly applicable to the measurement and quantitation of peptide digest products or daughter compound abundances. The preferred embodiment relates to relative Bayesian quantitation of analyte/daughter groups. As will be discussed in more detail below, the preferred embodiment relates to a probabilistic or Bayesian approach to determining the relative quantitation of a component, molecule or analyte present in two or more samples . By way of background, Bayesian probability theory handles probabilities of statements. Probabilities tell how certain those statements are true. For example, a probability of 1 means that there is absolute certainty. A probability of 0 also means that there is absolute certainty, but absolute certainty that the statement is false. A probability of 0.5 means that there is maximum uncertainty whether the statement is true or false. Changing probabilities when getting new information is an important aspect of Bayesian reasoning. So called Bayes rule defines how a rational agent changes its beliefs when it gets new information (evidence) . Bayesian probabilities or certainties are always conditional. This means that probabilities are estimated in the context of some background assumptions. Conditional probabilities are usually written using the notation P (Thing I Assumption) . The probabilities are numbers between zero and one that tell how certain it is that Thing is true when it is believed that the Assumption is true. Conditional probabilities are often written in the form P(D|M) or P(M|D), where M is dependency model and D is data. Accordingly, P(D|M) means the probability of obtaining data D if it is believed that model M is the true model. Likewise, P(M|D) means the probability that the model M is the true model given the data D. Sometimes probabilities are presented just as P (M) or P(D) but these are generally considered to be imprecise Bayesian notations, since all the probabilities are actually conditional. However, sometimes, when all the terms have the same background assumptions then it may not be necessary to repeat them. In theory, probabilities should be written in the form P(D|M,U) and P(M|D,U) and P(M|U) and P(D|U), where U is a set of background assumptions. Expert systems often calculate the probabilities of inter-dependent events by giving each parent event a weighting. Bayesian Belief Networks are considered to provide a mathematically correct and therefore more accurate method of measuring the effects of events on each other. The mathematics involved enables calculations to be made in both directions. So it is possible, for example, to find out which event was the most likely cause of another. The following Product Rule of probability for independent events is well known: p(AB) = p(A) * p(B)
where p(AB) means the probability of A and B happening. This is a special case of the following Product Rule for dependent events, where p(A|B) means the probability of A given that B has already occurred: p(AB) = p(A) * p(B|A) p(AB) = p(B) * p(A|B)
So because: p(A) p(B|A) = p(B) p(A|B)
Then: p(A|B) = (p(A)*p(B|A))/p(B) The above equation is a simpler version of Bayes' Theorem. This equation gives the probability of A happening given that B has happened, calculated in terms of other probabilities which are known. Bayes' theorem can be summarised as:
Figure imgf000004_0001
Η0 can be taken to be a hypothesis which may have been developed ab initio or induced from some preceding set of observations, but before the new observation or evidence E. The term P(Η0) is called the prior probability of H0. The term P(E|H0) is the conditional probability of seeing the observation E given that the hypothesis H0 is true - as a function of H0 given E, it is called the likelihood function. The term P(E) is called the marginal probability of E and it is a normalizing constant and can be calculated as the sum of all mutually exclusive hypotheses:
∑R(E|H (H;)
The term P(Η0|Ε) is called the posterior probability of H0 given E. The scaling factor P(E|H0)/P(E) gives a measure of the impact that the observation has on belief in the hypothesis. If it is unlikely that the observation will be made unless the particular hypothesis being considered is true, then this scaling factor will be large. Multiplying this scaling factor by the prior probability of the hypothesis being correct gives a measure of the posterior probability of the hypothesis being correct given the observation. The keys to making the inference work is the assigning of the prior probabilities given to the hypothesis and possible alternatives, and the calculation of the conditional probabilities of the observation under different hypotheses. In the analysis of multiple biological samples or a complex mixture of biological samples it may be desired to compare the relative concentrations of component compounds . For example, it may be desired to see whether or not a protein or peptide is expressed differently in two or more different samples. One sample may, for example, comprise a sample taken from a healthy organism, whilst the other sample may comprise a sample taken from a patient. If a particular protein or peptide is expressed to a significantly greater or lesser extent in the patient sample relative to the sample taken from a healthy organism (i.e. control sample) then this may be indicative of a disease state. Complex mixtures of biological samples can be analysed using a mass spectrometer preferably in combination with a liquid chromatograph . It is known to use the ion intensity or ion count rate recorded by a mass spectrometer as a measure of the concentration of each peptide. The data relating to each sample is, however, subject to various systematic errors such as injection volume errors as well as various non-systematic effects such as counting statistics. Due to the complexity of the samples and the sometimes low concentrations of various components, molecules or analytes in the samples, the data can sometimes or often be incomplete. The data may also include interferences. As a result the assignment of data to components, molecules or analytes or the identification of components, molecules or analytes may be uncertain. According to conventional approaches these factors can cause results that may appear to be anomalous and hence are thus discarded. As a result, it may not always be possible to quantify some components, molecules or analytes present in two or more samples and/or some data may be rejected out of hand when in fact it may not be anomalous . It is therefore desired to provide an improved way of being able to quantify components, molecules or analytes present in two or more separate samples when noisy and incomplete measurements of the samples are made. According to an aspect of the present invention there is provided a method of mass spectrometry comprising: providing a first sample comprising a first mixture of components, molecules or analytes; providing a second different sample comprising a second mixture of components, molecules or analytes; and probabilistically determining or quantifying the relative intensity, concentration or expression level of a component, molecule or analyte in the first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in the second sample. Although the preferred embodiment may just relate to two separate samples, according to a particularly preferred embodiment a plurality of further samples each comprising a mixture of components, molecules or analytes may be provided. The components, molecules or analytes preferably comprise proteins, protein digest products, peptides or fragments of peptides . The components, molecules or analytes in the first mixture are preferably the same species as the components, molecules or analytes in the second mixture and/or components, molecules or analytes in further mixtures. However, alternatively, the components, molecules or analytes in the first mixture may be different species to the components, molecules or analytes in the second mixture and/or to components, molecules or analytes in further mixtures. The method preferably further comprises: digesting the first mixture of components, molecules or analytes; and/or digesting the second mixture of components, molecules or analytes; and/or digesting further mixtures of components, molecules or analytes. Preferably, the first mixture of components, molecules or analytes is digested to form a first complex mixture; and/or the second mixture of components, molecules or analytes is digested to form a second complex mixture; and/or further mixtures of components, molecules or analytes are digested to form further complex mixtures. The complex mixtures preferably comprise complex mixtures of peptides or protein digest products . According to the preferred embodiment the method further comprises : dividing the first sample into one or more first replicate samples; and/or dividing the second sample into one or more second replicate samples; and/or dividing further samples into one or more further replicate samples; and/or dividing the first complex mixture into one or more first replicate samples; and/or dividing the second complex mixture into one or more second replicate samples; and/or dividing the further complex mixtures into one or more further replicate samples . According to an embodiment the method further comprises: separating components, analytes or molecules in the first sample by means of a separation process; and/or separating components, analytes or molecules in the second sample by means of a separation process; and/or separating components, analytes or molecules in further samples by means of a separation process; and/or separating components, analytes or molecules in the first replicate samples by means of a separation process; and/or separating components, analytes or molecules in the second replicate samples by means of a separation process; and/or separating components, analytes or molecules in further replicate samples by means of a separation process . The separation process preferably comprises liquid chromatography. According to an embodiment the separation process may comprise: (i) High Performance Liquid Chromatography ("HPLC"); (ii) anion exchange; (iii) anion exchange chromatography; (iv) cation exchange; (v) cation exchange chromatography; (vi) ion pair reversed-phase chromatography; (vii) chromatography; (viii) single dimensional electrophoresis; (ix) multi-dimensional electrophoresis; (x) size exclusion; (xi) affinity; (xii) revere phase chromatography; (xiii) Capillary Electrophoresis Chromatography ("CEC"); (xiv) electrophoresis; (xv) ion mobility separation; (xvi) Field Asymmetric Ion Mobility Separation or Spectrometry ("FAIMS"); or (xvi) capillary electrophoresis. The method preferably further comprises : ionising components, analytes or molecules in the first sample; and/or ionising components, analytes or molecules in the second sample; and/or ionising components, analytes or molecules in further samples; and/or ionising components, analytes or molecules in the first replicate samples; and/or ionising components, analytes or molecules in the second replicate samples; and/or ionising components, analytes or molecules in further replicate samples. The method preferably further comprises : mass analysing components, analytes or molecules in the first sample; and/or mass analysing components, analytes or molecules in the second sample; and/or mass analysing components, analytes or molecules in further samples; and/or mass analysing components, analytes or molecules in the first replicate samples; and/or mass analysing components, analytes or molecules in the second replicate samples; and/or mass analysing components, analytes or molecules in further replicate samples. The step of mass analysing components, analytes or molecules preferably further comprises producing mass spectral data comprising a plurality of mass peaks. Preferably, the method further comprises determining the mass or mass to charge ratio of one or more of the mass peaks. Preferably, the method further comprises determining the signal intensity, or the integrated signal, for one or more of the mass peaks. According to the preferred embodiment the method further comprises determining the retention time for one or more of the mass peaks . Preferably, the method further comprises clustering mass peaks from the first sample and/or the second sample and/or further samples. Preferably, the method comprises clustering mass peaks from the first replicate sample and/or the second replicate sample and/or further replicate samples . According to an embodiment the method further comprises : recognising or identifying components, analytes or molecules in the first sample; and/or recognising or identifying components, analytes or molecules in the second sample; and/or recognising or identifying components, analytes or molecules in further samples; and/or recognising or identifying components, analytes or molecules in the first replicate samples; and/or recognising or identifying components, analytes or molecules in the second replicate samples; and/or recognising or identifying components, analytes or molecules in further replicate samples. The components, analytes or molecules are preferably recognised or identified on the basis of mass or mass to charge ratio or accurate mass or accurate mass to charge ratio. The accurate mass or mass to charge ratio of the components, analytes or molecules is preferably determined to within 20 ppm, 19 ppm, 18 ppm, 17 ppm, 16 ppm, 15 ppm, 14 ppm, 13 ppm, 12 ppm, 11 ppm, 10 ppm, 9 ppm, 8 ppm, 7 ppm, 6 ppm, 5 ppm, 4 ppm, 3 ppm, 2 ppm, 1 ppm or < 1 ppm. The mass or mass to charge ratio of the components, analytes or molecules is preferably determined to within 0.01 mass units, 0.009 mass units, 0.008 mass units, 0.007 mass units, 0.006 mass units, 0.005 mass units, 0.004 mass units, 0.003 mass units, 0.002 mass units, 0.001 mass units or < 0.001 mass units. Components, analytes or molecules are preferably recognised or identified on the basis of chromatographic retention time or another physico-chemical property. According to an embodiment, the method further comprises fragmenting components, molecules or analytes in a collision or fragmentation cell to form, create or generate a plurality of fragment, daughter or product ions. Preferably, the fragment, daughter or product ions are mass analysed. According to an embodiment the method further comprises : identifying or recognising components, molecules or analytes in the first sample on the basis of fragment, daughter or product ions; and/or identifying or recognising components, molecules or analytes in the second sample on the basis of fragment, daughter or product ions; and/or identifying or recognising components, molecules or analytes in further samples on the basis of fragment, daughter or product ions. According to an embodiment the method further comprises obtaining or assigning probabilities for the correct identification of mass peaks. Preferably, the method further comprises determining or deriving the probabilities from a protein search procedure. The method preferably further comprises assigning a constant probability of correct identification where no probability is determined or derived from a protein search procedure. Preferably, the method further comprises assigning the probability of correct identification as a value x% wherein preferably x is selected from the group consisting of: (i) < 5%; (ii) 5-10%; (iii) 10-15%; (iv) 15-20%; (v) 20-25%; (vi) 25-30%; (vii) 30-35%; (viii) 35-40%; (ix) 40-45%; (x) 45- 50%; (xi) 50-55%; (xii) 55-60%; (xiii) 60-65%; (xiv) 65-70%; (xv) 70-75%; (xvi) 75-80%; (xvii) 80-85%; (xviii) 85-90%; (xix) 90-95%; and (xx) > 95%. According to an embodiment the method further comprises assigning a constant probability of correct identification in the event that no protein search procedure is performed. Preferably, the method further comprises assigning the probability of correct identification as a value x% . Preferably, x is selected from the group consisting of: (i) < 5%; (ii) 5-10%; (iii) 10-15%; (iv) 15-20%; (v) 20-25%; (vi)
25-30%; (vii) 30-35%; (viii) 35-40%; (ix) 40-45%; (x) 45-50%; (xi) 50-55%; (xii) 55-60%; (xiii) 60-65%; (xiv) 65-70%; (xv) 70-75%; (xvi) 75-80%; (xvii) 80-85%; (xviii) 85-90%; (xix) 90- 95%; and (xx) > 95%. According to an embodiment the method further comprises determining, formulating or assigning a prior probability distribution function Pr(L) for the relative amount or concentration L of components, molecules or analytes present in each sample. Preferably, the prior probability distribution function Pr(L) is proportional to exp(-L/Λ) wherein A corresponds with a maximum signal intensity recorded for a mass peak. Preferably, Λ corresponds with a mean or average signal intensity recorded for mass peaks. According to an embodiment the prior probability distribution function Pr(L) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution. Preferably, the prior probability distribution function Pr(L) has a distribution with an integral equal to one. According to an embodiment the method further comprises determining, formulating or assigning a prior probability distribution function Pr(k) for the overall response factor k of each component, molecule or analyte in the sample. Preferably, k includes one or more of the following: (i) digestion efficiency; (ii) relative product yield; (iii) losses in delivery; (iv) ionisation efficiency; (v) transmission efficiency; and (vi) detection efficiency. According to an embodiment the prior probability distribution function Pr(k) is proportional to exp(-k/k0), where k0 is a constant. Preferably, kn = 1. According to an embodiment the prior probability distribution function Pr(k) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution. Preferably, the prior probability distribution function Pr(k) has a distribution with an integral equal to one. According to an embodiment the method further comprises determining, formulating or assigning a prior probability distribution function Pr(h) for the relative amount of sample h of each component, molecule or analyte in each sample used in an analysis. Preferably, h includes one or more of the following: (i) amount of solvent added; and (ii) amount of material injected. Preferably, the prior probability distribution function Pr(h) is proportional to exp(-h/h0), where h0 is a constant. Preferably, h0 = 1. According to an embodiment the prior probability distribution function Pr(h) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution. Preferably, the prior probability distribution function Pr(h) has a distribution with an integral equal to one. According to an embodiment the method further comprises determining, formulating or assigning a prior probability distribution function Pr(G) for the noise contribution factor G assumed for observed signal intensities and/or applied to predicted signal intensities. Preferably, G includes one or more of the following: (i) ion statistical shot noise; and (ii) Electrospray ionisation droplet statistical shot noise. The prior probability distribution function Pr(G) is preferably proportional to exp(-G/G0), where G0 is a constant. Preferably, G0 = 1. According to an embodiment the prior probability distribution function Pr(G) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution. Preferably, the prior probability distribution function Pr(G) has a distribution with an integral equal to one. According to an embodiment the method further comprises locating, determining, identifying or choosing one or more internal standards or references. Preferably, the one or more internal standards or references comprise one or more components, molecules or analytes which have substantially the same intensity, concentration or expression level in all of the samples. The one or more internal standards or references may comprise one or more components, molecules or analytes added to each sample. The one or more internal standards or references may be endogenous or exogenous to the first sample and/or the second sample and/or further samples. The method preferably further comprises applying or using a Markov Chain Monte Carlo predictive procedure or investigating iteratively using a Markov Chain Monte Carlo algorithm to determine likely values for the relative concentrations L of each component, molecule or analyte in each of the samples. Preferably, the Markov Chain Monte Carlo predictive procedure or algorithm is selected from the group consisting of: (i) Metropolis Hastings algorithm; (ii) Gibbs Sampling algorithm; (iii) Hamiltonian Monte Carlo algorithm; and (iv) Slice Sampling algorithm. According to an embodiment the Markov Chain Monte Carlo predictive procedure or algorithm is used in conjunction with simulated annealing and/or nested sampling. According to an embodiment the method further comprises predicting what would be observed for each mass peak intensity given probability distribution functions Pr(L) and/or Pr(k) and/or Pr(h) and/or Pr(G) and/or given the probability p of correct identification. According to an embodiment the method further comprises comparing peak intensities that are predicted with those that are observed. According to an embodiment the method further comprises adjusting the value of L or the probability distribution function Pr(L). According to an embodiment the method further comprises adjusting the value of k or the probability distribution function Pr (k) . According to an embodiment the method further comprises adjusting the value of h or the probability distribution function Pr (h) . According to an embodiment the method further comprises adjusting the value of G or the probability distribution function Pr(G). According to an embodiment the method further comprises predicting what would be observed for each mass peak intensity given the adjusted probability distribution functions Pr(L) and/or Pr(k) and/or Pr (h) . and/or Pr(G) and/or given the probability p of correct identification. The method preferably further comprises comparing peak intensities that are predicted with those that are observed. According to an embodiment the method further comprises accepting or rejecting adjusted probability distribution functions. Preferably, the method further comprises repeating or terminating the cycle of adjusting probability distribution functions and/or predicting intensities and/or comparing predicted intensities with observed intensities. Preferably, the method further comprises determining the ratios Lij of relative concentrations L of each component, molecule or analyte in each of the samples for every pair i,j of samples. According to an embodiment the method further comprises continuing the Markov Chain Monte Carlo predictive procedure to determine more likely values for the relative concentrations L of each component, molecule or analyte in each of the samples and the ratios Lij of the relative concentrations L. The number of determinations of the ratios ij of the relative concentrations L is preferably pre-defined according to required accuracy of mean values. The method preferably further comprises calculating mean values for the ratios Lij of the relative concentrations L of each component, molecule or analyte in each of the samples for every pair i,j of the samples. According to an embodiment the method further comprises calculating standard deviations and/or relative standard deviations for the ratios ij of the relative concentrations L of each component, molecule or analyte in each of the samples for every pair i,j of the samples. According to an embodiment the first sample and/or the second sample and/or further samples comprise a plurality of different biopolymers, proteins, peptides, polypeptides, oligionucleotides, oligionucleosides, amino acids, carbohydrates, sugars, lipids, fatty acids, vitamins, hormones, portions or fragments of DNA, portions or fragments of cDNA, portions or fragments of RNA, portions or fragments of mRNA, portions or fragments of tRNA, polyclonal antibodies, monoclonal antibodies, ribonucleases, enzymes, metabolites, polysaccharides, phosphorolated peptides, phosphorolated proteins, glycopeptides, glycoproteins or steroids. The first sample and/or the second sample and/or further samples may comprise at least 2, 5, 10, 20, 30, 40, 50, 60,
70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, or 5000 components, molecules or analytes having different identities or comprising different species . The first sample and/or the second sample and/or further samples may comprise non-equimolar heterogeneous complex mixtures. Preferably, either: (i) the first sample is taken from a diseased organism and the second sample is taken from a non-diseased organism; (ii) the first sample is taken from a treated organism and the second sample is taken from a non- treated organism; or (iii) the first sample is taken from a mutant organism and the second sample is taken from a wild type organism. According to an embodiment the method further comprises identifying components, molecules or analytes in the first sample and/or the second sample and/or further samples The components, molecules or analytes in the first sample and/or the second sample and/or further samples are preferably only identified if the intensity of the components, molecules or analytes in the first sample differs from the intensity of the components, molecules or analytes in the second sample and/or further samples by more than a predetermined amount. The components, molecules or analytes in the first sample and/or the second sample and/or further samples may only identified if the average intensity of a plurality of different components, molecules or analytes in the first sample differs from the average intensity of a plurality of different components, molecules or analytes in the second sample and/or further samples by more than a predetermined amount . The predetermined amount is preferably selected from the group consisting of: (i) 1%; (ii) 2%; (iii) 5%; (iv) 10%; (v) 20%; (vi) 50%; (vii) 100%; (viii) 150%; (ix) 200%; (x) 250%; (xi) 300%; (xii) 350%; (xiii) 400%; (xiv) 450%; (xv) 500%; (xvi) 1000%; (xvii) 5000%; and (xviii) 10000%. The mass or mass to charge ratio of molecules, components or analytes and/or peptide digest products and/or fragment, daughter or product ions are preferably mass analysed by either: (i) a Fourier Transform ("FT") mass spectrometer; (ii) a Fourier Transform Ion Cyclotron Resonance ("FTICR") mass spectrometer; (iii) a Time of Flight ("TOF") mass spectrometer; (iv) an orthogonal acceleration Time of
Flight ("oaTOF") mass spectrometer; (v) a magnetic sector mass spectrometer; (vi) a quadrupole mass analyser; (vii) an ion trap mass analyser; and (viii) a Fourier Transform orbitrap, an electrostatic Ion Cyclotron Resonance mass spectrometer or an electrostatic Fourier Transform mass spectrometer. The first sample and/or the second sample and/or further samples are preferably ionised by an ion source selected from the group consisting of: (i) an Electrospray ionisation ("ESI") ion source; (ii) an Atmospheric Pressure Photo Ionisation ("APPI") ion source; (iii) an Atmospheric Pressure Chemical Ionisation ("APCI") ion source; (iv) a Matrix Assisted Laser Desorption Ionisation ("MALDI") ion source; (v) a Laser Desorption Ionisation ("LDI") ion source; (vi) an Atmospheric Pressure Ionisation ("API") ion source; (vii) a Desorption Ionisation on Silicon ("DIOS") ion source; (viii) an Electron Impact ("El") ion source; (ix) a Chemical Ionisation ("CI") ion source; (x) a Field Ionisation ("FI") ion source; (xi) a Field Desorption ("FD") ion source; (xii) an Inductively Coupled Plasma ("ICP") ion source; (xiii) a Fast Atom Bombardment ("FAB") ion source; (xiv) a Liquid Secondary Ion Mass Spectrometry ("LSIMS") ion source; (xv) a Desorption Electrospray Ionisation ("DESI") ion source; and (xvi) a Nickel-63 radioactive ion source. According to an aspect of the present invention there is provided a mass spectrometer comprising means arranged to probabilistically determine or quantify the relative intensity, concentration or expression level of a component, molecule or analyte in a first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in a second sample. The mass spectrometer preferably further comprises a liquid chromatograph. According to an embodiment the mass spectrometer further comprises one or mass filters and/or one or more mass analysers. The one or more mass filters and the one or more mass analysers are preferably selected from the group consisting of: (i) an orthogonal acceleration Time of Flight mass analyser; (ii) an axial acceleration Time of Flight mass analyser; (iii) a Paul 3D quadrupole ion trap mass analyser; (iv) a 2D or linear quadrupole ion trap mass analyser; (v) a Fourier Transform Ion Cyclotron Resonance mass analyser; (vi) a magnetic sector mass analyser; (vii) a quadrupole mass analyser; and (viii) a Penning trap mass analyser. The mass spectrometer preferably further comprises an ion source. The ion source may comprise a pulsed ion source or a continuous ion source. The ion source may be selected from the group consisting of: (i) an Electrospray ionisation ("ESI") ion source; (ii) an Atmospheric Pressure Photo Ionisation ("APPI") ion source; (iii) an Atmospheric Pressure Chemical Ionisation ("APCI") ion source; (iv) a Matrix Assisted Laser Desorption Ionisation ("MALDI") ion source; (v) a Laser Desorption Ionisation ("LDI") ion source; (vi) an Atmospheric Pressure Ionisation ("API") ion source; (vii) a Desorption Ionisation on Silicon ("DIOS") ion source; (viii) an Electron Impact ("El") ion source; (ix) a Chemical Ionisation ("CI") ion source; (x) a Field Ionisation ("FI") ion source; (xi) a Field Desorption ("FD") ion source; (xii) an Inductively Coupled Plasma ("ICP") ion source; (xiii) a Fast Atom Bombardment ("FAB") ion source; (xiv) a Liquid Secondary Ion Mass Spectrometry ("LSIMS") ion source; (xv) a Desorption Electrospray Ionisation ("DESI") ion source; and (xvi) a Nickel-63 radioactive ion source. According to an aspect of the present invention there is provided a method of relatively quantifying one or more molecular species among several samples, the method comprising: dividing each sample into multiple replicate samples; for each of the replicate samples obtaining a signal for each of several tentatively identified digestion products of the molecular species in question, wherein the signal is proportional to the concentration of the parent species subject to random noise; obtaining or assigning probabilities that each tentative identification is correct; assigning a prior probability distribution function for the relative amount L of each molecular species in each sample; assigning a prior probability distribution function for the relative amount k of digestion product produced from each molecular species; assigning a prior probability distribution function for the relative amount h of sample for each replicate sample; assigning a prior probability distribution function for the noise level G in each sample; choosing an internal standard wherein the concentration of the internal standard is known to be the same in all of the replicate samples; updating the probability distribution for the relative amount L of each molecular species in each sample; obtaining samples according to the probability distribution for the relative amount L of each molecular species in each sample of a monotonic function of the ratios L_i to L__j for every distinct pair i,j of the replicate samples; and calculating a mean value and standard deviation of the function for each of the pairs . According to an aspect of the present invention there is provided a method of mass spectrometry comprising: providing a first sample comprising a first mixture of components, molecules or analytes; providing a second different sample comprising a second mixture of components, molecules or analytes; and determining or quantifying the, relative intensity, concentration or expression level of a component, molecule or analyte in said first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in said second sample. According to an aspect of the present invention there is provided a mass spectrometer comprising means arranged to determine or quantify the relative intensity, concentration or expression level of a component, molecule or analyte in a first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in a second sample. The preferred embodiment preferably uses a forward modelling algorithm to average over the contribution of unknown ionisation and digestion efficiencies to the measured ion count. The measured ion count of a peptide can be expressed as being proportional to the product of its concentration in the original sample and a factor relating to its ionisation and digestion efficiencies. Values of concentration and digestion/ionisation efficiency are preferably explored for each peptide and likelihoods are preferably calculated for each result using supplied probabilities of the compounds present in the samples . The likelihood calculation preferably does not advantageously require missing data to be interpolated or otherwise filled in. This is in contrast to conventional approaches. A further feature of the exploration according to the preferred embodiment is that assignments to data can be switched on or off such that the presence of outliers or outlying data may be investigated. Relative concentrations of proteins or peptides in each sample can then be calculated and a percentage confidence interval given using the results of the probabilistic exploration. Mass spectral data and microarray data present different challenges. The preferred embodiment is particularly concerned with data which exhibits underlying Poisson noise (counting statistics) . This is particularly appropriate when an analytical instrument determines the abundance of daughter compounds (e.g. peptides or peptide digest products) and reports a number of events (e.g. intensity). The events may, for example, relate to the number of ion arrivals in a quadrupole Time of Flight mass spectrometer. However, this is not appropriate to continuous quantities such as the colour/brightness of a microarray spot. In its simplest form, the preferred algorithm as implemented in the method and apparatus according to the preferred embodiment may be considered as being directed to solving a problem where there are two unknown numbers A and B and it is desired to determine the ratio of B/A. Samples of A (Aι,A2...AN) and B (Bι,B2...BM) are provided. In general, N and M are not equal although the cases N = 1 and M = 1 are permitted. The samples of A and B can either be considered as "Good" or "Bad" and each sample may be considered as coming with a probability e.g. Pr (A3 is Good) . A "Good" sample of A will be close to A in some mathematically well defined sense. "Bad" samples of A could be almost anything. The same applies to B. According to the preferred embodiment it is desired to infer the ratio B/A given only this information, and also to provide an uncertainty estimate for the ratio. In the preferred embodiment, the numbers A and B are proportional to concentrations of peptides in solution as measured by a mass spectrometer. The following example may be considered:
Sample A Sample B
Measurement Prob Measurement Prob
100 0.91 510 0.87
96 0.89 487 0.96
107 0.92 97 0.63
111 0.98 530 0.78
98 0.91
104 0.97
111 0.83
104 0.88
246 0.89 From the above data it may be considered that A equals
100 and that B equals 500 is plausible if the last sample of A and the third sample of B are considered as being "bad" and hence are rejected as being outliers. The preferred embodiment does not however immediately reject data which may initially appear to be spurious. The ratio B/A as determined by the preferred embodiment and a corresponding uncertainty estimate is determined to be 5.1 ± 0.1. The preferred embodiment can be considered from a different perspective and can be considered as addressing a second related question. This problem can be considered to be that there are 2+K unknown numbers A, B, ki, k2 ... kκ and that some of the 2*K possible products are provided or known:
A*kι A*k2 ... A*kκ B*kι B*k2 ... B*kκ It can be considered that any number of samples of any of these products are provided. Samples can either be considered to be "Good" or "Bad" in the same sense as above, and each sample again comes with an associated probability. The problem is again to estimate B/A and provide an uncertainty estimate. According to the preferred embodiment the numbers A and B are proportional to concentrations of intact proteins in solution prior to digestion, and the other ' unknowns ki are related to the digestion and ionisation characteristics of the proteins tryptic peptides. The coefficients ki are not of particular interest and it is not necessary actually to calculate them. The preferred embodiment relates to a method and apparatus which incorporates an algorithm designed to quantify changes in abundance of an analyte compound across several physical samples containing the analyte or its products and at least one internal standard compound. Any number of replicate measurements may be available from each sample, and the data may be noisy and generally also incomplete. It is known from the outset that there is a probability of incorrect assignment of data and that some assignments are more likely to be correct than others. The preferred embodiment relates to the application of a novel mathematical model of the data and to using Markov chain Monte Carlo techniques to explore the space of model parameters in such a way that changes in abundance along with associated uncertainties can be measured and determined. Standard statistical techniques such as pairwise t-tests and ANOVA cannot be applied in situations where the number of measurements in each sample is different, when measurements are missing, where assignments of data are ambiguous, where measurements are experimentally correlated or where the number of measurements is very small. As will be appreciated by those skilled in the art, in the real world experimental data is often noisy and incomplete and hence it is apparent that conventional known techniques are of limited use in being able to process and analyse noisy and incomplete experimental data. A particular advantageous aspect of the preferred embodiment is that a normalisation step does not need to be performed as a separate step in order to determine the relative concentration of a particular analyte present in two or more separate samples . Multiple experiments are preferably performed and an analyte for which the concentration is the same in all experiments is preferably used as an internal standard. The preferred embodiment allows for daughter compounds (e.g. peptides) which are associated probabilistically with parents (e.g. proteins). This is particularly useful when daughters are enzymatic digest products of proteins and wherein peptide identification information comes from tandem mass spectrometry. The preferred embodiment also deals transparently with missing data. Conventional approaches, by contrast, are particularly problematic and prone to error when data is missing. The preferred embodiment relates to a probabilistic or Bayesian method of measuring differences in the relative concentration of a particular analyte present in multiple different samples. The preferred embodiment is particularly advantageous in being able accurately to quantify analytes present in samples even though the experimental data may be less than perfect. The data may, for example, suffer from an unknown gain and/or there may be other global or poorly understood sources of noise. The concentration of each analyte in the original samples may be represented in the data by one or more compounds. These compounds preferably comprise digestion product/fragments which shall be referred to hereinafter as daughters . For each sample several replicate experiments are preferably performed i.e. the sample is divided up into a number of sub-samples and each sub-sample may be separately analysed. As will be appreciated, running the preferred procedures on multiple replicate samples helps to improve the accuracy of the quantification steps according to the preferred 'embodiment. However, it is not essential that samples be divided into a number of replicate samples and that each replicate sample be analysed separately. It is contemplated that different (and unknown) quantities of a sample may be used in each replicate experiment so that there may be significant variations in the data among the replicate experiments . The identity of each peptide may be in question, but according to the preferred embodiment a probability pij = Pr (Protein is analyte j given data associated with peptide i) is either available or is set to some uniform value. This information may, for example, come from the analysis of fragments of peptides by tandem mass spectrometry (MSMS) wherein peptide digest products are fragmented in a collision or fragmentation cell and the resulting fragment, daughter or product ions are mass analysed. Some peptides may not have complete coverage across all experiments for reasons other than low concentration. Such reasons may be practical considerations. For example, a number of peptides with a similar mass to charge ratio may elute from the liquid chromatograph at a similar time making identification difficult. The preferred embodiment enables an output to be generated which may comprise ratios of concentration for each analyte between pairs of conditions with associated uncertainties, the probability that each ratio exceeds one, a full posterior probability distribution for each ratio, or other desired statistics. The preferred method assumes that the ideal measured intensity of each peptide in the mass spectral data is proportional to the concentration of the corresponding parent protein, that the measured intensities are inherently subject to at least Poisson noise (counting statistics) , and that there exists at least one measured peptide which can be assumed to be at the same concentration in each experiment for each sample (this will be referred to hereinafter as an "internal standard") . The preferred method depends on constructing a model of the data taking into account the problems and requirements described above. The underlying data Dπ for each peptide (before noise and gain) is assumed to be given by:
Dυ = Lhk (l)
where L is the concentration of protein present in a sample, h expresses how much sample (or what fraction of the sample) was used in a particular replicate experiment and k is a coefficient which expresses the efficiency with which a peptide is produced from the corresponding protein ion and also how efficiently the mass spectrometer observes the peptide ion. The actual observed data D0 is assumed to be subject to Poisson noise and an unknown gain G to allow for global scaling of the noise level. For a particular peptide ion, the probability of observing D0 given a particular set of model parameters L, k and h is:
Pr(D01 L, k, h)=Pr(D0 \Du)p +Pr(D01 R)(l -p) (2)
where :
Figure imgf000024_0001
and is a modified Poisson distribution which captures the degree of agreement of the predicted theoretical data with the actual experimentally observed data. The quantity in Equation 2 will be referred to hereinafter as the likelihood. With reference to Equations 2 and 3 above, the Gamma function (x) is a commonly used special function, p is the probability that the parent analyte is correctly assigned and Pr(D0|B) is the background probability of observing a particular datum D0 given an incorrect parent assignment . According to the preferred embodiment :
Figure imgf000025_0001
Equation 4 reflects the fact that data attached to an incorrect assignment could be almost anything roughly consistent with the overall scale of the data Λ . In a preferred embodiment, Λ is taken to be the size of the largest datum. In a less preferred embodiment, Λ is taken to be a probability weighted average over all data. Should the result of the probability function as detailed in Equation 4 be larger and thus more significant in calculating the likelihood (Equation 2) than the result of Equation 3, then the assignment can be considered incorrect. In order to complete the probabilistic formulation of the problem, it is necessary to specify prior probability distributions for each of the parameters L, h, k and G. The prior probability distributions are denoted Pr(L), Pr(h) etc. The prior probability distributions encapsulate what is known about the parameters before the data is examined, ensuring that unrealistic values are not investigated. In the preferred embodiment an exponential form for the prior probability distributions for parameters L, h, k and G is preferably used. For example:
Vr(L extfrL/L0) (5)
Figure imgf000025_0002
There are various different possible prescriptions for choosing L0 in Equation 5. According to the preferred embodiment L0 is set as being Λ, k0 is set as being 1, h0 is set as being 1 and G0 is also set as being 1. With these parameters defined, particular choices of prior probability distributions can be linked with the calculated likelihood for given values of L, h and k to give the j oint probability distribution, which can be expressed as :
Pr(L, h, k, G, Data) = Pr(L) Pr(h) Pr(k) Pr(G)]^[ Pr(D0 \ L,h,k) ( 6 ) Data
where L, h and k are vectors on the LHS of the above expression. The dimension of the vector L is the number of samples multiplied by the number of analytes. The dimension of the vector h is the number of experiments. The dimension of the vector k is the number of daughters (e.g. peptides).
According, the total number of model parameters (including the gain and ignoring the internal standard) equals (the number of samples times the number of analytes) plus (the number of experiments) plus (the number of daughters) plus 1. The joint probability distribution as given in Equation 6 is therefore a high dimensional function. The quantity of interest, however, is preferably the set of ratios of elements of the vector L and the corresponding set of uncertainties, relating to a single protein or peptide in multiple samples . It is preferred not to locate the single vector L which maximises the joint probability, but to obtain probability distributions for ratios of elements of L. An example would be Pr(L2/Ll, Data) . Such probability distributions are often asymmetrical, making the associated uncertainties difficult to express. Thus it is preferred to express the probability distributions for monotonic functions of ratios of elements of L, for instance natural logarithms of ratios of elements of L. These distributions allow estimates of the ratios to be quoted with associated uncertainties or any other desired statistics. Appropriate methods to perform this exploration are known to those skilled in the art. General tools exist, for example, for solving this kind of problem including, for example, the publicly available inference engine BayeSys (RTM) . An approximation may preferably be made to the full joint probability as detailed in Equation 6 above to bring about an increase in the speed of exploration. For each peptide, there is (at most) one contribution to the product in the joint probability (Equation 6) from each experiment. These contributions have two terms each. The approximation preferably keeps only four terms per protein in the fully expanded joint probability. These four terms correspond to: (i) peptides assigned correctly in all experiments; (ii) peptides assigned correctly in all but least probable experiment (lowest value of p) ; (iii) peptides assigned incorrectly in all but strongest experiment (highest value of p) ; and (iv) peptides assigned incorrectly in all experiments. In practice, however, even using powerful techniques such as applying Markov Chain Monte Carlo algorithms and simulated annealing, the solution to these problems can still become very slow when a large numbers of analytes is involved. The preferred embodiment enables the exploration to proceed more efficiently by preferably analytically reducing the dimensionality of the posterior probability distribution (Equation 6) by removing all components of the vector k thus leaving one less parameter to explore and thus saving computational power, in a procedure known as marginalisation. This is possible as it is unnecessary to record the magnitude of the vector k. Marginalisation is a process wherein both sides of the joint probability function (Equation 6) are integrated with respect to one of the vectors. In a preferred embodiment, marginalisation proceeds by the integration of the joint probability function with respect to k. In a less preferred embodiment, marginalisation may proceed by the integration of the joint probability function with respect to h. In a less preferred embodiment, a further integration may be performed, such that k and h may both be removed from the joint probability function. The second integration in such a method is, however, often difficult (and sometimes impossible), as the first integral may not be a true function. Various embodiments of the present invention will now be described, by way of example only, and with reference to the accompanying drawings in which: Fig. 1 shows some simulated noisy data where measurements for some analytes are not available; and Fig. 2 shows the actual relationship between sample quality and analyte expression and the relationship as determined according to the preferred embodiment. A preferred embodiment of the present invention will now be described. Fig. 1 shows simulated data with numbers produced using a random number generator. Four samples were considered and two replicate experiments were modelled for each sample. Accordingly, a total of eight experiments were performed. The actual, underlying or true relationships or ratios between the sample quantities hl-h8 and between the analyte expressions L1-L4 are shown in Fig. 2. Fig. 2 also shows the experimentally determined relationships or ratios as reconstructed according to the preferred embodiment from the noisy and incomplete data as shown in Fig. 1. It is apparent from Fig. 1 that in the sixth experiment no data was modelled as being present or obtained for the internal standard or invariant ions. However, nonetheless as can be see from Fig. 2 the ratio h6/hl has still been recovered successfully despite the lack of any internal standard in this experiment by the method of the preferred embodiment. It is to be noted that all of the sample ratios were successfully recovered and are shown in Fig. 2 consistent within the reported uncertainties. This would not be possible using conventional techniques. A number of further modifications to the preferred embodiment are contemplated. According to a modification the Poisson distribution given in Equation 3 above may be replaced by a Gaussian approximation to a Poisson distribution. According to another embodiment the exponential prior probability distribution function as presented in Equation 4 above may be replaced by a gamma distribution for any of the parameters G,L,h or k. For example, according to an embodiment:
Pr(L|β) = I- xp(-ZΛ) (7. Gamma(a)ta According to a further embodiment, the exponential prior probability distribution function as given in Equation 3 above may be replaced by a normal distribution for any of the parameters G, L, h or k. For example:
Figure imgf000029_0001
The exponential prior probability distribution function as given in Equation 3 may according to another embodiment be replaced by a lognormal distribution for any of the parameters G, L, h or k. For example:
Figure imgf000029_0002
According to an embodiment the value L0 in Equation 3 above is set to the average datum size. It is contemplated that a dimension could be removed from the model. According to such an embodiment, L may be multiplied by a constant and k could be divided by the same constant without changing the likelihood (Equation 2). A constraint could be added such as: π* =ι and the dependence on h could be recast in hyperbolic coordinates. This describes an alternative method of simplifying the probability distribution to marginalisation. Rather than integrating a value out of the equation in the case of marginalisation, a limit could instead be imposed on its possible values, such that there is less "space" for the algorithm to explore. To understand the concept of "space" a graph of h2 axis over hi axis can be considered. If there is no limit imposed on values of h, then the algorithm must explore all positive values - zero to infinity - for hi and likewise for h2, i.e. the entire positive region of the graph. By declaring the product of hιh2=l, the space that the algorithm needs to explore is limited to a single hyperbolic line on this graph (h2 = 1/hl , y = 1/x) . This leaves the values of h with some flexibility, so is a better approximation than simply assigning hl=l. This imposition can be made since the likelihood will remain the same if the value of k is altered accordingly. According to another embodiment marginalisation may proceed by integrating over h instead of k. As discussed above, since according to the preferred embodiment the values of L and Data are the only ones of particular interest, then all other values (i.e. G, h, k) in the joint probability function (See Equation 6 above) can be considered as being nuisance parameters i.e. parameter required for the calculation but otherwise unnecessary for the output. One of these values can be removed from the joint probability function by integrating both sides with respect to this value. For instance, to remove k, it is necessary to integrate with respect to k, giving:
Pr(E,&, G,Data) = j* (Pr(L)l?r(h)?r(k)-Pr(G)Yl?r(D0 ] L,h,k)) dk (10 ) Data
thus leaving the algorithm one less parameter to explore, and saving computational time. The result of such an integral is unlikely to be a function, so further integration is unlikely to be possible. It is not usually possible to integrate the function with respect to G, the program usually doing so with respect to h or k. The analytes could according to an embodiment be processed one at a 'time along with the internal standard rather than modelling the whole data set at once. According to an embodiment the preferred embodiment may tackle the problem in two parts. Firstly, h may be inferred and then L may be inferred given the inference about h. According to an embodiment there may not be any daughters (e.g. peptides) i.e. it may be possible to quantify directly on the analytes, or it may not be possible to make the associations described above and treat each daughter as a separate analyte. A further embodiment is contemplated wherein different approximations may be made to the joint probability distribution given in Equation 6 above. For example, up to six terms or eight terms may be kept, or all terms may be retained. It is also contemplated that the joint probability distribution could be explored without marginalisation. Although the present invention has been described with reference to preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the scope of the invention as set forth in the accompanying claims.

Claims

Claims
1. A method of mass spectrometry comprising: providing a first sample comprising a first mixture of components, molecules or analytes; providing a second different sample comprising a second mixture of components, molecules or analytes; and probabilistically determining or quantifying the relative intensity, concentration or expression level of a component, molecule or analyte in said first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in said second sample.
2. A method as claimed in claim 1, further comprising providing a plurality of further samples each comprising a mixture of components, molecules or analytes.
3. A method as claimed in claim 1 or 2, wherein said components, molecules or analytes comprise: (i) proteins; (ii) protein digest products; (iii) peptides; or (iv) fragments of peptides .
4. A method as claimed in claim 1, 2 or 3, wherein said components, molecules or analytes in said first mixture are the same species as said components, molecules or analytes in said second mixture and/or components, molecules or analytes in further mixtures .
5. A method as claimed in claim 1, 2 or 3, wherein said components, molecules or analytes in said first mixture are different species to said components, molecules or analytes in said second mixture and/or to components, molecules or analytes in further mixtures.
6. A method as claimed in any preceding claim, further comprising: digesting said first mixture of components, molecules or analytes; and/or digesting said second mixture of components, molecules or analytes; and/or digesting further mixtures of components, molecules or analytes .
7. A method as claimed in claim 6, wherein: said first mixture of components, molecules or analytes is digested to form a first complex mixture; and/or said second mixture of components, molecules or analytes is digested to form a second complex mixture; and/or further mixtures of components, molecules or analytes are digested to form further complex mixtures .
8. A method as claimed in claim 7, wherein said complex mixtures comprise complex mixtures of peptides or protein digest products.
9. A method as claimed in any preceding claim, further comprising: dividing said first sample into one or more first replicate samples; and/or dividing said second sample into one or more second replicate samples; and/or dividing further samples into one or more further replicate samples; and/or dividing said first complex mixture into one or more first replicate samples; and/or dividing said second complex mixture into one or more second replicate samples; and/or dividing said further complex mixtures into one or more further replicate samples .
10. A method as claimed in any preceding claim, further comprising: separating components, analytes or molecules in said first sample by means of a separation process; and/or separating components, analytes or molecules in said second sample by means of a separation process; and/or separating components, analytes or molecules in further samples by means of a separation process; and/or separating components, analytes or molecules in said first replicate samples by means of a separation process; and/or separating components, analytes or molecules in said second replicate samples by means of a separation process; and/or separating components, analytes or molecules in further replicate samples by means of a separation process.
11. A method as claimed in claim 10, wherein said separation process comprises liquid chromatography.
12. A method as claimed in claim 10, wherein separation process comprises: (i) High Performance Liquid Chromatography
("HPLC"); (ii) anion exchange; (iii) anion exchange chromatography; (iv) cation exchange; (v) cation exchange chromatography; (vi) ion pair reversed-phase chromatography; (vii) chromatography; (viii) single dimensional electrophoresis; (ix) multi-dimensional electrophoresis; (x) size exclusion; (xi) affinity; (xii) revere phase chromatography; (xiii) Capillary Electrophoresis
Chromatography ("CEC"); (xiv) electrophoresis; (xv) ion mobility separation; (xvi) Field Asymmetric Ion Mobility Separation or Spectrometry ("FAIMS"); or (xvi) capillary electrophoresis .
13. A method as claimed in any preceding claim, further comprising: ionising components, analytes or molecules in said first sample; and/or ionising components, analytes or molecules in said second sample; and/or ionising components, analytes or molecules in further samples; and/or ionising components, analytes or molecules in first replicate samples; and/or ionising components, analytes or molecules in second replicate samples; and/or ionising components, analytes or molecules in further replicate samples.
14. A method as claimed in any preceding claim, further comprising: mass analysing components, analytes or molecules in said first sample; and/or mass analysing components, analytes or molecules in said second sample; and/or mass analysing components, analytes or molecules in further samples; and/or mass analysing components, analytes or molecules in said first replicate samples; and/or mass analysing components, analytes or molecules in said second replicate samples; and/or mass analysing components, analytes or molecules in further replicate samples .
15. A method as claimed in claim 14, wherein said step of mass analysing components, analytes or molecules further comprises producing mass spectral data comprising a plurality of mass peaks.
16. A method as claimed in claim 15, further comprising determining the mass or mass to charge ratio of one or more of said mass peaks.
17. A method as claimed in claim 15 or 16, further comprising determining the signal intensity, or the integrated signal, for one or more of said mass peaks.
18. A method as claimed in claim 15, 16 or 17, further comprising determining the retention time for one or more of said mass peaks.
19. A method as claimed in any of claims 15-18, further comprising clustering mass peaks from said first sample and/or said second sample and/or further samples.
20. A method as claimed in any of claims 15-18, further comprising clustering mass peaks from said first replicate sample and/or said second replicate sample and/or further replicate samples.
21. A method as claimed in claim 19 or 20, further comprising: recognising or identifying components, analytes or molecules in said first sample; and/or recognising or identifying components, analytes or molecules in said second sample; and/or recognising or identifying components, analytes or molecules in further samples; and/or recognising or identifying components, analytes or molecules in said first replicate samples; and/or recognising or identifying components, analytes or molecules in said second replicate samples; and/or recognising or identifying components, analytes or molecules in further replicate samples .
22. A method as claimed in claim 21, wherein components, analytes or molecules are recognised or identified on the basis of mass or mass to charge ratio or accurate mass or accurate mass to charge ratio.
23. A method as claimed in claim 22, wherein the accurate mass or mass to charge ratio of said components, analytes or molecules is determined to within 20 ppm, 19 ppm, 18 ppm, 17 ppm, 16 ppm, 15 ppm, 14 ppm, 13 ppm, 12 ppm, 11 ppm, 10 ppm, 9 ppm, 8 ppm, 7 ppm, 6 ppm, 5 ppm, 4 ppm, 3 ppm, 2 ppm, 1 ppm or < 1 ppm.
24. A method as claimed in claim 22 or 23, wherein the mass to charge ratio of said components, analytes or molecules is determined to within 0.01 mass units, 0.009 mass units, 0.008 mass units, 0.007 mass units, 0.006 mass units, 0.005 mass units, 0.004 mass units, 0.003 mass units, 0.002 mass units, 0.001 mass units or < 0.001 mass units.
25. A method as claimed in any of claims 21-24, wherein components, analytes or molecules are recognised or identified on the basis of chromatographic retention time.
26. A method as claimed in any of claims 21-25, further comprising fragmenting components, molecules or analytes in a collision or fragmentation cell to form, create or generate a ' plurality of fragment, daughter or product ions.
27. A method as claimed in claim 26, further comprising mass analysing said fragment, daughter or product ions.
28. A method as claimed in any of claims 21-27, further comprising: identifying or recognising components, molecules or analytes in said first sample on the basis of fragment, daughter or product ions; and/or identifying or recognising components, molecules or analytes in said second sample on the basis of fragment, daughter or product ions; and/or identifying or recognising components, molecules or analytes in further samples on the basis of fragment, daughter or product ions.
29. A method as claimed in any of claims 15-28, further comprising obtaining or assigning probabilities for the correct identification of mass peaks.
30. A method as claimed in claim 29, further comprising determining or deriving said probabilities from a protein search procedure.
31. A method as claimed in claim 29 or 30, further comprising assigning a constant probability of correct identification where no probability is determined or derived from a protein search procedure.
32. A method as claimed in claim 31, further comprising assigning the probability of correct identification as a value x%.
33. A method as claimed in claim 32, wherein x is selected from the group consisting of: (i) < 5%; (ii) 5-10%; (iii) 10- 15%; (iv) 15-20%; (v) 20-25%; (vi) 25-30%; (vii) 30-35%; (viii) 35-40%; (ix) 40-45%; (x) 45-50%; (xi) 50-55%; (xii) 55- 60%; (xiii) 60-65%; (xiv) 65-70%; (xv) 70-75%; (xvi) 75-80%; (xvii) 80-85%; (xviii) 85-90%; (xix) 90-95%; and (xx) > 95%.
34. A method as claimed in any of claims 29-33, further comprising assigning a constant probability of correct identification in the event that no protein search procedure is performed.
35. A method as claimed in claim 34, further comprising assigning the probability of correct identification as a value x%.
36. A method as claimed in claim 35, wherein x is selected from the group consisting of: (i) < 5%; (ii) 5-10%; (iii) 10- 15%; (iv) 15-20%; (v) 20-25%; (vi) 25-30%; (vii) 30-35%; (viii) 35-40%; (ix) 40-45%; (x) 45-50%; (xi) 50-55%; (xii) 55- 60%; (xiii) 60-65%; (xiv) 65-70%; (xv) 70-75%; (xvi) 75-80%; (xvii) 80-85%; (xviii) 85-90%; (xix) 90-95%; and (xx) > 95%.
37. A method as claimed in any preceding claim, further comprising determining, formulating or assigning a prior probability distribution function Pr(L) for the relative amount or concentration L of components, molecules or analytes present in each sample.
38. A method as claimed in claim 37, wherein said prior probability distribution function Pr(L) is proportional to exp(-L/Λ) .
39. A method as claimed in claim 38, wherein Λ corresponds with a maximum signal intensity recorded for a mass peak.
40. A method as claimed in claim 38, wherein A corresponds with a mean or average signal intensity recorded for mass peaks.
41. A method as claimed in claim 37, wherein said prior probability distribution function Pr(L) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
42. A method as claimed in any of claims 37-41, wherein said prior probability distribution function Pr(L) has a distribution with an integral equal to one.
43. A method as claimed in any preceding claim, further comprising determining, formulating or assigning a prior probability distribution function Pr(k) for the overall response factor k of each component, molecule or analyte in said sample.
44. A method as claimed in claim 43, wherein k includes one or more of the following: (i) digestion efficiency; (ii) relative product yield; (iii) losses in delivery; (iv) ionisation efficiency; (v) transmission efficiency; and (vi) detection efficiency.
45. A method as claimed in claim 43 or 44, wherein said prior probability distribution function Pr(k) is proportional to exp(-k/k0), where k0 is a constant.
46. A method as claimed in claim 45, wherein k0 = 1.
47. A method as claimed in claim 43 or 44, wherein said prior probability distribution function Pr(k) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
48. A method as claimed in any of claims 43-47, wherein said prior probability distribution function Pr(k) has a distribution with an integral equal to one.
49. A method as claimed in any preceding claim, further comprising determining, formulating or assigning a prior probability distribution function Pr(h) for the relative amount of sample h of each component, molecule or analyte in each sample used in an analysis.
50. A method as claimed in claim 49, wherein h includes one or more of the following: (i) amount of solvent added; and (ii) amount of material injected.
51. A method as claimed in claim 49 or 50, wherein said prior probability distribution function Pr(h) is proportional to exp(-h/h0), where h0 is a constant.
52. A method as claimed in claim 51, wherein h0 = 1.
53. A method as claimed in claim 49 or 50, wherein said prior probability distribution function Pr(h) has a gamma, Poisson, Gaussian, exponential, normal or lognormal distribution.
54. A method as claimed in any of claims 49-53, wherein said prior probability distribution function Pr(h) has a distribution with an integral equal to one.
55. A method as claimed in any preceding claim, further comprising determining, formulating or assigning a prior probability distribution function Pr(G) for the noise contribution factor G assumed for observed signal intensities and/or applied to predicted signal intensities .
56. A method as claimed in claim 55, wherein G includes one or more of the following: (i) ion statistical shot noise; and
(ii) Electrospray ionisation droplet statistical shot noise.
57. A method as claimed in claim 55 or 56, wherein said prior probability distribution function Pr(G) is proportional to exp(-G/G0), where G0 is a constant.
58. A method as claimed in claim 57, wherein G0 = 1.
59. A method as claimed in claim 55 or 56, wherein said prior probability distribution function Pr(G) has a gamma,
Poisson, Gaussian, exponential, normal or lognormal distribution.
60. A method as claimed in any of claims 55-59, wherein said prior probability distribution function Pr(G) has a distribution with an integral equal to one.
61. A method as claimed in any preceding claim, further comprising locating, determining, identifying or choosing one or more internal standards or references.
62. A method as claimed in claim 61, wherein said one or more internal standards or references comprise one or more components, molecules or analytes which have substantially the same intensity, concentration or expression level in all of said samples .
63. A method as claimed in claim 61 or 62, wherein said one or more internal standards or references comprise one or more components, molecules or analytes added to each sample.
64. A method as claimed in claim 61, 62 or 63, wherein said one or more internal standards or references are endogenous or exogenous to said first sample and/or said second sample and/or further samples.
65. A method as claimed in any preceding claim, further comprising applying or using a Markov Chain Monte Carlo predictive procedure or investigating iteratively using a Markov Chain Monte Carlo algorithm to determine likely values for the relative concentrations L of each component, molecule or analyte in each of said samples .
66. A method as claimed in claim 65, wherein said Markov Chain Monte Carlo predictive procedure or algorithm is selected from the group consisting of: (i) Metropolis Hastings algorithm; (ii) Gibbs Sampling algorithm; (iii) Hamiltonian Monte Carlo algorithm; and (iv) Slice Sampling algorithm.
67. A method as claimed in claim 65 or 66, wherein said Markov Chain Monte Carlo predictive procedure or algorithm is used in conjunction with simulated annealing and/or nested sampling.
68. A method as claimed in any preceding claim, further comprising predicting what would be observed for each mass peak intensity given probability distribution functions Pr(L) and/or Pr(k) and/or Pr(h) and/or Pr(G) and/or given the probability p of correct identification.
69. A method as claimed in claim 68, further comprising comparing peak intensities that are predicted with those that are observed.
70. A method as claimed in claim 68 or 69, further comprising adjusting the value of L or the probability distribution function Pr(L).
71. A method as claimed in claim 68, 69 or 70, further comprising adjusting the value of k or the probability distribution function Pr(k).
72. A method as claimed in any of claims 68-71, further comprising adjusting the value of h or the probability distribution function Pr(h).
73. A method as claimed in any of claims 68-72, further comprising adjusting the value of G or the probability distribution function Pr(G).
74. A method as claimed in any of claims 68-73, further comprising predicting what would be observed for each mass peak intensity given said adjusted probability distribution functions Pr(L) and/or Pr(k) and/or Pr(h) and/or Pr(G) and/or given the probability p of correct identification.
75. A method as claimed in claim 74, further comprising comparing peak intensities that are predicted with those that are observed.
76. A method as claimed in claim 75, further comprising accepting or rejecting adjusted probability distribution functions.
77. A method as claimed in claim 76, further comprising repeating or terminating the cycle of adjusting probability distribution functions and/or predicting intensities and/or comparing predicted intensities with observed intensities .
78. A method as claimed in claim 77, further comprising determining the ratios Lij of relative concentrations L of each component, molecule or analyte in each of said samples for every pair i,j of samples.
79. A method as claimed in claim 78, further comprising continuing the Markov Chain Monte Carlo predictive procedure to determine more likely values for said relative concentrations L of each component, molecule or analyte in each of said samples and the ratios L^ of said relative concentrations L.
80. A method as claimed in claim 79, wherein the number of determinations of the ratios Lij of said relative concentrations L is pre-defined according to required accuracy of mean values .
81. A method as claimed in claim 80, further comprising calculating mean values for the ratios Lij of said relative concentrations L of each component, molecule or analyte in each of said samples for every pair i, j of said samples.
82. A method as claimed in claim 81, further comprising calculating standard deviations and/or relative standard deviations for the ratios Lij of said relative concentrations L of each component, molecule or analyte in each of said samples for every pair i,j of said samples.
83. A method as claimed in any preceding claim, wherein said first sample and/or said second sample and/or further samples comprise a plurality of different biopolymers, proteins, peptides, polypeptides, oligionucleotides, oligionucleosides, amino acids, carbohydrates, sugars, lipids, fatty acids, vitamins, hormones, portions or fragments of DNA, portions or fragments of cDNA, portions or fragments of RNA, portions or fragments of mRNA, portions or fragments of tRNA, polyclonal antibodies, monoclonal antibodies, ribonucleases, enzymes, metabolites, polysaccharides, phosphorolated peptides, phosphorolated proteins, glycopeptides, glycoproteins or steroids .
84. A method as claimed in any preceding claim, wherein said first sample and/or said second sample and/or further samples comprise at least 2, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, or 5000 components, molecules or analytes having different identities or comprising different species.
85. A method as claimed in any preceding claim, wherein said first sample and/or said second sample and/or further samples comprise non-equimolar heterogeneous complex mixtures.
86. A method as claimed in any preceding claim, wherein either: (i) said first sample is taken from a diseased organism and said second sample is taken from a non-diseased organism; (ii) said first sample is taken from a treated organism and said second sample is taken from a non-treated organism; or (iii) said first sample is taken from a mutant organism and said second sample is taken from a wild type organism.
87. A method as claimed in any preceding claim, further comprising identifying components, molecules or analytes in said first sample and/or said second sample and/or further samples
88. A method as claimed in any preceding claim, wherein said components, molecules or analytes in said first sample and/or said second sample and/or further samples are only identified if the intensity of said components, molecules or analytes in said first sample differs from the intensity of said components, molecules or analytes in said second sample and/or further samples by more than a predetermined amount.
89. A method as claimed in any preceding claim, wherein said components, molecules or analytes in said first sample and/or said second sample and/or further samples are only identified if the average intensity of a plurality of different components, molecules or analytes in said first sample differs from the average intensity of a plurality of different components, molecules or analytes in said second sample and/or further samples by more than a predetermined amount.
90. A method as claimed in claim 88 or 89, wherein said predetermined amount is selected from the group consisting of:
(i) 1%; (ii) 2%; (iii) 5%; (iv) 10%; (v) 20%; (vi) 50%; (vii) 100%; (viii) 150%; (ix) 200%; (x) 250%; (xi) 300%; (xii) 350%; (xiii) 400%; (xiv) 450%; (xv) 500%; (xvi) 1000%; (xvii) 5000%; and (xviii) 10000%.
91. A method as claimed in any preceding claim, wherein the mass or mass to charge ratio of components, molecules or analytes and/or peptide digest products and/or fragment, daughter or product ions are mass analysed by either: (i) a Fourier Transform ("FT") mass spectrometer; (ii) a Fourier Transform Ion Cyclotron Resonance ("FTICR") mass spectrometer; (iii) a Time of Flight ("TOF") mass spectrometer; (iv) an orthogonal acceleration Time of Flight ("oaTOF") mass spectrometer; (v) a magnetic sector mass spectrometer; (vi) a quadrupole mass analyser; (vii) an ion trap mass analyser; or (viii) a Fourier Transform orbitrap, an electrostatic Ion Cyclotron Resonance mass spectrometer or an electrostatic Fourier Transform mass spectrometer.
92. A method as claimed in any preceding claim, further comprising ionising said first sample and/or said second sample and/or further samples using an ion source selected from the group consisting of: (i) an Electrospray ionisation ("ESI") ion source; (ii) an Atmospheric Pressure Photo Ionisation ("APPI") ion source; (iii) an Atmospheric Pressure Chemical Ionisation ("APCI") ion source; (iv) a Matrix Assisted Laser Desorption Ionisation ("MALDI") ion source; (v) a Laser Desorption Ionisation ("LDI") ion source; (vi) an
Atmospheric Pressure Ionisation ("API") ion source; (vii) a Desorption Ionisation on Silicon ("DIOS") ion source; (viii) an Electron Impact ("El") ion source; (ix) a Chemical Ionisation ("CI") ion source; (x) a Field Ionisation ("FI") ion source; (xi) a Field Desorption ("FD") ion source; (xii) an Inductively Coupled Plasma ("ICP") ion source; (xiii) a Fast Atom Bombardment ("FAB") ion source; (xiv) a Liquid Secondary Ion Mass Spectrometry ("LSIMS") ion source; (xv) a Desorption Electrospray Ionisation ("DESI") ion source; and (xvi) a Nickel-63 radioactive ion source.
93. A mass spectrometer comprising means arranged to probabilistically determine or quantify the relative intensity, concentration or expression level of a component, molecule or analyte in a first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in a second sample.
94. A mass spectrometer as claimed in claim 93, further comprising a liquid chromatograph.
95. A mass spectrometer as claimed in claim 93 or 94, further comprising one or mass filters and/or one or more mass analysers .
96. A mass spectrometer as claimed in claim 95, wherein said one or more mass filters and said one or more mass analysers are selected from the group consisting of: (i) an orthogonal acceleration Time of Flight mass analyser; (ii) an axial acceleration Time of Flight mass analyser; (iii) a Paul 3D quadrupole ion trap mass analyser; (iv) a 2D or linear quadrupole ion trap mass analyser; (v) a Fourier Transform Ion Cyclotron Resonance mass analyser; (vi) a magnetic sector mass analyser; (vii) a quadrupole mass analyser; and (viii) a Penning trap mass analyser.
97. A mass spectrometer as claimed in any of claims 93-96, further comprising an ion source.
98. A mass spectrometer as claimed in claim 97, wherein said ion source comprises a pulsed ion source.
99. A mass spectrometer as claimed in claim 97, wherein said ion source comprises a continuous ion source.
100. A mass spectrometer as claimed in any of claims 93-99, further comprising an ion source selected from the group consisting of: (i) an Electrospray ionisation ("ESI") ion source; (ii) an Atmospheric Pressure Photo Ionisation ("APPI") ion source; (iii) an Atmospheric Pressure Chemical Ionisation ("APCI") ion source; (iv) a Matrix Assisted Laser Desorption Ionisation ("MALDI") ion source; (v) a Laser Desorption Ionisation ("LDI") ion source; (vi) an Atmospheric Pressure Ionisation ("API") ion source; (vii) a Desorption Ionisation on Silicon ("DIOS") ion source; (viii) an Electron Impact ("El") ion source; (ix) a Chemical Ionisation ("CI") ion source; (x) a Field Ionisation ("FI") ion source; (xi) a Field Desorption ("FD") ion source; (xii) an Inductively Coupled Plasma ("ICP") ion source; (xiii) a Fast Atom Bombardment ("FAB") ion source; (xiv) a Liquid Secondary Ion Mass Spectrometry ("LSIMS") ion source; (xv) a Desorption Electrospray Ionisation ("DESI") ion source; and (xvi) a Nickel-63 radioactive ion source.
101. A method of relatively quantifying one or more molecular species among several samples, said method comprising: dividing each sample into multiple replicate samples; for each of said replicate samples obtaining a signal for each of several tentatively identified digestion products of said molecular species in question, wherein the signal is proportional to the concentration of the parent species subject to random noise; obtaining or assigning probabilities that each tentative identification is correct; assigning a prior probability distribution function for the relative amount L of each molecular species in each sample; assigning a prior probability distribution function for the relative amount k of digestion product produced from each molecular species; assigning a prior probability distribution function for the relative amount h of sample for each replicate sample; assigning a prior probability distribution function for the noise level G in each sample; choosing an internal standard wherein the concentration of said internal standard is known to be the same in all of said replicate samples; updating the probability distribution for the relative amount L of each molecular species in each sample; obtaining samples according to said probability distribution for the relative amount L of each molecular species in each sample of a monotonic function of the ratios L_i to L_j for every distinct pair i,j of said replicate samples; and calculating a mean value and standard deviation of the function for each of said pairs.
102. A method of mass spectrometry comprising: providing a first sample comprising a first mixture of components, molecules or analytes; providing a second different sample comprising a second mixture of components, molecules or analytes; and determining or quantifying the relative intensity, concentration or expression level of a component, molecule or analyte in said first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in said second sample.
103. A mass spectrometer comprising means arranged to determine or quantify the relative intensity, concentration or expression level of a component, molecule or analyte in a first sample relative to the intensity, concentration or expression level of a component, molecule or analyte in a second sample.
PCT/GB2005/001679 2004-04-30 2005-05-03 Mass spectrometer WO2005106453A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP05740580.5A EP1745500B1 (en) 2004-04-30 2005-05-03 Mass spectrometer
CA2564279A CA2564279C (en) 2004-04-30 2005-05-03 Mass spectrometer
US11/568,408 US8012764B2 (en) 2004-04-30 2005-05-03 Mass spectrometer
JP2007510124A JP5009784B2 (en) 2004-04-30 2005-05-03 Mass spectrometer

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0409677A GB0409677D0 (en) 2004-04-30 2004-04-30 Mass spectrometer
GB0409677.2 2004-04-30
GB0411248A GB0411248D0 (en) 2004-04-30 2004-05-20 Mass spectrometer
GB0411248.8 2004-05-20

Publications (2)

Publication Number Publication Date
WO2005106453A2 true WO2005106453A2 (en) 2005-11-10
WO2005106453A3 WO2005106453A3 (en) 2006-09-21

Family

ID=34680447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/001679 WO2005106453A2 (en) 2004-04-30 2005-05-03 Mass spectrometer

Country Status (6)

Country Link
US (1) US8012764B2 (en)
EP (1) EP1745500B1 (en)
JP (1) JP5009784B2 (en)
CA (1) CA2564279C (en)
GB (1) GB2413695B (en)
WO (1) WO2005106453A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011128702A1 (en) * 2010-04-15 2011-10-20 Micromass Uk Limited Method and system of identifying a sample by analyising a mass spectrum by the use of a bayesian inference technique
JP2019505780A (en) * 2015-12-30 2019-02-28 フィト エヌフェー Structure determination method of biopolymer based on mass spectrometry

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0409676D0 (en) * 2004-04-30 2004-06-02 Micromass Ltd Mass spectrometer
GB0514553D0 (en) 2005-07-15 2005-08-24 Nonlinear Dynamics Ltd A method of analysing a representation of a separation pattern
GB0514555D0 (en) 2005-07-15 2005-08-24 Nonlinear Dynamics Ltd A method of analysing separation patterns
US8095345B2 (en) * 2009-01-20 2012-01-10 Chevron U.S.A. Inc Stochastic inversion of geophysical data for estimating earth model parameters
CA2819181C (en) * 2010-11-29 2020-03-10 Dako Denmark A/S Methods and systems for analyzing images of specimens processed by a programmable quantitative assay
GB201100302D0 (en) * 2011-01-10 2011-02-23 Micromass Ltd A method of correction of data impaired by hardware limitions in mass spectrometry
GB201205720D0 (en) 2012-03-30 2012-05-16 Micromass Ltd A method for the investigation of differences in analytical data and an apparatus adapted to perform such a method
US10627407B2 (en) 2015-03-12 2020-04-21 Mars, Incorporated Ultra high resolution mass spectrometry and methods of using the same
US11182688B2 (en) * 2019-01-30 2021-11-23 International Business Machines Corporation Producing a formulation based on prior distributions of a number of ingredients used in the formulation
US20210215651A1 (en) * 2020-01-15 2021-07-15 Chevron U.S.A. Inc. Estimating unknown proportions of a plurality of end-members in an unknown mixture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5910655A (en) * 1996-01-05 1999-06-08 Maxent Solutions Ltd. Reducing interferences in elemental mass spectrometers
US20020053545A1 (en) * 2000-08-03 2002-05-09 Greef Jan Van Der Method and system for identifying and quantifying chemical components of a mixture
GB2394545A (en) * 2001-12-08 2004-04-28 Micromass Ltd Mass spectrometry

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5087815A (en) * 1989-11-08 1992-02-11 Schultz J Albert High resolution mass spectrometry of recoiled ions for isotopic and trace elemental analysis
JP3509267B2 (en) * 1995-04-03 2004-03-22 株式会社日立製作所 Ion trap mass spectrometry method and apparatus
US5545894A (en) * 1995-05-04 1996-08-13 The Regents Of The University Of California Compact hydrogen/helium isotope mass spectrometer
US6489608B1 (en) * 1999-04-06 2002-12-03 Micromass Limited Method of determining peptide sequences by mass spectrometry
US6391649B1 (en) * 1999-05-04 2002-05-21 The Rockefeller University Method for the comparative quantitative analysis of proteins and other biological material by isotopic labeling and mass spectroscopy
US6489609B1 (en) * 1999-05-21 2002-12-03 Hitachi, Ltd. Ion trap mass spectrometry and apparatus
US6446010B1 (en) * 1999-06-15 2002-09-03 The Rockefeller University Method for assessing significance of protein identification
US6393367B1 (en) * 2000-02-19 2002-05-21 Proteometrics, Llc Method for evaluating the quality of comparisons between experimental and theoretical mass data
JP3975663B2 (en) * 2000-09-05 2007-09-12 株式会社日立製作所 Gene polymorphism analysis method
KR20030031911A (en) * 2001-04-19 2003-04-23 싸이퍼젠 바이오시스템즈, 인코포레이티드 Biomolecule characterization using mass spectrometry and affinity tags
US7045296B2 (en) * 2001-05-08 2006-05-16 Applera Corporation Process for analyzing protein samples
US6835927B2 (en) * 2001-10-15 2004-12-28 Surromed, Inc. Mass spectrometric quantification of chemical mixture components
EP1319954A1 (en) * 2001-12-12 2003-06-18 Centre National de Genotypage Methods for protein analysis using protein capture arrays
US6556651B1 (en) * 2002-01-25 2003-04-29 Photoelectron Corporation Array of miniature radiation sources
WO2003098182A2 (en) * 2002-05-15 2003-11-27 Proteosys Ag Method for quantifying molecules
GB0305796D0 (en) 2002-07-24 2003-04-16 Micromass Ltd Method of mass spectrometry and a mass spectrometer
EP1530721B1 (en) * 2002-08-22 2008-12-24 Applera Corporation Method for characterizing biomolecules utilizing a result driven strategy
EP1606757A1 (en) * 2003-03-25 2005-12-21 Institut Suisse de Bioinformatique Method for comparing proteomes
US7425700B2 (en) * 2003-05-22 2008-09-16 Stults John T Systems and methods for discovery and analysis of markers
US20050048499A1 (en) * 2003-08-29 2005-03-03 Perkin Elmer Life Sciences, Inc. Tandem mass spectrometry method for the genetic screening of inborn errors of metabolism in newborns
EP1695090A1 (en) * 2003-12-08 2006-08-30 Oxford Gene Technology Ip Limited Mass spectrometry of arginine-containing peptides
JP4275545B2 (en) * 2004-02-17 2009-06-10 株式会社日立ハイテクノロジーズ Mass spectrometer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5910655A (en) * 1996-01-05 1999-06-08 Maxent Solutions Ltd. Reducing interferences in elemental mass spectrometers
US20020053545A1 (en) * 2000-08-03 2002-05-09 Greef Jan Van Der Method and system for identifying and quantifying chemical components of a mixture
GB2394545A (en) * 2001-12-08 2004-04-28 Micromass Ltd Mass spectrometry

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HELLERSTEIN M K ET AL: "Mass isotopomer distribution analysis at eight years: theoretical, analytic and experimental considerations" AMERICAN JOURNAL OF PHYSIOLOGY: ENDOCRINOLOGY AND METABOLISM, AMERICAN PHYSIOLOGICAL SOCIETY, BETHESDA, MD, US, vol. 276, no. 39, 1999, pages E1146-E1170, XP002978087 ISSN: 0193-1849 *
KANG H D ET AL: "Radical detection in a methane plasma" JOURNAL OF VACUUM SCIENCE & TECHNOLOGY A (VACUUM, SURFACES, AND FILMS) AIP FOR AMERICAN VACUUM SOC USA, vol. 21, no. 6, November 2003 (2003-11), pages 1978-1980, XP007900816 ISSN: 0734-2101 *
PREUSS R ET AL: "Quantitative analysis of multicomponent mass spectra" AIP CONFERENCE PROCEEDINGS AIP USA, no. 617, 2002, pages 155-162, XP007900817 ISSN: 0094-243X *
See also references of EP1745500A2 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011128702A1 (en) * 2010-04-15 2011-10-20 Micromass Uk Limited Method and system of identifying a sample by analyising a mass spectrum by the use of a bayesian inference technique
US8604421B2 (en) 2010-04-15 2013-12-10 Micromass Uk Limited Method and system of identifying a sample by analyising a mass spectrum by the use of a bayesian inference technique
JP2019505780A (en) * 2015-12-30 2019-02-28 フィト エヌフェー Structure determination method of biopolymer based on mass spectrometry

Also Published As

Publication number Publication date
JP2007535673A (en) 2007-12-06
US20080076186A1 (en) 2008-03-27
GB2413695A (en) 2005-11-02
JP5009784B2 (en) 2012-08-22
WO2005106453A3 (en) 2006-09-21
EP1745500A2 (en) 2007-01-24
US8012764B2 (en) 2011-09-06
CA2564279A1 (en) 2005-11-10
CA2564279C (en) 2013-09-24
GB2413695B (en) 2009-01-21
GB0508935D0 (en) 2005-06-08
EP1745500B1 (en) 2017-03-15

Similar Documents

Publication Publication Date Title
CA2564279C (en) Mass spectrometer
US8975577B2 (en) System and method for grouping precursor and fragment ions using selected ion chromatograms
US8841606B2 (en) Mass spectrometry
US8809770B2 (en) Data independent acquisition of product ion spectra and reference spectra library matching
JP4848454B2 (en) Mass spectrometer
EP2741224A1 (en) Methods for generating local mass spectral libraries for interpreting multiplexed mass spectra
JP4950029B2 (en) Mass spectrometer
US8515685B2 (en) Method of mass spectrometry, a mass spectrometer, and probabilistic method of clustering data
EP4102509A1 (en) Method and apparatus for identifying molecular species in a mass spectrum
TAECHAWATTANANANT Peak identification and quantification in proteomic mass spectrograms using non-negative matrix factorization

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2564279

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2007510124

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Ref document number: DE

REEP Request for entry into the european phase

Ref document number: 2005740580

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2005740580

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005740580

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 11568408

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 11568408

Country of ref document: US