© 2000 Nature America Inc. • http://structbio.nature.com
progress
Protein NMR spectroscopy in structural genomics
Gaetano T. Montelione1, Deyou Zheng1, Yuanpeng J. Huang1, Kristin C. Gunsalus1 and Thomas Szyperski2
© 2000 Nature America Inc. • http://structbio.nature.com
Protein NMR spectroscopy provides an important complement to X-ray crystallography for structural
genomics, both for determining three-dimensional protein structures and in characterizing their biochemical
and biophysical functions.
Structural genomics involves the determination, analysis, and dissemination of the
three-dimensional structures of all protein
and RNA molecules in nature, providing
new opportunities at the interface of structural biology, functional genomics, and
bioinformatics. This very ambitious goal
requires both large-scale structure determination and amplification of these data by high-throughput
modeling. It is generally recognized that X-ray crystallography
using synchrotron radiation, and multiwavelength anomalous
dispersion (MAD) methods1 for determining the phase information required for crystallographic analysis, will play a central role
in genomic-scale structural analysis (see the articles by Stevens
and colleagues, and Lamzin and Perrakis). Solution state NMR
will also have a complementary role in post-genomic analysis,
particularly considering that (i) many protein targets do not provide crystals suitable for crystallographic analysis; (ii) some
15–20% of new protein structures are determined by NMR
methods; and (iii) sequence-specific resonance assignments provide the basis for various kinds of functional characterization.
Strengths and weaknesses of NMR in structural
genomics
Several features of solution-state NMR make it particularly
suitable for structure-function analysis and structural
genomics. Structural analysis by NMR does not require
protein crystals. Most (∼75%) of the NMR structures in the
Protein Data Bank (PDB) do not have corresponding crystal structures, and many of these simply do not provide diffraction quality crystals. Moreover, NMR studies can be
carried out in aqueous solution under conditions quite similar to the physiological conditions under which the protein
normally functions. This feature allows comparisons to be
made between subtly different solution conditions that may
modulate structure-function relationships. For example,
pH titration data can be used to determine pKa values of
specific ionizable groups in the protein and to characterize
the corresponding structure-function relationships. While
most crystal structures are determined under physiologically relevant conditions, in many cases somewhat exotic solution conditions are required for crystallization.
The accuracy of protein structures determined by
NMR is very dependent on the extent and quality of data
that can be obtained. The highest quality NMR structures have
accuracies comparable to 2.0–2.5 Å X-ray crystal structures2.
Although atomic positions in high-resolution crystal structures
are more precisely determined than in the corresponding NMR
structures, the crystallization process may select for a subset of
conformers present under solution conditions. For example,
while high-quality NMR structures typically exibit root mean
square (r.m.s.) deviations of backbone and heavy atoms (excluding those of surface side chains) of 0.3–0.6 Å and 0.5–0.8 Å,
respectively, analysis of a set of high-resolution X-ray crystal
structures of bovine pancreatic trypsin inhibitor determined in
different crystal forms3 indicates similar variations of 0.2–0.6 Å
in backbone atom positions due to preferential selection of distinct low energy conformers in the crystallization process.
NMR has special value in structural genomics efforts for rapidly characterizing the ‘foldedness’ of specific protein or RNA constructs. The dispersion and lineshapes of resonances measured in
1D 1H-NMR and 2D 15N-1H or 13C-1H correlation spectra provide ‘foldedness’ criteria with which to define constructs and
solution conditions that provide folded protein samples (Fig. 1).
As the required isotopic enrichment with 15N is relatively inexpensive, and the 2D 15N-1H correlation spectra can be recorded in
a
b
Fig. 1 Comparison of 15N-1H correlation spectra for disordered and well-folded proteins. a, Spectrum of Drosophila melanogaster Par 1 C-terminal domain, a domain
construct that is predominantly disordered under the conditions of these measurements (K.G. and G.T.M., unpublished results). b, Spectrum of Thermus thermophilus
varient of COG272 protein, a target with well-defined three-dimensional structure
in aqueous solution (B. Dixon, S. Anderson, and G.T.M., unpublished results).
Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854-5638, USA.
Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260, USA. Correspondence should be addressed to G.T.M.
email: guy@cabm.rutgers.edu
1
2
982
nature structural biology • structural genomics supplement • november 2000
© 2000 Nature America Inc. • http://structbio.nature.com
progress
© 2000 Nature America Inc. • http://structbio.nature.com
Box 1 Protein structure determination by NMR
The determination of a NMR solution structure may be dissected into six major parts. (i) At
the outset of the NMR study, a suitable sample, usually ∼500 µL of a 1 mM protein solution is
prepared. If the molecular weight of a protein exceeds ∼10 kDa, enrichment with 13C and 15N
isotopes is required in order to resolve spectral overlap in 1H-NMR spectroscopy. Due to the
availability of high-yield over-expression systems, stable isotope labeling has become
routine. (ii) Subsequently, this sample is used to record a set of multidimensional NMR
experiments, typically at temperatures around 30 ºC, which provide, after suitable data
processing, the NMR spectra. (iii) These allow determination of (nearly) complete sequential
NMR assignments (the measurement of resonance frequencies (chemical shifts) of the NMRactive spins in the protein). (iv) The resulting conformation-dependent dispersion of the
chemical shifts is a prerequisite for deriving experimental constraints from various NMR
experiments (such as NOE, scalar coupling, and dipolar coupling data) for the NMR structure
calculation. The circular arrows between steps (iv) and (v) indicate that the analysis of
structural constraints and the calculation of NMR structures is generally pursued in an iterative fashion. (v) Iterations involving structure
calculations and identification of new constraints are carried out until the overwhelming majority of experimentally derived constraints is
in agreement with a bundle of protein conformations representing the NMR solution structure. Conformational variations in the bundle of
structures reflect the precision of the NMR structure determination. (vi) Finally, the NMR structure can be refined using conformational
energy force fields, which in essence reflect our current knowledge about conformational preferences of proteins.
tens of minutes with conventional NMR systems, it is quite feasiLarge multidomain proteins are generally not suitable for
ble to use such data as a ‘foldedness’ screen in a high throughput NMR analysis. However, these can also exhibit interdomain flexsample preparation pipeline. Moreover, there may be correlations ibility, which can complicate or prevent crystallization.
between such ‘foldedness’ criteria and crystallizability, so that Fortunately, many of these larger proteins are composed of
data from a high throughput NMR screen might directly support structural domains13–15, with an average size of ∼175 amino
efforts to generate samples for crystallographic analysis.
acids. Indeed, much of the structural information available for
Protein backbone chemical shift assignments are obtained at the such larger proteins comes from X-ray and NMR studies of isoinitial stage of a structure determination (see Box 1), and can often lated domains. In this regard, both experimental and theoretical
be generated in a fully automated fashion4. These data provide methods for parsing large multidomain proteins into
experimental determination of locations of secondary structural autonomously folding domain segments are critical to the generelements5,6, which is more reliable than that provided by secondary al aims of structural genomics.
structure prediction algorithms. This knowledge is tremendously
NMR is particularly valuable in structural genomics for analyzenabling for fold prediction algorithms. Such fold predictions form ing protein structures that are outside the scope of crystallographic
the basis for functional predictions7 and can also be used for prior- studies. Included in the classes of proteins that do not form crystals
itizing targets for further experimental structure analysis.
suitable for crystallographic analysis are those that are partially
NMR also provides a powerful tool for downstream characteri- unfolded in the absence of binding partners, as well as some memzation of structure-function relationships, a critical component brane-associated proteins that can be studied in micelle environof the process of structure-based functional genomics8. Chemical ments using solution-state NMR. Solid state NMR methods can
shift perturbation provides an important tool for validating pro- also provide structural information for some integral membrane
posed biochemical functions, screening for small molecule lig- proteins that may not be accessible by crystallographic methods.
ands, mapping ligand binding epitopes, and drug development9.
NMR spectroscopy is relatively insensitive, which severely
Moreover, it is generally appreciated that the thermodynamics limits experimental design. Typically samples at ∼1 mM protein
and mechanisms of molecular function depend on
changes in internal dynamics, which can be characterized using nuclear relaxation measurements10.
Although significant progress has been made in
determining resonance assignments and low resolution structures of larger systems11,12, standard methods for high resolution structure analysis by NMR are
limited to proteins with molecular weights less than
25–30 kDa. The size distribution of ORFs in some
genomes is shown in Fig. 2. Even though many of
these ORFs code for oligomeric proteins, proteins
that are folded only in the presence of binding partners, or integral membrane proteins, we estimate that
at least 25% of yeast ORFs will be suitable for NMR
structure determination with current methodologies.
In higher eukaryotic genomes, this fraction of small
Fig. 2 Distribution of predicted open reading frame (ORF) lengths in the genomes of
ORFs is somewhat lower. Nonetheless, there are thou- Escherichia coli (blue), Saccharomyces cerevisiae (red), Caenorhabditis elegans (yelsands of full-length ORF targets that will be suitable low), and Drosophila melanogaster (green). Assuming monomeric structures, the
length cut-off for routine NMR studies is ∼300 amino acids (dotted vertical line).
for NMR structure determination.
nature structural biology • structural genomics supplement • november 2000
983
© 2000 Nature America Inc. • http://structbio.nature.com
progress
© 2000 Nature America Inc. • http://structbio.nature.com
concentration are required, preventing studies of proteins with
very low solubilities. Because of constraints on pulse sequence
design arising from these sensitivity limitations, several different NMR spectra recorded over a four to six week period are
necessary to obtain the information needed for a high-quality
structure determination. These long data collection periods, in
turn, put significant constraints on sample stability. Although
multiple samples can be used in the structure determination
process, each one must be stable for days to weeks with respect
to precipitation, aggregation, and other forms of degradation.
Manual analysis of these multiple NMR data sets is laborious
and requires significant expertise. Another important limitation
of NMR analysis is that the density of constraints is sometimes
inadequate for accurate structural analysis. In particular, general methods for cross validation analogous to a free R-factor, a
statistical measurement used in crystallographic studies to evaluate how well a structural model fits the diffraction data, are not
yet available.
a
b
c
d
Recent technological advances
The reduction of the data collection time required for a structure
determination is a major challenge for NMR-based structural
genomics. Technological advances enhancing sensitivity, such as
the construction of new high-field magnets are of keen interest.
The sensitivity of the acquired NMR data depends critically on
the performance of the NMR probe, a sophisticated electronic
device used to detect NMR signals. In the near future, the introduction of cryogenic probes is expected to have a significant
impact. Radiofrequency (RF) coils constitute the heart of these
probes, and their sensitivity scales with the thermal noise associated with the coil’s temperature. Cryogenic probes utilize
RF-coils cooled to ~25 K, and the resulting sensitivity enhancement reduces instrument time requirements by factors that
range from 4 to 16. Another key advance involves partial deuteration12, providing samples that can be studied with improved
signal-to-noise ratios that result from their sharper linewidths
and longer transverse relaxation times. The combination of partial deuteration and cryogenic probes can provide a factor of 10
or more reduction in the requisite data collection times. These
technologies provide the basis for high throughput NMR, and
are particularly valuable for samples exhibiting limited stabilities
and/or low solubilities. A novel spectroscopic concept named
TROSY (transverse relaxation optimized spectroscopy), based
on selection of slowly relaxing NMR transitions, also can provide
significant sensitivity enhancement for large proteins11,16,17 and
may become a prerequisite to extend structural genomics by
NMR into the 30–50 kDa molecular weight range.
NMR structure determinations rely on the nearly complete
assignment of chemical shifts, which are obtained using multidimensional 13C,15N,1H-triple resonance NMR methods (for
recent technical reviews see refs 12, 17, and 18). However, a complete set of these experiments often requires far more instrument
time than the minimum dictated by signal-to-noise (S/N)
requirements. A particular challenge for structural genomics is
the development of NMR experiments that allow matching of
instrument time investments to the minimum time required for
measuring the chemical shift data. For many samples, most of
the instrument time is needed not to detect signal, but to ensure
appropriate resolution and/or information content of the spectra. In particular, lower bounds for the measurement time of
three- and four-dimensional experiments are often determined
by digital resolution requirements in the indirect dimensions
rather than S/N requirements. Reduced dimensionality experi-
ments19,20, with simultaneous frequency labeling of more than
one atom type in indirect dimensions, offers an attractive solution that matches data collection times with signal-to-noise, and
requiring minimal sets of NMR experiments for resonance
assignment20.
Traditional NMR structure determination relies on measurement of nuclear Overhauser effects (NOEs; through-space dipolar interactions between protons) and scalar couplings
(through-bond interactions between nuclei mediated by
nuclear-electron interactions) for deriving distance and torsion
angle constraints, respectively. NOE constraints will continue to
be key for high-throughput structure determination, but the
arsenal of techniques that have recently been developed to
recruit additional experimental parameters for structure refinement will play a valuable role in structural genomics. First, measurement of residual dipolar 1H-15N and 1H-13C couplings in
dilute liquid crystalline media (aqueous solutions containing
suitable amounts of bicelles21 or filamentous phage22 to help constrain the orientation of the protein under study) offers qualitatively new structural information. Dipolar coupling constraints
can establish the spatial relationship of remote segments of a biological macromolecule and can complement sparse NOE networks for obtaining high-quality structures23. Current
limitations for use in structural genomics are the efficient identification of suitable orienting media in which the protein sample
remains soluble. Second, chemical shifts (the NMR resonance
984
nature structural biology • structural genomics supplement • november 2000
Fig. 3. Results of automatic analysis of protein structures from NMR
data. Comparison of backbone structures of basic fibroblast growth factor (FGF) determined by a, manual analysis of NMR data (PDB code
1bld), b, automated analysis of the same NMR data using the program
AutoStructure (Y.J.H, R. Tejero and G.T.M., unpublished) or c, X-ray crystallography (PDB code 1bas). Only residues 28–152 are shown, as the
N-terminal segment is not well-ordered in either the X-ray or solution
NMR structure, and a few C-terminal residues are not defined in the
X-ray crystal structure. The average root mean square (r.m.s.) deviation
of backbone atom positions between the AutoStructure and manuallydetermined NMR structures is 0.6 Å. d, Superposition of 10 NMR structures computed with AutoStructure. The average r.m.s. deviation of
core backbone atoms relative to the mean coordinates is 0.3 Å.
© 2000 Nature America Inc. • http://structbio.nature.com
© 2000 Nature America Inc. • http://structbio.nature.com
progress
frequencies) have long been recogTable 1 Web sites related to the use of NMR in structural genomics
nized as a potential source for structural refinement. In particular, 13Cα
Center or Consortium
URL
and 13Cβ shifts offer a robust means
BioMagResBank
www.bmrb.wisc.edu
to map the secondary structure and
Harvard Structural Genomics of Cancer
sbweb.med.harvard.edu/~sgc/
to derive backbone dihedral angle
Initiative, USA
constraints at an early stage of the
New Jersey Commission on Science and
www-nmr.cabm.rutgers.edu/structuralgenomics
structure determination5,6. They are
Technolology Initiative in Structural
obtained during the resonance
Genomics, USA
assignment process, and are thus of
Northeast Structural Genomics
www.nesg.org
outstanding value for efficient highConsortium, USA
throughput efforts. Third, detection
Protein Structure Factory, Germany
userpage.chemie.fu-berlin.de/~psf/
of through-hydrogen bond scalar
Riken Genome Sciences Center, Tokyo, Japan
www.gsc.riken.go.jp
24
couplings affords valuable unam- Toronto Structural Proteomics Project, Canada nmr.oci.utoronto.ca/arrowsmith/proteomics
biguous constraints for characterizing hydrogen-bonded networks,
although the small size of these couplings may restrict this to Conclusions
smaller proteins.
Protein NMR provides structural and biophysical information
that is complementary to X-ray crystallography, and these two
Automated data analysis
methods will play synergistic roles in the postgenomic analysis
Another important area of development involves automated and structural genomics. Indeed, NMR is already playing key
analysis of NMR data. It has been recognized for some time that roles in several of the established pilot projects. The primary
many of the interactive tasks carried out by an expert in the challenges to NMR for high throughput applications are the necprocess of spectral analysis could, in principle, be carried out essarily long time periods for data collection and the laborious
more efficiently and rapidly by computational systems. Recent expert reasoning need for data analysis. Recent advances in
developments provide automated analysis of NMR assignments probe design, data collection strategies, and software engineerand three-dimensional structures of proteins ranging from ∼50 ing demonstrate the potential for higher throughput data collecto 200 amino acids4,18. When good quality data are available, tion and automated structure analysis.
automated analysis of protein NMR data can be very rapid.
Many of the available resonance assignment programs execute in
tens of seconds4,18, and automated structure refinements are Acknowledgments
being carried out in tens of minutes using arrays of processors We thank S. Anderson for useful discussions. The NMR data for FGF were
provided by R. Powers and F. Moy (Wyeth Ayerst Research Laboratories).
for course-grain parallel calculations (Fig. 3). However, while G.T.M. is supported by grants from the New Jersey Commission on Science and
progress over the last few years is encouraging, more work is Technology, The National Science Foundation, and the Merck Genome Research
required, even for small proteins, before automated structural Institute. K.C.G. is supported by Postdoctoral Fellowship Award from the NIH.
analysis is routine. In particular, general methods for automated
analysis of side chain resonance assignments are not yet well Associations with structural genomics
developed, and there are as yet no examples of completely auto- G. T. M. is Director of the New Jersey Commission on Science and Technology
mated protein structure determinations. Moreover, little work Initiative in Structural Genomics and Bioinformatics
has focused on the specific problems associated with nucleic acid 1. Hendrickson, W. Science 254, 51–58 (1995).
2. Billeter, M. Q. Rev. Biophys. 25, 325–377 (1992).
structure determinations.
3. Kossiakoff, A.A., Randal, M., Guenot, M. & Eigenbrot, C. Proteins Struct. Funct.
Pilot projects using NMR for structural genomics
In view of these technological advances and the unique opportunities presented by the genomic sequence data, several research
groups and consortia have initiated pilot projects using NMR in
structural genomics (Table 1). The scales of these efforts range
from the effort at Rutgers University in the USA funded by the
New Jersey Commission on Science and Techonology, which
focuses primarily on technology development, to the RIKEN
Genome Sciences Center in Japan, which is in the process of
installing some twenty high field NMR spectrometers to be used
largely for high throughput structural genomics. Also particularly noteworthy is the structural genomics pilot project organized
by researchers at University of Toronto in Canada, in which isotope-enriched samples of proteins encoded by the genome of
Methanobacterium thermoautotrophicum have been distributed
to several NMR groups for parallel data collection and structure
analysis, resulting in some dozen three-dimensional structures
over the last year.
nature structural biology • structural genomics supplement • november 2000
Genet. 14, 65–74 (1992).
Moseley, H.N.B. & Montelione, G.T. Curr. Opin. Struct. Biol. 9, 635–642 (1999).
Wishart, D.S. & Sykes, B.D. J. Biomol. NMR 4, 171–180 (1994).
Cornilescu, G., Delaglio, F. & Bax, A. J. Biomol. NMR 13, 289–302 (1999).
Fetrow, J.S. & Skolnick, J. J. Mol. Biol. 281, 949–968 (1998).
Montelione, G.T. & Anderson, S. Nature Struct. Biol. 11–12 (1999).
Shuker, S.B., Hajduk, P.J., Meadows, R.P. & Fesik, S.W. Science 274, 1531–1534 (1996).
Palmer, A.G., Williams, J. & McDermott, A. J. Phys. Chem. 100, 13293–13310
(1996).
11. Wüthrich, K. Nature Struct. Biol. 5, 492–495 (1998).
12. Gardner, K.H. & Kay, L.E. Annu. Rev. Biophys. Biomol. Struct. 27, 357−406 (1998).
13. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. J. Mol. Biol. 247, 536–540
(1995).
14. Holm, L. & Sander, C. Science 273, 595–602 (1996).
15. Orengo, C.A., et al. Structure 5, 1093–1108 (1997).
16. Pervushin, K., Riek, R., Wider, G. & Wüthrich, K. Proc. Natl. Acad. Sci. USA 94,
12366–12371 (1997).
17. Wider, G. & Wüthrich, K. Curr. Opin. Struct. Biol. 9, 594–601 (1999).
18. Montelione, G.T., Rios, C.B., Swapna, G.V.T. & Zimmerman, D.E. In Biological
magnetic resonance (eds Krishna, R. & Berliner, L.) 81–130 (Klewer
Academic/Plenum Publishers, New York; 1999).
19. Szyperski, T., Wider, G., Bushweller, J.H. & Wüthrich, K. J. Am. Chem. Soc. 115,
9307–9308 (1993).
20. Szyperski, T., Banecki, B., Braun, D. & Glaser, R.W. J. Biomol. NMR 11, 387–405 (1998).
21. Tjandra, N. & Bax, A. Science 278, 1111–1114 (1997).
22. Hansen, M.R., Mueller, L. & Pardi, A. Nature Struct. Biol. 5, 1065–1074 (1998).
23. Prestegard, J.H. Nature Struct. Biol. 5, 517–522 (1998).
24. Cordier, F. & Grzesiek, S. J. Am. Chem. Soc 121, 1601–1602 (1999).
4.
5.
6.
7.
8.
9.
10.
985