WO2000020574A9 - Fusions of scaffold proteins with random peptide libraries - Google Patents

Fusions of scaffold proteins with random peptide libraries

Info

Publication number
WO2000020574A9
WO2000020574A9 PCT/US1999/023715 US9923715W WO0020574A9 WO 2000020574 A9 WO2000020574 A9 WO 2000020574A9 US 9923715 W US9923715 W US 9923715W WO 0020574 A9 WO0020574 A9 WO 0020574A9
Authority
WO
WIPO (PCT)
Prior art keywords
protein
library
peptide
cells
peptides
Prior art date
Application number
PCT/US1999/023715
Other languages
French (fr)
Other versions
WO2000020574A2 (en
WO2000020574A3 (en
Inventor
David Anderson
Jakob Maria Bogenberger
Beau Robert Peelle
Original Assignee
Rigel Pharmaceuticals Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rigel Pharmaceuticals Inc filed Critical Rigel Pharmaceuticals Inc
Priority to AU15164/00A priority Critical patent/AU768126B2/en
Priority to DE69936103T priority patent/DE69936103T2/en
Priority to EP99957466A priority patent/EP1119617B1/en
Priority to CA002345215A priority patent/CA2345215A1/en
Priority to JP2000574670A priority patent/JP2002526108A/en
Publication of WO2000020574A2 publication Critical patent/WO2000020574A2/en
Publication of WO2000020574A3 publication Critical patent/WO2000020574A3/en
Publication of WO2000020574A9 publication Critical patent/WO2000020574A9/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1044Preparation or screening of libraries displayed on scaffold proteins
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/43504Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates
    • C07K14/43595Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates from coelenteratae, e.g. medusae
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1086Preparation or screening of expression libraries, e.g. reporter assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6897Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids involving reporter genes operably linked to promoters
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/40Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
    • C07K2319/42Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a HA(hemagglutinin)-tag
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/60Fusion polypeptide containing spectroscopic/fluorescent detection, e.g. green fluorescent protein [GFP]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/70Fusion polypeptide containing domain for protein-protein interaction

Definitions

  • the invention relates to the use of scaffold proteins, particularly detectable genes such as green fluorescent protein (GFP), luciferase, ⁇ -lactamase, etc., in fusion constructs with random and defined peptides and peptide libraries, to increase the cellular expression levels, decrease the cellular catabolism, increase the conformational stability relative to linear peptides, and to increase the steady state concentrations of the random peptides and random peptide library members expressed in cells for the purpose of detecting the presence of the peptides and screening random peptide libraries.
  • N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions are all contemplated. Novel fusions utilizing self-binding peptides to create a conformationally stabilized fusion domain are also contemplated.
  • biomolecule screening for biologically and therapeutically relevant compounds is rapidly growing.
  • Relevant biomolecules that have been the focus of such screening include chemical libraries, nucleic acid libraries and peptide libraries, in search of molecules that either inhibit or augment the biological activity of identified target molecules.
  • peptide libraries the isolation of peptide inhibitors of targets and the identification of formal binding partners of targets has been a key focus.
  • one particular problem with peptide libraries is the difficulty assessing whether any particular peptide has been expressed, and at what level, prior to determining whether the peptide has a biological effect.
  • Green fluorescent protein is a 238 amino acid protein.
  • the crystal structure of the protein and of several point mutants has been solved (Ormo et al., Science 273, 1392-5, 1996; Yang et al., Nature Biotechnol. 14, 1246-51 , 1996).
  • the fluorophore consisting of a modified tripeptide, is buried inside a relatively rigid beta-can structure, where it is almost completely protected from solvent access.
  • the fluorescence of this protein is sensitive to a number of point mutations (Phillips, G.N., Curr. Opin. Struct. Biol. 7, 821-27, 1997).
  • the fluorescence appears to be a sensitive indication of the preservation of the native structure of the protein, since any disruption of the structure allowing solvent access to the fluoropho ⁇ c t ⁇ peptide will quench the fluorescence
  • compositions of fusion constructs of peptides with scaffold proteins comprising for example detectable proteins such as GFP, and methods of using such constructs in screening of peptide libraries
  • the present invention provides fusion proteins comprising a scaffold protein and a random peptide, fused to said scaffold protein, and nucleic acids which encode such fusion proteins
  • the present invention provides libraries of a) fusion proteins, b) fusion nucleic acids, c) expression vectors comprising the fusion nucleic acids, and d) host cells comprising the fusion nucleic acids
  • the present invention further comprises methods for screening for a bioactive peptide capable of confe ⁇ ng a particular phenotype
  • a library of fusion proteins comprises a scaffold protein, a random peptide fused to the N-terminus of the scaffold protein and a representation structure that will present the random peptide in a conformationally restricted form
  • each of the random peptide in the library is different
  • a library of fusion proteins comprises a scaffold protein, a random peptide fused to the C-terminus of the scaffold protein and a representation structure that will present the random peptide in a conformationally restricted form
  • each of the random peptide in the library is different
  • a library of fusion proteins comprises a scaffold protein, a random peptide inserted into the scaffold protein and at least one fusion partner
  • each of the random peptide in the library is different
  • the random peptide is inserted into a loop structure of said scaffold protein
  • the scaffold protein is a green fluorescent protein (GFP)
  • the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 130 to 135 of said GFP
  • the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 154 to 159 of said GFP
  • the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 172 to 175 of said GFP
  • the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 188 to 193 of said GFP
  • the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 208 to 216 of said GFP
  • the GFP is from a Remlla species
  • the scaffold protein is ⁇ -lactamase
  • the scaffold protein is DHFR
  • the scaffold protein is ⁇ -galactosidase
  • the scaffold protein is luciferase
  • a library of fusion proteins comprising a linker between the random peptide and the scaffold protein
  • a library of fusion proteins comprising a second linker between the other end of the random peptide and the scaffold protein
  • a library of fusion proteins comprising a -(gly) n - nke, wherein n ⁇ 2
  • a library of fusion proteins comprising a scaffold protein and a random peptide, wherein the random peptide replaces at least one ammo acid of said scaffold protein
  • the ammo acid of said scaffold protein which is replaced by the random peptide is located within a loop structure of said scaffold protein
  • the library of fusion proteins and the library of nucleic acids comprise at least 10 5 different members
  • the invention further provides fusion nucleic acids encoding the fusion proteins
  • the nucleic acid encoding the fusion protein comprises a nucleic acid encoding a random peptide, a nucleic acid encoding a scaffold protein and a nucleic acid encoding a fusion partner
  • the nucleic acid encoding the random peptide is inserted internally into the nucleic acid encoding the scaffold protein
  • expression vectors comprise one or more of the nucleic acids encoding the fusion proteins operably linked to regulatory sequences recognized by a host cell transformed with the nnucleic acids
  • the expression vectors are retroviral vectors
  • host cells comprising the vectors and the recombinant nucleic acids provided herein
  • the invention provides methods of screening for bioactive peptides conferring a particular phenotype
  • the methods comprise providing cells containing a fusion nucleic acid comprising nucleic acid encoding a fusion protein comprising a scaffold protein and a random peptide as above
  • the cells are subjected to conditions wherein the fusion protein is expressed
  • the cells are then assayed for the phenotype
  • Figure 1 depicts the crystal structure of GFP showing the temperature factors used to pick some of the loops for internal insertion of random peptides
  • Figures 2A, 2B, 2C, 2D, 2E and 2F depict the results of the examples
  • Figure 2A schematically depicts the location of the loops
  • Figures 2B-2F show the results and the mean fluorescence
  • Figure 3 depicts a helical wheel diagram of a parallel coiled coil
  • a or a' are at the N-termmus, and the residues in sequence are abcdefg or a'b'c'd'e'f g', which are the repeated to give individual helices abcdefg(abcdefg) n abcdefg or a'b'c'd'e'f g'(a'b'c'd'e'fg') n a'b'c'd'e'fg'
  • the core of the helix would be a, a', d and d', which would be combinations of hydrophobic strong helix forming residues such as ala/leu, or val/leu If residues e and e' are fixed as glu, and g and g' are fixed as lys, inter-helical salt bridges would further stabilize the coiled coil structure
  • Figure 4 depicts the ammo acid sequence of ⁇ lactamase TEM-1 from E coli Ammo acid residues 26-290 are shown
  • Figures 5A and 5B depict the crystal structure of E coli ⁇ -lactamase [PDB1 BTL, Jelsch et al , Proteins Struct , Funct Genet 16 364 (19930]
  • Figure 5A shows an end-on view of the two helices to which the random library may be fused
  • Figure 5B shows a side view of the two helices
  • the two helices which are to be extended with random residues in this library are shown in yellow (C-terminal helix, containing residues 271-290, see Figure 4) and white (N- terminal helix, containing residues 26-40, see Figure 4)
  • This protein has residues 1-25 removed
  • the same residues may be removed in the library scaffold as well
  • the active site ser 70 is shown in red Both helices are remote from the active site and therefore attachment of random residues to the N- and/or C-terminus should not affect the activity of the enzyme
  • Figure 6 depicts a model of ⁇ -lactamase colored by crystallographic temperature factor, with the most immobile regions shown in red and the more mobile regions in yellow
  • the loops discussed in Legrande et al [Nature Biotechnology 17 67-72 (1999)] are shown in blue, the active site ser 70 is shown in white, while glu 166 is shown in blue-gray
  • Figure 7 depicts the structure of C ⁇ -2, taken from the PDB file 2C ⁇ -2
  • the reactive site loop are represented by residues 54-63, the residues supporting the loop structure are 51 , 65, 67, 69 and 83 These residues could be randomized in different combinations Loop-insert libraries are inserted between residues 72-73 and/or 44-45
  • Figure 8 depicts the structure of kanamycin nucleotidyl transferase dimer 1 KNY DETAILED DESCRIPTION OF THE INVENTION
  • the present invention is directed to fusions of scaffold proteins, including variants, and random peptides that are fused in such a manner that the structure of the scaffold is not significantly perturbed and the peptide is metabolically conform
  • the scaffold proteins fall into two mam categories reporter proteins and structural proteins Reporter proteins are those that allow cells containing the reporter proteins to be distinguished from those that do not While determining expression of a particular peptide is difficult, numerous methods are known in the art to measure expression of larger proteins or the expression of genes encoding them Expression of a gene, e g , can be measured by measuring the level of the RNA produced However, this analysis, although direct, is difficult, usually not very sensitive and labor intensive A more advantageous approach is offered by measuring the expression of reporter genes Reporter gene expression is generally more easily monitored, since in many cases, the cellular phenotype is altered, either due to the presence of a detectable alterations, such as the presence of a fluorescent protein (which, as outlined herein, includes both the use of fusions to the detectable gene itself, or the use of detectable gene constructs that rely on the presence of the scaffold protein to be activated, e g when the scaffold is a transcription factor), by the addition of a substrate altered by the reporter protein (e g chromogenic
  • the peptides may have different structural biases, since different protein or other functional targets may require peptides of different specific structures to interact tightly with their surface or crevice binding sites.
  • different libraries each with a different structural bias, may be utilized to maximize the chances of having high affinity members for a variety of different targets
  • random peptide libraries with a helical bias or extended structure bias may be made through fusion to the N- terminus and/ or C-terminus of certain scaffold proteins
  • random peptide libraries with a coiled coil bias may be made via fusion to the N- and/or C- terminus of particular scaffold proteins
  • Extended conformations of the random library may be made using insertions between dimenzing scaffold proteins
  • Preferred embodiments utilize loop formations via insertion into loops in scaffold proteins, ammo acid residues within the respective loop structures may be replaced by the random peptide library or the random peptide library may be inserted in between two ammo acid residues located within
  • fusion protein or “fusion polypeptide” or grammatical equivalents herein is meant a protein composed of a plurality of protein components, that while typically unjoined in their native state, typically are joined by their respective ammo and carboxyl termini through a peptide linkage to form a single continuous polypeptide
  • Protein in this context includes proteins, polypeptides and peptides Plurality in this context means at least two, and preferred embodiments generally utilize two components It will be appreciated that the protein components can be joined directly or joined through a peptide linker/spacer as outlined below In addition, as outlined below, additional components such as fusion partners including presentation structures, targeting sequences, etc may be used
  • the present invention provides fusion proteins of scaffold proteins and random peptides
  • scaffold protein "scaffold polypeptide” , “scaffold” or grammatical equivalents thereof, herein is meant a protein to which am o acid sequences, such as random peptides, can be fused
  • am o acid sequences such as random peptides
  • the scaffold protein Upon fusion, the scaffold protein usually allows the display of the random peptides in a way that they are accessible to other molecules Scaffold proteins fall into several classes, including, reporter proteins (which includes detectable proteins, survival proteins and indirectly detectable proteins), and structural proteins
  • the scaffold protein is a reporter protein
  • reporter protein or grammatical equivalents herein is meant a protein that by its presence in or on a cell or when secreted in the media allow the cell to be distinguished from a cell that does not contain the reporter protein
  • the cell usually comprises a reporter gene that encodes the reporter protein
  • Reporter genes fall into several classes, as outlined above, including, but not limited to, detection genes, indirectly detectable genes, and survival genes
  • the scaffold protein is a detectable protein
  • a "detectable protein” or “detection protein” is a protein that can be used as a direct label, that is, the protein is detectable (and preferably, a cell comprising the detectable protein is detectable) without further manipulations or constructs
  • preferred embodiments of screening utilize cell sorting (for example via FACS) to detect scaffold (and thus peptide library) expression
  • the protein product of the reporter gene itself can serve to distinguish cells that are expressing the detectable gene
  • suitable detectable genes include those encoding autofluorescent proteins
  • GFP green fluorescent protein
  • BFP blue fluorescent protein
  • BFP Quantum Biotechnologies, Inc 1801 de Maisonneuve Blvd West, 8th Floor, Montreal (Quebec) Canada H3H 1J9, Stauber, R H Biotechniques 24(3) 462-471 (1998), Heim, R and Tsien, R Y Curr Biol 6 178-182 (1996))
  • EYFP Clontech Laboratories, Inc , 1020 East Meadow Circle, Palo Alto, CA 94303
  • the present invention provides fusions of green fluorescent protein (GFP) and random peptides
  • green fluorescent protein or “GFP” herein is meant a protein with at least 30% sequence identity to GFP and exhibits fluorescence at 490 to 600 nm
  • the wild-type GFP is 238 ammo acids in length, contains a modified t ⁇ peptide fluorophore buried inside a relatively rigid ⁇ -can structure which protects the fluorophore from the solvent, and thus solvent quenching See Prasher et al , Gene 111(2) 229-233 (1992), Cody et al , Biochem 32(5) 1212- 1218 (1993), Ormo et al, Science 273 1392-1395 (1996), and Yang et al , Nat Biotech 14 1246-1251 (1996), all of which are hereby incorporated by reference in their entirety)
  • Included within the definition of GFP are derivatives of GFP, including ammo acid substitutions, insertions and deletions See for example WO 98
  • the GFP proteins are derivative or variant GFP proteins That is, as outlined more fully below, the derivative GFP will contain at least one ammo acid substitution, deletion or insertion, with ammo acid substitutions being particularly preferred The am o acid substitution, insertion or deletion may occur at any residue within the GFP protein
  • These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA encoding the GFP protein, using cassette or PCR mutagenesis or other techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture as outlined above
  • variant GFP protein fragments having up to about 100-150 residues may be prepared by in vitro synthesis using established techniques Ammo acid sequence variants are characterized by the predetermined nature of the variation, a feature that sets them apart from naturally occurring allelic or mterspecies variation of the GFP protein ammo acid sequence The variants typically exhibit the same qualitative biological activity as the naturally occurring analogue, although variants can also be selected
  • any of the scaffold proteins or the genes encoding them may be wild type or variants thereof These variants fall into one or more of three classes substitutional, insertional or deletional variants These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA encoding the scaffold protein, using cassette or PCR mutagenesis or other techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture as outlined herein
  • variant protein fragments having up to about 100-150 residues may be prepared by in vitro synthesis using established techniques
  • Ammo acid sequence variants are characterized by the predetermined nature of the va ⁇ ation, a feature that sets them apart from naturally occurring allelic or mterspecies variation of the scaffold protein ammo acid sequence
  • the variants typically exhibit the same qualitative biological activity as the naturally occurring analogue, although variants can also be selected which have modified characteristics as will be more fully outlined below
  • the mutation per se need not be predetermined
  • random mutagenesis may be conducted at the target codon or region and the expressed scaffold variants screened for the optimal combination of desired activity
  • Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known, for example, M13 primer mutagenesis and PCR mutagenesis Screening of the mutants is done using assays of scaffold protein activities
  • Ammo acid substitutions are typically of single residues, insertions usually will be on the order of from about 1 to 20 ammo acids, although considerably larger insertions may be tolerated
  • Deletions range from about 1 to about 20 residues, although in some cases deletions may be much larger
  • Substitutions, deletions, insertions or any combination thereof may be used to arrive at a final derivative Generally these changes are done on a few ammo acids to minimize the alteration of the molecule However, larger changes may be tolerated in certain circumstances When small alterations in the characteristics of a scaffold protein, such as GFP, are desired, substitutions are generally made in accordance with the following chart
  • substitutions are less conservative than those shown in Chart I
  • substitutions may be made which more significantly affect the structure of the polypeptide backbone in the area of the alteration, for example the alpha-helical or beta-sheet structure, the charge or hydrophobicity of the molecule at the target site, or the bulk of the side chain
  • substitutions which in general are expected to produce the greatest changes in the polypeptide's properties are those in which (a) a hydrophi c residue, e g seryl or threonyl, is substituted for (or by) a hydrophobic residue, e g leucyl, isoleucyl, phenylalanyl, valyl or alanyl, (b) a cysteme or proline is substituted for (or by) any other residue, (c) a residue having an electropositive side chain, e g lysyl, argmyl, or histidyl, is substituted for (a) a hydrophi c residue, e g
  • scaffold proteins can be made that are longer than the wild-type, for example, by the addition of epitope or purification tags, the addition of other fusion sequences, etc , as is more fully outlined below
  • the scaffold protein is a variant GFP that has low or no fluorescence, but is expressed in mammalian cells at a concentration of at least about 10 nM, preferably at a concentration of at least about 100 nM, more preferably at a concentration of at least about 1 ⁇ M, even more preferably at a concentration of at least about 10 ⁇ M and most preferred at a concentration of at least about 100 ⁇ M
  • a random peptide is fused to a scaffold protein to form a fusion polypeptide
  • fused or “operably linked” herein is meant that the random peptide, as defined below, and the scaffold protein, as exemplified by GFP herein, are linked together, in such a manner as to minimize the disruption to the stability of the scaffold structure (i e it can retain biological activity)
  • the scaffold preferably retains its ability to fluoresce, or maintains a Tm of at least 42°C
  • the fusion polypeptide (or fusion polynucleotide encoding the fusion polypeptide) can comprise further components as well, including multiple peptides at multiple loops, fusion partners, etc
  • the fusion polypeptide preferably includes additional components, including, but not limited to, fusion partners and linkers
  • the random peptide is fused to the N-terminus of the GFP
  • the fusion can be direct, i e with no additional residues between the C-terminus of the peptide and the N-terminus of the GFP, or indirect, that is, intervening ammo acids are used, such as one or more fusion partners, including a linker
  • a presentation structure is used, to confer some conformational stability to the peptide
  • Particularly preferred embodiments include the use of dime ⁇ zation sequences
  • N-termmal residues of the GFP are deleted, i e one or more ammo acids of the GFP can be deleted and replaced with the peptide
  • deletions of more than 7 am o acids may render the GFP less fluorescent, and thus larger deletions are generally not preferred
  • the fusion is directly to the first ammo acid of the GFP
  • the random peptide is fused to the C-terminus of the GFP
  • the fusion can be direct or indirect, and C-terminal residues may be deleted
  • peptides and fusion partners are added to both the N- and the C- termmus of the GFP
  • the N- and C-terminus of GFP are on the same "face" of the protein, in spatial proximity (within 18 A)
  • dime ⁇ zation sequences can allow a noncovalently cyc zed protein, by attaching a first dime ⁇ zation sequence to either the N- or C-terminus of GFP, and adding a random peptide and a second dime ⁇ zation sequence to the other terminus, a large compact structure can be formed
  • the random peptide is fused to an internal position of the GFP, that is, the peptide is inserted at an internal position of the GFP While the peptide can be inserted at virtually any position, preferred positions include insertion at the very tips of "loops" on the surface of the GFP, to minimize disruption of the GFP beta-can protein structure In a preferred embodiment, loops are selected as having the highest termperature factors in the crystal structure as outlined in the Examples
  • the random peptide is inserted, without any deletion of GFP residues That is, the insertion point is between two ammo acids in the loop, adding the new ammo acids of the peptide and fusion partners, including linkers Generally, when linkers are used, the linkers are directly fused to the GFP, with additional fusion partners, if present, being fused to the linkers and the peptides
  • the peptide is inserted into the GFP, with one or more GFP residues being deleted, that is, the random peptide (and fusion partners, including linkers) replaces one or more residues
  • the linkers are attached directly to the GFP, thus it is linker residues which replace the GFP residues, again generally at the tip of the loop
  • residues are replaced, from one to five residues of GFP are deleted, with deletions of one, two, three, four and five ammo acids all possible Specific preferred deletions are outlined below
  • Preferred insertion points in loops include, but are not limited to, loop 1 (ammo acids 130-135), loop 2 (ammo acids 154-159), loop 3 (ammo acids 172-175), loop 4 (ammo acids 188-193), and loop 5 (ammo acids 208-216)
  • Particularly preferred embodiments include insertion of peptides and associated structures into loop 1 , ammo acids 130-135 In a preferred embodiment, one or more of the loop ammo acids are deleted, with the deletion of asp133 being preferred
  • peptides are inserted into loop 2, ammo acids 154-159
  • one or more of the loop ammo acids are deleted, with the deletion of both Iys156 and gln157 being preferred
  • peptides are inserted into loop 3, ammo acids 172-175
  • one or more of the loop ammo acids are deleted, with the deletion of asp173 being preferred
  • peptides are inserted into loop 4, ammo acids 188-193
  • one or more of the loop am o acids are deleted, with the simultaneous deletion of glyl 89, asp190, gly 191 , and pro192 being preferred
  • peptides are inserted into loop 5, ammo acids 208-216
  • one or more of the loop ammo acids are deleted, with the simultaneous deletion of asn212, glu213 and Iys214 being preferred
  • peptides can be inserted into more than one loop of the scaffold at a time
  • adding peptides to both loops 2 and 4 of GFP can increase the complexity of the library but still allow presentation of these loops on the same face of the protein
  • fusion polypeptides comp ⁇ sing GFP and random peptides are provided.
  • a preferred embodiment provides GFP proteins with a multisite cloning site inserted into at least one loop outlined above
  • the scaffold may not be GFP
  • the scaffold is a Renilla GFP
  • the scaffold is not Aequorea GFP
  • the scaffold is not any GFP
  • the scaffold protein is an indirectly detectable protein
  • the reporter proteins cells that contain the indirectly detectable protein can be distinguished from those that do not, however, this is as a result of a secondary event
  • a preferred embodiment utilizes "enzymatically detectable" scaffolds that comprise enzymes that will act on chromogenic, and particularly fluorogenic, substrates, to generate fluorescence, such as luciferase, ⁇ -galactosidase, and ⁇ -lactamase
  • the indirectly detectable protein may require a recombinant construct in a cell that may be activated by the scaffold, for example, scaffold transc ⁇ ption factors or mducers that will bind to a promoter linked to an autofluorescent protein such that transcription of the autofluorescent protein occurs
  • the scaffold is ⁇ -lactamase B-lactamase is generally secreted into the pe ⁇ plasm of bacteria and provides resistance to a variety of penicillins and cephalosponns, including the antibiotic ampicillin
  • antibiotic selection of cells comprising a fusion protein of a ⁇ -lactamase scaffold with peptide library members allows a determination of library expression This allows examination of the effects on scaffold folding of different library insertion sites, fusion sites, or library biases by looking at the survival percentage after selection with a ⁇ -lactam antibiotic
  • eukaryotic ⁇ -lactamase libraries have the leader sequence removed to avoid their secretion from the cell Since ⁇ -lactamase is readily assayed using colo ⁇ met ⁇ c reagents [Marshall et al , Diagn Microbiol Infect Dis 22 353-5 (1995)] or fluoropho ⁇ c reagents inside a live mammalian cell [Zlokarnik et al , Science 279
  • ⁇ -lactamase herein includes ⁇ -lactamases produced by a variety of microorganisms, including TEM-type extended spectrum ⁇ -lactamases (such as from E coli, see below) and class A ⁇ - lactamases ⁇ -lactamases within the scope of this invention thus include, but are not limited to TEM-1 ⁇ -lactamase from E coli, ⁇ -lactamase from Pseudomonas aerugmosa, TEM-26B ⁇ - lactamase from Klebsiella oxytoca, class A ⁇ -lactamase from Capnocytophaga oc racea, TEM- 6 ⁇ -lactamase (EC 3.5.2.6) from E.
  • TEM-1 ⁇ -lactamase from E coli
  • ⁇ -lactamase from Pseudomonas aerugmosa
  • TEM-26B ⁇ - lactamase from Klebsiella oxytoca
  • ⁇ -lactamases with a high sequence homology to TEM-1 from E. coli, especially in the N-and C-terminal helices or in the 84-89 loop, are also preferred.
  • fusion proteins comprising a ⁇ -lactamase scaffold and peptides as outlined below are provided.
  • GFP and all the scaffold proteins outlined herein, N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions, either separately or in combination, are all contemplated.
  • ⁇ -lactamases which are known; e.g.: ⁇ -lactamase from Bacillus licheniformis (see Moews et al., Proteins 7(2): 156-71 (1990); Knox and Moews, J. Mol. Biol. 220(2):435-55 (1991 )); ⁇ -lactamase from Staphylococcus aureus (see Herzberg, J. Mol. Biol.
  • the ⁇ -lactamase libraries are made using ⁇ -lactamase inactivated by site-specific mutations.
  • ala164 would be replaced by arg, or glu166 replaced by ala (see Bouthers et al., Biochem. J. 330:1443-9 (1998)).
  • Active mutants of ⁇ -lactamase which are more stable than the wild type enzyme are also preferred as library scaffolds for loop-insert libraries. These mutants can have the advantage that their extra stability enhances the folding of library members with particularly destabilizing random library sequences. Examples of such mutants include E104K and E240K (Raquet et al., Proteins 23:63-72 (1995)).
  • the mutation M182T which is a global suppressor of missense mutations (Huang and Palzkill, Proc. Natl. Acad. Sci. U.S.A. 94:8801-6 (1997)) may also be included in the scaffold to suppress folding or stability defects resulting in some library members. Again, such reasoning may not only apply for ⁇ -lactamase, but for all other enzymes or proteins-disclosed herein.
  • a derivative of ⁇ -lactamase is used as a scaffold protein: N- terminus-BLA-C-terminus, comp ⁇ sing residues 26-290 of E. coli TEM-1 ⁇ -lactamase, or similar residues of Staphylococcus aureus or other ⁇ -lactamases (e.g., see Figures 5A, 5B, and 6).
  • the main site of insertion includes insertion of random amino acids (optionally with linkers and other fusion partners as outlined below) in relative mobile loops which are not close to the active site of the enzyme.
  • Figure 6 shows a model of ⁇ -lactamase depicting the most immobile and mobile regions.
  • a preferred loop for insertion of peptide libraries is the loop including I84-D85-A86-G87-Q88-E89 (termed “ ⁇ -lactamase loop 1" herein), which connects a helix at its N-terminus and an irregular region at its C-terminus.
  • This loop is different from the loops described by Legendre et al. (Nature Biotechnology 17:67-72 (1999)), who specifically selected loops near or affecting the active site to modulate enzyme activity. Here no attenuation of activity is intended or desired.
  • one or more loop residues may be replaced or alternatively the insert may be between two residues.
  • I84, D85 and E89 are fixed in the library since the side chains of each appear to interact with the rest of the ⁇ -lactamase structure, although this is not required.
  • Q88 may also optionally be fixed.
  • A86 and G87 may be are replaced, for example with random residues or with random residues flanked by linker residues.
  • linker amino acids on one or both sides may comprise 2, 3, 4, or more glycines, in order to provide a flexible region between the random library and the rest of the protein.
  • the linker may not need any glycines.
  • the presence of multiple glycines at least partly conformationally decouples the library from the rest of the protein, enhancing the chances that the library members fold and create active ⁇ -lactamase
  • random residues are inserted into alternate loop sites, again, linkers and other fusion partners may optionally be used Preferred embodiments utilize at least one glycine linker on either side of the random insert to allow a high percentage of ⁇ - lacta ase-ra idom to the relative immobriity of the backbone and some of the side chains of the loop
  • loop residues can be replaced or inserted into at positions at D254- G255-K256 (" ⁇ -lactamase loop 2"), again with optional linkers, preferably glycine residues, and other fusion partners In this loop, replacement of the three residues is preferred
  • loop residues can be replaced or inserted into at positions at A227- G228 (" ⁇ -lactamase loop 3"), again with optional linkers, preferably glycine residues, and other fusion partners In this loop, replacement of the two residues is preferred In some backbones, such as the Bacillus lichemfirmis (PDB structure 4BLM) protein, K255-G256-D257 is the loop of choice
  • loop residues can be replaced or inserted into at positions at N52- S53 (" ⁇ -lactamase loop 4"), again with optional linkers, preferably glycine residues, and other fusion partners In this loop, replacement of the two residues is preferred In some backbones, such as the Bacillus lichemfirmis (PDB structure 4BLM) protein, G52-T53-N54 is the loop of choice
  • the random peptide library is fused to the N- or C-terminus of ⁇ - lactamase This optimizes the chances that the scaffold folds well and independently of the sequence of the random peptide library
  • a library with an alpha-helical bias is used e g , for binding to proteins with binding sites preferring alpha helices, such as ieucine zipper proteins, coiled coils, or helical bundles These helices also act by displacing an existing helix in one of the above structures
  • the random peptide sequences (chosen from all 20 natural L-amino acids) are fused to the end of a helix which is already nucleated, i e , which is stable within the native structure and has at least several turns This can be accomplished by fusion directly to the C-terminal or N-terminal residues of the selected ⁇ -lactamases, since both of these termini are extended alpha helices
  • i e ieucine zipper proteins
  • mutants of ⁇ -lactamase are used which include substitutions of P27 in the TEM-1 truncated sequence with any helix-forming ammo acid, such as M, K, E, A, F, L, R, D, Q, I, or V
  • the random peptide library is fused to the C-terminus of ⁇ -lactamase and the resulting library has the following schematic structure "N-terminus-BLA-C-terminus- spacer residues-random peptide l ⁇ brary-(+/- optional C-cap residues)"
  • the random peptide library is fused to the N-terminus of ⁇ - lactamase and the resulting library has the following schematic structure "(+/- optional N-cap res ⁇ dues)-random peptide library-spacer residues-N-terminus-BLA-C-terminus"
  • the first residue would be the strong helix former M
  • spacer residues may be inserted between the ⁇ -lactamase structure and the random peptide library
  • these spacers may all be strong helix formers, such as M, K, E, A, F, L, R, D, Q, I, or V, in any combination, or in particular sequences such that L and E are 3-4 residues apart, allowing a side chain salt bridge to further stabilize the helix
  • the spacers may be charged, so that it would be less likely to be inserted into the interior of the ⁇ -lactamase structure
  • the spacer sequence may be KLEALEG, which would bias the sequence to form an alpha helix and interact in a parallel coiled-coil fashion with a helix in a target protein [Monera et al , j Biol Chem 268 19218 (1993)]
  • the spacer sequence for ⁇ -lactamase C-terminal helix biased libraries may be EEAAKA Combined with C-terminal wild type sequence -KHW 290 from E coli TEM-1 ⁇ -lactamase, this would give -KHW 2 g 0 E 291 E 292 A 293 A 294 K 295 A 296 E 291 would be in a position to form an i, ⁇ +4 salt bridge with K 295 , and E 292 could form a similar salt bridge with K 288 This would stabilize an alpha helix A 293 A 2g4 K 295 A 296 would form an AXXA motif allowing insertion of a Sfi-I restriction site in the DNA encoding this region, thereby allowing the cloning of random peptide libraries onto the C-terminus of ⁇ -lactamase
  • the spacer sequence includes the sequence A 292 E 293 K 2 g 4 A 295 K 2 g 6 A 297 E 2
  • the scaffold protein is luciferase
  • the bioluminescent reaction catalyzed by luciferase requires lucife ⁇ n, ATP, magnesium, and molecular 0 2 Mixing these components results in a rapidly decaying flash of light which is detected, e g by using a luminometer
  • the reporter protein is firefly luciferase [de Wet et al , Mol Cell Biol 7 725-737 (1987), Yang and Thomason, supra, Bronstein et al , supra) Firefly luciferase can also be detected in live cells when soluble luciferase substrates, capable of crossing the plasma membrane are employed (Bronstein et al , supra)
  • the use of firefly luciferase is especially preferred because there is only minimal endogenous activity in mammalian cells Luciferases have been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers E08320, E05448, D25416 S61961 , U51019, M15077, L39928, L39929, AF085332, U89490, U31240, M10961 , M65067, M62917, M25666, M63501 , M55977 U03687,
  • the scaffold protein is Renilla reniformis luciferase Renilla luciferase, DNA encoding Renilla luciferase, and use of the Renilla reniformis DNA to produce recombinant luciferase, as well as DNA encoding luciferase from other coelenterates, are well known in the art and are available [see, e g , SEQ ID No 1 , U S patent Nos 5,418, 155 and 5,292,658, see also, Prasher et al , Biochem Biophys Res Commun 126 1259-1268 (1985), Cormier, "Renilla and Aequorea bioluminescence” in Bioluminescence and Chemiluminescence, pp 225-233 (1981 ), Charbonneau et al , J Biol Chem 254 769-780 (1979), Ward et al , J Biol Chem 254 781-788 (1979), Lorenz
  • fusion proteins comprising luciferase and peptide libraries may be made, at the N- terminus, the C-terminus, both, or one or more internal fusions can be utilized, in combination or alone
  • the site of fusion may be determined based on the structures of firefly luciferase [Franks et al , Biophys J 75(5) 2205-11 (1998), Conti et al , Structure 4(3) 287-98 (1996)] or bacterial luciferase [Fisher et al , Biochemistry 34(20) 6581-6 (1995), Fisher et al , J Biol Chem
  • the scaffold protein is ⁇ -galactosidase (Alam and Cook, supra, Bronstein et al , supra)
  • ⁇ -galactosidase encoded by the lacZ gene from E coli
  • lacZ genes were have been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers J01636, AB025433, AF073995, U62625, and M57579)
  • the enzyme catalyzes the hydrolysis of several ⁇ - galactosides (e g , Young et al , supra) and is employed in colo ⁇ met ⁇ c assays, e g , using o- nitrophenyl- ⁇ -D-galactopyranoside (ONPG), in chemiluminescent assays based on chemiluminescence
  • chemiluminescent 1 ,2-d ⁇ oxetane substrates has greatly improved the sensitivity of detection of enzyme activity
  • the assay is 50, 000-fold more sensitive than a colo ⁇ met ⁇ c assay
  • the assay may also be enhanced employing assay conditions that minimize endogenous enzyme activities contributed by eukaryotic ⁇ -galactosides (Young et al , supra)
  • ⁇ -galactosidase is used in in vivo assays
  • In vivo assays can be performed in prokaryotic and eukaryotic cells, in tissue sections and intact embryos and includes staining with the precipitating substrate X-gal (Alam and Cook, supra)
  • bioluminescence assays in live cells are employed using fluorescein di- ⁇ -D- galactopyranoside (FDG, Bronstein et al , supra)
  • FDG fluorescein di- ⁇ -D- galactopyranoside
  • the site of fusion may be determined based on the structure of ⁇ -galactosidase, which has been determined [e g , see Pearl et al , J Mol Biol 229(2) 561-3 (1993), Jacobson et al , Nature 369(6483) 761-6 (1994), and Jacobson and Matthews, J Mol Biol 223(4) 1177-82 (1992)] Insertions of am o acids into loop structures within ⁇ -galactosidase are especially preferred.
  • the reporter protein is chloramphenicol acetyltransferase [CAT, Gorman et al , Mol Cell Biol , 2 1044-1051 (1982)] This enzyme catalyzes the transfer of acetyl groups from acetyl-coenzyme A to chloramphenicol Using CAT as a reporter has
  • the indirectly detectable protein is a DNA-bmdmg protein which can bind to a DNA binding site and activate transcription of an operably linked reporter gene
  • the reporter gene can be any of the detectable genes, such as green fluorescent protein, or any of the survival genes, outlined herein
  • the DNA binding s ⁇ te(s) to which the DNA binding protein is binding is (are) placed proximal to a basal promoter that contains sequences required for recognition by the basic transcription machinery (e g , RNA polymerase II)
  • the promoter controls expression of a reporter gene Following introduction of this chimeric reporter construct into an appropriate cell, an increase of the reporter gene product provides an indication that the DNA binding protein bound to its DNA binding site and activated transcription
  • no reporter gene product is made Alternatively, a low basal level of reporter gene product may be tolerated in the case when a strong increase in reporter gene product is observed upon the addition of the DNA binding protein, or the DNA binding protein encoding gene It is well known in the art
  • the DNA-bmdmg protein is a cell type specific DNA binding protein which can bind to a nucleic acid binding site within a promoter region to which endogenous proteins do not bind at all or bind very weakly
  • These cell type specific DNA-bmdmg proteins comprise transc ⁇ ptional activators, such as Oct-2 [Mueller et al , Nature 336(6199) 544-51 (1988)] which e g , is expressed in lymphoid cells and not in fibroblast cells Expression of this DNA binding protein in HeLa cells, which usually do not express this protein, is sufficient for a strong transc ⁇ ptional activation of B-cell specific promoters, comprising a DNA binding site for Oct-2 (Mueller et al , supra)
  • the indirectly detectable protein is a DNA-binding/transc ⁇ ption activator fusion protein which can bind to a DNA binding site and activate transcription of an operably linked reporter gene —
  • B ⁇ efly-transc ⁇ ption can be-activated through the use of two functional domains of a transcription activator protein, a domain or sequence of am o acids that recognizes and binds to a nucleic acid sequence, i e a nucleic acid binding domain, and a domain or sequence of ammo acids that will activate transcription when brought into proximity to the target sequence
  • the transc ⁇ ptional activation domain is thought to function by contacting other proteins required in transcription, essentially bringing in the machinery of transcription It must be localized at the target gene by the nucleic acid binding domain, which putatively functions by positioning the transc ⁇ ptional activation domain at the transc ⁇ ptional complex of the target gene
  • the DNA binding domain and the transc ⁇ ptional activator domain can be either from the same transc ⁇ ptional activator protein, or can be from different proteins (see McKnight et al , Proc Natl Acad Sci USA 89 7061 (1987), Ghosh et al , J Mol Biol 234(3) 610-619 (1993), and Curran et al , 55 395 (1988))
  • transc ⁇ ptional activator proteins comprising an activation domain and a DNA binding domain are known in the art
  • the DNA-binding/transc ⁇ ption activator fusion protein is a tetracycl e repressor protein (TetR)-VP16 fusion protein
  • TetR tetracycl e repressor protein
  • This bipartite fusion protein consists of a DNA binding domain (TetR) and a transcription activation domain (VP16) TetR binds with high specificity to the tetracyclme operator sequence, (tetO)
  • the VP16 domain is capable of activating gene expression of a gene of interest, provided that it is recruited to a functional promoter
  • a tetracyclme repressor protein (TetR)-VP16 fusion protein a suitable eukaryotic expression system which can be tightly controlled by the addition or omission of tetracyclme or doxycyclme has been described (Gossen and Bujard, Proc Natl Acad Sci U S A 89 5547-5551 , Gossen et al
  • the site of fusion may be determined based on the structure of DNA- binding/transc ⁇ ption activator fusion protein, which are determined [e g , TetR, see Orth et al , J Mol Biol 285(2) 455-61 (1999), Orth et al , J Mol Biol 279(2) 439-47 (1998), Hin ⁇ chs et al , Science 264(5157) 418-20 (1994), and Kisker et al , J Mol Biol 247(2) 260-80 (1995)] Insertions of ammo acids into loop structures within DNA-binding/transcnption activator fusion proteins are especially preferred
  • am o acids are inserted at or close to the fusion site of the DNA binding domain and the transcription activator domain
  • a dual scaffold protein is used to present the random peptide library
  • the random peptide library is such flanked by a scaffold protein representing the DNA binding domain and a scaffold protein representing the transcription activation domain
  • the random peptide library thus is inserted between the C-terminus of the DNA binding domain and the N-terminus of the transcription activation domain or vice versa
  • Linker sequences separating the random peptides from the DNA binding domain and transcription activation domain are optional
  • DNA-binding/transcnption activator fusion proteins in protein protein interaction screening protocols (e g see Fields et al , Nature 340 245 (1989), Vasavada et al , Proc Natl Acad Sci U S A 88 10686 (1991 ), Fearon et al , Proc Natl Acad Sci U S A 89 7958 (1992),
  • the invention provides a composition, comp ⁇ sing (i) a nucleic acid binding site, to which a DNA-binding/transcnption activator and/or a DNA binding domain/transcription activator fusion protein can bind, said nucleic acid binding site being operably linked to a reporter gene, (n) a reporter gene, and (in) a DNA-binding/transcnption activator and/or a DNA binding domain/transcription activator fusion protein which may be encoded by a nucleic acid
  • the scaffold protein is a survival protein
  • survival protein selection protein or grammatical equivalents herein is meant a protein without which the cell cannot survive, such as drug resistance genes
  • the cell usually does not naturally contain an active form of the survival protein which is used as a scaffold protein
  • the cell usually comprises a survival gene that encodes the survival protein
  • the expression of a survival protein is usually not quantified in terms of protein activity, but rather recognized by conferring a characteristic phenotype onto a cell which comprises the respective survival gene or selection gene
  • survival genes may provide resistance to a selection agent (i e , an antibiotic) to preferentially select only those cells which contain and express the respective survival gene
  • a selection agent i e , an antibiotic
  • Suitable selection agents for the selection of eukaryotic cells include, but are not limited to, blasti ⁇ din [Izumi et al , Exp Cell Res , 197(2) 229-33 (1991 ), Kimura et al , Biochim Biophys Acta 1219(3) 653-9 (1994), Kimura et al , Mol Gen Genet 242(2) 121-9 (1994)], histid ol D [Hartman and Mulligan, Proc Natl Acad Sci U S A , 85(21 ) 8047-51 (1988)], hygromycm [G ⁇ tz and Davies, Gene
  • Suitable survival genes include, but are not limited to thymidine kmase [TK, Wigler et al , Cell 11 233 (1977)], adenine phospho ⁇ bosyltransferase [APRT, Lowry et al , Cell 22 817 (1980), Murray et al , Gene 31 233 (1984), Stambrook et al , Som Cell Mol Genet 4 359 (1982)], hypoxanthine-guanme phospho ⁇ bosyltransferase [HGPRT, Jolly et al , Proc Natl Acad Sci U S A 80 477 (1983)], dihydrofolate reductase [DHFR, Subramani et al , Mol Cell Biol 1 854 (1985), Kaufman and Sharp, J Mol Biol 159 601 (1982), Simonsen and Lev
  • the survival protein is thymidine k ase [TK, Wigler et al , Cell 11 233 (1977)]
  • TK is encoded by the HSV or vaccinia virus tk genes When transferred into a TK cell, these genes confer resistance to HAT medium, a medium supplemented with hypoxanthine, aminopte ⁇ n and thymidine TKs have been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers M29943, M29942, M29941 and K0261 1 )
  • the survival protein is adenine phosphonbosyltransferase [APRT, Lowry et al , Cell 22 817 (1980), Murray et al , Gene 31 233 (1984), Stambrook et al , Som Cell Mol Genet 4 359 (1982)]
  • APRT adenine phosphonbosyltransferase
  • the gene encoding APRT confers resistance to complete medium, supplemented with azase ⁇ ne, adenine and alanosine APRT genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers L25411 , AF060886, X58640, U16781 , U22442, U28961 , L06280, M16446, L04970, and M11310 )
  • the site of fusion may be determined based on the structures of adenine phosphonbosyltransferase from Leishmania donovani, which has been determined [Phillips et al , EMBO J 18(13) 3533-45 (1999)] Insertions of ammo acids into loop structures within adenine phosphonbosyltransferase are especially preferred
  • the survival protein is hypoxanthine-guanme phosphonbosyltransferase [HGPRT, Jolly et al , Proc Natl Acad Sci U S A 80 477 (1983)] When transferred into a HGPRT , APRT cells, the gene encoding HGPRT confers resistance to
  • the site of fusion may be determined based on the structures of human hypoxanthine-guanme phosphonbosyltransferase, which has been determined [Shi et al , Nat Struct Biol 6(6) 588-93), Eads et al , Cell 78(2) 325-34 (1994)] Insertions of ammo acids into loop structures within hypoxanthine-guanme phosphonbosyltransferase are especially preferred
  • the survival protein is dihydrofolate reductase (DHFR), which is encoded by the dhfr gene [Subramani et al , Mol Cell Biol 1 854 (1985), Kaufman and Sharp, J Mol Biol 159 601 (1982), Simonsen and Levmson, Proc Natl Acad Sci U S A 80 2495 (1983)]
  • DHFR dihydrofolate reductase
  • the gene encoding DHFR confers resistance to medium containing methotrexate DHFR genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers NM_000791 , J01609, J00140, L26316, and M37124)
  • the survival protein is aspartate transcarbamylase Aspartate transcarbamylase is encoded by pyrB [Ruiz and Wahl, Mol Cell Biol 6 3050 (1986)] When transferred to CHO D20 (UrdA mutant, deficient in the first three enzymatic activities of de novo undine biosynthesis carbamyl phosphate synthetase, aspartate transcarbamylase, and dihydroorotase ) the gene encoding this protein confers resistance to Ham F-12 medium (minus undine) Aspartate transcarbamylase genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers U61765, M38561 , J04711 , M60508, and M13128)
  • the survival protein is ornithine decarboxylase
  • Ornithme decarboxylase is encoded by the ode gene [Chiang and McConlogue, Mol Cell Biol 8 764 (1988)]
  • ODC CHO C55 7 cells
  • the gen encoding this protein confers resistance medium lacking putrescme ODC genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers U36394, AF016891 , AF012551 , U03059, J04792, and M34158)
  • the survival protein is aminoglycoside phosphotransferase, which is encoded by the aph gene [Southern and Berg, Mol Appl Gen 1 327 (1982), Davies and
  • the survival protein is hygromycm-B-phosphotransferase, which is encoded by the hph gene [G ⁇ tz and Davies, supra, Sugden et al , Mol Cell Biol 5 410 (1985), Palmer et al , Proc Natl Acad Sci U S A 84 1055 (1987)]
  • this dominant selectable gene confers resistance to hygromycm-B
  • the hygromy ⁇ n-B- phosphotransferase encoding gene has been cloned and used widely as a selectable marker on various vectors (e g , see GenBank accession numbers AF025747, L76273, and K01193)
  • the survival protein is xanthine-guanme phosphonbosyltransferase, which is encoded by the gpt gene [Mulligan and Berg, Proc Natl Acad Sci U S A 78 2072 (1981 )] When transferred into almost any cell, this dominant selectable gene confers resistance to XMAT medium, comprising xanthme, hypoxanthine, thymidine, aminopte ⁇ n, mycophenolic acid and L-glutamme
  • the xanthine-guanme phosphonbosyltransferase encoding gene has been cloned and the nucleotide sequences are available (e g , see GenBank accession numbers U28239 and M15035)
  • the survival protein is tryptophan synthetase, which is encoded by the trpB gene [Hartman and Mulligan, Proc Natl Acad Sci U S A 85 8047 (1988)] When transferred into almost any cell, this dominant selectable gene confers resistance to tryptophan-minus medium Tryptophan synthetase encoding genes have been cloned and the nucleotide sequences are available (e g , see GenBank accession numbers V00372, AF173835, V00365, M15826 and M32108)
  • the survival protein is histidmol dehydrogenase, which is encoded by the hisD gene [Hartman and Mulligan, Proc Natl Acad Sci U S A 85 8047 (1988)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising histidmol Histidmol dehydrogenase encoding genes have been cloned and the nucleotide sequences are available (e g , see GenBank accession numbers AB013080, U82227, J01804, and M60466)
  • the survival protein is the multiple drug resistance biochemical marker, which is encoded by the mdr1 gene [Kane et al , Mol Cell Biol 8 3316 (1988), Choi et al , Cell 53 519 (1988)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising colchicme MDR1 genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers U62928, U62930 AJ227752, U62931 , AF016535 and J03398)
  • the survival protein is blasticidin S deaminase, which is encoded by the bsr gene [Izumi et al , Exp Cell Res 197(2) 229-33 (1991)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising the antibiotic blasticidin S Blasticidin S deaminase encoding genes have been cloned They are used widely as a selectable marker on various vectors and the nucleotide sequences are available (e g , see GenBank accession numbers D83710, U75992, and U75991 )
  • the survival protein is bleomycin hydrolase, which is encoded by the ble gene [Mulsant et al , supra] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising bleomycin, phleomycm or zeocin Bleomycin hydrolase encoding genes have been cloned They are used widely as a selectable marker on various vectors and the nucleotide sequences are available (e g , see GenBank accession numbers L26954, L37442, and L36849)
  • It is an object of the instant application to fuse ammo acid sequences to bleomycin hydrolase N-terminal, C-termmal, dual N- and C-termmal and one or more internal fusions are all contemplated
  • the site of fusion may be determined based on the structure of yeast (Gal6) and human bleomycin hydrolase, which have been determined [Joshua-Tor et al , Science 269(5226) 945-50 (1995), O'Farrell et al , Structure Fold Des 7(6) 619-27 (1999)] Insertions of ammo acids into loop structures within bleomycin hydrolase are especially preferred
  • the survival protein is puromycin-N-acetyl-transferase, which is encoded by the pac gene [Lacalle et al , Gene 79(2) 375-80 (1989)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comp ⁇ sing puromycm
  • pac gene the pac gene
  • a puromycin-N-acetyltransferase encoding gene has been cloned It is used widely as a selectable marker on various vectors and the nucleotide sequences are available (e g , see GenBank accession numbers Z75185 and M25346)
  • the scaffold protein is a structural protein
  • the scaffold protein is generally not directly detectable, but is generally a small, stable, non-disulfide bond-containing protein
  • the presentation scaffold significantly constrains the presented random peptides
  • the peptides will be conformationally pre-constramed, will have a diminished number of low energy conformers, and will thus lose less entropy when bound to a target binding partner (a macromolecule such as a protein, DNA, or other functional molecule present within or on the outside of a cell)
  • a target binding partner a macromolecule such as a protein, DNA, or other functional molecule present within or on the outside of a cell
  • a target binding partner a macromolecule such as a protein, DNA, or other functional molecule present within or on the outside of a cell
  • Such constrained peptides may thus bind more tightly to a target molecule than unconstrained peptides
  • constrained peptides may be less subject to intracellular catabolism than unconstrained peptides, especially by proteases
  • Different scaffold may impart different biases to peptides depending on the insertion site of the random peptide libraries
  • the scaffold comprises protease inhibitors belonging to the trypsm inhibitor I family, such as barley chymotrypsin inhibitor 2 (C ⁇ -2) and eglm C Both of these proteins are small (83 and 64 residues, respectively), stable, and lack disulfide bonds, thus allowing their expression and folding in the cytoplasm of a mammalian cell without the complications of disulfide bond formation Disulfide bond formation is difficult in the cytoplasm due to high levels of reduced glutathione, and the presence of thioredoxin reductase
  • the folding mechanism of C ⁇ -2 has been studied in detail, implying a two-state process with the rate limiting step for two slow phases being proline isome ⁇ zation [Jackson and Fersht, Biochemistry 30 * 113428-35 (19 +)] — rt+ras-bee ⁇ rstrowrrtc-ref ⁇ ld-wherr ⁇ pieces, composed of residues 20-59 and 60-83, with the
  • C ⁇ -2 and the similar protease inhibitor eglm-C are used as scaffolds for a small protein-embedded random peptide library Since different intracellular targets demand bound peptides of different conformations, it is important to construct peptide libraries with different biases, as already outlined above
  • the crystal structure of C ⁇ -2 [see Figure 7 and McPhalen and James, Biochemistry 26 261-269 (1987)] allows the construction of a different random peptide library with an additional bias a broad-based 20A constraint, with both ends fixed at this distance by the C ⁇ -2 scaffold
  • the insertion site replaces the C ⁇ -2 inhibitor loop residues G54-R62 with 9 or more random ammo acids Inserting 9 random residues to replace the 9 existing residues in G54-R62 will bias the library to a broad-based semicircular loop, roughly 20A at its base Inserting more residues will bias the library to more flexible peptides Inserting correspondingly more residues in a slightly larger insertion site in this inhibitor loop, e g , inserting 13 residues between 52 and 64, will create a library with a bias towards the top ca 2/3 of a large ca 18mer cyclic peptide A library replacing all -19 residues of this nearly circular loop (residues 49-67) will in effect mimic a large 19 residue cycle peptide and thus would be different than any of the above libraries
  • the above libraries substituting G54-R62 are made more flexible by substituting random residues for native residues at the base of this inhibitor loop which appear to support the top of the loop Without this support, the top residues may be significantly more flexible
  • the supporting residues appear to include F69, L51 , R67, and R65 G83 could also be randomized since it is near the side of the loop in the crystal structure
  • the random peptide library is inserted between K72-L73 of Ci- 2
  • random peptide libraries between residues K72-L73 or random peptide libraries replacing residues P44-E45 may be used as selectable libraries, allowing the elimination of cells not expressing a properly folded and bioactive library member, or of unmfected cells
  • analogous library insertion sites may be used with eglm-C or other potato trypsm inhibitor I family members lacking disulfide bonds, which have similar structures to that of C ⁇ -2
  • the fusion protein comprising the scaffold protein and the random peptide library is bioactive, e g , has enzymatic activity
  • the fusion protein need not display such a bioactive function
  • a preferred property of the fusion protein is, however, to present the random peptide sequences to potential binding partners
  • multiple scaffolds are used for the intracellular (and extracellular) presentation of peptide libraries with a bias to extended peptides
  • Extended conformations are important for molecular recognition in a number of peptide-protem complexes [Si gardi and Drake, Biopolymers 37(4)281-92 (1995)] including peptide substrate (and inhibitor) binding to a large variety of proteases, kinases and phosphatases, peptide binding to MHC class I and II proteins, peptide binding to chaperones, peptide binding to DNA, and B cell epitopes
  • Additional examples of extended bound peptides include a troponm inhibitory peptide binding to troponm C [Hernanderz et al , Biochemistry 38 6911-17 (1999)] and a p21 -derived peptide binding to PCNA [Gulbis et al , Cell 87 297-306 (1996)] Linear peptides are a unique
  • the intracellular catabohsm of peptides is one limiting factor which may prevent significant steady state levels of small peptides
  • Proteases such as aminopeptidases [Lee and Goldberg, Biopolymers 37 281-92 (1992)] as well as carboxypeptidases and the proteasome, as outlined further below, may be involved in the degradation of intracellular peptides
  • linear or extended peptides may be readily degraded after their intracellular expression
  • the library is constructed allowing the random library members, consisting of 18-30 random residues, to have linear/extended configurations without both free N-termini (allowing aminopeptidase-mediated degradation) and free C-termmi (allowing carboxypeptidase-mediated degradation)
  • the scaffold present the random peptides with a linear/extended structural bias (but not as an absolute requirement) and allow significant peptide flexibility while somewhat limiting intracellular catabohsm Fusion of proteins to both ends of the library should protect the random sequences from ammo- and carboxypeptidases
  • a dual fusion scaffold fusion protein of the following form is constructed N-terminus-protein 1 -linker 1 -random peptide library-linker 2-prote ⁇ n 2-C- termmus
  • protein 1 and protein 2 are the same protein Alternatively, protein 1 and protein 2 are different proteins
  • linker 1 and linker 2 are the same linker Alternatively, linker 1 and linker 2 are different linkers
  • protein 1 and protein 2 are selected from a group of proteins which have low affinity for each other
  • protein 1 and protein 2 are selected from a group of proteins that are well-expressed in mammalian cells or in the cell in which the random peptide library is tested Included in this embodiment are proteins with a long intracellular half-life, such as CAT and others known in the art
  • protein 2 is a selection protein, such as DHFR or any other,
  • protein 1 is a selection protein
  • protein 2 is a reporter protein, such as GFP or any other fluorescent protein, ⁇ -lactamase, another highly colored protein, as either outlined above or known in the art
  • reporter protein such as GFP or any other fluorescent protein, ⁇ -lactamase, another highly colored protein, as either outlined above or known in the art
  • intracellular detection and tracking of full-length library members in mammalian cells or in cells in which the library is tested can be achieved Reporter-gene product analyses were outlined above
  • protein 1 is a reporter protein
  • protein 1 is a reporter protein and protein 2 is a selection protein, allowing, both intracellular tracking and selection of full-length library member
  • Linker 1 and linker 2 should not have a high self-affinity or a noncovalent affinity for either protein 1 or protein 2
  • linker 1 and/or linker 2 cons ⁇ st(s) of residues with one or more glycines to decouple the structure from protein 1 and protein 2 from the random library
  • linker 1 and or linker 2 prov ⁇ de(s) enough residues which, when extended, provide 0 5-1 protein diameter spacing between the random residues and proteins 1 and 2 This would correspond to approximately 15-30 A or 5-10 residues and would minimize ste ⁇ c interference in peptide library member binding to potential targets
  • linker 1 and/or linker 2 conta ⁇ n(s) enough hydrophilic residues so that the linkers do not adversely affect the solubility or stickiness of the entire fusion protein or of the linker region alone
  • a relatively rigid structure can be formed from the linkers to force the random residues away from the surfaces of proteins 1 and 2
  • the cellular protein p21 is used to display a linear peptide to binding partners.
  • the tumor suppressor protein p21 binds to PCNA via its C-terminal 22 residues by effectively displaying this C-terminal peptide to PCNA in an extended conformation (Gulbis et al., supra). Therefore this scaffold may be useful for the display of random peptide libraries with an extended structural bias in the position of some or all of the C-terminal 22 residues, with the C-terminal residues now being randomized.
  • the structure of the p21 scaffold appears to be disordered-and to-become more- ⁇ rdered-at-its N-terminus upon binding to cyclin-dependent kinases (CDKs).
  • CDKs cyclin-dependent kinases
  • the overall disordered structure may suggest that this scaffold nay be particularly useful for displaying extended (disordered) peptide libraries.
  • the nuclear localization sequence of p21 located between residues 141 and 156 is deleted and replaced by random residues.
  • the random peptide library is thus inserted that it replaces the nuclear localization signal.
  • this scaffold should function as a scaffold for a cytoplasmic peptide library.
  • the p21 scaffold library members should not bind to nuclear cyclins and CDKs and thus should not perturb the cell cycle.
  • the appropriate domains can be inactivated by site-directed mutagenesis, as known in the art.
  • One such mutation, R94W blocks the ability of p21 to inhibit cyclin-dependent kinases [Balbin et al., J. Biol. Chem. 271 : 15782-6 (1996)].
  • a second mutant in a p21 CDK- construct, also blocking CDK binding, has been shown to stabilize p21 to proteosomal degradation [Cayrol and Ducommun, Oncogene 17:2437-44 (1998)] and thus may be preferred as a scaffold.
  • N50S also blocks CDK inhibition by p21 [Welcker et al., Cancer Res. 58:5053-6 (1998)].
  • the cy-1 site (residues 17-24) may be deleted, blocking both cyclin- and cyclin-CDK complex binding to p21 [Chen et al., Mol. Cell. Biol. 16:4673-82 (1996)].
  • the cy-2 cyclin binding site, at residues 152-158, may also be deleted in case the random library is inserted in place of residues 141-164.
  • the scaffold protein is kanamycin nucleotidyl transferase (see Figure 8). Kanamycin nucleotidyl transferase forms tight dimers.
  • the extended-bias random peptides would be inserted between the C-terminus of the first dimer and the N-terminus of the second dimer, with spacer residues between each protein and the random residues.
  • the spacer residues on either side of the random library region would consist of at least 5-10 residues on each side of the random peptide library, including one or more glycines and no hydrophobic residues.
  • the fusion proteins of the present invention comprise a scaffold protein and a random peptide
  • the peptides (and nucleic acids encoding them) are randomized, either fully randomized or they are biased in their randomization, e g in nucleotide/residue frequency generally or per position
  • randomized or grammatical equivalents herein is meant that each nucleic acid and peptide consists of essentially random nucleotides and ammo acids, respectively
  • the nucleic acids which give rise to the peptides are chemically synthesized, and thus-may incorporate any nucleotide at any position Thus; when the nucleic acids are expressed to form peptides, any ammo acid residue may be incorporated at any position
  • the synthetic process can be designed to generate randomized nucleic acids, to allow the formation of all or most of the possible combinations over the length of the nucleic acid, thus forming a library of randomized nucleic acids
  • the library should provide a sufficiently structurally diverse population of randomized expression products to effect a probabilistically sufficient range of cellular responses to provide one or more cells exhibiting a desired response Accordingly, an interaction library must be large enough so that at least one of its members will have a structure that gives it affinity for some molecule, protein, or other factor whose activity is necessary for completion of the signaling pathway Although it is difficult to gauge the required absolute size of an interaction library, nature provides a hint with the immune response a diversity of 10 7 -10 8 different antibod- les provides at least one combination with sufficient affinity to interact with most potential antigens faced by an organism Published in vitro selection techniques have also shown that a library size of 10 7 to 10 8 is sufficient to find structures with affinity for the target A library of all combinations of a peptide 7 to 20 am o acids in length, such as proposed here for expression in retroviruses, has the potential to code for 20 7 (10 9 ) to 20 20 Thus, for example, with libraries of 10 7 to 10 8 per ml of retroviral particles the present methods allow a
  • a library of fusion proteins each fusion protein comprising a scaffold protein and a random peptide, comprises at least 10 s , preferably at least 10 6 , more preferably at least 10 7 , still more preferably at least 10 8 and most preferably at least 10 9 different random peptides
  • an mdivdual member of the library of fusion proteins is analyzed as outlined herein
  • more than one individual member of the library of fusion proteins may be simultaneously analyzed
  • the peptide library is fully randomized, with no sequence preferences or constants at any position
  • the library is biased That is, some positions within the sequence are either held constant, or are selected from a limited number of possibilities
  • the nucleotides or ammo acid residues are randomized within a defined class, for example, of hydrophobic ammo acids, hydrophilic residues, ste ⁇ cally biased (either small or large) residues, towards the creation of cystemes, for cross-linking, prolmes for SH-3 domains, se ⁇ nes, threonines, tyrosines or histid es for phosphorylation sites, etc , or to pu ⁇ nes, etc
  • individual residues may be fixed in the random peptide sequence of the insert to create a structural bias, similar to the concept of presentation structures outlined below
  • a preferred embodiment utilizes inserts of a general structure -gly 2 8 -aa aa 2 - -aa n -gly 2 8 - where the random insert sequence is aa , to aa n
  • This sequence can be constrained by fixing one or more of the n residues as prolmes (which will significantly restrict the conformation space of the entire loop), as bulky ammo acids such as W, R, K, L, I, V, F, or Y, or biasing the set of random am o acids to include only bulky residues such as E, F, H, I, K, L, M, Q, R, T, V, W, and Y Due to the larger size of the side chains, these residues will have fewer ways to pack into a small space that is defined by that available to a loop, and thus there will be fewer available loop conformations
  • the random libraries can be biased to a particular secondary structure by including an appropriate number of residues (beyond the glycine linkers) which prefer the particular secondary structure
  • residues beyond the glycine linkers
  • the entire loop insert might look like -gly 2 8 — helix former 4 8 -random residues-helix former 4 . 8 - gly 2 .
  • the randomized region can be devoid of strong helix breakers such as pro and gly, examples of strong helix forming residues would include M, A, K, L, D, E, R, Q, F, I and V
  • the bias is towards peptides that interact with known classes of molecules
  • peptides that interact with known classes of molecules
  • a short region from the HIV-1 envelope cytoplasmic domain has been previously shown to block the action of cellular calmodulin Regions of the Fas cytoplasmic domain, which shows homology to the mastoparan toxin from Wasps, can be limited to a short peptide region with death-inducing apoptotic or G protein inducing functions
  • Magainm a natural peptide derived from Xenopus, can have potent anti-tumour and anti-microbial activity
  • Short peptide fragments of a protein kmase C isozyme ( ⁇ PKC) have been shown to block nuclear translocation of ⁇ PKC in Xenopus oocytes following stimulation
  • short SH-3 target peptides have been used
  • a number of molecules or protein domains are suitable as starting points for the generation of biased randomized peptides
  • a large number of small molecule domains are known, that confer a common function, structure or affinity
  • areas of weak ammo acid homology may have strong structural homology
  • a number of these molecules, domains, and/or corresponding consensus sequences are known, including, but are not limited to, SH-2 domains, SH-3 domains, Pleckst ⁇ n, death domains, protease cleavage/recognition sites, enzyme inhibitors, enzyme substrates, Traf, etc
  • nucleic acid binding proteins containing domains suitable for use in the invention
  • leucine zipper consensus sequences are known
  • the random library may have leucines or isoleucines fixed every 7 residues to bias it to a leucine or isoleucine zipper motif.
  • the optional C- or N-cap residues in the case of a helix-biased library, may be fixed and not random and again would be strong helix formers.
  • the C- or N-terminus forms a stable secondary structure such as an alpha helix or a poly-proline helix, it will be resistant to proteolysis, which would be an advantage for the stability of the library within the cell.
  • N- and C-cap helix stabilizing sequences or residues can be included both at the N-termini and C-termini, respectively [Betz and DeGrado, Biochem. 35:6955-62 (1996); Doig et al. Prot. Sci. 6:147-155 (1997); Doig and Baldwin, Prot. Sci. 4:1325-36 (1995); Richardson and Richardson, Science 240:1648-52 (1988). These sequences are incorporated by reference].
  • a library with a more extended structural bias is constructed, wherein weaker helix formers would be fused at each end of the random region, or one or more glycines would be included in the spacer region and C- or N-cap region.
  • a library with a more extended structural bias is constructed by omitting the helix N- or C-cap residues.
  • the random residues would be selected from all 20 natural L-amino acids.
  • a dual library may be constructed with fusion peptides at both the N-and C-terminus of ⁇ -lactamase and the resulting library has the following schematic structure: "(+/- optional N-cap residues)-random peptide library-spacer residues-N-terminus- BLA-C-terminus-spacer residues-random peptide library-(+/- optional C-cap residues)".
  • the ⁇ -lactamase N- and C-terminal helices are adjacent and parallel (i.e. they run in the same direction), such a library could be biased to have two adjacent helices sticking out from the ⁇ -lactamase structure in a coiled-coil fashion.
  • this bias is accentuated by inclusion of the spacer sequences KLEALEG (Monera et al., supra) or VSSLESK [Graddis et al., Biochem. 32:12664-71 (1993)] between the random peptide library and that of ⁇ -lactamase.
  • the spacer sequence VSSLESE could be included between one random peptide library and ⁇ -lactamase, and the spacer sequence VSSLKSK could be included between the second random peptide library (e.g., after adjustments of the number of intervening amino acids to keep these in register) and the other terminus of ⁇ -lactamase (Graddis et al , supra) These two helix heptad repeats may help bind the two potential helices together
  • the bias of the two adjacent random peptide libraries to a coiled coil is further increased by fixing positions in the sequence such that a number of random residues will be inserted on the surface of the two helices while the fixed residues in the sequence may reside at the interface between the two helices in a parallel coiled coil
  • the two helices composing the random peptide library may be set in register lengthwise by insertion of one or more helix forming residues as appropriate
  • the size of the library thus be controlled by n Residues in positions c, c', f, f , b and b' may
  • the fixed residues a, a', d, and d' are combinations of hydrophobic strong helix forming residues such as ala, val, leu, g and g' are lys, and e and e' are glu (or alternatively lys, when e and e' are glu)
  • Positions e, e', g, and g' may be fixed to further stabilize the coiled coil with salt bridges
  • Positions b, b', c, c', f and f may be random residues
  • a library with less helical bias is generated having more random residues on the surface of the helix
  • positions g and g' and e and e' may be random residues as well
  • n would be 1 , 2, 3, 4, 5 or more
  • an alternative set of fixed residues is used to generate a bias to a parallel coiled coil
  • the fixed positions include ala in a and a' leu in d and d', glu in e and e', lys in g and g', and random residues in the remaining positions
  • g and g' may also be randomized
  • SH-3 domain-binding o gonucleotides/peptides are made SH-3 domains have been shown to recognize short target motifs (SH-3 domain-binding peptides), about ten to twelve residues in a linear sequence, that can be encoded as short peptides with high affinity for the target SH-3 domain Consensus sequences for SH-3 domain binding proteins have been proposed
  • oligos/peptides are made with the following biases 1 XXXPPXPXX, wherein X is a randomized residue 2 (within the positions of residue positions 11 to -2)
  • the N-terminus flanking region is suggested to have the greatest effects on binding affinity and is therefore entirely randomized "Hyd” indicates a bias toward a hydrophobic residue, i e - Val, Ala, Gly, Leu, Pro, Arg
  • the random peptides range from about 4 to about 50 residues in length, with from about 5 to about 30 being preferred, and from about 10 to about 20 being especially preferred
  • the random pept ⁇ de(s) can be fused to a scaffold in a variety of positions, as is more fully outlined herein, to form fusion polypeptides
  • the fusion proteins of the present invention preferably include additional components, including, but not limited to, fusion partners, including linkers
  • Fusion partner herein is meant a sequence that is associated with the random peptide that confers upon all members of the library in that class a common function or ability Fusion partners can be heterologous (i e not native to the host cell), or synthetic (not native to any cell) Suitable fusion partners include, but are not limited to a) presentation structures, as defined below, which provide the peptides in a conformationally restricted or stable form, b) targeting sequences, defined below, which allow the localization of the peptide into a subcellular or extracellular compartment, c) rescue sequences as defined below, which allow the purification or isolation of either the peptides or the nucleic acids encoding them, d) stability sequences, which confer stability or protection from degradation to the peptide or the nucleic acid encoding it, for example resistance to proteolytic degradation, e) linker sequences, which conformationally decouple the random peptide elements from the scaffold itself, which keep the peptide from interfering with scaffold folding, or f), any
  • the fusion partner is a presentation structure
  • presentation structure or grammatical equivalents herein is meant a sequence, which, when fused to peptides, causes the peptides to assume a conformationally restricted form Proteins interact with each other largely through conformationally constrained domains
  • small peptides with freely rotating ammo and carboxyl termini can have potent functions as is known in the art, the conversion of such peptide structures into pharmacologic agents is difficult due to the inability to predict side-chain positions for peptidomimetic synthesis Therefore the presentation of peptides in conformationally constrained structures will benefit both the later generation of pharmacophore models and pharmaceuticals and will also likely lead to higher affinity interactions of the peptide with the target protein This fact has been recognized in the combinatorial library generation systems using biologically generated short peptides in bacterial phage systems A number of workers have constructed small domain molecules in which one might present randomized peptide structures
  • synthetic presentation structures i e artificial polypeptides
  • synthetic presentation structures are capable of presenting a randomized peptide as a conformationally-rest ⁇ cted domain
  • presentation structures comprise a first portion joined to the N-terminal end of the randomized peptide, and a second portion joined to the C-termmal end of the peptide, that is, the peptide is inserted into the presentation structure, although variations may be made, as outlined below, in which elements of the presentation structure are included within the random peptide sequence
  • the presentation structures are selected or designed to have minimal biologically activity when expressed in the target cell
  • suitable presentation structures maximize accessibility to the peptide by presenting it on an exterior surface such as a loop, and also cause further conformational constraints in a peptide
  • suitable presentation structures include, but are not limited to, dime ⁇ zation sequences, minibody structures, loops on ⁇ -turns and coiled-coil stem structures in which residues not critical to structure are randomized, zinc-finger domains, cysteine-lmked (disulfide) structures, transglutaminase linked structures, cyclic peptides, B-loop structures, helical barrels or bundles, leucine zipper motifs, etc
  • the presentation structure is a coiled-coil structure, allowing the presentation of the randomized peptide on an exterior loop
  • coiled-coil structures allow for between 6 to 20 randomized positions
  • a preferred coiled-coil presentation structure is as follows
  • the underlined regions represent a coiled-coil leucine zipper region defined previously (see Martin et al , EMBO J 13(22) 5303-5309 (1994), incorporated by reference)
  • the bolded GRGDMP region represents the loop structure and when appropriately replaced with randomized peptides (i e peptides, generally depicted herein as (X) n , where X is an am o acid residue and n is an integer of at least 5 or 6) can be of variable length
  • the replacement of the bolded region is facilitated by encoding restriction endonuclease sites in the underlined regions, which allows the direct incorporation of randomized oligonucleotides at these positions
  • a preferred embodiment generates a Xhol site at the double underlined LE site and a Hmdlll site at the double-underlined KL site
  • the presentation structure is a minibody structure
  • a "minibody” is essentially composed of a minimal antibody complementarity region
  • the minibody presentation structure generally provides two randomizing regions that in the folded protein are presented along a single face of the tertiary structure See for example Bianchi et al , J Mol Biol
  • a preferred minibody presentation structure is as follows
  • the presentation structure is a sequence that contains generally two cysteme residues, such that a disulfide bond may be formed, resulting in a conformationally constrained sequence
  • This embodiment is particularly preferred ex vivo, for example when secretory targeting sequences are used.
  • the presentation sequence confers the ability to bind metal ions to confer secondary structure
  • C2H2 zinc finger sequences are used, C2H2 sequences have two cysteines and two histidines placed such that a zinc ion is chelated
  • Zinc finger domains are known to occur independently in multiple zinc-finger peptides to form structurally independent, flexibly linked domains
  • a general consensus sequence is (5 ammo ac ⁇ ds)-C-(2 to 3 am o ac ⁇ ds)-C-(4 to 12 am o ac ⁇ ds)-H-(3 ammo ac ⁇ ds)-H-(5 ammo acids)
  • a preferred example would be -FQCEEC-random peptide of 3 to 20 am o acids-HIRSHTG-
  • CCHC boxes can be used (see Biochem Biophys Res Commun 242 385 (1998)), that have a consensus seqeunce -C-(2 ammo ac ⁇ ds)-C-(4 to 20 random pept ⁇ de)-H-(4 ammo ac ⁇ ds)-C- (see Bavoso et al , Biochem Biophys Res Comm 242(2) 385 (1998), hereby incorporated by reference Preferred examples include (1 ) -VKCFNC-4 to 20 random ammo a ⁇ ds-HTARNCR-, based on the nucleocapsid protein P2, (2) a sequence modified from tehat of the naturally occu ⁇ ng zinc-binding peptide of the Lasp-1 LIM domain (Hammarstrom et al , Biochem 35 12723 (1996)), and (3) -MNPNCARCG-4 to 20 random ammo acids-HKACF-, based on the nmr structural ensemble 1ZFP (
  • the presentation structure is a dime ⁇ zation sequence, including self- binding peptides
  • a dime ⁇ zation sequence allows the non-covalent association of two peptide sequences, which can be the same or different, with sufficient affinity to remain associated under normal physiological conditions
  • These sequences may be used in several ways
  • one terminus of the random peptide is joined to a first dime ⁇ zation sequence and the other terminus is joined to a second dime ⁇ zation sequence, which can be the same or different from the first sequence
  • This allows the formation of a loop upon association of the dimenzing sequences
  • the use of these sequences effectively allows small libraries of random peptides (for example, 10 4 ) to become large libraries if two peptides per cell are generated which then dime ⁇ ze, to form an effective library of 10 8 (10 4 X 10 4 )
  • the dimers may be homo- or heterodimers
  • Dimerization sequences may be a single sequence that self-aggregates, or two different sequences that associate That is, nucleic acids encoding both a first random peptide with dimerization sequence 1 , and a second random peptide with dimerization sequence 2, such that upon introduction into a cell and expression of the nucleic acid, dimerization sequence 1 associates with dimerization sequence 2 to form a new random peptide structure
  • dime ⁇ zation sequences allows the "circula ⁇ zation' of the random peptides, that is, if a dimerization sequence is used at each terminus of the peptide, the resulting structure can form a "stem-loop" type of structure
  • the use of dimenzing sequences fused to both the N- and C-terminus of the scaffold such as GFP forms a noncovalently cyclized scaffold random peptide library
  • dimerization sequences will encompass a wide variety of sequences Any number of protein-protein interaction sites are known. dimerization sequences may also be elucidated using standard methods such as the yeast two hybrid system, traditional biochemical affinity binding studies, or even using the present methods See U S S N 60/080,444, filed April 2, 1998, hereby incorporated by reference in its entireity Particularly preferred dimerization peptide sequences include, but are not limited to, -EFLIVKS-, EEFLIVKKS-, -FESIKLV-, and - VSIKFEL-
  • the fusion partner is a targeting sequence
  • the localization of proteins within a cell is a simple method for increasing effective concentration and determining function
  • RAF1 when localized to the mitochondnal membrane can inhibit the anti-apoptotic effect of BCL-2
  • membrane bound Sos induces Ras mediated signaling in T-lymphocytes
  • suitable targeting sequences include, but are not limited to, binding sequences capable of causing binding of the expression product to a predetermined molecule or class of molecules while retaining bioactivity of the expression product, (for example by using enzyme inhibitor or substrate sequences to target a class of relevant enzymes), sequences signalling selective degradation, of itself or co-bound proteins, and signal sequences capable of constitutively localizing the peptides to a predetermined cellular locale, including a) subcellular locations such as the Golgi, endoplasmic reticulum, nucleus, nucleoli, nuclear membrane, mitochondria, chloroplast, secretory vesicles, lysosome, and cellular membrane, and b) extracellular locations via a secretory signal Particularly preferred is localization to either subcellular locations or to the outside of the cell via secretion
  • the targeting sequence is a nuclear localization signal (NLS)
  • NLSs are generally short, positively charged (basic) domains that serve to direct the entire protein in which they occur to the cell's nucleus
  • NLS ammo acid sequences have been reported including single basic NLS's such as that of the SV40 (monkey virus) large T Antigen (Pro Lys Lys Lys Arg Lys Val), Kalderon (1984), et al , Cell, 39 499-509, the human retmoic acid receptor- ⁇ nuclear localization signal (ARRRRP), NFKB p50 (EEVQRKRQKL, Ghosh et al , Cell 62 1019 (1990), NFKB p65 (EEKRKRTYE, Nolan et al , Cell 64 961 (1991 ), and others (see for example Bou kas, J Cell Biochem 55(1) 32-58 (1994), hereby incorporated by reference) and double basic NLS's exemplified by that of
  • the targeting sequence is a membrane anchoring signal sequence
  • a membrane anchoring region is provided at the carboxyl terminus of the peptide presentation structure
  • the randomized epression product region is expressed on the cell surface and presented to the extracellular space, such that it can bind to other surface molecules (affecting their function) or molecules present in the extracellular medium
  • the binding of such molecules could confer function on the cells expressing a peptide that binds the molecule
  • the cytoplasmic region could be neutral or could contain a domain that, when the extracellular randomized expression product region is bound, confers a function on the cells (activation of a kmase, phosphatase
  • Membrane-anchoring sequences are well known in the art and are based on the genetic geometry of mammalian transmembrane molecules Peptides are inserted into the membrane based on a signal sequence (designated herein as ssTM) and require a hydrophobic transmembrane domain (herein TM)
  • ssTM signal sequence
  • TM hydrophobic transmembrane domain
  • the transmembrane proteins are inserted into the membrane such that the regions encoded 5' of the transmembrane domain are extracellular and the sequences 3' become intracellular
  • these transmembrane domains are placed 5' of the variable region, they will serve to anchor it as an intracellular domain, which may be desirable in some embodiments sTMs and TMs are known for a wide variety of membrane bound proteins, and these sequences may be used accordingly, either as pairs from a particular protein or with each component being taken from a different protein, or alternatively, the sequences may be synthetic, and derived entirely from consensus as artificial delivery domains
  • membrane-anchoring sequences including both ssTM and TM, are known for a wide variety of proteins and any of these may be used Particularly preferred membrane-anchoring sequences include, but are not limited to, those derived from CD8, ICAM-2, IL-8R, CD4 and LFA-1
  • Useful sequences include sequences from 1 ) class I integral membrane proteins such as IL-2 receptor ⁇ -cham (residues 1-26 are the signal sequence, 241-265 are the transmembrane residues, see Hatakeyama et al , Science 244 551 (1989) and von Heijne et al, Eur J Biochem 174 671 (1988)) and insulin receptor ⁇ -chain (residues 1-27 are the signal, 957-959 are the transmembrane domain and 960-1382 are the cytoplasmic domain, see Hatakeyama, supra, and Ebma et al , Cell 40 747 (1985)), 2) class II integral membrane proteins such as neutral endopeptidase (residues 29-51 are the transmembrane domain, 2-28 are the cytoplasmic domain, see Malfroy et al , Biochem Biophys Res Commun 144 59 (1987)), 3) type III proteins such as human cytochrome P450 NF25 (H
  • membrane anchoring sequences include the GPI anchor, which results in a covalent bond between the molecule and the lipid bilayer via a glycosyl-phosphatidylmositol bond for example in DAF (PNKGSGTTSGTTRLLSGHTCFTLTGLLGTLVTMGLLT, with the bolded se ⁇ ne the site of the anchor, see Homans et al , Nature 333(6170) 269-72 (1988), and Moran et al . J Biol Chem 266 1250 (1991 ))
  • the GPI sequence from Thy-1 can be cassetted 3' of the variable region in place of a transmembrane sequence
  • my ⁇ stylation sequences can serve as membrane anchoring sequences It is known that the my ⁇ stylation of c-src recruits it to the plasma membrane This is a simple and effective method of membrane localization, given that the first 14 ammo acids of the protein are solely responsible for this function MGSSKSKPKDPSQR (see Cross et al , Mol Cell Biol 4(9) 1834 (1984), Spencer et al , Science 262 1019-1024 (1993), both of which are hereby incorporated by reference)
  • This motif has already been shown to be effective in the localization of reporter genes and can be used to anchor the zeta chain of the TCR This motif is placed 5' of the variable region in order to localize the construct to the plasma membrane
  • Other modifications such as palmitoylation can be used to anchor constructs in the plasma membrane, for example, palmitoylation sequences from the G protein-coupled receptor kmase GRK6 sequence
  • the targeting sequence is a lysozomal targeting sequence, including, for example, a lysosomal degradation sequence such as Lamp-2 (KFERQ, Dice, Ann N Y Acad Sci 674 58 (1992), or lysosomal membrane sequences from Lamp-1 (MLIPIAGFFALAGLVLIVLIAYLIGRKRS AGYQ1 ⁇ . Uthavakumar et al .
  • Lamp-2 LVPIAVGAALAGVLILVLLAYFIGLKHH ⁇ i ⁇ GYEQF, Konecki et la , Biochem Biophys Res Comm 205 1-5 (1994), both of which show the transmembrane domains in italics and the cytoplasmic targeting signal underlined
  • the targeting sequence may be a mitrochond ⁇ al localization sequence, including mitochondnal matrix sequences (e g yeast alcohol dehydrogenase III,
  • the target sequences may also be endoplasmic reticulum sequences, including the sequences from calreticulin (KDEL, Pelham, Royal Society London Transactions B, 1-10 (1992)) or adenovirus E3/19K protein (LYLSRRSFIDEKKMP, Jackson et al , EMBO J 9 3153 (1990)
  • endoplasmic reticulum sequences including the sequences from calreticulin (KDEL, Pelham, Royal Society London Transactions B, 1-10 (1992)
  • LYLSRRSFIDEKKMP adenovirus E3/19K protein
  • targeting sequences also include peroxisome sequences (for example, the peroxisome matrix sequence from Luciferase, SKL, Keller et al , PNAS USA 4 3264 (1987)), farnesylation sequences (for example, P21 H-ras 1 , LNPPDESGPGCMSCKCVLS, with the bold cysteme famesylated, Capon, supra), geranylgeranylation sequences (for example, protein rab- 5A, LTEPTQPTRNQCCSN, with the bold cysteines geranylgeranylated, Farnsworth, PNAS USA 91 11963 (1994)), or destruction sequences (cyclin B1 , RTALGDIGN, Klotzbucher et al , EMBO J 1 3053 (1996))
  • peroxisome sequences for example, the peroxisome matrix sequence from Luciferase, SKL, Keller et al , PNAS USA 4 3264 (1987)
  • farnesylation sequences for example, P21 H-
  • the targeting sequence is a secretory signal sequence capable of effecting the secretion of the peptide
  • secretory signal sequences which are placed 5' to the variable peptide region, and are cleaved from the peptide region to effect secretion into the extracellular space
  • Secretory signal sequences and their transferabi ty to unrelated proteins are well known, e g , Silhavy, et al (1985) Microbiol Rev 49, 398-418
  • This is particularly useful to generate a peptide capable of binding to the surface of, or affecting the physiology of, a target cell that is other than the host cell, e g , the cell infected with the retrovirus
  • a fusion product is configured to contain, in series, secretion signal peptide-presentation structure-randomized expression product region-presentation structure, see Figure 3 In this manner, target cells grown in the vicinity of cells caused to express the library of peptides, are bathed in secretory signal sequence
  • Suitable secretory sequences are known, including signals from IL-2
  • MYRMQLLSCIALSLALVTNS Villmger et al , J Immunol 155 3946 (1995)
  • growth hormone MATGSRTSLLLAFGLLCLPWLQEGSAFPI, Roskam et al , Nucleic Acids Res 7 30 (1979)
  • preproinsulm MALWMRLLPLLALLALWGPDPAAAFVN.
  • a particularly preferred secretory signal sequence is the signal leader sequence from the secreted cytokme IL-4, which comprises the first 24 ammo acids of IL-4 as follows MGLTSQLLPPLFFLLACAGNFVHG
  • the fusion partner is a rescue sequence
  • a rescue sequence is a sequence which may be used to purify or isolate either the peptide or the nucleic acid encoding it
  • peptide rescue sequences include purification sequences such as the H ⁇ s 6 tag for use with Ni affinity columns and epitope tags for detection, immunoprecipitation or FACS (fluoroscence-activated cell sorting)
  • Suitable epitope tags include myc (for use with the commercially available 9E10 antibody), the BSP biotmylation target sequence of the bacterial enzyme BirA, flu tags, lacZ, GST, and Strep tag I and II
  • the rescue sequence may be a unique oligonucleotide sequence which serves as a probe target site to allow the quick and easy isolation of the retroviral construct, via PCR, related techniques, or hybridization
  • the fusion partner is a stability sequence to confer stability to the peptide or the nucleic acid encoding it
  • peptides may be stabilized by the incorporation of glycines after the initiation methionine (MG or MGG0), for protection of the peptide to ubiquitmation as per Varshavsky's N-End Rule, thus conferring long half-life in the cytoplasm
  • two prolmes at the C-terminus impart peptides that are largely resistant to carboxypeptidase action
  • the presence of two glycines prior to the prolmes impart both flexibility and prevent structure initiating events in the di-prolme to be propagated into the peptide structure
  • preferred stability sequences are as follows MG(X) n GGPP, where X is any ammo acid and n is an integer of at least four
  • N-cap N-cap residues
  • N-cap sequence or grammatical equivalents thereof refer to
  • the fusion partners may be placed anywhere (i e N-terminal, C-termmal, internal) in the structure as the biology and activity permits
  • the discussion has been directed to the fusion of fusion partners to the peptide portion of the fusion polypeptide, it is also possible to fuse one or more of these fusion partners to the scaffold portion of the fusion polypeptide
  • the scaffold may contain a targeting sequence (either N-termmally, C- terminally, or internally, as described below) at one location, and a rescue sequence in the same place or a different place on the molecule
  • a targeting sequence either N-termmally, C- terminally, or internally, as described below
  • the fusion partner includes a linker or tethering sequence
  • Linker sequences between various targeting sequences may be desirable to allow the peptides to interact with potential targets unhindered
  • useful linkers include glycine polymers (G) n , glycine-senne polymers (including, for example, (GS) n , (GSGGS) n and (GGGS) n , where n is an integer of at least one), glycme-alanine polymers, alanine-se ⁇ ne polymers, and other flexible linkers such as the tether for the shaker potassium channel, and a large variety of other flexible linkers, as will be appreciated by those in the art Glycine and glycine-senne polymers are preferred since both of these ammo acids are relatively unstructured, and therefore may be able to serve as a neutral tether between components
  • linkers when the peptides are inserted into internal positions in scaffold, preferred embodiments utilize linkers, and preferably (gly) n linkers, where n is 1 or more, with n being two, three, four, five and six, although linkers of 7-10 or more ammo acids are also possible Generally in this embodiment, no ammo acids with ⁇ -carbons are used in the linkers
  • the linker comprises the sequence GQGGG
  • the linker comprises the sequence GQAGGGG
  • either linker may be fused to either the N-terminus or C-terminus of a peptide or scaffold protein
  • the fusion partners may be modified, randomized, and/or matured to alter the presentation orientation of the randomized expression product
  • determinants at the base of the loop may be modified to slightly modify the internal loop peptide tertiary structure, which maintaining the randomized ammo acid sequence
  • combinations of fusion partners are used
  • any number of combinations of presentation structures, targeting sequences, rescue sequences, and stability sequences may be used, with or without linker sequences
  • the invention further provides fusion nucleic acids encoding the fusion polypeptides of the invention
  • an extremely large number of nucleic acids may be made, all of which encode the fusion proteins of the present invention
  • the expression vectors may be either self-replicating extrachromosomal vectors or vectors which integrate into a host genome Generally, these expression vectors include transc ⁇ ptional and translational regulatory nucleic acid operably linked to the nucleic acid encoding the fusion protein
  • control sequences refers to DNA sequences necessary for the expression of an operably linked coding sequence in a particular host organism
  • the control sequences that are suitable for prokaryotes include a promoter, optionally an operator sequence, and a ⁇ bosome binding site Eukaryotic cells are known to utilize promoters, polyadenylation signals, and enhancers
  • Nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence
  • DNA for a presequence or secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotem that participates in the secretion of the polypeptide
  • a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence
  • a ⁇ bosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation
  • "operably linked” means that the DNA sequences being linked are contiguous, and, in the case of a secretory leader, contiguous and in reading phase
  • enhancers do not have to be contiguous Linking is accomplished by ligation at convenient restriction sites If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice The transc ⁇ ptional and translational regulatory nucleic acid will generally be appropriate
  • transc ⁇ ptional and translational regulatory sequences may include, but are not limited to, promoter sequences, ribosomal binding sites, transc ⁇ ptional start and stop sequences, translational start and stop sequences, and enhancer or activator sequences
  • the regulatory sequences include a promoter and transc ⁇ ptional start and stop sequences
  • Promoter sequences encode either constitutive or inducible promoters
  • the promoters may be either naturally occurring promoters or hybrid promoters Hybrid promoters, which combine elements of more than one promoter, are also known in the art, and are useful in the present invention
  • the promoters are strong promoters, allowing high expression in cells, particularly mammalian cells, such as the CMV promoter, particularly in combination with a Tet regulatory element
  • the expression vector may comprise additional elements
  • the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in mammalian or insect cells for expression and in a procaryotic host for cloning and amplification
  • the expression vector contains at least one sequence homologous to the host cell genome, and preferably two homologous sequences which flank the expression construct
  • the integrating vector may be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector Constructs for integrating vectors are well known in the art
  • the expression vector contains a selectable marker gene to allow the selection of transformed host cells Selection genes are well known in the art and will vary with the host cell used
  • a preferred expression vector system is a retroviral vector system such as is generally described in PCT/US97/01019 and PCT/US97/01048, both of which are hereby expressly incorporated by reference
  • the candidate nucleic acids are introduced into the cells for screening, as is more fully outlined below
  • Exemplary methods include CaP0 4 precipitation, liposome fusion, lipofectm®, electroporation, viral infection, etc
  • the candidate nucleic acids may stably integrate into the genome of the host cell (for example, with retroviral introduction, outlined below), or may exist either transiently or stably in the cytoplasm (i e through the use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc )
  • retroviral vectors capable of transfectmg such targets are preferred
  • the fusion proteins of the present invention are produced by culturing a host cell transformed with an expression vector containing nucleic acid encoding a fusion protein, under the appropriate conditions to induce or cause expression of the fusion protein
  • the conditions appropriate for fusion protein expression will vary with the choice of the expression vector and the host cell, and will be easily ascertained by one skilled in the art through routine experimentation
  • the use of constitutive promoters in the expression vector will require optimizing the growth and proliferation of the host cell, while the use of an inducible promoter requires the appropriate growth conditions for induction
  • the timing of the harvest is important
  • the baculoviral systems used in insect cell expression are lytic viruses, and thus harvest time selection can be crucial for product yield
  • Appropriate host cells include yeast, bacteria, archebacte ⁇ a, fungi, and insect and animal cells, including mammalian cells Of particular interest are Drosophila melangaster cells, Saccharomyces cerevisiae and other yeasts, E coli, Bacillus subtilis, SF9 cells, C129 cells, 293 cells, Neurospora, BHK, CHO, COS, and HeLa cells, fibroblasts, Schwanoma cell lines, immortalized mammalian myeloid and lymphoid cell lines, Jurkat cells, mast cells and other endocrine and exoc ⁇ ne cells, and neuronal cells
  • the fusion proteins are expressed in mammalian cells
  • Mammalian expression systems are also known in the art, and include retroviral systems
  • a mammalian promoter is any DNA sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') transcription of a coding sequence for the fusion protein into mRNA
  • a promoter will have a transcription initiating region, which is usually placed Oproximal to the 5' end of the coding sequence, and a TATA box, using a located 25-30 base pairs upstream of the transcription initiation site The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at the correct site
  • a mammalian promoter will also contain an upstream promoter element (enhancer element), typically located within 100 to 200 base pairs upstream of the TATA box An upstream promoter element determines the rate at which transcription is initiated and can act in either orientation
  • An upstream promoter element determines the rate at which transcription is initiated and can act in either orientation
  • mammalian cells used in the present invention can vary widely Basically, any mammalian cells may be used, with mouse, rat, primate and human cells being particularly preferred, although as will be appreciated by those in the art, modifications of the system by pseudotypmg allows all eukaryotic cells to be used, preferably higher eukaryotes
  • a screen will be set up such that the cells exhibit a selectable phenotype in the presence of a bioactive peptide
  • cell types implicated in a wide variety of disease conditions are particularly useful, so long as a suitable screen may be designed to allow the selection of cells that exhibit an altered phenotype as a consequence of the presence of a peptide within the cell
  • suitable cell types include, but are not limited to, tumor cells of all types
  • telomeres include mononuclear leukocytes, stem cells such as haemopoetic, neural, skin, lung, kidney, liver and myocyte stem cells (for use in screening for differentiation and de- differentiation factors), osteoclasts, chondrocytes and other connective tissue cells, keratmocytes, melanocytes, liver cells, kidney cells, and adipocytes
  • Suitable cells also include known research cells, including, but not limited to, Jurkat T cells, NIH3T3 cells, CHO, Cos, etc See the ATCC cell line catalog, hereby expressly incorporated by reference
  • the cells may be additionally genetically engineered, that is, contain exogeneous nucleic acid other than the fusion nucleic acid
  • the fusion proteins are expressed in bacterial systems Bacterial expression systems are well known in the art
  • a suitable bacterial promoter is any nucleic acid sequence capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of the coding sequence of the fusion protein into mRNA
  • a bacterial promoter has a transcription initiation region which is usually placed proximal to the 5' end of the coding sequence This transcription initiation region typically includes an RNA polymerase binding site and a transcription initiation site
  • Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose and maltose, and sequences derived from biosynthetic enzymes such as tryptophan Promoters from bacte ⁇ ophage may also be used and are known in the art
  • synthetic promoters and hybrid promoters are also useful, for example, the tac promoter is a hybrid of the trp and lac promoter sequences
  • a bacterial promoter can include naturally occurring promoters of non-bacterial origin that have the ability
  • the ⁇ bosome binding site is called the Shme-Delgarno (SD) sequence and includes an initiation codon and a sequence 3-9 nucleotides in length located 3 - 11 nucleotides upstream of the initiation codon
  • the expression vector may also include a signal peptide sequence that provides for secretion of the fusion protein in bacteria
  • the signal sequence typically encodes a signal peptide comprised of hydrophobic ammo acids which direct the secretion of the protein from the cell, as is well known in the art
  • the protein is either secreted into the growth media (gram-positive bacteria) or into the pe ⁇ plasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria)
  • the bacterial expression vector may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed Suitable selection genes include genes which render the bacteria resistant to drugs such as ampicil n, chloramphenicol, erythromycin, kanamycin, neomycm and tetracyclme Selectable markers also include biosynthetic genes, such as those in the histidine, tryptophan and leucine biosynthetic pathways These components are assembled into expression vectors Expression vectors for bacteria are well known in the art, and include vectors for Bacillus subtilis, E coli, Streptococcus cremons, and Streptococcus lividans, among others
  • the bacterial expression vectors are transformed into bacterial host cells using techniques well known in the art, such as calcium chloride treatment, electroporation, and others
  • fusion proteins are produced in insect cells
  • Expression vectors for the transformation of insect cells, and in particular, baculovirus-based expression vectors, are well known in the art
  • fusion protein is produced in yeast cells
  • yeast expression systems are well known in the art, and include expression vectors for Saccharomyces cerevisiae, Candida albicans and C maltosa, Hansenula polymorpha, Kluyveromyces fragilis and K lactis, Pichia guille ⁇ mondii and P pastons, Schizosaccharomyces pombe, and Yarrowia lipolytica
  • Preferred promoter sequences for expression in yeast include the inducible GAL1 , 10 promoter, the promoters from alcohol dehydrogenase, enolase, glucokmase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase, hexokinase, phosphofructokinase, 3- phosphoglycerate mutase, pyruvate kmase, and the acid phosphatase gene Yeast selectable markers include ADE2, H
  • fusion polypeptides of the invention may be further fused to other proteins, if desired, for example to increase expression
  • the fusion nucleic acids, proteins and antibodies of the invention are labeled with a label other than the scaffold
  • label herein is meant that a compound has at least one element, isotope or chemical compound attached to enable the detection of the compound
  • labels fall into three classes a) isotopic labels, which may be radioactive or heavy isotopes, b) immune labels, which may be antibodies or antigens, and c) colored or fluorescent dyes
  • isotopic labels which may be radioactive or heavy isotopes
  • immune labels which may be antibodies or antigens
  • colored or fluorescent dyes The labels may be incorporated into the compound at any position
  • the fusion nucleic acids are introduced into the cells to screen for peptides capable of altering the phenotype of a cell
  • a first plurality of cells is screened That is, the cells into which the fusion nucleic acids are introduced are screened for an altered phenotype
  • the effect of the bioactive peptide is seen in the same cells in which it is made, i e an autoc ⁇ ne effect
  • a “plurality of cells” herein is meant roughly from about 10 3 cells to 10 8 or 10 9 , with from 10 6 to 10 8 being preferred
  • This plurality of cells comprises a cellular library, wherein generally each cell within the library contains a member of the peptide molecular library, i e a different peptide (or nucleic acid encoding the peptide), although as will be appreciated by those in the art, some cells within the library may not contain a peptide, and some may contain more than species of peptide When methods other than retroviral infection are used to introduce the candidate nucleic acids into a plurality of cells, the distribution of candidate nucleic acids within the individual cell members of the cellular library may vary widely, as it is generally difficult to control the number of nucleic acids which enter a cell during electroporation, etc
  • the fusion nucleic acids are introduced into a first plurality of cells, and the effect of the peptide is screened in a second or third plurality of cells, different from the first plurality of cells, i e generally a different cell type That is, the effect of the bioactive peptide is due to an extracellular effect on a second cell, i e an endocrine or parac ⁇ ne effect This is done using standard techniques
  • the first plurality of cells may be grown in or on one media, and the media is allowed to touch a second plurality of cells, and the effect measured Alternatively, there may be direct contact between the cells Thus, "contacting" is functional contact, and includes both direct and indirect
  • the first plurality of cells may or may not be screened
  • the cells are treated to conditions suitable for the expression of the peptide (for example, when inducible promoters are used)
  • the methods of the present invention comprise introducing a molecular library of fusion nucleic acids encoding randomized peptides fused to scaffold into a plurality of cells, a cellular library Each of the nucleic acids comprises a different nucleotide sequence encoding scaffold with a random peptide
  • the plurality of cells is then screened, as is more fully outlined below, for a cell exhibiting an altered phenotype
  • the altered phenotype is due to the presence of a bioactive peptide
  • altered phenotype or “changed physiology” or other grammatical equivalents herein is meant that the phenotype of the cell is altered in some way, preferably in some detectable and/or measurable way
  • a strength of the present invention is the wide variety of cell types and potential phenotypic changes which may be tested using the present methods Accordingly, any phenotypic change which may be observed, detected, or measured may be the basis of the screening methods herein Suitable phenotypic changes include, but are not limited to gross physical changes such as changes in cell morphology, cell growth, cell viability, adhesion to substrates or other cells, and cellular density, changes in the expression of one or more RNAs, proteins, lipids, hormones, cytokines, or other molecules, changes in the equilibrium state (i e half-life) or one or more RNAs, proteins, lipids, hormones, cytokines, or other molecules, changes in the localization of one or more RNAs, proteins
  • the altered phenotype may be detected in a wide variety of ways, as is described more fully below, and will generally depend and correspond to the phenotype that is being changed Generally, the changed phenotype is detected using, for example microscopic analysis of cell morphology, standard cell viability assays, including both increased cell death and increased cell viability, for example, cells that are now resistant to cell death via virus, bacteria, or bacterial or synthetic toxins, standard labeling assays such as fluoromet ⁇ c indicator assays for the presence or level of a particular cell or molecule, including FACS or other dye staining techniques, biochemical detection of the expression of target compounds after killing the cells, etc
  • the altered phenotype is detected in the cell in which the fusion nucleic acid was introduced, in other embodiments, the altered phenotype is detected in a second cell which is responding to some molecular signal from the first cell
  • transdominant an altered phenotype of a cell indicates the presence of a bioactive peptide, acting preferably in a transdommant way
  • transdominant herein is meant that the bioactive peptide indirectly causes the altered phenotype by acting on a second molecule, which leads to an altered phenotype
  • a transdominant expression product has an effect that is not in cis, i e , a trans event as defined in genetic terms or biochemical terms
  • a transdominant effect is a distinguishable effect by a molecular entity (i e , the encoded peptide or RNA) upon some separate and distinguishable target, that is, not an effect upon the encoded entity itself
  • transdominant effects include many well-known effects by pharmacologic agents upon target molecules or pathways in cells or physiologic systems, for instance, the ⁇ -lactam antibiotics have a transdominant effect upon peptidoglycan synthesis in bacterial cells by binding to penicillin binding proteins and disrupting their functions
  • a transdominant effect upon a protein or molecular pathway is clearly distinguishable from randomization, change, or mutation of a sequence within a protein or molecule of known or unknown function to enhance or dimmish a biochemical ability that protein or molecule already manifests
  • a protein that enzymatically cleaves ⁇ -lactam antibiotics, a ⁇ -lactamase could be enhanced or diminished in its activity by mutating sequences internal to its structure that enhance or diminish the ability of this enzyme to act upon and cleave ⁇ -lactam antibiotics This would be called a cis mutation to the protein
  • the effect of this protein upon ⁇ -lactam antibiotics is an activity the protein already manifests, to a distinguishable degree
  • a mutation in the leader sequence that enhanced the export of this protein to the extracellular spaces wherein it might encounter ⁇ -lactam molecules more readily, or a mutation within the sequence that enhance the stability of the protein would be termed cis mutations in the protein
  • the presence of the fusion protein is verified, to ensure that the peptide was expressed and thus that the altered phenotype can be due to the presence of the peptide
  • this verification of the presence of the peptide can be done either before, during or after the screening for an altered phenotype This can be done in a variety of ways, although preferred methods utilize FACS techniques
  • the cell with the altered phenotype is generally isolated from the plurality which do not have altered phenotypes This may be done in any number of ways, as is known in the art, and will in some instances depend on the assay or screen Suitable isolation techniques include, but are not limited to, FACS, lysis selection using complement, cell cloning, scanning by Fluo ⁇ mager, expression of a "survival" protein, induced expression of a cell surface protein or other molecule that can be rendered fluorescent or taggable for physical isolation, expression of
  • the fusion nucleic acid and/or the bioactive peptide is isolated from the positive cell This may be done in a number of ways
  • primers complementary to DNA regions common to the retroviral constructs, or to specific components of the library such as a rescue sequence, defined above are used to "rescue" the unique random sequence
  • the fusion protein is isolated using a rescue sequence
  • rescue sequences comprising epitope tags or purification sequences may be used to pull out the fusion protein using immunoprecipitation or affinity columns In some instances, as is outlined below, this may also pull out the primary target molecule, if there is a sufficiently strong binding interaction between the bioactive peptide and the target molecule
  • the peptide may be detected using mass spectroscopy
  • the sequence of the bioactive peptide and/or fusion nucleic acid is determined This information can then be used in a number of ways
  • the bioactive peptide is resynthesized and remtroduced into the target cells, to verify the effect
  • This may be done using retroviruses, or alternatively using fusions to the HIV-1 Tat protein, and analogs and related proteins, which allows very high uptake into target cells See for example, Fawell et al , PNAS USA 91 664 (1994), Frankel et al , Cell 55 1189 (1988), Savion et al , J Biol Chem 256 1149 (1981 ), Derossi et al , J Biol Chem 269 10444 (1994), and Baldm et al , EMBO J 9 1511 (1990), all of which are incorporated by reference
  • the sequence of a bioactive peptide is used to generate more candidate peptides
  • the sequence of the bioactive peptide may be the basis of a second round of (biased) randomization, to develop bioactive peptides with increased or altered activities
  • the second round of randomization may change the affinity of the bioactive peptide
  • either the bioactive peptide or the bioactive nucleic acid encoding it is used to identify target molecules, i e the molecules with which the bioactive peptide interacts
  • target molecules i e the molecules with which the bioactive peptide interacts
  • the bioactive peptide is used to pull out target molecules
  • the target molecules are proteins
  • the use of epitope tags or purification sequences can allow the purification of primary target molecules via biochemical means (co-immunoprecipitation, affinity columns, etc )
  • the peptide when expressed in bacteria and purified, can be used as a probe against a bacterial cDNA expression library made from mRNA of the target cell type
  • peptides can be used as "bait" in either yeast or mammalian two or three hybrid systems
  • Such interaction cloning approaches have been very useful to isolate DNA-bmdmg proteins and other interacting protein components
  • the pep- t ⁇ de(s) can be combined with other pharmacologic activators to study the epistatic relationships of signal transduction pathways in question It is also possible to synthetically prepare labeled peptide and use it to screen a cDNA library expressed in bacte ⁇ ophage for those cDNAs which bind the
  • secondary target molecules may be identified in the same manner, using the primary target as the "bait" In this manner, signalling pathways may be elucidated Similarly, bioactive peptides specific for secondary target molecules may also be discovered, to allow a number of bioactive peptides to act on a single pathway, for example for combination therapies
  • the screening methods of the present invention may be useful to screen a large number of cell types under a wide variety of conditions Generally, the host cells are cells that are involved in disease states, and they are tested or screened under conditions that normally result in undesirable consequences on the cells When a suitable bioactive peptide is found, the undesirable effect may be reduced or eliminated Alternatively, normally desirable consequences may be reduced or eliminated, with an eye towards elucidating the cellular mechanisms associated with the disease state or signalling pathway
  • the present methods are useful in cancer applications
  • the ability to rapidly and specifically kill tumor cells is a cornerstone of cancer chemotherapy
  • random libraries can be introduced into any tumor cell (primary or cultured), and peptides identified which by themselves induce apoptosis, cell death, loss of cell division or decreased cell growth This may be done de novo, or by biased randomization toward known peptide agents, such as angiostatm, which inhibits blood vessel wall growth
  • the methods of the present invention can be combined with other cancer therapeutics (e g drugs or radiation) to sensitize the cells and thus induce rapid and specific apoptosis, cell death, loss of cell division or decreased cell growth after exposure to a secondary agent
  • the present methods may be used in conjunction with known cancer therapeutics to screen for agonists to make the therapeutic more effective or less toxic This is particularly preferred when the chemotherapeutic is very expensive to produce such as taxol
  • non-transformed cells can be transfected with these oncogenes, and then random libraries introduced into these cells, to select for bioactive peptides which reverse or correct the transformed state
  • One of the signal features of oncogene transformation of cells is the loss of contact inhibition and the ability to grow in soft-agar
  • transforming viruses are constructed containing v-Abl, v-Src, or v-Ras in IRES-puro retroviral vectors, infected into target 3T3 cells, and subjected to puromycm selection, all of the 3T3 cells hyper-transform and detach from the plate
  • the cells may be removed by washing with fresh medium This can serve as the basis of a screen, since cells which express a bioactive
  • the growth and/or spread of certain tumor types is enhanced by stimulatory responses from growth factors and cytokines (PDGF, EGF, Heregulm, and others) which bind to receptors on the surfaces of specific tumors
  • the methods of the invention are used to inhibit or stop tumor growth and/or spread, by finding bioactive peptides capable of blocking the ability of the growth factor or cytokme to stimulate the tumor cell
  • bioactive peptides capable of blocking the ability of the growth factor or cytokme to stimulate the tumor cell
  • tumor cells known to have a high metastatic potential for example, melanoma, lung cell carcinoma, breast and ovarian carcinoma
  • Particular applications for inhibition of the metastatic phenotype, which could allow a more specific inhibition of metastasis include the metastasis suppressor gene NM23, which codes for a dmucleoside diphosphate kmase
  • intracellular peptide activators of this gene could block metastasis, and a screen for its upregulation (by fusing it to a reporter gene) would be of interest
  • Many oncogenes also enhance metastasis Peptides which inactivate or counteract mutated RAS oncogene
  • the random libraries of the present invention are introduced into tumor cells known to have inactivated tumor suppressor genes, and successful reversal by either reactivation or compensation of the knockout would be screened by restoration of the normal phenotype
  • a major example is the reversal of p53- ⁇ nact ⁇ vat ⁇ ng mutations, which are present in 50% or more of all cancers Since p53's actions are complex and involve its action as a transcription factor, there are probably numerous potential ways a peptide or small molecule derived from a peptide could reverse the mutation
  • One example would be upregulation of the immediately downstream cyclin-dependent kmase p21CIP1/WAF1 To be useful such reversal would have to work for many of the different known p53 mutations This is currently being approached by gene therapy, one or more small molecules which do this might be preferable
  • Another example involves screening of bioactive peptides which restore the constitutive function of the brca-1 or brca-2 genes, and other tumor suppressor genes important in breast cancer such as the adenomatous polyposis coli gene (APC) and the Drosophila discs-large gene (Dig), which are components of cell-cell junctions Mutations of brca-1 are important in hereditary ovarian and breast cancers, and constitute an additional application of the present invention
  • the methods of the present invention are used to create novel cell lines from cancers from patients
  • a retrovirally delivered short peptide which inhibits the final common pathway of programmed cell death should allow for short- and possibly long-term cell lines to be established
  • Conditions of in vitro culture and infection of human leukemia cells will be established
  • Some human cell lines have been established by the use of transforming agents such as
  • cardiomyocytes may be screened for the prevention of cell damage or death in the presence of normally injurious conditions, including, but not limited to, the presence of toxic drugs (particularly chemotherapeutic drugs), for example, to prevent heart failure following treatment with ad ⁇ amycin, anoxia, for example in the setting of coronary artery occlusion, and autoimmune cellular damage by attack from activated lymphoid cells (for example as seen in post viral myocarditis and lupus)
  • toxic drugs particularly chemotherapeutic drugs
  • ad ⁇ amycin anoxia
  • anoxia for example in the setting of coronary artery occlusion
  • autoimmune cellular damage by attack from activated lymphoid cells for example as seen in post viral myocarditis and lupus
  • Candidate bioactive peptides are inserted into cardiomyocytes, the cells are subjected to the insult, and bioactive peptides are selected that prevent any or all of apoptosis, membrane depolarization (i e decrease
  • the present methods are used to screen for enhanced contractile properties of cardiomyocytes and dimmish heart failure potential
  • the introduction of the libraries of the invention followed by measuring the rate of change of myosm polyme ⁇ zation/depolyme ⁇ zation using fluorescent techniques can be done Bioactive peptides which increase the rate of change of this phenomenon can result in a greater contractile response of the entire myocardium, similar to the effect seen with digitalis
  • the present methods are useful to identify agents that will regulate the intracellular and sarcolemmal calcium cycling in cardiomyocytes in order to prevent arrhythmias
  • Bioactive peptides are selected that regulate sodium-calcium exchange, sodium proton pump function, and regulation of calcium-ATPase activity
  • the present methods are useful to identify agents that dimmish embo c phenomena in arteries and arte ⁇ oles leading to strokes (and other occlusive events leading to kidney failure and limb ischemia) and angina precipitating a myocardial mfarct are selected
  • bioactive peptides which will diminish the adhesion of platelets and leukocytes, and thus diminish the occlusion events Adhesion in this setting can be inhibited by the libraries of the invention being inserted into endothelial cells (quiescent cells, or activated by cytokines, i e IL-1 , and growth factors, i e PDGF / EGF) and then screening for peptides that either 1 ) downregulate adhesion molecule expression on the surface of the endothelial cells (binding assay), 2) block adhesion molecule activation on the surface of these cells (signaling assay), or 3) release in an autoc ⁇ ne manner peptides that block receptor binding
  • Embo c phenomena can also be addressed by activating proteolytic enzymes on the cell surfaces of endothelial cells, and thus releasing active enzyme which can digest blood clots
  • delivery of the libraries of the invention to endothelial cells is done, followed by standard fluorogenic assays, which will allow monitoring of proteolytic activity on the cell surface towards a known substrate Bioactive peptides can then be selected which activate specific enzymes towards specific substrates
  • arterial inflammation in the setting of vasculitis and post-infarction can be regulated by decreasing the chemotactic responses of leukocytes and mononuclear leukocytes This can be accomplished by blocking chemotactic receptors and their responding pathways on these cells
  • Candidate bioactive libraries can be inserted into these cells, and the chemotactic response to diverse chemokmes (for example, to the IL-8 family of chemokmes, RANTES) inhibited in cell migration assays
  • arterial restenosis following coronary angioplasty can be controlled by regulating the proliferation of vascular mtimal cells and capillary and/or arterial endothelial cells
  • Candidate bioactive peptide libraries can be inserted into these cell types and their proliferation in response to specific stimuli monitored
  • One application may be intracellular peptides which block the expression or function of c-myc and other oncogenes in smooth muscle cells to stop their proliferation
  • a second application may involve the expression of libraries in vascular smooth muscle cells to selectively induce their apoptosis
  • Application of small molecules derived from these peptides may require targeted drug delivery, this is available with stents, hydrogel coatings, and infusion-based catheter systems
  • Peptides which downregulate endothel ⁇ n-1 A receptors or which block the release of the potent vasoconstrictor and vascular smooth muscle cell mitogen endothel ⁇ n-1 may also be candidates for therapeutics Peptides can be isolated from these libraries which inhibit growth of these cells, or which prevent
  • the present methods are useful in screening for decreases in atherosclerosis producing mechanisms to find peptides that regulate LDL and HDL metabolism
  • Candidate libraries can be inserted into the appropriate cells (including hepatocytes, mononuclear leukocytes, endothelial cells) and peptides selected which lead to a decreased release of LDL or diminished synthesis of LDL, or conversely to an increased release of HDL or enhanced synthesis of HDL
  • Bioactive peptides can also be isolated from candidate libraries which decrease the production of oxidized LDL, which has been implicated in atherosclerosis and isolated from atherosclerotic lesions This could occur by decreasing its expression, activating reducing systems or enzymes, or blocking the activity or production of enzymes implicated in production of oxidized LDL, such as 15-l ⁇ poxygenase in macrophages
  • the present methods are used in screens to regulate obesity via the control of food intake mechanisms or diminishing the responses of receptor signaling pathways that regulate metabolism
  • Bioactive peptides that regulate or inhibit the responses of neuropeptide Y (NPY), cholecystokinm and galanm receptors are particularly desirable
  • Candidate libraries can be inserted into cells that have these receptors cloned into them, and inhibitory peptides selected that are secreted in an autoc ⁇ ne manner that block the signaling responses to galanm and NPY
  • peptides can be found that regulate the leptm receptor
  • Candidate libraries may be used for screening for anti-apoptotics for preservation of neuronal function and prevention of neuronal death
  • Initial screens would be done in cell culture
  • One application would include prevention of neuronal death, by apoptosis, in cerebral ischemia resulting from stroke
  • Apoptosis is known to be blocked by neuronal apoptosis inhibitory protein (NAIP), screens for its upregulation, or effecting any coupled step could yield peptides which selectively block neuronal apoptosis
  • NAIP neuronal apoptosis inhibitory protein
  • Other applications include neurodegenerative diseases such as Alzheimer's disease and Hunt gton's disease
  • the present methods are useful in bone biology applications
  • Osteoclasts are known to play a key role in bone remodeling by breaking down "old” bone, so that osteoblasts can lay down “new” bone
  • Osteoclast overactivity can be regulated by inserting candidate libraries into these cells, and then looking for bioactive peptides that produce 1) a diminished processing of collagen by these cells, 2) decreased pit formation on bone chips, and 3) decreased release of calcium from bone fragments
  • the present methods may also be used to screen for agonists of bone morphogenic proteins, hormone mimetics to stimulate, regulate, or enhance new bone formation (in a manner similar to parathyroid hormone and cal ⁇ tonin, for example) These have use in osteoporosis, for poorly healing fractures, and to accelerate the rate of healing of new fractures
  • cell lines of connective tissue origin can be treated with candidate libraries and screened for their growth, proliferation, collagen stimulating activity, and/or proline incorporating ability on the target osteoblasts
  • candidate libraries can be expressed directly in osteoblasts or chondrocytes and screened for increased production of collagen or bone
  • the present methods are useful in skin biology applications Keratmocyte responses to a variety of stimuli may result in psoriasis, a prohferative change in these cells
  • Candidate libraries can be inserted into cells removed from active pso ⁇ atic plaques, and bioactive peptides isolated which decrease the rate of growth of these cells
  • the present methods are useful in the regulation or inhibition of keloid formation (i e excessive scarring)
  • keloid formation i e excessive scarring
  • wound healing for diabetic ulcers and other chronic "failure to heal" conditions in the skin and extremities can be regulated by providing additional growth signals to cells which populate the skin and dermal layers Growth factor mimetics may in fact be very useful for this condition
  • Candidate libraries can be inserted into skin connective tissue cells, and bioactive peptides isolated which promote the growth of these cells under "harsh” conditions, such as low oxygen tension, low pH, and the presence of inflammatory mediators
  • Cosmeceutical applications of the present invention include the control of melanin production in skin melanocytes
  • a naturally occurring peptide, arbutin is a tyrosine hydroxylase inhibitor, a key enzyme in the synthesis of melanin
  • Candidate libraries can be inserted into melanocytes and known stimuli that increase the synthesis of melanin applied to the cells
  • Bioactive peptides can be isolated that inhibit the synthesis of melanin under these conditions
  • the present methods are useful in endocrinology applications
  • the retroviral peptide library technology can be applied broadly to any endocrine, growth factor, cytokme or chemokme network which involves a signaling peptide or protein that acts in either an endocrine, parac ⁇ ne or autoc ⁇ ne manner that binds or dimerizes a receptor and activates a signaling cascade that results in a known phenotypic or functional outcome
  • the methods are applied so as to isolate a peptide which either mimics the desired hormone (i e , insulin, leptm, cal ⁇ tonin, PDGF, EGF, EPO, GMCSF, IL1-17, mimetics) or inhibits its action by either blocking the release of the hormone, blocking its binding to a specific receptor or carrier protein (for example, CRF binding protein), or inhibiting the intracellular responses of the specific target cells to that hormone Selection of peptides which increase the expression or release of hormones from the cells which normally produce them could have
  • the present methods are useful in infectious disease applications
  • Viral latency (herpes viruses such as CMV, EBV, HBV, and other viruses such as HIV) and their reactivation are a significant problem, particularly in immunosuppressed patients ( patients with AIDS and transplant patients)
  • the ability to block the reactivation and spread of these viruses is an important goal
  • Cell lines known to harbor or be susceptible to latent viral infection can be infected with the specific virus, and then stimuli applied to these cells which have been shown to lead to reactivation and viral replication This can be followed by measuring viral titers in the medium and scoring cells for phenotypic changes
  • Candidate libraries can then be inserted into these cells under the above conditions, and peptides isolated which block or dimmish the growth and/or release of the virus
  • these experiments can also be done with drugs which are only partially effective towards this outcome, and bioactive peptides isolated which enhance the virucidal effect of these drugs
  • CCR-5 is the required co-receptor, and there is strong evidence that a block on CCR-5 will result in resistance to HIV-1 infection.
  • the natural ligands for CCR-5, the CC chemokmes RANTES, MIP1a and MIP1 b are responsible for CD8+ mediated resistance to HIV
  • individuals homozygous for a mutant allele of CCR-5 are completely resistant to HIV infection
  • an inhibitor of the CCR-5/HIV interaction would be of enormous interest to both biologists and clinicians
  • the extracellular anchored constructs offer superb tools for such a discovery Into the transmembrane, epitope tagged, glycine-senne tethered constructs (ssTM V G20 E TM), one can place a random,
  • the present invention finds use with infectious organisms
  • Intracellular organisms such as mycobacte ⁇ a, listena, salmonella, pneumocystis, yersmia, leishmania, T cruzi, can persist and replicate within cells, and become active in immunosuppressed patients
  • Candidate libraries can be inserted into specific cells infected with these organisms (pre- or post-infection), and bioactive peptides selected which promote the intracellular destruction of these organisms in a manner analogous to intracellular "antibiotic peptides" similar to magainms
  • peptides can be selected which enhance the cidal properties of drugs already under investigation which have insufficient potency by themselves, but when combined with a specific peptide from a candidate library, are dramatically more potent through a synergistic mechanism
  • bioactive peptides can be isolated which alter the metabolism of these intracellular organisms, in such a way as to terminate
  • Antibiotic drugs that are widely used have certain dose dependent, tissue specific toxicities For example renal toxicity is seen with the use of gentamicm, tobramycm, and amphote ⁇ cin, hepatotoxicity is seen with the use of INH and ⁇ fampin, bone marrow toxicity is seen with chloramphenicol, and platelet toxicity is seen with ticarcillin, etc These toxicities limit their use
  • Candidate libraries can be introduced into the specific cell types where specific changes leading to cellular damage or apoptosis by the antibiotics are produced, and bioactive peptides can be isolated that confer protection, when these cells are treated with these specific antibiotics
  • the present invention finds use in screening for bioactive peptides that block antibiotic transport mechanisms
  • the rapid secretion from the blood stream of certain antibiotics limits their usefulness
  • penicillins are rapidly secreted by certain transport mechanisms in the kidney and choroid plexus in the brain
  • Probenecid is known to block this transport and increase serum and tissue levels
  • Candidate agents can be inserted into specific cells derived from kidney cells and cells of the choroid plexus known to have active transport mechanisms for antibiotics
  • Bioactive peptides can then be isolated which block the active transport of specific antibiotics and thus extend the serum halflife of these drugs
  • the present methods are useful in drug toxicities and drug resistance applications
  • Drug toxicity is a significant clinical problem This may manifest itself as specific tissue or cell damage with the result that the drug's effectiveness is limited Examples include myeloablation in high dose cancer chemotherapy, damage to epithelial cells lining the airway and gut, and hair loss Specific examples include ad ⁇ amycin induced cardiomyocyte death, cisplatinm-induced kidney
  • Drug toxicity may be due to a specific metabolite produced in the liver or kidney which is highly toxic to specific cells, or due to drug interactions in the liver which block or enhance the metabolism of an administered drug
  • Candidate libraries can be introduced into liver or kidney cells following the exposure of these cells to the drug known to produce the toxic metabolite
  • Bioactive peptides can be isolated which alter how the liver or kidney cells metabolize the drug, and specific agents identified which prevent the generation of a specific toxic metabolite The generation of the metabolite can be followed by mass spectrometry, and phenotypic changes can be assessed by microscopy
  • Such a screen can also be done in cultured hepatocytes, cocultured with readout cells which are specifically sensitive to the toxic metabolite Applications include reversible (to limit toxicity) inhibitors of enzymes involved in drug metabolism
  • Candidate libraries can be introduced into tumor cell lines (primary and cultured) that have demonstrated specific or multiple drug resistance Bioactive peptides can then be identified which confer drug sensitivity when the cells are exposed to the drug of interest, or to drugs used in combination chemotherapy The readout can be the onset of apoptosis in these cells, membrane permeability changes, the release of intracellular ions and fluorescent markers
  • the cells in which multidrug resistance involves membrane transporters can be preloaded with fluorescent transporter substrates, and selection carried out for peptides which block the normal efflux of fluorescent drug from these cells
  • Candidate libraries are particularly suited to screening for peptides which reverse poorly characterized or recently discovered intracellular mechanisms of resistance or mechanisms for which few or no chemosensitizers currently exist, such as mechanisms involving LRP (lung resistance protein) This protein has been implicated in multidrug resistance in ovarian carcinoma, metastatic malignant melanoma, and acute mye
  • the present methods are useful in improving the performance of existing or developmental drugs
  • First pass metabolism of orally administered drugs limits their oral bioavailability, and can result in diminished efficacy as well as the need to administer more drug for a desired effect
  • Reversible inhibitors of enzymes involved in first pass metabolism may thus be a useful adjunct enhancing the efficacy of these drugs
  • First pass metabolism occurs in the liver, thus inhibitors of the corresponding catabolic enzymes may enhance the effect of the cognate drugs
  • Reversible inhibitors would be delivered at the same time as, or slightly before, the drug of interest
  • Screening of candidate libraries in hepatocytes for inhibitors (by any mechanism, such as protein downregulation as well as a direct inhibition of activity) of particularly problematical isozymes would be of interest
  • Other applications could include reversible inhibitors of UDP-
  • the present methods are useful in immunobiology, inflammation, and allergic response applications
  • Selective regulation of T lymphocyte responses is a desired goal in order to modulate immune-mediated diseases in a specific manner
  • Candidate libraries can be introduced into specific T cell subsets (TH1 , TH2, CD4+, CD8+, and others) and the responses which characterize those subsets (cytokme generation, cytotoxicity, proliferation in response to antigen being presented by a mononuclear leukocyte, and others) modified by members of the library Agents can be selected which increase or diminish the known T cell subset physiologic response
  • This approach will be useful in any number of conditions, including 1) autoimmune diseases where one wants to induce a tolerant state (select a peptide that inhibits T cell subset from recognizing a self-antigen bearing cell), 2) allergic diseases where one wants to decrease the stimulation of IgE producing cells (select peptide which blocks release from T cell subsets of specific B-cell stimulating cytokines which induce switch to IgE production
  • Candidate libraries can be inserted into B cells and bioactive peptides selected which inhibit the release and synthesis of a specific immunoglobulin This may be useful in autoimmune diseases characterized by the overproduction of auto antibodies and the production of allergy causing antibodies, such as IgE Agents can also be identified which inhibit or enhance the binding of a specific immunoglobulin subclass to a specific antigen either foreign of self Finally, agents can be selected which inhibit the binding of a specific immunoglobulin subclass to its receptor on specific cell types
  • agents which affect cytokme production may be selected, generally using two cell systems For example, cytokme production from macrophages, monocytes, etc may be evaluated Similarly, agents which mimic cytokines, for example erythropoetm and IL1-17, may be selected, or agents that bind cytokines such as TNF- ⁇ , before they bind their receptor
  • Antigen processing by mononuclear leukocytes is an important early step in the immune system's ability to recognize and eliminate foreign proteins
  • Candidate agents can be inserted into ML cell lines and agents selected which alter the intracellular processing of foreign peptides and sequence of the foreign peptide that is presented to T cells by MLs on their cell surface in the context of Class II MHC
  • This agent would in fact induce immune tolerance and/or dimmish immune responses to foreign proteins
  • This approach could be used in transplantation, autoimmune diseases, and allergic diseases
  • inflammatory mediators cytokines, leukot ⁇ enes, prostaglandms, platelet activating factor, histamme, neuropeptides, and other peptide and lipid mediators
  • cytokines cytokines, leukot ⁇ enes, prostaglandms, platelet activating factor, histamme, neuropeptides, and other peptide and lipid mediators
  • Candidate libraries can be inserted into MLs, mast cells, eosmophils, and other cells participating in a specific inflammatory response, and bioactive peptides selected which inhibit the synthesis, release and binding to the cognate receptor of each of these types of mediators
  • Candidate library expression in mammalian cells can also be considered for other pharmaceutical-related applications, such as modification of protein expression, protein folding, or protein secretion
  • Candidate libraries resulting in bioactive peptides which select for an increased cell growth rate (perhaps peptides mimicking growth factors or acting as agonists of growth factor signal transduction pathways), for pathogen resistance (see previous section), for lack of sialylation or glycosylation (by blocking glycotransferases or rerouting trafficking of the protein in the cell), for allowing growth on autoclaved media, or for growth in serum free media, would all increase productivity and decrease costs in the production of protein pharmaceuticals
  • Random peptides displayed on the surface of circulating cells can be used as tools to identify organ, tissue, and cell specific peptide targeting sequences Any cell introduced into the bloodstream of an animal expressing a library targeted to the cell surface can be selected for specific organ and tissue targeting The bioactive peptide sequence identified can then be coupled to an antibody, enzyme, drug, imaging agent or substance for which organ targeting is desired
  • agents which may be selected using the present invention include 1) agents which block the activity of transcription factors, using cell lines with reporter genes, 2) agents which block the interaction of two known proteins in cells, using the absence of normal cellular functions, the mammalian two hybrid system or fluorescence resonance energy transfer mechanisms for detection, and 3) agents may be identified by tethering a random peptide to a protein binding region to allow interactions with molecules ste ⁇ cally close, i e within a signalling pathway, to localize the effects to a functional area of interest
  • agents serve to more fully describe the manner of using the above-described invention, as well as to set forth the best modes contemplated for carrying out various aspects of the invention It is understood that these examples in no way serve to limit the true scope of this invention, but rather are presented for illustrative purposes All references cited herein are incorporated by reference in their entireity
  • loops are selected based on having mobility in the loop or tip of the loop well above that of the most rigid parts of the beta-can structure (Yang et al , Nature Biotechnology 14, 1246-9, 1996, Ormo et al , Science 273, 1392-5, 1996)
  • the loops of most interest are those which are not rigidly coupled to the beta-can structure of the rest of GFP, this lack of rigid coupling may allow the most tolerance for sequence additions within the loops in a library construct
  • Loops can be selected as those which have the highest temperature factors in the crystal structures, and include loops 130-135, 154-159, 172-175, 188-193, and 208-216 in a GFP monomer
  • the temperature factor of the loop can be artificially increased by including flexible am o acids such as glycine in the linkers (see below)
  • Example 3 Mean fluorescence of GFP with test inserts 1 and 2 in loops 1-5, expressed in E coli
  • the GFP used is EGFP (Clontech Inc , Palo Alto, CA) and the two test sequences were inserted at the sites indicated in example 1
  • An equal number of bacteria (20000) representing clones of a single colonies were analyzed by fluorescence-activated cell sorting on a MoFlo cell sorter (Cytomation Inc , Ft Collins, CO) Intensity of FL1 was averaged
  • the relative fluorescence intensity was calculated as (WT fluorescence - fluorescence of loop ⁇ nsert)/(WT fluorescence - bkd) x 100% Constructs with insert 1 in loops 1 and 5 were not expressed due to cloning difficulties
  • Equal amounts of cell lysate from each loop insert were run on a 10% SDS gel and blotted to PVDF GFP was detected with anti-GFP antibody and the bands were observed using chemiluminescent detection
  • the intensity of individual bands was measured using a Sharp JX-330 scanning densitomer and Biol
  • the numbers for the relative fluorescence of the loop 2, 3, and 4 inserts are derived from the average value + 1 standard deviation for 1-2 independent clones with the specified insert
  • the specific fluorescence is the ratio of the relative fluorescence to the Western blot relative intensity
  • the standard deviation of the relative fluorescence was calculated as [fluorescence of insert/fluorescence of WT ⁇ (std dev of insert fluorescence/insert fluorescence) 2 + (std dev of WT fluorescence WT fluorescence) 2 ⁇ ] 05 (Bev ⁇ ngton, P 1969 Data reduction and error analysis for the physical sciences New York McGraw Hill, p 61-2) Data with an asterisk * was derived from cells with a 60-70% transfection efficiency and so can only be qualitatively compared with the rest of the data

Abstract

The invention relates to the use of scaffold proteins, particularly green fluorescent protein (GFP), in fusion constructs with random and defined peptides and peptide libraries, to increase the cellular expression levels, decrease the cellular catabolism, increase the conformational stability relative to linear peptides, and to increase the steady state concentrations of the random peptides and random peptide library members expressed in cells for the purpose of detecting the presence of the peptides and screening random peptide libraries. N-terminal, C-terminal, dual N- and C- terminal and one or more internal fusions are all contemplated. Novel fusions utilizing self-binding peptides to create a conformationally stabilized fusion domain are also contemplated.

Description

FUSIONS OF SCAFFOLD PROTEINS WITH RANDOM PEPTIDE LIBRARIES
FIELD OF THE INVENTION
The invention relates to the use of scaffold proteins, particularly detectable genes such as green fluorescent protein (GFP), luciferase, β-lactamase, etc., in fusion constructs with random and defined peptides and peptide libraries, to increase the cellular expression levels, decrease the cellular catabolism, increase the conformational stability relative to linear peptides, and to increase the steady state concentrations of the random peptides and random peptide library members expressed in cells for the purpose of detecting the presence of the peptides and screening random peptide libraries. N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions are all contemplated. Novel fusions utilizing self-binding peptides to create a conformationally stabilized fusion domain are also contemplated.
BACKGROUND OF THE INVENTION
The field of biomolecule screening for biologically and therapeutically relevant compounds is rapidly growing. Relevant biomolecules that have been the focus of such screening include chemical libraries, nucleic acid libraries and peptide libraries, in search of molecules that either inhibit or augment the biological activity of identified target molecules. With particular regard to peptide libraries, the isolation of peptide inhibitors of targets and the identification of formal binding partners of targets has been a key focus. However, one particular problem with peptide libraries is the difficulty assessing whether any particular peptide has been expressed, and at what level, prior to determining whether the peptide has a biological effect.
Green fluorescent protein (GFP) is a 238 amino acid protein. The crystal structure of the protein and of several point mutants has been solved (Ormo et al., Science 273, 1392-5, 1996; Yang et al., Nature Biotechnol. 14, 1246-51 , 1996). The fluorophore, consisting of a modified tripeptide, is buried inside a relatively rigid beta-can structure, where it is almost completely protected from solvent access. The fluorescence of this protein is sensitive to a number of point mutations (Phillips, G.N., Curr. Opin. Struct. Biol. 7, 821-27, 1997). The fluorescence appears to be a sensitive indication of the preservation of the native structure of the protein, since any disruption of the structure allowing solvent access to the fluorophoπc tπpeptide will quench the fluorescence
Abedi et al (Nucleic Acids Res 26, 623-30, 1998) have inserted peptides between residues contained in several GFP loops Inserts of the short sequence LEEFGS between adjacent residues at 10 internal insertion sites were tried Of these, inserts at three sites, between residues 157-158, 172-173 and 194-195 gave fluorescence of at least 1 % of that of wild type GFP Only inserts between residues 157-158 and 172-173 had fluorescence of at least 10% of wild type GFP When -SAG-random 20mer-GAS- peptide sequences were inserted at different sites internal to GFP, only two sites gave mean fluorescence intensities of 2% or more of the GFP-random peptide sequences 10-fold above background fluorescence These sites were insertions between residues 157-158 and 172-173
It is an object of the invention to provide compositions of fusion constructs of peptides with scaffold proteins, comprising for example detectable proteins such as GFP, and methods of using such constructs in screening of peptide libraries
SUMMARY OF THE INVENTION
In accordance with the objects outlined above, the present invention provides fusion proteins comprising a scaffold protein and a random peptide, fused to said scaffold protein, and nucleic acids which encode such fusion proteins In an additional aspect, the present invention provides libraries of a) fusion proteins, b) fusion nucleic acids, c) expression vectors comprising the fusion nucleic acids, and d) host cells comprising the fusion nucleic acids The present invention further comprises methods for screening for a bioactive peptide capable of confeπng a particular phenotype
In one aspect, a library of fusion proteins comprises a scaffold protein, a random peptide fused to the N-terminus of the scaffold protein and a representation structure that will present the random peptide in a conformationally restricted form In a preferred embodiment, each of the random peptide in the library is different
In one aspect, a library of fusion proteins comprises a scaffold protein, a random peptide fused to the C-terminus of the scaffold protein and a representation structure that will present the random peptide in a conformationally restricted form In a preferred embodiment, each of the random peptide in the library is different In one aspect, a library of fusion proteins comprises a scaffold protein, a random peptide inserted into the scaffold protein and at least one fusion partner In a preferred embodiment, each of the random peptide in the library is different In another preferred embodiment, the random peptide is inserted into a loop structure of said scaffold protein
In one aspect of the invention, the scaffold protein is a green fluorescent protein (GFP)
In one aspect of the invention, the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 130 to 135 of said GFP
In another aspect of the invention, the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 154 to 159 of said GFP
In another aspect of the invention, the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 172 to 175 of said GFP
In another aspect of the invention, the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 188 to 193 of said GFP
In another aspect of the invention, the GFP is from Aequrea and the random peptide is inserted into the loop comprising ammo acids 208 to 216 of said GFP
In one aspect of the invention, the GFP is from a Remlla species
In another aspect of the invention, the scaffold protein is β-lactamase
In another aspect of the invention, the scaffold protein is DHFR
In another aspect of the invention, the scaffold protein is β-galactosidase
In another aspect of the invention, the scaffold protein is luciferase
In another aspect of the invention, a library of fusion proteins is provided, comprising a linker between the random peptide and the scaffold protein
In another aspect of the invention, a library of fusion proteins is provided, comprising a second linker between the other end of the random peptide and the scaffold protein In another aspect of the invention, a library of fusion proteins is provided, comprising a -(gly)n- nke, wherein n≥2
In another aspect of the invention, a library of fusion proteins is provided, comprising a scaffold protein and a random peptide, wherein the random peptide replaces at least one ammo acid of said scaffold protein In a preferred embodiment, the ammo acid of said scaffold protein which is replaced by the random peptide is located within a loop structure of said scaffold protein
In one aspect of the invention, the library of fusion proteins and the library of nucleic acids comprise at least 105 different members
The invention further provides fusion nucleic acids encoding the fusion proteins In a preferred embodiment, the nucleic acid encoding the fusion protein comprises a nucleic acid encoding a random peptide, a nucleic acid encoding a scaffold protein and a nucleic acid encoding a fusion partner In another preferred embodiment, the nucleic acid encoding the random peptide is inserted internally into the nucleic acid encoding the scaffold protein
In another aspect of the invention, expression vectors are provided The expression vectors comprise one or more of the nucleic acids encoding the fusion proteins operably linked to regulatory sequences recognized by a host cell transformed with the nnucleic acids In a preferred embodiment the expression vectors are retroviral vectors Further provided herein are host cells comprising the vectors and the recombinant nucleic acids provided herein
In a further aspect, the invention provides methods of screening for bioactive peptides conferring a particular phenotype The methods comprise providing cells containing a fusion nucleic acid comprising nucleic acid encoding a fusion protein comprising a scaffold protein and a random peptide as above The cells are subjected to conditions wherein the fusion protein is expressed The cells are then assayed for the phenotype
Other aspects of the invention will become apparent to the skilled artisan by the following description of the invention
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 depicts the crystal structure of GFP showing the temperature factors used to pick some of the loops for internal insertion of random peptides Figures 2A, 2B, 2C, 2D, 2E and 2F depict the results of the examples Figure 2A schematically depicts the location of the loops Figures 2B-2F show the results and the mean fluorescence
Figure 3 depicts a helical wheel diagram of a parallel coiled coil For each helix, a or a' are at the N-termmus, and the residues in sequence are abcdefg or a'b'c'd'e'f g', which are the repeated to give individual helices abcdefg(abcdefg)nabcdefg or a'b'c'd'e'f g'(a'b'c'd'e'fg')na'b'c'd'e'fg' The core of the helix would be a, a', d and d', which would be combinations of hydrophobic strong helix forming residues such as ala/leu, or val/leu If residues e and e' are fixed as glu, and g and g' are fixed as lys, inter-helical salt bridges would further stabilize the coiled coil structure
Figure 4 depicts the ammo acid sequence of β lactamase TEM-1 from E coli Ammo acid residues 26-290 are shown
Figures 5A and 5B depict the crystal structure of E coli β-lactamase [PDB1 BTL, Jelsch et al , Proteins Struct , Funct Genet 16 364 (19930] Figure 5A shows an end-on view of the two helices to which the random library may be fused Figure 5B shows a side view of the two helices The two helices which are to be extended with random residues in this library are shown in yellow (C-terminal helix, containing residues 271-290, see Figure 4) and white (N- terminal helix, containing residues 26-40, see Figure 4) This protein has residues 1-25 removed The same residues may be removed in the library scaffold as well The active site ser 70 is shown in red Both helices are remote from the active site and therefore attachment of random residues to the N- and/or C-terminus should not affect the activity of the enzyme
Figure 6 depicts a model of β-lactamase colored by crystallographic temperature factor, with the most immobile regions shown in red and the more mobile regions in yellow The loops discussed in Legrande et al [Nature Biotechnology 17 67-72 (1999)] are shown in blue, the active site ser 70 is shown in white, while glu 166 is shown in blue-gray
Figure 7 depicts the structure of Cι-2, taken from the PDB file 2Cι-2 The reactive site loop are represented by residues 54-63, the residues supporting the loop structure are 51 , 65, 67, 69 and 83 These residues could be randomized in different combinations Loop-insert libraries are inserted between residues 72-73 and/or 44-45
Figure 8 depicts the structure of kanamycin nucleotidyl transferase dimer 1 KNY DETAILED DESCRIPTION OF THE INVENTION
Screening of combinatorial libraries of potential drugs on therapeutically relevant target cells is a rapidly growing and important field Peptide libraries are an important subset of these libraries However, to facilitate intracellular screening of these peptide libraries, a number of hurdles must be overcome In order to express and subsequently screen functional peptides in cells, the peptides need to be expressed in sufficient quantities to overcome catabo c mechanisms such as proteolysis and transport out of the cytoplasm into endosomes The peptides may also be conformationally stabilized relative to linear peptides to allow a higher binding affinity for their cellular targets In addition, measuring the expression level of these peptides can be difficult for example, it may be generally difficult to follow the expression of peptides in specific cells, to ascertain whether any particular cell is expressing a member of the library To overcome these problems, the present invention is directed to fusions of scaffold proteins, including variants, and random peptides that are fused in such a manner that the structure of the scaffold is not significantly perturbed and the peptide is metabolically conformationally stabilized This allows the creation of a peptide library that is easily monitored, both for its presence within cells and its quantity Thus, the peptides within or fused to a scaffold protein are displayed on or at the surface of the scaffold, therefore being accessible for interaction with potential functional targets
The scaffold proteins fall into two mam categories reporter proteins and structural proteins Reporter proteins are those that allow cells containing the reporter proteins to be distinguished from those that do not While determining expression of a particular peptide is difficult, numerous methods are known in the art to measure expression of larger proteins or the expression of genes encoding them Expression of a gene, e g , can be measured by measuring the level of the RNA produced However, this analysis, although direct, is difficult, usually not very sensitive and labor intensive A more advantageous approach is offered by measuring the expression of reporter genes Reporter gene expression is generally more easily monitored, since in many cases, the cellular phenotype is altered, either due to the presence of a detectable alterations, such as the presence of a fluorescent protein (which, as outlined herein, includes both the use of fusions to the detectable gene itself, or the use of detectable gene constructs that rely on the presence of the scaffold protein to be activated, e g when the scaffold is a transcription factor), by the addition of a substrate altered by the reporter protein (e g chromogenic (including fluorogenic) substrates for reporter enzymes such as luciferase, β- galactosidase, etc ), or by conferring a drug resistive phenotype, for example Reporter proteins generally fall into one of several classes, including detection genes, indirectly detectable genes, survival genes, etc That is, by inserting a peptide library into a gene that is detectable, for example GFP or luciferase, the expression of the peptide library may be monitored Similarly, the insertion of a gene into a survival gene, such as an antibiotic resistance gene, allows detection of the expression of the library
In some embodiments, it is also desirable for the peptides to have different structural biases, since different protein or other functional targets may require peptides of different specific structures to interact tightly with their surface or crevice binding sites Thus, different libraries, each with a different structural bias, may be utilized to maximize the chances of having high affinity members for a variety of different targets Thus, for example, as is more fully outlined below, random peptide libraries with a helical bias or extended structure bias may be made through fusion to the N- terminus and/ or C-terminus of certain scaffold proteins Similarly, random peptide libraries with a coiled coil bias may be made via fusion to the N- and/or C- terminus of particular scaffold proteins Extended conformations of the random library may be made using insertions between dimenzing scaffold proteins Preferred embodiments utilize loop formations via insertion into loops in scaffold proteins, ammo acid residues within the respective loop structures may be replaced by the random peptide library or the random peptide library may be inserted in between two ammo acid residues located within a loop structure
Accordingly, the present invention provides fusion proteins of scaffold proteins and random peptides By "fusion protein" or "fusion polypeptide" or grammatical equivalents herein is meant a protein composed of a plurality of protein components, that while typically unjoined in their native state, typically are joined by their respective ammo and carboxyl termini through a peptide linkage to form a single continuous polypeptide "Protein" in this context includes proteins, polypeptides and peptides Plurality in this context means at least two, and preferred embodiments generally utilize two components It will be appreciated that the protein components can be joined directly or joined through a peptide linker/spacer as outlined below In addition, as outlined below, additional components such as fusion partners including presentation structures, targeting sequences, etc may be used
The present invention provides fusion proteins of scaffold proteins and random peptides By "scaffold protein", "scaffold polypeptide" , "scaffold" or grammatical equivalents thereof, herein is meant a protein to which am o acid sequences, such as random peptides, can be fused The peptides are exogeneous to the scaffold, that is, they are not usually present in the protein
Upon fusion, the scaffold protein usually allows the display of the random peptides in a way that they are accessible to other molecules Scaffold proteins fall into several classes, including, reporter proteins (which includes detectable proteins, survival proteins and indirectly detectable proteins), and structural proteins
In a preferred embodiment, the scaffold protein is a reporter protein By "reporter protein" or grammatical equivalents herein is meant a protein that by its presence in or on a cell or when secreted in the media allow the cell to be distinguished from a cell that does not contain the reporter protein As described herein, the cell usually comprises a reporter gene that encodes the reporter protein
Reporter genes fall into several classes, as outlined above, including, but not limited to, detection genes, indirectly detectable genes, and survival genes
In a preferred embodiment, the scaffold protein is a detectable protein A "detectable protein" or "detection protein" (encoded by a detectable or detection gene) is a protein that can be used as a direct label, that is, the protein is detectable (and preferably, a cell comprising the detectable protein is detectable) without further manipulations or constructs As outlined herein, preferred embodiments of screening utilize cell sorting (for example via FACS) to detect scaffold (and thus peptide library) expression Thus, in this embodiment, the protein product of the reporter gene itself can serve to distinguish cells that are expressing the detectable gene In this embodiment, suitable detectable genes include those encoding autofluorescent proteins
As is known in the art, there are a variety of autofluorescent proteins known, these generally are based on the green fluorescent protein (GFP) from Aequorea and variants thereof, including, but not limited to, GFP, (Chalfie, et al , "Green Fluorescent Protein as a Marker for Gene Expression," Science 263(5148) 802-805 (1994)), enhanced GFP (EGFP, Clontech - Genbank Accession Number U55762 )), blue fluorescent protein (BFP, Quantum Biotechnologies, Inc 1801 de Maisonneuve Blvd West, 8th Floor, Montreal (Quebec) Canada H3H 1J9, Stauber, R H Biotechniques 24(3) 462-471 (1998), Heim, R and Tsien, R Y Curr Biol 6 178-182 (1996)), and enhanced yellow fluorescent protein (EYFP, Clontech Laboratories, Inc , 1020 East Meadow Circle, Palo Alto, CA 94303) In addition, there are recent reports of autofluorescent proteins from Renilla species See WO 92/15673, WO 95/07463, WO 98/14605, WO 98/26277, WO 99/49019, U S patent 5,292,658, U S patent 5,418,155, U S patent 5,683,888, U S patent 5,741 ,668, U S patent 5,777,079, U S patent 5,804,387, U S patent 5,874,304, U S patent 5,876,995, and U S patent 5,925,558, all of which are expressly incorporated herein by reference In a preferred embodiment, the scaffold protein is Aequorea green fluorescent protein or one of its variants, see Cody et al , Biochemistry 32 1212-1218 (1993), and Inouye and TSUJI, FEBS Lett 341 277-280 (1994), both of which are expressly incorporated by reference herein
Accordingly, the present invention provides fusions of green fluorescent protein (GFP) and random peptides By "green fluorescent protein" or "GFP" herein is meant a protein with at least 30% sequence identity to GFP and exhibits fluorescence at 490 to 600 nm The wild-type GFP is 238 ammo acids in length, contains a modified tπpeptide fluorophore buried inside a relatively rigid β-can structure which protects the fluorophore from the solvent, and thus solvent quenching See Prasher et al , Gene 111(2) 229-233 (1992), Cody et al , Biochem 32(5) 1212- 1218 (1993), Ormo et al, Science 273 1392-1395 (1996), and Yang et al , Nat Biotech 14 1246-1251 (1996), all of which are hereby incorporated by reference in their entirety) Included within the definition of GFP are derivatives of GFP, including ammo acid substitutions, insertions and deletions See for example WO 98/06737 and U S Patent No 5,777,079, both of which are hereby incorporated by reference in their entirety Accordingly, the GFP proteins utilized in the present invention may be shorter or longer than the wild type sequence Thus, in a preferred embodiment, included within the definition of GFP proteins are portions or fragments of the wild type sequence For example, GFP deletion mutants can be made At the N- termmus, it is known that only the first ammo acid of the protein may be deleted without loss of fluorescence At the C-terminus, up to 7 residues can be deleted without loss of fluorescence, see Phillips et al , Current Opm Structural Biol 7 821 (1997))
In one embodiment, the GFP proteins are derivative or variant GFP proteins That is, as outlined more fully below, the derivative GFP will contain at least one ammo acid substitution, deletion or insertion, with ammo acid substitutions being particularly preferred The am o acid substitution, insertion or deletion may occur at any residue within the GFP protein These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA encoding the GFP protein, using cassette or PCR mutagenesis or other techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture as outlined above However, variant GFP protein fragments having up to about 100-150 residues may be prepared by in vitro synthesis using established techniques Ammo acid sequence variants are characterized by the predetermined nature of the variation, a feature that sets them apart from naturally occurring allelic or mterspecies variation of the GFP protein ammo acid sequence The variants typically exhibit the same qualitative biological activity as the naturally occurring analogue, although variants can also be selected which have modified characteristics as will be more fully outlined below That is, in a preferred embodiment, when non-wild-type GFP is used, the derivative preferably has at least 1 % of wild-type fluorescence, with at least about 10% being preferred, at least about 50-60% being particularly preferred and 95% to 98% to 100% being especially preferred In general, what is important is that there is enough fluorescence to allow sorting and/or detection above background, for example using a fluorescence-activated cell sorter (FACS) machine However, in some embodiments, it is possible to detect the fusion proteins non-fluorescently, using, for example, antibodies directed to either an epitope tag (i e purification sequence) or to the GFP itself In this case the GFP scaffold does not have to be fluorescent, similarly, as outlined below, any of the scaffolds need not be biologically active, if it can be shown that the scaffold is folding correctly and/or reproducibly
As will be appreciated by those in the art, any of the scaffold proteins or the genes encoding them may be wild type or variants thereof These variants fall into one or more of three classes substitutional, insertional or deletional variants These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA encoding the scaffold protein, using cassette or PCR mutagenesis or other techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture as outlined herein However, variant protein fragments having up to about 100-150 residues may be prepared by in vitro synthesis using established techniques Ammo acid sequence variants are characterized by the predetermined nature of the vaπation, a feature that sets them apart from naturally occurring allelic or mterspecies variation of the scaffold protein ammo acid sequence The variants typically exhibit the same qualitative biological activity as the naturally occurring analogue, although variants can also be selected which have modified characteristics as will be more fully outlined below
While the site or region for introducing an ammo acid sequence variation is predetermined, the mutation per se need not be predetermined For example, in order to optimize the performance of a mutation at a given site, random mutagenesis may be conducted at the target codon or region and the expressed scaffold variants screened for the optimal combination of desired activity Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known, for example, M13 primer mutagenesis and PCR mutagenesis Screening of the mutants is done using assays of scaffold protein activities
Ammo acid substitutions are typically of single residues, insertions usually will be on the order of from about 1 to 20 ammo acids, although considerably larger insertions may be tolerated Deletions range from about 1 to about 20 residues, although in some cases deletions may be much larger Substitutions, deletions, insertions or any combination thereof may be used to arrive at a final derivative Generally these changes are done on a few ammo acids to minimize the alteration of the molecule However, larger changes may be tolerated in certain circumstances When small alterations in the characteristics of a scaffold protein, such as GFP, are desired, substitutions are generally made in accordance with the following chart
Chart I
Original Residue Exemplar v Substitutions
Ala Ser Arg Lys Asn Gin, His Asp Glu Cys Ser Gin Asn Glu Asp Gly Pro His Asn, Gin He Leu, Val Leu lie, Val Lys Arg, Gin, Glu Met Leu, lie Phe Met, Leu, Tyr Ser Thr Thr Ser Trp Tyr Tyr Trp, Phe Val lie, Leu
Substantial changes in function or immunological identity are made by selecting substitutions that are less conservative than those shown in Chart I For example, substitutions may be made which more significantly affect the structure of the polypeptide backbone in the area of the alteration, for example the alpha-helical or beta-sheet structure, the charge or hydrophobicity of the molecule at the target site, or the bulk of the side chain The substitutions which in general are expected to produce the greatest changes in the polypeptide's properties are those in which (a) a hydrophi c residue, e g seryl or threonyl, is substituted for (or by) a hydrophobic residue, e g leucyl, isoleucyl, phenylalanyl, valyl or alanyl, (b) a cysteme or proline is substituted for (or by) any other residue, (c) a residue having an electropositive side chain, e g lysyl, argmyl, or histidyl, is substituted for (or by) an electronegative residue, e g glutamyl or aspartyl, or (d) a residue having a bulky side chain, e g phenylalanine, is substituted for (or by) one not having a side chain, e g glycme As outlined above, the variants typically exhibit the same qualitative biological activity (i e fluorescence) although variants also are selected to modify the characteristics of the scaffold proteins as needed
In addition, scaffold proteins can be made that are longer than the wild-type, for example, by the addition of epitope or purification tags, the addition of other fusion sequences, etc , as is more fully outlined below
In preferred embodiment, the scaffold protein is a variant GFP that has low or no fluorescence, but is expressed in mammalian cells at a concentration of at least about 10 nM, preferably at a concentration of at least about 100 nM, more preferably at a concentration of at least about 1 μM, even more preferably at a concentration of at least about 10 μM and most preferred at a concentration of at least about 100 μM
A random peptide is fused to a scaffold protein to form a fusion polypeptide By "fused" or "operably linked" herein is meant that the random peptide, as defined below, and the scaffold protein, as exemplified by GFP herein, are linked together, in such a manner as to minimize the disruption to the stability of the scaffold structure (i e it can retain biological activity) In the case of GFP, the scaffold preferably retains its ability to fluoresce, or maintains a Tm of at least 42°C As outlined below, the fusion polypeptide (or fusion polynucleotide encoding the fusion polypeptide) can comprise further components as well, including multiple peptides at multiple loops, fusion partners, etc
The fusion polypeptide preferably includes additional components, including, but not limited to, fusion partners and linkers
In a preferred embodiment, the random peptide is fused to the N-terminus of the GFP The fusion can be direct, i e with no additional residues between the C-terminus of the peptide and the N-terminus of the GFP, or indirect, that is, intervening ammo acids are used, such as one or more fusion partners, including a linker In this embodiment, preferably a presentation structure is used, to confer some conformational stability to the peptide Particularly preferred embodiments include the use of dimeπzation sequences
In one embodiment, N-termmal residues of the GFP are deleted, i e one or more ammo acids of the GFP can be deleted and replaced with the peptide However, as noted above, deletions of more than 7 am o acids may render the GFP less fluorescent, and thus larger deletions are generally not preferred In a preferred embodiment, the fusion is directly to the first ammo acid of the GFP
In a preferred embodiment, the random peptide is fused to the C-terminus of the GFP As above for N-terminal fusions, the fusion can be direct or indirect, and C-terminal residues may be deleted
In a preferred embodiment, peptides and fusion partners are added to both the N- and the C- termmus of the GFP As the N- and C-terminus of GFP are on the same "face" of the protein, in spatial proximity (within 18 A), it is possible to make a non-covalently "circular" GFP protein using the components of the invention Thus for example, the use of dimeπzation sequences can allow a noncovalently cyc zed protein, by attaching a first dimeπzation sequence to either the N- or C-terminus of GFP, and adding a random peptide and a second dimeπzation sequence to the other terminus, a large compact structure can be formed
In a preferred embodiment, the random peptide is fused to an internal position of the GFP, that is, the peptide is inserted at an internal position of the GFP While the peptide can be inserted at virtually any position, preferred positions include insertion at the very tips of "loops" on the surface of the GFP, to minimize disruption of the GFP beta-can protein structure In a preferred embodiment, loops are selected as having the highest termperature factors in the crystal structure as outlined in the Examples
In a preferred embodiment, the random peptide is inserted, without any deletion of GFP residues That is, the insertion point is between two ammo acids in the loop, adding the new ammo acids of the peptide and fusion partners, including linkers Generally, when linkers are used, the linkers are directly fused to the GFP, with additional fusion partners, if present, being fused to the linkers and the peptides
In a preferred embodiment, the peptide is inserted into the GFP, with one or more GFP residues being deleted, that is, the random peptide (and fusion partners, including linkers) replaces one or more residues In general, when linkers are used, the linkers are attached directly to the GFP, thus it is linker residues which replace the GFP residues, again generally at the tip of the loop In general, when residues are replaced, from one to five residues of GFP are deleted, with deletions of one, two, three, four and five ammo acids all possible Specific preferred deletions are outlined below For the structure of GFP, see Figures 1 and 2 Preferred insertion points in loops include, but are not limited to, loop 1 (ammo acids 130-135), loop 2 (ammo acids 154-159), loop 3 (ammo acids 172-175), loop 4 (ammo acids 188-193), and loop 5 (ammo acids 208-216)
Particularly preferred embodiments include insertion of peptides and associated structures into loop 1 , ammo acids 130-135 In a preferred embodiment, one or more of the loop ammo acids are deleted, with the deletion of asp133 being preferred
In a preferred embodiment, peptides (and fusion partners, if present), are inserted into loop 2, ammo acids 154-159 In a preferred embodiment, one or more of the loop ammo acids are deleted, with the deletion of both Iys156 and gln157 being preferred
In a preferred embodiment, peptides (and fusion partners, if present), are inserted into loop 3, ammo acids 172-175 In a preferred embodiment, one or more of the loop ammo acids are deleted, with the deletion of asp173 being preferred
In a preferred embodiment, peptides (and fusion partners, if present), are inserted into loop 4, ammo acids 188-193 In a preferred embodiment, one or more of the loop am o acids are deleted, with the simultaneous deletion of glyl 89, asp190, gly 191 , and pro192 being preferred
In a preferred embodiment, peptides (and fusion partners, if present), are inserted into loop 5, ammo acids 208-216 In a preferred embodiment, one or more of the loop ammo acids are deleted, with the simultaneous deletion of asn212, glu213 and Iys214 being preferred
In a preferred embodiment, peptides (including fusion partners, if applicable) can be inserted into more than one loop of the scaffold at a time Thus, for example, adding peptides to both loops 2 and 4 of GFP can increase the complexity of the library but still allow presentation of these loops on the same face of the protein Similarly, it is possible to add peptides to one or more loops and add other fusion partners to other loops, such as targeting sequences, etc
Thus, fusion polypeptides compπsing GFP and random peptides are provided In addition, to facilitate the introduction of random peptides into the GFP, a preferred embodiment provides GFP proteins with a multisite cloning site inserted into at least one loop outlined above
In one embodiment, for example when linkers or other fusion partners are not used, the scaffold may not be GFP In a preferred embodiment, the scaffold is a Renilla GFP
In one embodiment, the scaffold is not Aequorea GFP
In some embodiments, the scaffold is not any GFP
Ifi a preferred embodiment, the scaffold protein is an indirectly detectable protein As for the reporter proteins, cells that contain the indirectly detectable protein can be distinguished from those that do not, however, this is as a result of a secondary event For example, a preferred embodiment utilizes "enzymatically detectable" scaffolds that comprise enzymes that will act on chromogenic, and particularly fluorogenic, substrates, to generate fluorescence, such as luciferase, β-galactosidase, and β-lactamase Alternatively, the indirectly detectable protein may require a recombinant construct in a cell that may be activated by the scaffold, for example, scaffold transcπption factors or mducers that will bind to a promoter linked to an autofluorescent protein such that transcription of the autofluorescent protein occurs
In a preferred embodiment, the scaffold is β-lactamase B-lactamase is generally secreted into the peπplasm of bacteria and provides resistance to a variety of penicillins and cephalosponns, including the antibiotic ampicillin Thus, antibiotic selection of cells comprising a fusion protein of a β-lactamase scaffold with peptide library members allows a determination of library expression This allows examination of the effects on scaffold folding of different library insertion sites, fusion sites, or library biases by looking at the survival percentage after selection with a β-lactam antibiotic Usually, eukaryotic β-lactamase libraries have the leader sequence removed to avoid their secretion from the cell Since β-lactamase is readily assayed using coloπmetπc reagents [Marshall et al , Diagn Microbiol Infect Dis 22 353-5 (1995)] or fluorophoπc reagents inside a live mammalian cell [Zlokarnik et al , Science 279 84-88 (1998)] the enzyme activity in cell lysates or in live cells allows a ready determination of the fraction of cells which have expressed library members, and cells expressing active β-lactamase library members can be FACS-sorted on the basis of changes in the coloπmetπc or fluorometπc reagents This enhances the ability to rapidly perform functional screens for peptide library members which alter cell function in a specific fashion
"β-lactamase" herein includes β-lactamases produced by a variety of microorganisms, including TEM-type extended spectrum β-lactamases (such as from E coli, see below) and class A β- lactamases β-lactamases within the scope of this invention thus include, but are not limited to TEM-1 β-lactamase from E coli, β-lactamase from Pseudomonas aerugmosa, TEM-26B β- lactamase from Klebsiella oxytoca, class A β-lactamase from Capnocytophaga oc racea, TEM- 6 β-lactamase (EC 3.5.2.6) from E. coli, TEM-28 β-lactamase from E. coli, extended-spectrum β-lactamase TEM-10 from Morganella morganii, class A β-lactamase from Klebsiella pneumoniae, extended-spectrum β-lactamase CAZ-7 from Klebsiella pneumoniae, TEM-3 β- lactamase (EC 3.5.2.6) from Klebsiella pneumoniae plasmid. β-lactamases with a high sequence homology to TEM-1 from E. coli, especially in the N-and C-terminal helices or in the 84-89 loop, are also preferred.
Accordingly, fusion proteins comprising a β-lactamase scaffold and peptides as outlined below are provided. As for GFP and all the scaffold proteins outlined herein, N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions, either separately or in combination, are all contemplated.
In a preferred embodiment, internal fusions are preferred. The site of fusion is determined based on the structures of several β-lactamases, which are known; e.g.: β-lactamase from Bacillus licheniformis (see Moews et al., Proteins 7(2): 156-71 (1990); Knox and Moews, J. Mol. Biol. 220(2):435-55 (1991 )); β-lactamase from Staphylococcus aureus (see Herzberg, J. Mol. Biol. 217(4):701-19 (1991 ); and Chen et al., Biochemistry 35(38):12251-8 (1996)); TEM-1 β- lactamases (see Swaren et al., Biochemistry 38(30):9570-6 (1999); Jelsch et al., Proteins 16(4):364-83 (1993); and Maveyraud et al., Biochemistry 37(8):2622-8 (1998)); class A β- lactamase Toho-1 (see Ibuka et al., J. Mol. Biol. 285(5):2079-87 (1999)); zinc β-lactamase (see Concha et al., Structure 4(7):823-36 (1996)), all of which are expressly incorporated by reference. Insertions of amino acids into loop structures within β-lactamase are especially preferred.
In some embodiments, for example if active β-lactamase enzymatic activity is undesirable in mammalian cells or in bacteria used to test the libraries, such as toxicity to cells or interference with specific functional assays, or to provide an alternative scaffold, the β-lactamase libraries are made using β-lactamase inactivated by site-specific mutations. In the class A β-lactamase PER-1 , for example, ala164 would be replaced by arg, or glu166 replaced by ala (see Bouthers et al., Biochem. J. 330:1443-9 (1998)). Likewise, in the TEM-1 β-lactamase, the active site ser70 or glu166 is replaced with ala (Adachi et al., J. Biol. Chem. 266:3186-91 (1991 )). In the class A β-lactamase from B. Licheniformis, glu166 could be replaced with ala (Knox et al., Protein Eng. 6:11-18 (1993)). As will be appreciated by those in the art, inactive yet folded scaffold proteins, including β-lactamase, may be used.
Active mutants of β-lactamase which are more stable than the wild type enzyme are also preferred as library scaffolds for loop-insert libraries. These mutants can have the advantage that their extra stability enhances the folding of library members with particularly destabilizing random library sequences. Examples of such mutants include E104K and E240K (Raquet et al., Proteins 23:63-72 (1995)). Alternatively, the mutation M182T, which is a global suppressor of missense mutations (Huang and Palzkill, Proc. Natl. Acad. Sci. U.S.A. 94:8801-6 (1997)) may also be included in the scaffold to suppress folding or stability defects resulting in some library members. Again, such reasoning may not only apply for β-lactamase, but for all other enzymes or proteins-disclosed herein.
In a preferred embodiment, a derivative of β-lactamase is used as a scaffold protein: N- terminus-BLA-C-terminus, compπsing residues 26-290 of E. coli TEM-1 β-lactamase, or similar residues of Staphylococcus aureus or other β-lactamases (e.g., see Figures 5A, 5B, and 6).
In a preferred embodiment, for optimal constraint of a random peptide library, the main site of insertion includes insertion of random amino acids (optionally with linkers and other fusion partners as outlined below) in relative mobile loops which are not close to the active site of the enzyme. Figure 6 shows a model of β-lactamase depicting the most immobile and mobile regions.
In a preferred embodiment, a preferred loop for insertion of peptide libraries is the loop including I84-D85-A86-G87-Q88-E89 (termed "β-lactamase loop 1" herein), which connects a helix at its N-terminus and an irregular region at its C-terminus. This loop is different from the loops described by Legendre et al. (Nature Biotechnology 17:67-72 (1999)), who specifically selected loops near or affecting the active site to modulate enzyme activity. Here no attenuation of activity is intended or desired.
As outlined above for GFP, one or more loop residues may be replaced or alternatively the insert may be between two residues. In one embodiment, I84, D85 and E89 are fixed in the library since the side chains of each appear to interact with the rest of the β-lactamase structure, although this is not required. Q88 may also optionally be fixed. A86 and G87 may be are replaced, for example with random residues or with random residues flanked by linker residues.
As is further described below, linker amino acids on one or both sides may comprise 2, 3, 4, or more glycines, in order to provide a flexible region between the random library and the rest of the protein. However, as will be appreciated by those in the art, if the loop is mobile enough the linker may not need any glycines. The presence of multiple glycines at least partly conformationally decouples the library from the rest of the protein, enhancing the chances that the library members fold and create active β-lactamase
In another preferred embodiment, random residues are inserted into alternate loop sites, again, linkers and other fusion partners may optionally be used Preferred embodiments utilize at least one glycine linker on either side of the random insert to allow a high percentage of β- lacta ase-ra idom
Figure imgf000020_0001
to the relative immobriity of the backbone and some of the side chains of the loop
In a preferred embodiment, loop residues can be replaced or inserted into at positions at D254- G255-K256 ("β-lactamase loop 2"), again with optional linkers, preferably glycine residues, and other fusion partners In this loop, replacement of the three residues is preferred
In a preferred embodiment, loop residues can be replaced or inserted into at positions at A227- G228 ("β-lactamase loop 3"), again with optional linkers, preferably glycine residues, and other fusion partners In this loop, replacement of the two residues is preferred In some backbones, such as the Bacillus lichemfirmis (PDB structure 4BLM) protein, K255-G256-D257 is the loop of choice
In a preferred embodiment, loop residues can be replaced or inserted into at positions at N52- S53 ("β-lactamase loop 4"), again with optional linkers, preferably glycine residues, and other fusion partners In this loop, replacement of the two residues is preferred In some backbones, such as the Bacillus lichemfirmis (PDB structure 4BLM) protein, G52-T53-N54 is the loop of choice
In a preferred embodiment, the random peptide library is fused to the N- or C-terminus of β- lactamase This optimizes the chances that the scaffold folds well and independently of the sequence of the random peptide library Such a library with an alpha-helical bias is used e g , for binding to proteins with binding sites preferring alpha helices, such as ieucine zipper proteins, coiled coils, or helical bundles These helices also act by displacing an existing helix in one of the above structures To create a bias for a helical structure, the random peptide sequences (chosen from all 20 natural L-amino acids) are fused to the end of a helix which is already nucleated, i e , which is stable within the native structure and has at least several turns This can be accomplished by fusion directly to the C-terminal or N-terminal residues of the selected β-lactamases, since both of these termini are extended alpha helices In another preferred embodiment the library is strongly biased to an alpha helical structure In this case the random peptide residues would be composed only of relatively strong helix formers, including M, K, E, A, F, L, R, D, Q, I, or V (e g , see Lyu et al , Science 250 (4981 ) 669- 673 (1990), O'Neil and DeGrado Science 250 (4981 ) 646-651 (1990)]
In another preferred embodiment, mutants of β-lactamase are used which include substitutions of P27 in the TEM-1 truncated sequence with any helix-forming ammo acid, such as M, K, E, A, F, L, R, D, Q, I, or V
In a preferred embodiment, the random peptide library is fused to the C-terminus of β-lactamase and the resulting library has the following schematic structure "N-terminus-BLA-C-terminus- spacer residues-random peptide lιbrary-(+/- optional C-cap residues)"
In another preferred embodiment, the random peptide library is fused to the N-terminus of β- lactamase and the resulting library has the following schematic structure "(+/- optional N-cap resιdues)-random peptide library-spacer residues-N-terminus-BLA-C-terminus" For cellular expression the first residue would be the strong helix former M
In a preferred embodiment, 1 , 2, 3, 4, 5 or more spacer residues may be inserted between the β-lactamase structure and the random peptide library In the case of a helix-biased library these spacers may all be strong helix formers, such as M, K, E, A, F, L, R, D, Q, I, or V, in any combination, or in particular sequences such that L and E are 3-4 residues apart, allowing a side chain salt bridge to further stabilize the helix The spacers may be charged, so that it would be less likely to be inserted into the interior of the β-lactamase structure
In a preferred embodiment, the spacer sequence may be KLEALEG, which would bias the sequence to form an alpha helix and interact in a parallel coiled-coil fashion with a helix in a target protein [Monera et al , j Biol Chem 268 19218 (1993)]
In another preferred embodiment, the spacer sequence for β-lactamase C-terminal helix biased libraries may be EEAAKA Combined with C-terminal wild type sequence -KHW290 from E coli TEM-1 β-lactamase, this would give -KHW2g0E291E292A293A294K295A296 E291 would be in a position to form an i, ι+4 salt bridge with K295, and E292 could form a similar salt bridge with K288 This would stabilize an alpha helix A293A2g4K295A296 would form an AXXA motif allowing insertion of a Sfi-I restriction site in the DNA encoding this region, thereby allowing the cloning of random peptide libraries onto the C-terminus of β-lactamase In another preferred embodiment, the spacer sequence includes the sequence A292E293K2g4A295K2g6A297E298, which would also allow two i, ι+4 salt bridges
In a preferred embodiment, the scaffold protein is luciferase The bioluminescent reaction catalyzed by luciferase requires lucifeπn, ATP, magnesium, and molecular 02 Mixing these components results in a rapidly decaying flash of light which is detected, e g by using a luminometer
In a preferred embodiment, the reporter protein is firefly luciferase [de Wet et al , Mol Cell Biol 7 725-737 (1987), Yang and Thomason, supra, Bronstein et al , supra) Firefly luciferase can also be detected in live cells when soluble luciferase substrates, capable of crossing the plasma membrane are employed (Bronstein et al , supra) The use of firefly luciferase is especially preferred because there is only minimal endogenous activity in mammalian cells Luciferases have been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers E08320, E05448, D25416 S61961 , U51019, M15077, L39928, L39929, AF085332, U89490, U31240, M10961 , M65067, M62917, M25666, M63501 , M55977 U03687, and M26194)
In a preferred embodiment, the scaffold protein is Renilla reniformis luciferase Renilla luciferase, DNA encoding Renilla luciferase, and use of the Renilla reniformis DNA to produce recombinant luciferase, as well as DNA encoding luciferase from other coelenterates, are well known in the art and are available [see, e g , SEQ ID No 1 , U S patent Nos 5,418, 155 and 5,292,658, see also, Prasher et al , Biochem Biophys Res Commun 126 1259-1268 (1985), Cormier, "Renilla and Aequorea bioluminescence" in Bioluminescence and Chemiluminescence, pp 225-233 (1981 ), Charbonneau et al , J Biol Chem 254 769-780 (1979), Ward et al , J Biol Chem 254 781-788 (1979), Lorenz et al , Proc Natl Acad Sci U S A 88 4438-4442 (1981 ) Hon et al , Proc Natl Acad Sci U S A 74 4285-4287 (1977), Hoπ et al , Biochemistry 134 2371-2376 (1975), Inouye et al , Jap Soc Chem Lett 141-144 (1975), and Matthews et al , Biochemistry 16 85-91 (1979)]
As above, fusion proteins comprising luciferase and peptide libraries may be made, at the N- terminus, the C-terminus, both, or one or more internal fusions can be utilized, in combination or alone The site of fusion may be determined based on the structures of firefly luciferase [Franks et al , Biophys J 75(5) 2205-11 (1998), Conti et al , Structure 4(3) 287-98 (1996)] or bacterial luciferase [Fisher et al , Biochemistry 34(20) 6581-6 (1995), Fisher et al , J Biol Chem
271 (36) 21956-68 (1996), Tanner et al , Biochemistry 36(4) 665-72 (1997), and Thoden et al , Protein Sci 6(1 ) 13-23 (1997)], which have been determined Insertions of ammo acids into loop structures within luciferase are especially preferred
In a preferred embodiment, the scaffold protein is β-galactosidase (Alam and Cook, supra, Bronstein et al , supra) β-galactosidase, encoded by the lacZ gene from E coli, is one of the most versatile genetic reporters and allows both in vitro and in vivo applications In addition to the E coli lacZ gene, lacZ genes were have been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers J01636, AB025433, AF073995, U62625, and M57579) The enzyme catalyzes the hydrolysis of several β- galactosides (e g , Young et al , supra) and is employed in coloπmetπc assays, e g , using o- nitrophenyl-β-D-galactopyranoside (ONPG), in chemiluminescent assays based on chemiluminescence of indole (Arakawa et al , J Biolumin Chemilumin 13(6) 349-54 (1998)], and in fluorometπc assays using e g , 4-methylumbellιferyl-β-D-galactosιde (MUG) and derivatives thereof, such as 6,8-dιfluoro-4-methylumbellιferyl-β-D-galactopyranosιde [DiFMUG, Gee et al . Anal Biochem 273(1 ) 41-8 (1999)] Further, the development of chemiluminescent 1 ,2-dιoxetane substrates has greatly improved the sensitivity of detection of enzyme activity When a lummometer is used to detect the chemiluminescent signal, the assay is 50, 000-fold more sensitive than a coloπmetπc assay The assay may also be enhanced employing assay conditions that minimize endogenous enzyme activities contributed by eukaryotic β-galactosides (Young et al , supra)
In a preferred embodiment, as for all the scaffolds, β-galactosidase is used in in vivo assays In vivo assays can be performed in prokaryotic and eukaryotic cells, in tissue sections and intact embryos and includes staining with the precipitating substrate X-gal (Alam and Cook, supra) Further, bioluminescence assays in live cells are employed using fluorescein di-β-D- galactopyranoside (FDG, Bronstein et al , supra) Cells expressing an enzymatically active form of β-galactosidase are detected via fluorescence from the fluorescein moiety of the metabolized substrate
As above, N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions, either separately or in combination, are all contemplated The site of fusion may be determined based on the structure of β-galactosidase, which has been determined [e g , see Pearl et al , J Mol Biol 229(2) 561-3 (1993), Jacobson et al , Nature 369(6483) 761-6 (1994), and Jacobson and Matthews, J Mol Biol 223(4) 1177-82 (1992)] Insertions of am o acids into loop structures within β-galactosidase are especially preferred In preferred embodiment, the reporter protein is chloramphenicol acetyltransferase [CAT, Gorman et al , Mol Cell Biol , 2 1044-1051 (1982)] This enzyme catalyzes the transfer of acetyl groups from acetyl-coenzyme A to chloramphenicol Using CAT as a reporter has the advantage of (i) minimal endogenous activity in mammalian cells, (n) stable protein expression and (in) various assay formats are available The CAT gene has been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers AF031037, S48276, X74948, X02872, and M58472)
It is an object of the instant application to fuse ammo acid sequences to chloramphenicol acetyltransferase N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structure of chloramphenicol acetyltransferase, which has been determined [e g , see Leslie et al , Proc Natl Acad Sci U S A 85(12) 4133-7 (1988), Lewendon et al , Biochemistry 27(19) 7385-90 (1988), and Leslie, J Mol Biol 213(1 ) 167-86 (1990)] Insertions of ammo acids into loop structures within chloramphenicol acetyltransferase are especially preferred
In a preferred embodiment, the indirectly detectable protein is a DNA-bmdmg protein which can bind to a DNA binding site and activate transcription of an operably linked reporter gene The reporter gene can be any of the detectable genes, such as green fluorescent protein, or any of the survival genes, outlined herein The DNA binding sιte(s) to which the DNA binding protein is binding is (are) placed proximal to a basal promoter that contains sequences required for recognition by the basic transcription machinery (e g , RNA polymerase II) The promoter controls expression of a reporter gene Following introduction of this chimeric reporter construct into an appropriate cell, an increase of the reporter gene product provides an indication that the DNA binding protein bound to its DNA binding site and activated transcription Preferably, in the absence of the DNA binding protein, no reporter gene product is made Alternatively, a low basal level of reporter gene product may be tolerated in the case when a strong increase in reporter gene product is observed upon the addition of the DNA binding protein, or the DNA binding protein encoding gene It is well known in the art to generate vectors comprising DNA binding sιte(s) for a DNA binding protein to be analyzed, promoter sequences and reporter genes
In a preferred embodiment, the DNA-bmdmg protein is a cell type specific DNA binding protein which can bind to a nucleic acid binding site within a promoter region to which endogenous proteins do not bind at all or bind very weakly These cell type specific DNA-bmdmg proteins comprise transcπptional activators, such as Oct-2 [Mueller et al , Nature 336(6199) 544-51 (1988)] which e g , is expressed in lymphoid cells and not in fibroblast cells Expression of this DNA binding protein in HeLa cells, which usually do not express this protein, is sufficient for a strong transcπptional activation of B-cell specific promoters, comprising a DNA binding site for Oct-2 (Mueller et al , supra)
In a preferred embodiment, the indirectly detectable protein is a DNA-binding/transcπption activator fusion protein which can bind to a DNA binding site and activate transcription of an operably linked reporter gene — Bπefly-transcπption can be-activated through the use of two functional domains of a transcription activator protein, a domain or sequence of am o acids that recognizes and binds to a nucleic acid sequence, i e a nucleic acid binding domain, and a domain or sequence of ammo acids that will activate transcription when brought into proximity to the target sequence Thus the transcπptional activation domain is thought to function by contacting other proteins required in transcription, essentially bringing in the machinery of transcription It must be localized at the target gene by the nucleic acid binding domain, which putatively functions by positioning the transcπptional activation domain at the transcπptional complex of the target gene
The DNA binding domain and the transcπptional activator domain can be either from the same transcπptional activator protein, or can be from different proteins (see McKnight et al , Proc Natl Acad Sci USA 89 7061 (1987), Ghosh et al , J Mol Biol 234(3) 610-619 (1993), and Curran et al , 55 395 (1988)) A variety of transcπptional activator proteins comprising an activation domain and a DNA binding domain are known in the art
In a preferred embodiment the DNA-binding/transcπption activator fusion protein is a tetracycl e repressor protein (TetR)-VP16 fusion protein This bipartite fusion protein consists of a DNA binding domain (TetR) and a transcription activation domain (VP16) TetR binds with high specificity to the tetracyclme operator sequence, (tetO) The VP16 domain is capable of activating gene expression of a gene of interest, provided that it is recruited to a functional promoter Employing a tetracyclme repressor protein (TetR)-VP16 fusion protein, a suitable eukaryotic expression system which can be tightly controlled by the addition or omission of tetracyclme or doxycyclme has been described (Gossen and Bujard, Proc Natl Acad Sci U S A 89 5547-5551 , Gossen et al , Science 268 1766-1769 (1995)]
It is an object of the instant application to fuse am o acid sequences to DNA- bindmg/transcπption activator proteins and/or to DNA-binding/transcπption activator fusion proteins N-terminal, C-terminal, dual N- and C-termmal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structure of DNA- binding/transcπption activator fusion protein, which are determined [e g , TetR, see Orth et al , J Mol Biol 285(2) 455-61 (1999), Orth et al , J Mol Biol 279(2) 439-47 (1998), Hinπchs et al , Science 264(5157) 418-20 (1994), and Kisker et al , J Mol Biol 247(2) 260-80 (1995)] Insertions of ammo acids into loop structures within DNA-binding/transcnption activator fusion proteins are especially preferred
In another preferred embodiment the am o acids (= random peptides) are inserted at or close to the fusion site of the DNA binding domain and the transcription activator domain In this embodiment, a dual scaffold protein is used to present the random peptide library The random peptide library is such flanked by a scaffold protein representing the DNA binding domain and a scaffold protein representing the transcription activation domain The random peptide library thus is inserted between the C-terminus of the DNA binding domain and the N-terminus of the transcription activation domain or vice versa Linker sequences separating the random peptides from the DNA binding domain and transcription activation domain are optional As indicated by the employment of DNA-binding/transcnption activator fusion proteins in protein protein interaction screening protocols (e g see Fields et al , Nature 340 245 (1989), Vasavada et al , Proc Natl Acad Sci U S A 88 10686 (1991 ), Fearon et al , Proc Natl Acad Sci U S A 89 7958 (1992), Dang et al , Mol Cell Biol 11 954 (1991 ), Chien et al , Proc Natl Acad Sci U S A 88 9578 (1991 ), and U S Patent Nos 5,283,173, 5,667,973, 5,468,614, 5,525,490, and 5,637,463), there is usually significant freedom of ammo acid insertion (e g , a component of a test library) to the DNA binding domain without perturbing either DNA binding or transcription activation
In a preferred embodiment, the invention provides a composition, compπsing (i) a nucleic acid binding site, to which a DNA-binding/transcnption activator and/or a DNA binding domain/transcription activator fusion protein can bind, said nucleic acid binding site being operably linked to a reporter gene, (n) a reporter gene, and (in) a DNA-binding/transcnption activator and/or a DNA binding domain/transcription activator fusion protein which may be encoded by a nucleic acid
In a preferred embodiment, the scaffold protein is a survival protein By ' survival protein", "selection protein" or grammatical equivalents herein is meant a protein without which the cell cannot survive, such as drug resistance genes As described herein, the cell usually does not naturally contain an active form of the survival protein which is used as a scaffold protein As further described herein, the cell usually comprises a survival gene that encodes the survival protein The expression of a survival protein is usually not quantified in terms of protein activity, but rather recognized by conferring a characteristic phenotype onto a cell which comprises the respective survival gene or selection gene Such survival genes may provide resistance to a selection agent (i e , an antibiotic) to preferentially select only those cells which contain and express the respective survival gene The variety of survival genes is quite broad and continues to grow (for review see Kriegler, Gene Transfer and Expression A Laboratory Manual, W H Freeman and Company, New York, 1990) Typically, the DNA containing the resistance-conferring phenotype is transfected into a cell and subsequently the cell is treated with media containing the concentration of drug appropriate for the selective survival and expansion of the transfected and now drug-resistant cells
Selection agents such as ampicillin, kanamycin and tetracyclme have been widely used for selection procedures in prokaryotes [e g , see Waxman and Strommger, Annu Rev Biochem 52 825-69 (1983), Davies and Smith, Annu Rev Microbiol 32 469-518 (1978), and Franklin, Biochem J , 105(1 ) 371-8 (1967)] Suitable selection agents for the selection of eukaryotic cells include, but are not limited to, blastiαdin [Izumi et al , Exp Cell Res , 197(2) 229-33 (1991 ), Kimura et al , Biochim Biophys Acta 1219(3) 653-9 (1994), Kimura et al , Mol Gen Genet 242(2) 121-9 (1994)], histid ol D [Hartman and Mulligan, Proc Natl Acad Sci U S A , 85(21 ) 8047-51 (1988)], hygromycm [Gπtz and Davies, Gene 25(2-3) 179-88 (1983), Sorensen et al , Gene 112(2) 257-60 (1992)], neomycm [Davies and Jimenez, Am J Trop Med Hyg , 29(5 Suppl) 1089-92 (1980), Southern and Berg, J Mol Appl Genet , 1 (4) 327-41 (19820], puromycin [de la Luna et al , Gene 62(1 ) 121-6 (1988)] and bleomycin/phleomycin/zeocin antibiotics [Mulsant et al , Somat Cell Mol Genet 14(3) 243-52 (1988)
Survival genes encoding enzymes mediating such a drug-resistant phenotype and protocols for their use are known in the art (see Kriegler, supra) Suitable survival genes include, but are not limited to thymidine kmase [TK, Wigler et al , Cell 11 233 (1977)], adenine phosphoπbosyltransferase [APRT, Lowry et al , Cell 22 817 (1980), Murray et al , Gene 31 233 (1984), Stambrook et al , Som Cell Mol Genet 4 359 (1982)], hypoxanthine-guanme phosphoπbosyltransferase [HGPRT, Jolly et al , Proc Natl Acad Sci U S A 80 477 (1983)], dihydrofolate reductase [DHFR, Subramani et al , Mol Cell Biol 1 854 (1985), Kaufman and Sharp, J Mol Biol 159 601 (1982), Simonsen and Levmson, Proc Natl Acad Sci U S A 80 2495 (1983) ] aspartate transcarbamylase [Ruiz and Wahl, Mol Cell Biol 6 3050 (1986)], ornithme decarboxyiase [Chiang and McConlogue, Mol Cell Biol 8 764 (1988)], aminoglycoside phosphotransferase [Southern and Berg, Mol Appl Gen 1 327 (1982), Davies and Jimmez, supra], hygromycm-B-phosphotransferase [Gπtz and Davies, supra, Sugden et al , Mol Cell Biol 5 410 (1985), Palmer et al , Proc Natl Acad Sci U S A 84 1055 (1987)], xanthine-guanme phosphonbosyltransferase [Mulligan and Berg, Proc Natl Acad Sci U S A 78 2072 (1981 )], tryptophan synthetase [Hartman and Mulligan, Proc Natl Acad Sci U S A 85 8047 (1988)], histidmol dehydrogenase (Hartman and Mulligan, supra), multiple drug resistance biochemical marker [Kane et al , Mol Cell Biol 8 3316 (1988), Choi et al , Cell 53 519 (1988)], blasticidm S deammase [Izumi et al , Exp Cell Res 197(2) 229-33 (1991)], bleomycin hydrolase [Mulsant et al , supra], and puromycin-N-acetyl-transferase [Lacalle et al , Gene-79(2) 375-80 (t98^ — — -
In a preferred embodiment, the survival protein is thymidine k ase [TK, Wigler et al , Cell 11 233 (1977)] TK is encoded by the HSV or vaccinia virus tk genes When transferred into a TK cell, these genes confer resistance to HAT medium, a medium supplemented with hypoxanthine, aminopteπn and thymidine TKs have been cloned from various species and the nucleotide sequences are available (e g , see GenBank accession numbers M29943, M29942, M29941 and K0261 1 )
It is an object of the instant application to fuse am o acid sequences to thymidine kmase N- terminal, C-terminal, dual N- and C-termmal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structures of HSV thymidine kmase, which has been determined [e g , see Bennett et al , FEBS Lett 443(2) 121-5 (1999), Champness et al , Proteins 32(3) 350-61 (1998), and Brown et al , Nat Struct Biol 2(10) 876- 81 (1995)] Insertions of ammo acids into loop structures within thymidine kmase are especially preferred
In another preferred embodiment, the survival protein is adenine phosphonbosyltransferase [APRT, Lowry et al , Cell 22 817 (1980), Murray et al , Gene 31 233 (1984), Stambrook et al , Som Cell Mol Genet 4 359 (1982)] When transferred into a APRT cells, the gene encoding APRT confers resistance to complete medium, supplemented with azaseπne, adenine and alanosine APRT genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers L25411 , AF060886, X58640, U16781 , U22442, U28961 , L06280, M16446, L04970, and M11310 )
It is an object of the instant application to fuse ammo acid sequences to adenine phosphonbosyltransferase N-termmal, C-terminal, dual N- and C-termmai and one or more internal fusions are all contemplated The site of fusion may be determined based on the structures of adenine phosphonbosyltransferase from Leishmania donovani, which has been determined [Phillips et al , EMBO J 18(13) 3533-45 (1999)] Insertions of ammo acids into loop structures within adenine phosphonbosyltransferase are especially preferred In a preferred embodiment, the survival protein is hypoxanthine-guanme phosphonbosyltransferase [HGPRT, Jolly et al , Proc Natl Acad Sci U S A 80 477 (1983)] When transferred into a HGPRT , APRT cells, the gene encoding HGPRT confers resistance to HAT medium HGPRT genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers AF170105, AFΘ6-1748, L07486HΘΘ423, M86443, J θθ6θ and-M2643 )
It is an object of the instant application to fuse ammo acid sequences to hypoxanthine-guanme phosphonbosyltransferase N-terminal, C-terminal, dual N- and C-termmal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structures of human hypoxanthine-guanme phosphonbosyltransferase, which has been determined [Shi et al , Nat Struct Biol 6(6) 588-93), Eads et al , Cell 78(2) 325-34 (1994)] Insertions of ammo acids into loop structures within hypoxanthine-guanme phosphonbosyltransferase are especially preferred
In a preferred embodiment, the survival protein is dihydrofolate reductase (DHFR), which is encoded by the dhfr gene [Subramani et al , Mol Cell Biol 1 854 (1985), Kaufman and Sharp, J Mol Biol 159 601 (1982), Simonsen and Levmson, Proc Natl Acad Sci U S A 80 2495 (1983)] When transferred into a DHFR cells, the gene encoding DHFR confers resistance to medium containing methotrexate DHFR genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers NM_000791 , J01609, J00140, L26316, and M37124)
It is an object of the instant application to fuse ammo acid sequences to dihydrofolate reductases N-terminal, C-termmal, dual N- and C-termmal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structures of human and E coli dihydrofolate reductases, which have been determined [Cody et al , Biochemistry 36(45) 13897-903 (1997), Chunduru et al , J Biol Chem 269(13) 9547-55 (1994), Lewis et al , J Biol Chem 270(10) 5057-64 (1995), Sawaya et al , Biochemistry 36(3) 586-603 (1997)
Reyes et al , Biochemistry 34(8) 2710-23 (1995)] Insertions of ammo acids into loop structures within dihydrofolate reductases are especially preferred
In a preferred embodiment, the survival protein is aspartate transcarbamylase Aspartate transcarbamylase is encoded by pyrB [Ruiz and Wahl, Mol Cell Biol 6 3050 (1986)] When transferred to CHO D20 (UrdA mutant, deficient in the first three enzymatic activities of de novo undine biosynthesis carbamyl phosphate synthetase, aspartate transcarbamylase, and dihydroorotase )the gene encoding this protein confers resistance to Ham F-12 medium (minus undine) Aspartate transcarbamylase genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers U61765, M38561 , J04711 , M60508, and M13128)
It is an object of the instant application to fuse am o acid sequences to aspartate nscarbamylase — N-terminal, C-termtnat dual N--and C^termιrta1-artd-one-oτ nore internal fusions are all contemplated The site of fusion may be determined based on the structures of E coli aspartate transcarbamylase, which has been determined [Kantrowitz and Lipscomb, Science 241(4866) 669-74 (1988)] Insertions of ammo acids into loop structures within aspartate transcarbamylase are especially preferred
In a preferred embodiment, the survival protein is ornithine decarboxylase Ornithme decarboxylase is encoded by the ode gene [Chiang and McConlogue, Mol Cell Biol 8 764 (1988)] When transferred into CHO C55 7 cells (ODC ) the gen encoding this protein confers resistance medium lacking putrescme ODC genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers U36394, AF016891 , AF012551 , U03059, J04792, and M34158)
It is an object of the instant application to fuse ammo acid sequences to ornithme decarboxylase N-terminal, C-terminal, dual N- and C-termmal and one or more internal fusions are all contemplated
In a preferred embodiment, the survival protein is aminoglycoside phosphotransferase, which is encoded by the aph gene [Southern and Berg, Mol Appl Gen 1 327 (1982), Davies and
Jiminez, supra] When transferred into almost any cell, this dominant selectable gene confers resistance to G418 (neomycin, geneti n) Aminoglycoside phosphotransferase encoding genes have been cloned and used widely as a selectable marker on various vectors (e g , see GenBank accession numbers Z48231 , M22126, U75992, AF072538, and U04894)
It is an object of the instant application to fuse ammo acid sequences to aminoglycoside phosphotransferase N-terminal, C-termmal, dual N- and C-termmal and one or more internal fusions are all contemplated
In a preferred embodiment, the survival protein is hygromycm-B-phosphotransferase, which is encoded by the hph gene [Gπtz and Davies, supra, Sugden et al , Mol Cell Biol 5 410 (1985), Palmer et al , Proc Natl Acad Sci U S A 84 1055 (1987)] When transferred into almost any cell, this dominant selectable gene confers resistance to hygromycm-B The hygromyαn-B- phosphotransferase encoding gene has been cloned and used widely as a selectable marker on various vectors (e g , see GenBank accession numbers AF025747, L76273, and K01193)
It is an object of the instant application to fuse ammo acid sequences to hygromyαn-B- phosphotransferase N-terminal, C-termmal, dual N- and C-termmal and one or more internal fusions are all contemplated
In another preferred embodiment, the survival protein is xanthine-guanme phosphonbosyltransferase, which is encoded by the gpt gene [Mulligan and Berg, Proc Natl Acad Sci U S A 78 2072 (1981 )] When transferred into almost any cell, this dominant selectable gene confers resistance to XMAT medium, comprising xanthme, hypoxanthine, thymidine, aminopteπn, mycophenolic acid and L-glutamme The xanthine-guanme phosphonbosyltransferase encoding gene has been cloned and the nucleotide sequences are available (e g , see GenBank accession numbers U28239 and M15035)
It is an object of the instant application to fuse ammo acid sequences to xanthine-guanme phosphonbosyltransferase N-terminal, C-terminal, dual N- and C-termmal and one or more internal fusions are all contemplated
In another preferred embodiment, the survival protein is tryptophan synthetase, which is encoded by the trpB gene [Hartman and Mulligan, Proc Natl Acad Sci U S A 85 8047 (1988)] When transferred into almost any cell, this dominant selectable gene confers resistance to tryptophan-minus medium Tryptophan synthetase encoding genes have been cloned and the nucleotide sequences are available (e g , see GenBank accession numbers V00372, AF173835, V00365, M15826 and M32108)
It is an object of the instant application to fuse ammo acid sequences to tryptophan synthetase N-terminal, C-termmal, dual N- and C-termmal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structure of tryptophan synthetase, which has been determined [e g , see Rhee et al , Biochemistry 36(25) 7664-80 (1997), Hyde et al , J Biol Chem 263(33) 17857-71 (1988)] Insertions of ammo acids into loop structures within tryptophan synthetase are especially preferred
In a further preferred embodiment, the survival protein is histidmol dehydrogenase, which is encoded by the hisD gene [Hartman and Mulligan, Proc Natl Acad Sci U S A 85 8047 (1988)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising histidmol Histidmol dehydrogenase encoding genes have been cloned and the nucleotide sequences are available (e g , see GenBank accession numbers AB013080, U82227, J01804, and M60466)
It is an object of the instant application to fuse am o acid sequences to histidmol dehydrogenase N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions-are-all contemplated —
In another preferred embodiment, the survival protein is the multiple drug resistance biochemical marker, which is encoded by the mdr1 gene [Kane et al , Mol Cell Biol 8 3316 (1988), Choi et al , Cell 53 519 (1988)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising colchicme MDR1 genes have been cloned from various species, including human, and the nucleotide sequences are available (e g , see GenBank accession numbers U62928, U62930 AJ227752, U62931 , AF016535 and J03398)
It is an object of the instant application to fuse ammo acid sequences to MDR1 N-terminal, C- terminal, dual N- and C-terminal and one or more internal fusions are all contemplated
In another preferred embodiment, the survival protein is blasticidin S deaminase, which is encoded by the bsr gene [Izumi et al , Exp Cell Res 197(2) 229-33 (1991)] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising the antibiotic blasticidin S Blasticidin S deaminase encoding genes have been cloned They are used widely as a selectable marker on various vectors and the nucleotide sequences are available (e g , see GenBank accession numbers D83710, U75992, and U75991 )
It is an object of the instant application to fuse ammo acid sequences to blasticidin S deaminase N-terminal, C-termmal, dual N- and C-terminal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structure of Aspergillus terreus blasticidin S deaminase, which has been determined [Nakasako et al , Acta Crystallogr D Biol Crystallogr 55(Pt2) 547-8 (1999)] Insertions of ammo acids into loop structures within blasticidin S deaminase are especially preferred
In another preferred embodiment, the survival protein is bleomycin hydrolase, which is encoded by the ble gene [Mulsant et al , supra] When transferred into almost any cell, this dominant selectable gene confers resistance to media comprising bleomycin, phleomycm or zeocin Bleomycin hydrolase encoding genes have been cloned They are used widely as a selectable marker on various vectors and the nucleotide sequences are available (e g , see GenBank accession numbers L26954, L37442, and L36849)
It is an object of the instant application to fuse ammo acid sequences to bleomycin hydrolase N-terminal, C-termmal, dual N- and C-termmal and one or more internal fusions are all contemplated The site of fusion may be determined based on the structure of yeast (Gal6) and human bleomycin hydrolase, which have been determined [Joshua-Tor et al , Science 269(5226) 945-50 (1995), O'Farrell et al , Structure Fold Des 7(6) 619-27 (1999)] Insertions of ammo acids into loop structures within bleomycin hydrolase are especially preferred
In another preferred embodiment, the survival protein is puromycin-N-acetyl-transferase, which is encoded by the pac gene [Lacalle et al , Gene 79(2) 375-80 (1989)] When transferred into almost any cell, this dominant selectable gene confers resistance to media compπsing puromycm A puromycin-N-acetyltransferase encoding gene has been cloned It is used widely as a selectable marker on various vectors and the nucleotide sequences are available (e g , see GenBank accession numbers Z75185 and M25346)
It is an object of the instant application to fuse ammo acid sequences puromyαn-N-acetyl- transferase N-terminal, C-terminal, dual N- and C-terminal and one or more internal fusions are all contemplated
In another preferred embodiment, the scaffold protein is a structural protein In this embodiment, the scaffold protein is generally not directly detectable, but is generally a small, stable, non-disulfide bond-containing protein
In a preferred embodiment, the presentation scaffold significantly constrains the presented random peptides The peptides will be conformationally pre-constramed, will have a diminished number of low energy conformers, and will thus lose less entropy when bound to a target binding partner (a macromolecule such as a protein, DNA, or other functional molecule present within or on the outside of a cell) Such constrained peptides may thus bind more tightly to a target molecule than unconstrained peptides Likewise, constrained peptides may be less subject to intracellular catabolism than unconstrained peptides, especially by proteases Different scaffold may impart different biases to peptides depending on the insertion site of the random peptide libraries
In a preferred embodiment, the scaffold comprises protease inhibitors belonging to the trypsm inhibitor I family, such as barley chymotrypsin inhibitor 2 (Cι-2) and eglm C Both of these proteins are small (83 and 64 residues, respectively), stable, and lack disulfide bonds, thus allowing their expression and folding in the cytoplasm of a mammalian cell without the complications of disulfide bond formation Disulfide bond formation is difficult in the cytoplasm due to high levels of reduced glutathione, and the presence of thioredoxin reductase The folding mechanism of Cι-2 has been studied in detail, implying a two-state process with the rate limiting step for two slow phases being proline isomeπzation [Jackson and Fersht, Biochemistry 30*113428-35 (19 +)] — rt+ras-beeτrstrowrrtc-refσld-wherr^^ pieces, composed of residues 20-59 and 60-83, with the fragments associating to form a native-like structure with a Kd of 42 nM [de Prat Gay and Fersht, Biochemistry 33 7957-63 (1994)] Cι-2 blocks subtilism BPN' with an inhibition constant of 2 9 pM [Longstaff et al , Biochemistry 29 7339-47 (1990)]
In a preferred embodiment, Cι-2 and the similar protease inhibitor eglm-C are used as scaffolds for a small protein-embedded random peptide library Since different intracellular targets demand bound peptides of different conformations, it is important to construct peptide libraries with different biases, as already outlined above The crystal structure of Cι-2 [see Figure 7 and McPhalen and James, Biochemistry 26 261-269 (1987)] allows the construction of a different random peptide library with an additional bias a broad-based 20A constraint, with both ends fixed at this distance by the Cι-2 scaffold There are at least three random peptide library insertion sites that may result in libraries with useful properties At each insertion site, the use of a varying number of inserted residues affect the conformational bias of the peptide library and thus creates a set of libraries
In a preferred embodiment, the insertion site replaces the Cι-2 inhibitor loop residues G54-R62 with 9 or more random ammo acids Inserting 9 random residues to replace the 9 existing residues in G54-R62 will bias the library to a broad-based semicircular loop, roughly 20A at its base Inserting more residues will bias the library to more flexible peptides Inserting correspondingly more residues in a slightly larger insertion site in this inhibitor loop, e g , inserting 13 residues between 52 and 64, will create a library with a bias towards the top ca 2/3 of a large ca 18mer cyclic peptide A library replacing all -19 residues of this nearly circular loop (residues 49-67) will in effect mimic a large 19 residue cycle peptide and thus would be different than any of the above libraries
In a preferred embodiment, the above libraries substituting G54-R62, are made more flexible by substituting random residues for native residues at the base of this inhibitor loop which appear to support the top of the loop Without this support, the top residues may be significantly more flexible The supporting residues appear to include F69, L51 , R67, and R65 G83 could also be randomized since it is near the side of the loop in the crystal structure
In another preferred embodiment, the random peptide library is inserted between K72-L73 of Ci- 2
jraιyτ«pfaees-festdues-P44'E4^-of-Gι=
Insertion of a random peptide library between residues K72-L73 or replacing residues P44-E45 will lead to different libraries, roughly biased to a loop with a closed or short base, but in a much smaller protein scaffold (9 kDa) than e g , GFP (27 kDa) or DHFR (20 kDa) Therefore, these two libraries may be useful as small loop-biased libraries
In a preferred embodiment, random peptide libraries between residues K72-L73 or random peptide libraries replacing residues P44-E45 may be used as selectable libraries, allowing the elimination of cells not expressing a properly folded and bioactive library member, or of unmfected cells When a random peptide libraries is inserted between residues K72-L73 or replacing residues P44-E45, use of the still-active protease inhibitor residues in positions ca 54-62 should retain the ability to inhibit subtilisin BPN', and thus to select cells co-expressing a properly folded inhibitor library member and a cognate mhibitable protease such as subtilisin BPN', K, = 2 9 pM (Longstaff, supra) The selection, thus would be by protection against protease-mduced cell death at an appropriate time point after infection or transfection of the cells with the Cι-2 library
In another preferred embodiment, analogous library insertion sites may be used with eglm-C or other potato trypsm inhibitor I family members lacking disulfide bonds, which have similar structures to that of Cι-2
In a preferred embodiment, the fusion protein comprising the scaffold protein and the random peptide library is bioactive, e g , has enzymatic activity However, as outlined herein, the fusion protein need not display such a bioactive function A preferred property of the fusion protein is, however, to present the random peptide sequences to potential binding partners
In a preferred embodiment, multiple scaffolds are used for the intracellular (and extracellular) presentation of peptide libraries with a bias to extended peptides Extended conformations are important for molecular recognition in a number of peptide-protem complexes [Si gardi and Drake, Biopolymers 37(4)281-92 (1995)] including peptide substrate (and inhibitor) binding to a large variety of proteases, kinases and phosphatases, peptide binding to MHC class I and II proteins, peptide binding to chaperones, peptide binding to DNA, and B cell epitopes Additional examples of extended bound peptides include a troponm inhibitory peptide binding to troponm C [Hernanderz et al , Biochemistry 38 6911-17 (1999)] and a p21 -derived peptide binding to PCNA [Gulbis et al , Cell 87 297-306 (1996)] Linear peptides are a unique
interactions
The intracellular catabohsm of peptides is one limiting factor which may prevent significant steady state levels of small peptides Proteases, such as aminopeptidases [Lee and Goldberg, Biopolymers 37 281-92 (1992)] as well as carboxypeptidases and the proteasome, as outlined further below, may be involved in the degradation of intracellular peptides Thus, linear or extended peptides may be readily degraded after their intracellular expression
In a preferred embodiment, the library is constructed allowing the random library members, consisting of 18-30 random residues, to have linear/extended configurations without both free N-termini (allowing aminopeptidase-mediated degradation) and free C-termmi (allowing carboxypeptidase-mediated degradation) In this embodiment, the scaffold present the random peptides with a linear/extended structural bias (but not as an absolute requirement) and allow significant peptide flexibility while somewhat limiting intracellular catabohsm Fusion of proteins to both ends of the library should protect the random sequences from ammo- and carboxypeptidases
Accordingly, in a preferred embodiment, a dual fusion scaffold fusion protein of the following form is constructed N-terminus-protein 1 -linker 1 -random peptide library-linker 2-proteιn 2-C- termmus
In a preferred embodiment, protein 1 and protein 2 are the same protein Alternatively, protein 1 and protein 2 are different proteins
In a preferred embodiment, linker 1 and linker 2 are the same linker Alternatively, linker 1 and linker 2 are different linkers
In a preferred embodiment, protein 1 and protein 2 are selected from a group of proteins which have low affinity for each other In another preferred embodiment, protein 1 and protein 2 are selected from a group of proteins that are well-expressed in mammalian cells or in the cell in which the random peptide library is tested Included in this embodiment are proteins with a long intracellular half-life, such as CAT and others known in the art
In another preferred embodiment, protein 2 is a selection protein, such as DHFR or any other,
members in mammalian cells or in cells in which the library is tested can be achieved Selection procedures were outlined above Alternatively, protein 1 is a selection protein
In another preferred embodiment, protein 2 is a reporter protein, such as GFP or any other fluorescent protein, β-lactamase, another highly colored protein, as either outlined above or known in the art In this embodiment, intracellular detection and tracking of full-length library members in mammalian cells or in cells in which the library is tested can be achieved Reporter-gene product analyses were outlined above Alternatively, protein 1 is a reporter protein
In another preferred embodiment, protein 1 is a reporter protein and protein 2 is a selection protein, allowing, both intracellular tracking and selection of full-length library member
Linker 1 and linker 2 should not have a high self-affinity or a noncovalent affinity for either protein 1 or protein 2
In a preferred embodiment, linker 1 and/or linker 2 consιst(s) of residues with one or more glycines to decouple the structure from protein 1 and protein 2 from the random library
In another preferred embodiment, linker 1 and or linker 2 provιde(s) enough residues which, when extended, provide 0 5-1 protein diameter spacing between the random residues and proteins 1 and 2 This would correspond to approximately 15-30 A or 5-10 residues and would minimize steπc interference in peptide library member binding to potential targets
In another preferred embodiment, linker 1 and/or linker 2 contaιn(s) enough hydrophilic residues so that the linkers do not adversely affect the solubility or stickiness of the entire fusion protein or of the linker region alone
In another preferred embodiment, a relatively rigid structure can be formed from the linkers to force the random residues away from the surfaces of proteins 1 and 2 In a preferred embodiment, the cellular protein p21 is used to display a linear peptide to binding partners. The tumor suppressor protein p21 binds to PCNA via its C-terminal 22 residues by effectively displaying this C-terminal peptide to PCNA in an extended conformation (Gulbis et al., supra). Therefore this scaffold may be useful for the display of random peptide libraries with an extended structural bias in the position of some or all of the C-terminal 22 residues, with the C-terminal residues now being randomized. The structure of the p21 scaffold appears to be disordered-and to-become more-ørdered-at-its N-terminus upon binding to cyclin-dependent kinases (CDKs). The overall disordered structure may suggest that this scaffold nay be particularly useful for displaying extended (disordered) peptide libraries.
In a preferred embodiment, the nuclear localization sequence of p21 , located between residues 141 and 156 is deleted and replaced by random residues. The random peptide library is thus inserted that it replaces the nuclear localization signal. Thereby this scaffold should function as a scaffold for a cytoplasmic peptide library. By remaining in the cytoplasm, the p21 scaffold library members should not bind to nuclear cyclins and CDKs and thus should not perturb the cell cycle.
To ensure deletion of p21 functions such as inhibition of CDKs, in case low levels of the peptide library members enter the nucleus, the appropriate domains can be inactivated by site-directed mutagenesis, as known in the art. One such mutation, R94W, blocks the ability of p21 to inhibit cyclin-dependent kinases [Balbin et al., J. Biol. Chem. 271 : 15782-6 (1996)]. A second mutant in a p21 CDK- construct, also blocking CDK binding, has been shown to stabilize p21 to proteosomal degradation [Cayrol and Ducommun, Oncogene 17:2437-44 (1998)] and thus may be preferred as a scaffold. A third mutant, N50S also blocks CDK inhibition by p21 [Welcker et al., Cancer Res. 58:5053-6 (1998)]. Alternatively, the cy-1 site (residues 17-24) may be deleted, blocking both cyclin- and cyclin-CDK complex binding to p21 [Chen et al., Mol. Cell. Biol. 16:4673-82 (1996)]. The cy-2 cyclin binding site, at residues 152-158, may also be deleted in case the random library is inserted in place of residues 141-164.
In another preferred embodiment the scaffold protein is kanamycin nucleotidyl transferase (see Figure 8). Kanamycin nucleotidyl transferase forms tight dimers. In this embodiment, the extended-bias random peptides would be inserted between the C-terminus of the first dimer and the N-terminus of the second dimer, with spacer residues between each protein and the random residues. The spacer residues on either side of the random library region would consist of at least 5-10 residues on each side of the random peptide library, including one or more glycines and no hydrophobic residues. The fusion proteins of the present invention comprise a scaffold protein and a random peptide The peptides (and nucleic acids encoding them) are randomized, either fully randomized or they are biased in their randomization, e g in nucleotide/residue frequency generally or per position By "randomized" or grammatical equivalents herein is meant that each nucleic acid and peptide consists of essentially random nucleotides and ammo acids, respectively As is more fully described below, the nucleic acids which give rise to the peptides are chemically synthesized, and thus-may incorporate any nucleotide at any position Thus; when the nucleic acids are expressed to form peptides, any ammo acid residue may be incorporated at any position The synthetic process can be designed to generate randomized nucleic acids, to allow the formation of all or most of the possible combinations over the length of the nucleic acid, thus forming a library of randomized nucleic acids
The library should provide a sufficiently structurally diverse population of randomized expression products to effect a probabilistically sufficient range of cellular responses to provide one or more cells exhibiting a desired response Accordingly, an interaction library must be large enough so that at least one of its members will have a structure that gives it affinity for some molecule, protein, or other factor whose activity is necessary for completion of the signaling pathway Although it is difficult to gauge the required absolute size of an interaction library, nature provides a hint with the immune response a diversity of 107-108 different antibod- les provides at least one combination with sufficient affinity to interact with most potential antigens faced by an organism Published in vitro selection techniques have also shown that a library size of 107 to 108 is sufficient to find structures with affinity for the target A library of all combinations of a peptide 7 to 20 am o acids in length, such as proposed here for expression in retroviruses, has the potential to code for 207 (109) to 2020 Thus, for example, with libraries of 107 to 108 per ml of retroviral particles the present methods allow a 'working" subset of a theoretically complete interaction library for 7 ammo acids, and a subset of shapes for the 2020 library Thus, in a preferred embodiment, at least 105, preferably at least 106, more preferably at least 107, still more preferably at least 10s and most preferably at least 109 different peptides may be simultaneously analyzed as outlined herein
Thus, a library of fusion proteins, each fusion protein comprising a scaffold protein and a random peptide, comprises at least 10s, preferably at least 106, more preferably at least 107, still more preferably at least 108 and most preferably at least 109 different random peptides
In another preferred embodiment, an mdivdual member of the library of fusion proteins, is analyzed as outlined herein Alternatively, more than one individual member of the library of fusion proteins may be simultaneously analyzed It is important to understand that in any library system encoded by oligonucleotide synthesis one cannot have complete control over the codons that will eventually be incorporated into the peptide structure This is especially true in the case of codons encoding stop signals (TAA, TGA, TAG) In a synthesis with NNN as the random region, there is a 3/64, or 4 69%, chance that the codon will be a stop codon Thus, in a peptide of 10 residues, there is an unacceptable high likelihood that 46 7% of the peptides will prematurely terminate For free peptide structures this is perhaps not-a problem But for larger structures, such as those envisioned here, such termination will lead to sterile peptide expression To alleviate this, random residues are encoded as NNK, where K= T or G This allows for encoding of all potential ammo acids (changing their relative representation slightly), but importantly preventing the encoding of two stop residues TAA and TGA Thus, libraries encoding a 10 ammo acid peptide will have a 15 6% chance to terminate prematurely However, it should be noted that the present invention allows screening of libraries containing terminated peptides in a loop, since the GFP will not fluoresce and thus these peptides will not be selected
In a preferred embodiment, the peptide library is fully randomized, with no sequence preferences or constants at any position In a preferred embodiment, the library is biased That is, some positions within the sequence are either held constant, or are selected from a limited number of possibilities For example, in a preferred embodiment, the nucleotides or ammo acid residues are randomized within a defined class, for example, of hydrophobic ammo acids, hydrophilic residues, steπcally biased (either small or large) residues, towards the creation of cystemes, for cross-linking, prolmes for SH-3 domains, seπnes, threonines, tyrosines or histid es for phosphorylation sites, etc , or to puπnes, etc
For example, individual residues may be fixed in the random peptide sequence of the insert to create a structural bias, similar to the concept of presentation structures outlined below A preferred embodiment utilizes inserts of a general structure -gly 2 8-aa aa 2- -aa n-gly 2 8- where the random insert sequence is aa , to aa n This sequence can be constrained by fixing one or more of the n residues as prolmes (which will significantly restrict the conformation space of the entire loop), as bulky ammo acids such as W, R, K, L, I, V, F, or Y, or biasing the set of random am o acids to include only bulky residues such as E, F, H, I, K, L, M, Q, R, T, V, W, and Y Due to the larger size of the side chains, these residues will have fewer ways to pack into a small space that is defined by that available to a loop, and thus there will be fewer available loop conformations
In an alternative embodiment, the random libraries can be biased to a particular secondary structure by including an appropriate number of residues (beyond the glycine linkers) which prefer the particular secondary structure For example, to create an alpha-helical bias the entire loop insert might look like -gly 2 8— helix former 4 8-random residues-helix former 4.8- gly 2.8-, where the 4-8 helix formers at each end of the randomized region will nucleate an alpha helix and raise the probability that the random inserts will be helical, to further this bias, the randomized region can be devoid of strong helix breakers such as pro and gly, examples of strong helix forming residues would include M, A, K, L, D, E, R, Q, F, I and V
In a preferred embodiment, the bias is towards peptides that interact with known classes of molecules For example, it is known that much of intracellular signaling is carried out via short regions of polypeptides interacting with other polypeptides through small peptide domains For instance, a short region from the HIV-1 envelope cytoplasmic domain has been previously shown to block the action of cellular calmodulin Regions of the Fas cytoplasmic domain, which shows homology to the mastoparan toxin from Wasps, can be limited to a short peptide region with death-inducing apoptotic or G protein inducing functions Magainm, a natural peptide derived from Xenopus, can have potent anti-tumour and anti-microbial activity Short peptide fragments of a protein kmase C isozyme (βPKC), have been shown to block nuclear translocation of βPKC in Xenopus oocytes following stimulation And, short SH-3 target peptides have been used as pseudosubstrates for specific binding to SH-3 proteins This is of course a short list of available peptides with biological activity, as the literature is dense in this area Thus, there is much precedent for the potential of small peptides to have activity on intracellular signaling cascades In addition, agonists and antagonists of any number of molecules may be used as the basis of biased randomization of peptides as well
Thus, a number of molecules or protein domains are suitable as starting points for the generation of biased randomized peptides A large number of small molecule domains are known, that confer a common function, structure or affinity In addition, as is appreciated in the art, areas of weak ammo acid homology may have strong structural homology A number of these molecules, domains, and/or corresponding consensus sequences, are known, including, but are not limited to, SH-2 domains, SH-3 domains, Pleckstπn, death domains, protease cleavage/recognition sites, enzyme inhibitors, enzyme substrates, Traf, etc Similarly, there are a number of known nucleic acid binding proteins containing domains suitable for use in the invention For example, leucine zipper consensus sequences are known
Generally, at least 4, preferably at least 10, more preferably at least 5 ammo acid positions need to be randomized, again, more are preferable if the randomization is less than perfect in a preferred embodiment, the random library may have leucines or isoleucines fixed every 7 residues to bias it to a leucine or isoleucine zipper motif.
In a preferred embodiment, the optional C- or N-cap residues, in the case of a helix-biased library, may be fixed and not random and again would be strong helix formers. For a stronger helical bias, there could be at least 2-3 turns of capping residues, or up to 11-12 amino acids. They could-also be (pro)n, to provide a poly-proline helix at the C- or N-terminus. When the C- or N-terminus forms a stable secondary structure such as an alpha helix or a poly-proline helix, it will be resistant to proteolysis, which would be an advantage for the stability of the library within the cell. Explicit N- and C-cap helix stabilizing sequences or residues can be included both at the N-termini and C-termini, respectively [Betz and DeGrado, Biochem. 35:6955-62 (1996); Doig et al. Prot. Sci. 6:147-155 (1997); Doig and Baldwin, Prot. Sci. 4:1325-36 (1995); Richardson and Richardson, Science 240:1648-52 (1988). These sequences are incorporated by reference].
In a preferred embodiment, a library with a more extended structural bias is constructed, wherein weaker helix formers would be fused at each end of the random region, or one or more glycines would be included in the spacer region and C- or N-cap region.
In another preferred embodiment, a library with a more extended structural bias is constructed by omitting the helix N- or C-cap residues. In this embodiment, the random residues would be selected from all 20 natural L-amino acids.
In another preferred embodiment, a dual library may be constructed with fusion peptides at both the N-and C-terminus of β-lactamase and the resulting library has the following schematic structure: "(+/- optional N-cap residues)-random peptide library-spacer residues-N-terminus- BLA-C-terminus-spacer residues-random peptide library-(+/- optional C-cap residues)". In this case, since the β-lactamase N- and C-terminal helices are adjacent and parallel (i.e. they run in the same direction), such a library could be biased to have two adjacent helices sticking out from the β-lactamase structure in a coiled-coil fashion.
In a preferred embodiment, this bias is accentuated by inclusion of the spacer sequences KLEALEG (Monera et al., supra) or VSSLESK [Graddis et al., Biochem. 32:12664-71 (1993)] between the random peptide library and that of β-lactamase. Alternatively, the spacer sequence VSSLESE could be included between one random peptide library and β-lactamase, and the spacer sequence VSSLKSK could be included between the second random peptide library (e.g., after adjustments of the number of intervening amino acids to keep these in register) and the other terminus of β-lactamase (Graddis et al , supra) These two helix heptad repeats may help bind the two potential helices together
In a preferred embodiment, the bias of the two adjacent random peptide libraries to a coiled coil is further increased by fixing positions in the sequence such that a number of random residues will be inserted on the surface of the two helices while the fixed residues in the sequence may reside at the interface between the two helices in a parallel coiled coil For this fusion protein, the two helices composing the random peptide library may be set in register lengthwise by insertion of one or more helix forming residues as appropriate Figure 3 shows a helical wheel representation of a parallel coiled coil (see Gradis et al , supra) Positions a, a', d, and d' would be fixed since these are at the core of the coiled coil structure If these were the only fixed residues and n=5 (see below), the total number of random residues in the library would be 18 The size of the library thus be controlled by n Residues in positions c, c', f, f , b and b' may be randomized and would present the face of the helix available for binding to targets Thus, in each coiled coil library, the sequence could be schematically structured as "BLA-spacer resιdues-a-b-c-d-e-f-g-(a-b-c-d-e-f-g-)n-C-cap residues and/or N-cap resιdues-a'-b'-c'-d'-e'-f-g'- (a'-b'-c'-d'-e'-f-g'-)n-spacer residues-BLA
In a preferred embodiment, in this scheme the fixed residues a, a', d, and d' are combinations of hydrophobic strong helix forming residues such as ala, val, leu, g and g' are lys, and e and e' are glu (or alternatively lys, when e and e' are glu) Positions e, e', g, and g' may be fixed to further stabilize the coiled coil with salt bridges Positions b, b', c, c', f and f ,may be random residues
In another preferred embodiment, a library with less helical bias is generated having more random residues on the surface of the helix In this embodiment, positions g and g' and e and e' may be random residues as well In the schematically presented libraries of above, n would be 1 , 2, 3, 4, 5 or more
In another preferred embodiment, an alternative set of fixed residues is used to generate a bias to a parallel coiled coil After the two helices were aligned (i e the ends put in register) in the β- lactamase structure, the fixed positions include ala in a and a' leu in d and d', glu in e and e', lys in g and g', and random residues in the remaining positions In this embodiment, g and g' may also be randomized
In a preferred embodiment, biased SH-3 domain-binding o gonucleotides/peptides are made SH-3 domains have been shown to recognize short target motifs (SH-3 domain-binding peptides), about ten to twelve residues in a linear sequence, that can be encoded as short peptides with high affinity for the target SH-3 domain Consensus sequences for SH-3 domain binding proteins have been proposed Thus, in a preferred embodiment, oligos/peptides are made with the following biases 1 XXXPPXPXX, wherein X is a randomized residue 2 (within the positions of residue positions 11 to -2)
11 10 9 8 7 6 5 4 3 2 1 Met GlyaallaalO aa9 aa8 aa7 Arg Pro Leu Pro Pro hyd 0 -1 -2
Pro hyd hyd Gly Gly Pro Pro STOP atg ggc nnk nnk nnk nnk nnk aga cct ctg cct cca sbk ggg sbk sbk gga ggc cca cct TAA1.
In this embodiment, the N-terminus flanking region is suggested to have the greatest effects on binding affinity and is therefore entirely randomized "Hyd" indicates a bias toward a hydrophobic residue, i e - Val, Ala, Gly, Leu, Pro, Arg To encode a hydrophobically biased residue, "sbk" codon biased structure is used Examination of the codons within the genetic code will ensure this encodes generally hydrophobic residues s= g,c, b= t, g, c, v= a, g, c, m= a, c, k= t, g, n= a, t, g, c
In general, the random peptides range from about 4 to about 50 residues in length, with from about 5 to about 30 being preferred, and from about 10 to about 20 being especially preferred
The random peptιde(s) can be fused to a scaffold in a variety of positions, as is more fully outlined herein, to form fusion polypeptides
In a preferred embodiment, in addition to the scaffold protein and the peptide, the fusion proteins of the present invention preferably include additional components, including, but not limited to, fusion partners, including linkers
By "fusion partner" herein is meant a sequence that is associated with the random peptide that confers upon all members of the library in that class a common function or ability Fusion partners can be heterologous (i e not native to the host cell), or synthetic (not native to any cell) Suitable fusion partners include, but are not limited to a) presentation structures, as defined below, which provide the peptides in a conformationally restricted or stable form, b) targeting sequences, defined below, which allow the localization of the peptide into a subcellular or extracellular compartment, c) rescue sequences as defined below, which allow the purification or isolation of either the peptides or the nucleic acids encoding them, d) stability sequences, which confer stability or protection from degradation to the peptide or the nucleic acid encoding it, for example resistance to proteolytic degradation, e) linker sequences, which conformationally decouple the random peptide elements from the scaffold itself, which keep the peptide from interfering with scaffold folding, or f), any combination of a), b), c), d) and e) as well as linker sequences as needed
In a preferred embodiment, the fusion partner is a presentation structure By "presentation structure" or grammatical equivalents herein is meant a sequence, which, when fused to peptides, causes the peptides to assume a conformationally restricted form Proteins interact with each other largely through conformationally constrained domains Although small peptides with freely rotating ammo and carboxyl termini can have potent functions as is known in the art, the conversion of such peptide structures into pharmacologic agents is difficult due to the inability to predict side-chain positions for peptidomimetic synthesis Therefore the presentation of peptides in conformationally constrained structures will benefit both the later generation of pharmacophore models and pharmaceuticals and will also likely lead to higher affinity interactions of the peptide with the target protein This fact has been recognized in the combinatorial library generation systems using biologically generated short peptides in bacterial phage systems A number of workers have constructed small domain molecules in which one might present randomized peptide structures
Thus, synthetic presentation structures, i e artificial polypeptides, are capable of presenting a randomized peptide as a conformationally-restπcted domain Generally such presentation structures comprise a first portion joined to the N-terminal end of the randomized peptide, and a second portion joined to the C-termmal end of the peptide, that is, the peptide is inserted into the presentation structure, although variations may be made, as outlined below, in which elements of the presentation structure are included within the random peptide sequence To increase the functional isolation of the randomized expression product, the presentation structures are selected or designed to have minimal biologically activity when expressed in the target cell
Preferred presentation structures maximize accessibility to the peptide by presenting it on an exterior surface such as a loop, and also cause further conformational constraints in a peptide Accordingly, suitable presentation structures include, but are not limited to, dimeπzation sequences, minibody structures, loops on β-turns and coiled-coil stem structures in which residues not critical to structure are randomized, zinc-finger domains, cysteine-lmked (disulfide) structures, transglutaminase linked structures, cyclic peptides, B-loop structures, helical barrels or bundles, leucine zipper motifs, etc
In a preferred embodiment, the presentation structure is a coiled-coil structure, allowing the presentation of the randomized peptide on an exterior loop See, for example, Myszka et al , Biochem 33 2362-2373 (1994), hereby incorporated by reference) Using this system investigators have isolated peptides capable of high affinity interaction with the appropriate target In general, coiled-coil structures allow for between 6 to 20 randomized positions
A preferred coiled-coil presentation structure is as follows
MGCAALESEVSALESEVASLESEVAALGRGDMPLAAVKSKLSAVKSKLASVKSKLAACGPP The underlined regions represent a coiled-coil leucine zipper region defined previously (see Martin et al , EMBO J 13(22) 5303-5309 (1994), incorporated by reference) The bolded GRGDMP region represents the loop structure and when appropriately replaced with randomized peptides (i e peptides, generally depicted herein as (X)n, where X is an am o acid residue and n is an integer of at least 5 or 6) can be of variable length The replacement of the bolded region is facilitated by encoding restriction endonuclease sites in the underlined regions, which allows the direct incorporation of randomized oligonucleotides at these positions For example, a preferred embodiment generates a Xhol site at the double underlined LE site and a Hmdlll site at the double-underlined KL site
In a preferred embodiment, the presentation structure is a minibody structure A "minibody" is essentially composed of a minimal antibody complementarity region The minibody presentation structure generally provides two randomizing regions that in the folded protein are presented along a single face of the tertiary structure See for example Bianchi et al , J Mol Biol
236(2) 649-59 (1994), and references cited therein, all of which are incorporated by reference) Investigators have shown this minimal domain is stable in solution and have used phage selection systems in combinatorial libraries to select minibodies with peptide regions exhibiting high affinity, Kd = 107, for the pro-inflammatory cytokine IL-6
A preferred minibody presentation structure is as follows
MGRNSQATSGFTFSHFYMEWVRGGEYIAASRHKHNKYTTEYSASVKGRYIVSRDTSQSILYLQ KKKGPP The bold, underline regions are the regions which may be randomized The italized phenylalanine must be invariant in the first randomizing region The entire peptide is cloned in a three-oligonucleotide variation of the coiled-coil embodiment, thus allowing two different randomizing regions to be incorporated simultaneously This embodiment utilizes non- pa ndromic BstXI sites on the termini In a preferred embodiment, the presentation structure is a sequence that contains generally two cysteme residues, such that a disulfide bond may be formed, resulting in a conformationally constrained sequence This embodiment is particularly preferred ex vivo, for example when secretory targeting sequences are used As will be appreciated by those in the art, any number of random sequences, with or without spacer or linking sequences, may be flanked with cysteme residues In other embodiments, effective presentation structures may be generated by the random regions themselves For example, the random regions may be "doped" with cysteme residues which, under the appropriate redox conditions, may result in highly crosslmked structured conformations, similar to a presentation structure Similarly, the randomization regions may be controlled to contain a certain number of residues to confer β- sheet or α-helical structures
In a preferred embodiment, the presentation sequence confers the ability to bind metal ions to confer secondary structure Thus, for example, C2H2 zinc finger sequences are used, C2H2 sequences have two cysteines and two histidines placed such that a zinc ion is chelated Zinc finger domains are known to occur independently in multiple zinc-finger peptides to form structurally independent, flexibly linked domains See J Mol Biol 228 619 (1992) A general consensus sequence is (5 ammo acιds)-C-(2 to 3 am o acιds)-C-(4 to 12 am o acιds)-H-(3 ammo acιds)-H-(5 ammo acids) A preferred example would be -FQCEEC-random peptide of 3 to 20 am o acids-HIRSHTG-
Similarly, CCHC boxes can be used (see Biochem Biophys Res Commun 242 385 (1998)), that have a consensus seqeunce -C-(2 ammo acιds)-C-(4 to 20 random peptιde)-H-(4 ammo acιds)-C- (see Bavoso et al , Biochem Biophys Res Comm 242(2) 385 (1998), hereby incorporated by reference Preferred examples include (1 ) -VKCFNC-4 to 20 random ammo aαds-HTARNCR-, based on the nucleocapsid protein P2, (2) a sequence modified from tehat of the naturally occuπng zinc-binding peptide of the Lasp-1 LIM domain (Hammarstrom et al , Biochem 35 12723 (1996)), and (3) -MNPNCARCG-4 to 20 random ammo acids-HKACF-, based on the nmr structural ensemble 1ZFP (Hammarstrom et al , Biochem 35 U S C 35(39) 12723 (1996)
In a preferred embodiment, the presentation structure is a dimeπzation sequence, including self- binding peptides A dimeπzation sequence allows the non-covalent association of two peptide sequences, which can be the same or different, with sufficient affinity to remain associated under normal physiological conditions These sequences may be used in several ways In a preferred embodiment, one terminus of the random peptide is joined to a first dimeπzation sequence and the other terminus is joined to a second dimeπzation sequence, which can be the same or different from the first sequence This allows the formation of a loop upon association of the dimenzing sequences Alternatively, the use of these sequences effectively allows small libraries of random peptides (for example, 104) to become large libraries if two peptides per cell are generated which then dimeπze, to form an effective library of 108 (104 X 104) It also allows the formation of longer random peptides, if needed, or more structurally complex random peptide molecules The dimers may be homo- or heterodimers
Dimerization sequences may be a single sequence that self-aggregates, or two different sequences that associate That is, nucleic acids encoding both a first random peptide with dimerization sequence 1 , and a second random peptide with dimerization sequence 2, such that upon introduction into a cell and expression of the nucleic acid, dimerization sequence 1 associates with dimerization sequence 2 to form a new random peptide structure The use of dimeπzation sequences allows the "circulaπzation' of the random peptides, that is, if a dimerization sequence is used at each terminus of the peptide, the resulting structure can form a "stem-loop" type of structure Furthermore, the use of dimenzing sequences fused to both the N- and C-terminus of the scaffold such as GFP forms a noncovalently cyclized scaffold random peptide library
Suitable dimerization sequences will encompass a wide variety of sequences Any number of protein-protein interaction sites are known In addition, dimerization sequences may also be elucidated using standard methods such as the yeast two hybrid system, traditional biochemical affinity binding studies, or even using the present methods See U S S N 60/080,444, filed April 2, 1998, hereby incorporated by reference in its entireity Particularly preferred dimerization peptide sequences include, but are not limited to, -EFLIVKS-, EEFLIVKKS-, -FESIKLV-, and - VSIKFEL-
In a preferred embodiment, the fusion partner is a targeting sequence As will be appreciated by those in the art, the localization of proteins within a cell is a simple method for increasing effective concentration and determining function For example, RAF1 when localized to the mitochondnal membrane can inhibit the anti-apoptotic effect of BCL-2 Similarly, membrane bound Sos induces Ras mediated signaling in T-lymphocytes These mechanisms are thought to rely on the principle of limiting the search space for ligands, that is to say, the localization of a protein to the plasma membrane limits the search for its ligand to that limited dimensional space near the membrane as opposed to the three dimensional space of the cytoplasm Alternatively, the concentration of a protein can also be simply increased by nature of the localization Shuttling the proteins into the nucleus confines them to a smaller space thereby increasing concentration Finally, the ligand or target may simply be localized to a specific compartment, and inhibitors must be localized appropriately
Thus, suitable targeting sequences include, but are not limited to, binding sequences capable of causing binding of the expression product to a predetermined molecule or class of molecules while retaining bioactivity of the expression product, (for example by using enzyme inhibitor or substrate sequences to target a class of relevant enzymes), sequences signalling selective degradation, of itself or co-bound proteins, and signal sequences capable of constitutively localizing the peptides to a predetermined cellular locale, including a) subcellular locations such as the Golgi, endoplasmic reticulum, nucleus, nucleoli, nuclear membrane, mitochondria, chloroplast, secretory vesicles, lysosome, and cellular membrane, and b) extracellular locations via a secretory signal Particularly preferred is localization to either subcellular locations or to the outside of the cell via secretion
In a preferred embodiment, the targeting sequence is a nuclear localization signal (NLS) NLSs are generally short, positively charged (basic) domains that serve to direct the entire protein in which they occur to the cell's nucleus Numerous NLS ammo acid sequences have been reported including single basic NLS's such as that of the SV40 (monkey virus) large T Antigen (Pro Lys Lys Lys Arg Lys Val), Kalderon (1984), et al , Cell, 39 499-509, the human retmoic acid receptor-β nuclear localization signal (ARRRRP), NFKB p50 (EEVQRKRQKL, Ghosh et al , Cell 62 1019 (1990), NFKB p65 (EEKRKRTYE, Nolan et al , Cell 64 961 (1991 ), and others (see for example Bou kas, J Cell Biochem 55(1) 32-58 (1994), hereby incorporated by reference) and double basic NLS's exemplified by that of the Xenopus (African clawed toad) protein, nucleoplasmin (Ala Val Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gin Ala Lys Lys Lys Lys Leu Asp), Dmgwall, et al , Cell, 30 449-458, 1982 and Dingwail, et al , J Cell Biol , 107 641-849, 1988) Numerous localization studies have demonstrated that NLSs incorporated in synthetic peptides or grafted onto reporter proteins not normally targeted to the cell nucleus cause these peptides and reporter proteins to be concentrated in the nucleus See, for example, Dmgwall, and Laskey, Ann, Rev Cell Biol , 2 367-390, 1986, Bonnerot, et al , Proc Natl Acad Sci USA, 84 6795-6799, 1987, Galileo, et al , Proc Natl Acad Sci USA, 87 458-462, 1990
in a preferred embodiment, the targeting sequence is a membrane anchoring signal sequence This is particularly useful since many parasites and pathogens bind to the membrane, in addition to the fact that many intracellular events originate at the plasma membrane Thus, membrane-bound peptide libraries are useful for both the identification of important elements in these processes as well as for the discovery of effective inhibitors The invention provides methods for presenting the randomized expression product extracellularly or in the cytoplasmic space For extracellular presentation, a membrane anchoring region is provided at the carboxyl terminus of the peptide presentation structure The randomized epression product region is expressed on the cell surface and presented to the extracellular space, such that it can bind to other surface molecules (affecting their function) or molecules present in the extracellular medium The binding of such molecules could confer function on the cells expressing a peptide that binds the molecule The cytoplasmic region could be neutral or could contain a domain that, when the extracellular randomized expression product region is bound, confers a function on the cells (activation of a kmase, phosphatase, binding of other cellular components to effect function) Similarly, the randomized expression product-containing region could be contained within a cytoplasmic region, and the transmembrane region and extracellular region remain constant or have a defined function
Membrane-anchoring sequences are well known in the art and are based on the genetic geometry of mammalian transmembrane molecules Peptides are inserted into the membrane based on a signal sequence (designated herein as ssTM) and require a hydrophobic transmembrane domain (herein TM) The transmembrane proteins are inserted into the membrane such that the regions encoded 5' of the transmembrane domain are extracellular and the sequences 3' become intracellular Of course, if these transmembrane domains are placed 5' of the variable region, they will serve to anchor it as an intracellular domain, which may be desirable in some embodiments ssTMs and TMs are known for a wide variety of membrane bound proteins, and these sequences may be used accordingly, either as pairs from a particular protein or with each component being taken from a different protein, or alternatively, the sequences may be synthetic, and derived entirely from consensus as artificial delivery domains
As will be appreciated by those in the art, membrane-anchoring sequences, including both ssTM and TM, are known for a wide variety of proteins and any of these may be used Particularly preferred membrane-anchoring sequences include, but are not limited to, those derived from CD8, ICAM-2, IL-8R, CD4 and LFA-1
Useful sequences include sequences from 1 ) class I integral membrane proteins such as IL-2 receptor β-cham (residues 1-26 are the signal sequence, 241-265 are the transmembrane residues, see Hatakeyama et al , Science 244 551 (1989) and von Heijne et al, Eur J Biochem 174 671 (1988)) and insulin receptor β-chain (residues 1-27 are the signal, 957-959 are the transmembrane domain and 960-1382 are the cytoplasmic domain, see Hatakeyama, supra, and Ebma et al , Cell 40 747 (1985)), 2) class II integral membrane proteins such as neutral endopeptidase (residues 29-51 are the transmembrane domain, 2-28 are the cytoplasmic domain, see Malfroy et al , Biochem Biophys Res Commun 144 59 (1987)), 3) type III proteins such as human cytochrome P450 NF25 (Hatakeyama, supra), and 4) type IV proteins such as human P-glycoprotein (Hatakeyama, supra) Particularly preferred are CD8 and ICAM- 2 For example, the signal sequences from CD8 and ICAM-2 lie at the extreme 5' end of the transcript These consist of the ammo acids 1 -32 in the case of CD8 (MASPLTRFLSLNLLLLGESILGSGEAKPQAP, Nakauchi et al , PNAS USA 82 5126 (1985) and 1-21 in the case of ICAM-2 (MSSFGYRTLTVALFTLICCPG, Staunton et al , Nature (London) 339 61 (1989)) These leader sequences deliver the construct to the membrane while the hydrophobic transmembrane domains, placed 3' of the random peptide region, serve to anchor the construct in the membrane These transmembrane domains are encompassed by ammo acids 145-195 from CD8
(PQRPEDCRPRGSVKGTGLDFACDIYIWAPLAGICVALLLSLIITLICYHSR, Nakauchi, supra) and 224-256 from ICAM-2 (MVIIVTWSVLLSLFVTSVLLCFIFGQHLRQQR, Staunton, supra)
Alternatively, membrane anchoring sequences include the GPI anchor, which results in a covalent bond between the molecule and the lipid bilayer via a glycosyl-phosphatidylmositol bond for example in DAF (PNKGSGTTSGTTRLLSGHTCFTLTGLLGTLVTMGLLT, with the bolded seπne the site of the anchor, see Homans et al , Nature 333(6170) 269-72 (1988), and Moran et al . J Biol Chem 266 1250 (1991 )) In order to do this, the GPI sequence from Thy-1 can be cassetted 3' of the variable region in place of a transmembrane sequence
Similarly, myπstylation sequences can serve as membrane anchoring sequences It is known that the myπstylation of c-src recruits it to the plasma membrane This is a simple and effective method of membrane localization, given that the first 14 ammo acids of the protein are solely responsible for this function MGSSKSKPKDPSQR (see Cross et al , Mol Cell Biol 4(9) 1834 (1984), Spencer et al , Science 262 1019-1024 (1993), both of which are hereby incorporated by reference) This motif has already been shown to be effective in the localization of reporter genes and can be used to anchor the zeta chain of the TCR This motif is placed 5' of the variable region in order to localize the construct to the plasma membrane Other modifications such as palmitoylation can be used to anchor constructs in the plasma membrane, for example, palmitoylation sequences from the G protein-coupled receptor kmase GRK6 sequence
(LLQRLFSRQDCCGNCSDSEEELPTRL, with the bold cysteines being palmitolyated, Stoffel et al , J Biol Chem 269 27791 (1994)), from rhodopsin (KQFRNCMLTSLCCGKNPLGD, Barnstable et al , J Mol Neurosci 5(3) 207 (1994)), and the p21 H-ras 1 protein (LNPPDESGPGCMSCKCVLS, Capon et al , Nature 302 33 (1983))
In a preferred embodiment, the targeting sequence is a lysozomal targeting sequence, including, for example, a lysosomal degradation sequence such as Lamp-2 (KFERQ, Dice, Ann N Y Acad Sci 674 58 (1992), or lysosomal membrane sequences from Lamp-1 (MLIPIAGFFALAGLVLIVLIAYLIGRKRS AGYQ1\. Uthavakumar et al . Cell Mol Biol Res 41 405 (1995)) or Lamp-2 (LVPIAVGAALAGVLILVLLAYFIGLKHHϊiΔGYEQF, Konecki et la , Biochem Biophys Res Comm 205 1-5 (1994), both of which show the transmembrane domains in italics and the cytoplasmic targeting signal underlined)
Alternatively, the targeting sequence may be a mitrochondπal localization sequence, including mitochondnal matrix sequences (e g yeast alcohol dehydrogenase III,
MLRTSSLFTRRVQPSLFSRNILRLQST, Schatz, Eur J Biochem 165 1-6 (1987)), mitochondnal inner membrane sequences (yeast cytochrome c oxidase subunit IV,
MLSLRQSIRFFKPATRTLCSSRYLL, Schatz, supra), mitochondnal intermembrane space sequences (yeast cytochrome d ,
MFSMLSKRWAQRTLSKSFYSTATGAASKSGKLTQKLVTAGVAAAGITASTLLYADSLTAEAMTA
, Schatz, supra) or mitochondnal outer membrane sequences (yeast 70 kD outer membrane protein, MKSFITRNKTAILATVAATGTAIGAYYYYNQLQQQQQRGKK, Schatz, supra)
The target sequences may also be endoplasmic reticulum sequences, including the sequences from calreticulin (KDEL, Pelham, Royal Society London Transactions B, 1-10 (1992)) or adenovirus E3/19K protein (LYLSRRSFIDEKKMP, Jackson et al , EMBO J 9 3153 (1990)
Furthermore, targeting sequences also include peroxisome sequences (for example, the peroxisome matrix sequence from Luciferase, SKL, Keller et al , PNAS USA 4 3264 (1987)), farnesylation sequences (for example, P21 H-ras 1 , LNPPDESGPGCMSCKCVLS, with the bold cysteme famesylated, Capon, supra), geranylgeranylation sequences (for example, protein rab- 5A, LTEPTQPTRNQCCSN, with the bold cysteines geranylgeranylated, Farnsworth, PNAS USA 91 11963 (1994)), or destruction sequences (cyclin B1 , RTALGDIGN, Klotzbucher et al , EMBO J 1 3053 (1996))
In a preferred embodiment, the targeting sequence is a secretory signal sequence capable of effecting the secretion of the peptide There are a large number of known secretory signal sequences which are placed 5' to the variable peptide region, and are cleaved from the peptide region to effect secretion into the extracellular space Secretory signal sequences and their transferabi ty to unrelated proteins are well known, e g , Silhavy, et al (1985) Microbiol Rev 49, 398-418 This is particularly useful to generate a peptide capable of binding to the surface of, or affecting the physiology of, a target cell that is other than the host cell, e g , the cell infected with the retrovirus In a preferred approach, a fusion product is configured to contain, in series, secretion signal peptide-presentation structure-randomized expression product region-presentation structure, see Figure 3 In this manner, target cells grown in the vicinity of cells caused to express the library of peptides, are bathed in secreted peptide Target cells exhibiting a physiological change in response to the presence of a peptide, e g , by the peptide binding to a surface receptor or by being internalized and binding to intracellular targets, and the secreting cells are localized by any of a variety of selection schemes and the peptide causing the effect determined Exemplary effects include variously that of a designer cytokme (i e , a stem cell factor capable of causing hematopoietic stem cells to divide and maintain their totipotential), a factor causing cancer cells to undergo spontaneous apoptosis, a factor that binds to the cell surface of target cells and labels them specifically, etc
Suitable secretory sequences are known, including signals from IL-2
(MYRMQLLSCIALSLALVTNS, Villmger et al , J Immunol 155 3946 (1995)), growth hormone (MATGSRTSLLLAFGLLCLPWLQEGSAFPI, Roskam et al , Nucleic Acids Res 7 30 (1979)), preproinsulm (MALWMRLLPLLALLALWGPDPAAAFVN. Bell et al , Nature 284 26 (1980)), and influenza HA protein (MKAKLLVLLYAFVAGDQI, Sekiwawa et al , PNAS 80 3563)), with cleavage between the non-underlined-underlmed junction A particularly preferred secretory signal sequence is the signal leader sequence from the secreted cytokme IL-4, which comprises the first 24 ammo acids of IL-4 as follows MGLTSQLLPPLFFLLACAGNFVHG
In a preferred embodiment, the fusion partner is a rescue sequence A rescue sequence is a sequence which may be used to purify or isolate either the peptide or the nucleic acid encoding it Thus, for example, peptide rescue sequences include purification sequences such as the Hιs6 tag for use with Ni affinity columns and epitope tags for detection, immunoprecipitation or FACS (fluoroscence-activated cell sorting) Suitable epitope tags include myc (for use with the commercially available 9E10 antibody), the BSP biotmylation target sequence of the bacterial enzyme BirA, flu tags, lacZ, GST, and Strep tag I and II
Alternatively, the rescue sequence may be a unique oligonucleotide sequence which serves as a probe target site to allow the quick and easy isolation of the retroviral construct, via PCR, related techniques, or hybridization
In a preferred embodiment, the fusion partner is a stability sequence to confer stability to the peptide or the nucleic acid encoding it Thus, for example, peptides may be stabilized by the incorporation of glycines after the initiation methionine (MG or MGG0), for protection of the peptide to ubiquitmation as per Varshavsky's N-End Rule, thus conferring long half-life in the cytoplasm Similarly, two prolmes at the C-terminus impart peptides that are largely resistant to carboxypeptidase action The presence of two glycines prior to the prolmes impart both flexibility and prevent structure initiating events in the di-prolme to be propagated into the peptide structure Thus, preferred stability sequences are as follows MG(X)nGGPP, where X is any ammo acid and n is an integer of at least four Thus, the terms "N-cap" , "N-cap residues", "N-cap sequence" or grammatical equivalents thereof refer to a sequence conferring stability, particularly proteolytic stability, when fused to the N-terminus of a peptide, or to the N-terminus of a scaffold protein, or to the N-terminus of a presentation structure Similarly, the terms "C- cap", "C-cap residues", "C-cap sequence" or grammatical equivalents thereof refer to a sequence conferring stability, particularly proteolytic stability, when fused to the N-terminus of a peptide, or to the N-terminus of a scaffold protein, or to the N-terminus of a presentation structure
The fusion partners may be placed anywhere (i e N-terminal, C-termmal, internal) in the structure as the biology and activity permits In addition, while the discussion has been directed to the fusion of fusion partners to the peptide portion of the fusion polypeptide, it is also possible to fuse one or more of these fusion partners to the scaffold portion of the fusion polypeptide Thus, for example, the scaffold may contain a targeting sequence (either N-termmally, C- terminally, or internally, as described below) at one location, and a rescue sequence in the same place or a different place on the molecule Thus, any combination of fusion partners and peptides and scaffold proteins may be made
In a preferred embodiment, the fusion partner includes a linker or tethering sequence Linker sequences between various targeting sequences (for example, membrane targeting sequences) and the other components of the constructs (such as the randomized peptides) may be desirable to allow the peptides to interact with potential targets unhindered For example, useful linkers include glycine polymers (G)n, glycine-senne polymers (including, for example, (GS)n, (GSGGS)n and (GGGS)n, where n is an integer of at least one), glycme-alanine polymers, alanine-seπne polymers, and other flexible linkers such as the tether for the shaker potassium channel, and a large variety of other flexible linkers, as will be appreciated by those in the art Glycine and glycine-senne polymers are preferred since both of these ammo acids are relatively unstructured, and therefore may be able to serve as a neutral tether between components Glycine polymers are the most preferred as glycine accesses significantly more phi-psi space than even alanine, and is much less restricted tan residues with longer side chains (see Scheraga, Rev Computational Chem 11173-142 (1992)) Secondly, senne is hydrophilic and therefore able to solubilize what could be a globular glycine chain Third, similar chains have been shown to be effective in joining subunits of recombinant proteins such as single chain antibodies In a preferred embodiment, the peptide is connected to the scaffold via linkers That is, while one embodiment utilizes the direct linkage of the peptide to the scaffold, or of the peptide and any fusion partners to the scaffold, a preferred embodiment utilizes linkers at one or both ends of the peptide That is, when attached either to the N- or C-terminus, one linker may be used When the peptide is inserted in an internal position, as is generally outlined below, preferred embodiments utilize at least one linker and preferably two, one at each terminus of the peptide Linkers are generally preferred in order to conformationally decouple any insertion sequence (i e the peptide) from the scaffold structure itself, to minimize local distortions in the scaffold structure that can either destabilize folding intermediates or allow access to GFP's buried tπpeptide fluorophore, which decreases (or eliminates) GFP's fluorescence due to exposure to exogeneous collisional fluorescence quenchers (see Phillips, Curr Opin Structural Biology 7 821 (1997), hereby incorporated by reference in its entireity)
Accordingly, as outlined below, when the peptides are inserted into internal positions in scaffold, preferred embodiments utilize linkers, and preferably (gly)n linkers, where n is 1 or more, with n being two, three, four, five and six, although linkers of 7-10 or more ammo acids are also possible Generally in this embodiment, no ammo acids with β-carbons are used in the linkers
In another preferred embodiment, the linker comprises the sequence GQGGG Alternatively the linker comprises the sequence GQAGGGG As outlined herein, either linker may be fused to either the N-terminus or C-terminus of a peptide or scaffold protein
In addition, the fusion partners, including presentation structures, may be modified, randomized, and/or matured to alter the presentation orientation of the randomized expression product For example, determinants at the base of the loop may be modified to slightly modify the internal loop peptide tertiary structure, which maintaining the randomized ammo acid sequence
In a preferred embodiment, combinations of fusion partners are used Thus, for example, any number of combinations of presentation structures, targeting sequences, rescue sequences, and stability sequences may be used, with or without linker sequences As will be appreciated by those in the art, using a base vector that contains a cloning site for receiving random and/or biased libraries, one can cassette in various fusion partners 5' and 3' of the library In addition, as discussed herein, it is possible to have more than one variable region in a construct, either to together form a new surface or to bring two other molecules together Similarly, as more fully outlined below, it is possible to have peptides inserted at two or more different loops of the scaffold, preferably but not required to be on the same "face" of scaffold The invention further provides fusion nucleic acids encoding the fusion polypeptides of the invention As will be appreciated by those in the art, due to the degeneracy of the genetic code, an extremely large number of nucleic acids may be made, all of which encode the fusion proteins of the present invention Thus, having identified a particular am o acid sequence, those skilled in the art could make any number of different nucleic acids, by simply modifying the sequence of one or more codons in a way which does not change the ammo acid sequence of the fusion protein
Using the nucleic acids of the present invention which encode a fusion protein, a variety of expression vectors are made The expression vectors may be either self-replicating extrachromosomal vectors or vectors which integrate into a host genome Generally, these expression vectors include transcπptional and translational regulatory nucleic acid operably linked to the nucleic acid encoding the fusion protein The term "control sequences" refers to DNA sequences necessary for the expression of an operably linked coding sequence in a particular host organism The control sequences that are suitable for prokaryotes, for example, include a promoter, optionally an operator sequence, and a πbosome binding site Eukaryotic cells are known to utilize promoters, polyadenylation signals, and enhancers
Nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence For example, DNA for a presequence or secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotem that participates in the secretion of the polypeptide, a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence, or a πbosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation Generally, "operably linked" means that the DNA sequences being linked are contiguous, and, in the case of a secretory leader, contiguous and in reading phase However, enhancers do not have to be contiguous Linking is accomplished by ligation at convenient restriction sites If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice The transcπptional and translational regulatory nucleic acid will generally be appropriate to the host cell used to express the fusion protein, for example, transcπptional and translational regulatory nucleic acid sequences from Bacillus are preferably used to express the fusion protein in Bacillus Numerous types of appropriate expression vectors, and suitable regulatory sequences are known in the art for a variety of host cells
In general, the transcπptional and translational regulatory sequences may include, but are not limited to, promoter sequences, ribosomal binding sites, transcπptional start and stop sequences, translational start and stop sequences, and enhancer or activator sequences In a preferred embodiment, the regulatory sequences include a promoter and transcπptional start and stop sequences
Promoter sequences encode either constitutive or inducible promoters The promoters may be either naturally occurring promoters or hybrid promoters Hybrid promoters, which combine elements of more than one promoter, are also known in the art, and are useful in the present invention In a preferred embodiment, the promoters are strong promoters, allowing high expression in cells, particularly mammalian cells, such as the CMV promoter, particularly in combination with a Tet regulatory element
In addition, the expression vector may comprise additional elements For example, the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in mammalian or insect cells for expression and in a procaryotic host for cloning and amplification Furthermore, for integrating expression vectors, the expression vector contains at least one sequence homologous to the host cell genome, and preferably two homologous sequences which flank the expression construct The integrating vector may be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector Constructs for integrating vectors are well known in the art
In addition, in a preferred embodiment, the expression vector contains a selectable marker gene to allow the selection of transformed host cells Selection genes are well known in the art and will vary with the host cell used
A preferred expression vector system is a retroviral vector system such as is generally described in PCT/US97/01019 and PCT/US97/01048, both of which are hereby expressly incorporated by reference
The candidate nucleic acids are introduced into the cells for screening, as is more fully outlined below By "introduced into " or grammatical equivalents herein is meant that the nucleic acids enter the cells in a manner suitable for subsequent expression of the nucleic acid The method of introduction is largely dictated by the targeted cell type, discussed below Exemplary methods include CaP04 precipitation, liposome fusion, lipofectm®, electroporation, viral infection, etc The candidate nucleic acids may stably integrate into the genome of the host cell (for example, with retroviral introduction, outlined below), or may exist either transiently or stably in the cytoplasm (i e through the use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc ) As many pharmaceutically important screens require human or model mammalian cell targets, retroviral vectors capable of transfectmg such targets are preferred
The fusion proteins of the present invention are produced by culturing a host cell transformed with an expression vector containing nucleic acid encoding a fusion protein, under the appropriate conditions to induce or cause expression of the fusion protein The conditions appropriate for fusion protein expression will vary with the choice of the expression vector and the host cell, and will be easily ascertained by one skilled in the art through routine experimentation For example, the use of constitutive promoters in the expression vector will require optimizing the growth and proliferation of the host cell, while the use of an inducible promoter requires the appropriate growth conditions for induction In addition, in some embodiments, the timing of the harvest is important For example, the baculoviral systems used in insect cell expression are lytic viruses, and thus harvest time selection can be crucial for product yield
Appropriate host cells include yeast, bacteria, archebacteπa, fungi, and insect and animal cells, including mammalian cells Of particular interest are Drosophila melangaster cells, Saccharomyces cerevisiae and other yeasts, E coli, Bacillus subtilis, SF9 cells, C129 cells, 293 cells, Neurospora, BHK, CHO, COS, and HeLa cells, fibroblasts, Schwanoma cell lines, immortalized mammalian myeloid and lymphoid cell lines, Jurkat cells, mast cells and other endocrine and exocπne cells, and neuronal cells
In a preferred embodiment, the fusion proteins are expressed in mammalian cells Mammalian expression systems are also known in the art, and include retroviral systems A mammalian promoter is any DNA sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') transcription of a coding sequence for the fusion protein into mRNA A promoter will have a transcription initiating region, which is usually placed Oproximal to the 5' end of the coding sequence, and a TATA box, using a located 25-30 base pairs upstream of the transcription initiation site The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at the correct site A mammalian promoter will also contain an upstream promoter element (enhancer element), typically located within 100 to 200 base pairs upstream of the TATA box An upstream promoter element determines the rate at which transcription is initiated and can act in either orientation Of particular use as mammalian promoters are the promoters from mammalian viral genes, since the viral genes are often highly expressed and have a broad host range Examples include the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late promoter, herpes simplex virus promoter, and the CMV promoter Typically, transcription termination and polyadenylation sequences recognized by mammalian cells are regulatory regions located 3' to the translation stop codon and thus, together with the promoter elements, flank the coding sequence The 3' terminus of the mature mRNA is formed by site-specific post-translational cleavage and polyadenylation Examples of transcription terminator and polyadenlytion signals include those derived form SV40
The methods of introducing exogenous nucleic acid into mammalian hosts, as well as other hosts, is well known in the art, and will vary with the host cell used Techniques include dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, electroporation, viral infection, encapsulation of the polynucleotιde(s) in liposomes, and direct microinjection of the DNA into nuclei As outlined herein, a particularly preferred method utilizes retroviral infection, as outlined in PCT US97/01019, incorporated by reference
As will be appreciated by those in the art, the type of mammalian cells used in the present invention can vary widely Basically, any mammalian cells may be used, with mouse, rat, primate and human cells being particularly preferred, although as will be appreciated by those in the art, modifications of the system by pseudotypmg allows all eukaryotic cells to be used, preferably higher eukaryotes As is more fully described below, a screen will be set up such that the cells exhibit a selectable phenotype in the presence of a bioactive peptide As is more fully described below, cell types implicated in a wide variety of disease conditions are particularly useful, so long as a suitable screen may be designed to allow the selection of cells that exhibit an altered phenotype as a consequence of the presence of a peptide within the cell
Accordingly, suitable cell types include, but are not limited to, tumor cells of all types
(particularly melanoma, myeloid leukemia, carcinomas of the lung, breast, ovaries, colon, kidney, prostate, pancreas and testes), cardiomyocytes, endothe al cells, epithelial cells, lymphocytes (T-cell and B cell) , mast cells, eosinophils, vascular intimal cells, hepatocytes, leukocytes including mononuclear leukocytes, stem cells such as haemopoetic, neural, skin, lung, kidney, liver and myocyte stem cells (for use in screening for differentiation and de- differentiation factors), osteoclasts, chondrocytes and other connective tissue cells, keratmocytes, melanocytes, liver cells, kidney cells, and adipocytes Suitable cells also include known research cells, including, but not limited to, Jurkat T cells, NIH3T3 cells, CHO, Cos, etc See the ATCC cell line catalog, hereby expressly incorporated by reference
In one embodiment, the cells may be additionally genetically engineered, that is, contain exogeneous nucleic acid other than the fusion nucleic acid In a preferred embodiment, the fusion proteins are expressed in bacterial systems Bacterial expression systems are well known in the art
A suitable bacterial promoter is any nucleic acid sequence capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of the coding sequence of the fusion protein into mRNA A bacterial promoter has a transcription initiation region which is usually placed proximal to the 5' end of the coding sequence This transcription initiation region typically includes an RNA polymerase binding site and a transcription initiation site Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose and maltose, and sequences derived from biosynthetic enzymes such as tryptophan Promoters from bacteπophage may also be used and are known in the art In addition, synthetic promoters and hybrid promoters are also useful, for example, the tac promoter is a hybrid of the trp and lac promoter sequences Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and initiate transcription
In addition to a functioning promoter sequence, an efficient πbosome binding site is desirable In E coli, the πbosome binding site is called the Shme-Delgarno (SD) sequence and includes an initiation codon and a sequence 3-9 nucleotides in length located 3 - 11 nucleotides upstream of the initiation codon
The expression vector may also include a signal peptide sequence that provides for secretion of the fusion protein in bacteria The signal sequence typically encodes a signal peptide comprised of hydrophobic ammo acids which direct the secretion of the protein from the cell, as is well known in the art The protein is either secreted into the growth media (gram-positive bacteria) or into the peπplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria)
The bacterial expression vector may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed Suitable selection genes include genes which render the bacteria resistant to drugs such as ampicil n, chloramphenicol, erythromycin, kanamycin, neomycm and tetracyclme Selectable markers also include biosynthetic genes, such as those in the histidine, tryptophan and leucine biosynthetic pathways These components are assembled into expression vectors Expression vectors for bacteria are well known in the art, and include vectors for Bacillus subtilis, E coli, Streptococcus cremons, and Streptococcus lividans, among others
The bacterial expression vectors are transformed into bacterial host cells using techniques well known in the art, such as calcium chloride treatment, electroporation, and others
In one embodiment, fusion proteins are produced in insect cells Expression vectors for the transformation of insect cells, and in particular, baculovirus-based expression vectors, are well known in the art
In a preferred embodiment, fusion protein is produced in yeast cells Yeast expression systems are well known in the art, and include expression vectors for Saccharomyces cerevisiae, Candida albicans and C maltosa, Hansenula polymorpha, Kluyveromyces fragilis and K lactis, Pichia guilleπmondii and P pastons, Schizosaccharomyces pombe, and Yarrowia lipolytica Preferred promoter sequences for expression in yeast include the inducible GAL1 , 10 promoter, the promoters from alcohol dehydrogenase, enolase, glucokmase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase, hexokinase, phosphofructokinase, 3- phosphoglycerate mutase, pyruvate kmase, and the acid phosphatase gene Yeast selectable markers include ADE2, HIS4, LEU2, TRP1 , and ALG7, which confers resistance to tunicamycm, the neomycm phosphotransferase gene, which confers resistance to G418, and the CUP1 gene, which allows yeast to grow in the presence of copper ions
In addition, the fusion polypeptides of the invention may be further fused to other proteins, if desired, for example to increase expression
In one embodiment, the fusion nucleic acids, proteins and antibodies of the invention are labeled with a label other than the scaffold By "labeled" herein is meant that a compound has at least one element, isotope or chemical compound attached to enable the detection of the compound In general, labels fall into three classes a) isotopic labels, which may be radioactive or heavy isotopes, b) immune labels, which may be antibodies or antigens, and c) colored or fluorescent dyes The labels may be incorporated into the compound at any position
The fusion nucleic acids are introduced into the cells to screen for peptides capable of altering the phenotype of a cell In a preferred embodiment, a first plurality of cells is screened That is, the cells into which the fusion nucleic acids are introduced are screened for an altered phenotype Thus, in this embodiment, the effect of the bioactive peptide is seen in the same cells in which it is made, i e an autocπne effect
By a "plurality of cells" herein is meant roughly from about 103 cells to 108 or 109, with from 106 to 108 being preferred This plurality of cells comprises a cellular library, wherein generally each cell within the library contains a member of the peptide molecular library, i e a different peptide (or nucleic acid encoding the peptide), although as will be appreciated by those in the art, some cells within the library may not contain a peptide, and some may contain more than species of peptide When methods other than retroviral infection are used to introduce the candidate nucleic acids into a plurality of cells, the distribution of candidate nucleic acids within the individual cell members of the cellular library may vary widely, as it is generally difficult to control the number of nucleic acids which enter a cell during electroporation, etc
In a preferred embodiment, the fusion nucleic acids are introduced into a first plurality of cells, and the effect of the peptide is screened in a second or third plurality of cells, different from the first plurality of cells, i e generally a different cell type That is, the effect of the bioactive peptide is due to an extracellular effect on a second cell, i e an endocrine or paracπne effect This is done using standard techniques The first plurality of cells may be grown in or on one media, and the media is allowed to touch a second plurality of cells, and the effect measured Alternatively, there may be direct contact between the cells Thus, "contacting" is functional contact, and includes both direct and indirect In this embodiment, the first plurality of cells may or may not be screened
If necessary, the cells are treated to conditions suitable for the expression of the peptide (for example, when inducible promoters are used)
Thus, the methods of the present invention comprise introducing a molecular library of fusion nucleic acids encoding randomized peptides fused to scaffold into a plurality of cells, a cellular library Each of the nucleic acids comprises a different nucleotide sequence encoding scaffold with a random peptide The plurality of cells is then screened, as is more fully outlined below, for a cell exhibiting an altered phenotype The altered phenotype is due to the presence of a bioactive peptide
By "altered phenotype" or "changed physiology" or other grammatical equivalents herein is meant that the phenotype of the cell is altered in some way, preferably in some detectable and/or measurable way As will be appreciated in the art, a strength of the present invention is the wide variety of cell types and potential phenotypic changes which may be tested using the present methods Accordingly, any phenotypic change which may be observed, detected, or measured may be the basis of the screening methods herein Suitable phenotypic changes include, but are not limited to gross physical changes such as changes in cell morphology, cell growth, cell viability, adhesion to substrates or other cells, and cellular density, changes in the expression of one or more RNAs, proteins, lipids, hormones, cytokines, or other molecules, changes in the equilibrium state (i e half-life) or one or more RNAs, proteins, lipids, hormones, cytokines, or other molecules, changes in the localization of one or more RNAs, proteins, lipids, hormones, cytokines, or other molecules, changes in the bioactivity or specific activity of one or more RNAs, proteins, lipids, hormones, cytokines, receptors, or other molecules, changes in the secretion of ions, cytokines, hormones, growth factors, or other molecules, alterations in cellular membrane potentials, polarization, integrity or transport, changes in infectivity, susceptabihty, latency, adhesion, and uptake of viruses and bacterial pathogens, etc By "capable of altering the phenotype" herein is meant that the bioactive peptide can change the phenotype of the cell in some detectable and/or measurable way
The altered phenotype may be detected in a wide variety of ways, as is described more fully below, and will generally depend and correspond to the phenotype that is being changed Generally, the changed phenotype is detected using, for example microscopic analysis of cell morphology, standard cell viability assays, including both increased cell death and increased cell viability, for example, cells that are now resistant to cell death via virus, bacteria, or bacterial or synthetic toxins, standard labeling assays such as fluorometπc indicator assays for the presence or level of a particular cell or molecule, including FACS or other dye staining techniques, biochemical detection of the expression of target compounds after killing the cells, etc In some cases, as is more fully described herein, the altered phenotype is detected in the cell in which the fusion nucleic acid was introduced, in other embodiments, the altered phenotype is detected in a second cell which is responding to some molecular signal from the first cell
An altered phenotype of a cell indicates the presence of a bioactive peptide, acting preferably in a transdommant way By "transdominant" herein is meant that the bioactive peptide indirectly causes the altered phenotype by acting on a second molecule, which leads to an altered phenotype That is, a transdominant expression product has an effect that is not in cis, i e , a trans event as defined in genetic terms or biochemical terms A transdominant effect is a distinguishable effect by a molecular entity (i e , the encoded peptide or RNA) upon some separate and distinguishable target, that is, not an effect upon the encoded entity itself As such, transdominant effects include many well-known effects by pharmacologic agents upon target molecules or pathways in cells or physiologic systems, for instance, the β-lactam antibiotics have a transdominant effect upon peptidoglycan synthesis in bacterial cells by binding to penicillin binding proteins and disrupting their functions An exemplary transdominant effect by a peptide is the ability to inhibit NF-κB signaling by binding to lκB-α at a region critical for its function, such that in the presence of sufficient amounts of the peptide (or molecular entity), the signaling pathways that normally lead to the activation of NF-κB through phosphorylation and/or degradation of lκB-α are inhibited from acting at lκB-α because of the binding of the peptide or molecular entity In another instance, signaling pathways that are normally activated to secrete IgE are inhibited in the presence of peptide Or, signaling pathways in adipose tissue cells, normally quiescent, are activated to metabolize fat Or, in the presence of a peptide, intracellular mechanisms for the replication of certain viruses, such as HIV-I, or Herpes viπdae family members, or Respiratory Syncytia Virus, for example, are inhibited
A transdominant effect upon a protein or molecular pathway is clearly distinguishable from randomization, change, or mutation of a sequence within a protein or molecule of known or unknown function to enhance or dimmish a biochemical ability that protein or molecule already manifests For instance, a protein that enzymatically cleaves β-lactam antibiotics, a β-lactamase, could be enhanced or diminished in its activity by mutating sequences internal to its structure that enhance or diminish the ability of this enzyme to act upon and cleave β-lactam antibiotics This would be called a cis mutation to the protein The effect of this protein upon β-lactam antibiotics is an activity the protein already manifests, to a distinguishable degree Similarly, a mutation in the leader sequence that enhanced the export of this protein to the extracellular spaces wherein it might encounter β-lactam molecules more readily, or a mutation within the sequence that enhance the stability of the protein, would be termed cis mutations in the protein For comparison, a transdominant effector of this protein would include an agent, independent of the β-lactamase, that bound to the β-lactamase in such a way that it enhanced or diminished the function of the β-lactamase by virtue of its binding to β-lactamase
In a preferred embodiment, once a cell with an altered phenotype is detected, the presence of the fusion protein is verified, to ensure that the peptide was expressed and thus that the altered phenotype can be due to the presence of the peptide As will be appreciated by those in the art, this verification of the presence of the peptide can be done either before, during or after the screening for an altered phenotype This can be done in a variety of ways, although preferred methods utilize FACS techniques Once the presence of the fusion protein is verified, the cell with the altered phenotype is generally isolated from the plurality which do not have altered phenotypes This may be done in any number of ways, as is known in the art, and will in some instances depend on the assay or screen Suitable isolation techniques include, but are not limited to, FACS, lysis selection using complement, cell cloning, scanning by Fluoπmager, expression of a "survival" protein, induced expression of a cell surface protein or other molecule that can be rendered fluorescent or taggable for physical isolation, expression of an enzyme that changes a non-fluorescent molecule to a fluoroscent one, overgrowth against a background of no or slow growth, death of cells and isolation of DNA or other cell vitality indicator dyes, etc
In a preferred embodiment, the fusion nucleic acid and/or the bioactive peptide (i e the fusion protein) is isolated from the positive cell This may be done in a number of ways In a preferred embodiment, primers complementary to DNA regions common to the retroviral constructs, or to specific components of the library such as a rescue sequence, defined above, are used to "rescue" the unique random sequence Alternatively, the fusion protein is isolated using a rescue sequence Thus, for example, rescue sequences comprising epitope tags or purification sequences may be used to pull out the fusion protein using immunoprecipitation or affinity columns In some instances, as is outlined below, this may also pull out the primary target molecule, if there is a sufficiently strong binding interaction between the bioactive peptide and the target molecule Alternatively, the peptide may be detected using mass spectroscopy
Once rescued, the sequence of the bioactive peptide and/or fusion nucleic acid is determined This information can then be used in a number of ways
In a preferred embodiment, the bioactive peptide is resynthesized and remtroduced into the target cells, to verify the effect This may be done using retroviruses, or alternatively using fusions to the HIV-1 Tat protein, and analogs and related proteins, which allows very high uptake into target cells See for example, Fawell et al , PNAS USA 91 664 (1994), Frankel et al , Cell 55 1189 (1988), Savion et al , J Biol Chem 256 1149 (1981 ), Derossi et al , J Biol Chem 269 10444 (1994), and Baldm et al , EMBO J 9 1511 (1990), all of which are incorporated by reference
In a preferred embodiment, the sequence of a bioactive peptide is used to generate more candidate peptides For example, the sequence of the bioactive peptide may be the basis of a second round of (biased) randomization, to develop bioactive peptides with increased or altered activities Alternatively, the second round of randomization may change the affinity of the bioactive peptide Furthermore, it may be desirable to put the identified random region of the bioactive peptide into other presentation structures, or to alter the sequence of the constant region of the presentation structure, to alter the conformation/shape of the bioactive peptide It may also be desirable to "walk" around a potential binding site, in a manner similar to the mutagenesis of a binding pocket, by keeping one end of the ligand region constant and randomizing the other end to shift the binding of the peptide around
In a preferred embodiment, either the bioactive peptide or the bioactive nucleic acid encoding it is used to identify target molecules, i e the molecules with which the bioactive peptide interacts As will be appreciated by those in the art, there may be primary target molecules, to which the bioactive peptide binds or acts upon directly, and there may be secondary target molecules, which are part of the signalling pathway affected by the bioactive peptide, these might be termed "validated targets"
In a preferred embodiment, the bioactive peptide is used to pull out target molecules For example, as outlined herein, if the target molecules are proteins, the use of epitope tags or purification sequences can allow the purification of primary target molecules via biochemical means (co-immunoprecipitation, affinity columns, etc ) Alternatively, the peptide, when expressed in bacteria and purified, can be used as a probe against a bacterial cDNA expression library made from mRNA of the target cell type Or, peptides can be used as "bait" in either yeast or mammalian two or three hybrid systems Such interaction cloning approaches have been very useful to isolate DNA-bmdmg proteins and other interacting protein components The pep- tιde(s) can be combined with other pharmacologic activators to study the epistatic relationships of signal transduction pathways in question It is also possible to synthetically prepare labeled peptide and use it to screen a cDNA library expressed in bacteπophage for those cDNAs which bind the peptide Furthermore, it is also possible that one could use cDNA cloning via retroviral libraries to "complement" the effect induced by the peptide In such a strategy, the peptide would be required to be stochiometπcally titrating away some important factor for a specific signaling pathway If this molecule or activity is replenished by over-expression of a cDNA from within a cDNA library, then one can clone the target Similarly, cDNAs cloned by any of the above yeast or bacteπophage systems can be remtroduced to mammalian cells in this manner to confirm that they act to complement function in the system the peptide acts upon
Once primary target molecules have been identified, secondary target molecules may be identified in the same manner, using the primary target as the "bait" In this manner, signalling pathways may be elucidated Similarly, bioactive peptides specific for secondary target molecules may also be discovered, to allow a number of bioactive peptides to act on a single pathway, for example for combination therapies The screening methods of the present invention may be useful to screen a large number of cell types under a wide variety of conditions Generally, the host cells are cells that are involved in disease states, and they are tested or screened under conditions that normally result in undesirable consequences on the cells When a suitable bioactive peptide is found, the undesirable effect may be reduced or eliminated Alternatively, normally desirable consequences may be reduced or eliminated, with an eye towards elucidating the cellular mechanisms associated with the disease state or signalling pathway
In a preferred embodiment, the present methods are useful in cancer applications The ability to rapidly and specifically kill tumor cells is a cornerstone of cancer chemotherapy In general, using the methods of the present invention, random libraries can be introduced into any tumor cell (primary or cultured), and peptides identified which by themselves induce apoptosis, cell death, loss of cell division or decreased cell growth This may be done de novo, or by biased randomization toward known peptide agents, such as angiostatm, which inhibits blood vessel wall growth Alternatively, the methods of the present invention can be combined with other cancer therapeutics (e g drugs or radiation) to sensitize the cells and thus induce rapid and specific apoptosis, cell death, loss of cell division or decreased cell growth after exposure to a secondary agent Similarly, the present methods may be used in conjunction with known cancer therapeutics to screen for agonists to make the therapeutic more effective or less toxic This is particularly preferred when the chemotherapeutic is very expensive to produce such as taxol
Known oncogenes such as v-Abl, v-Src, v-Ras, and others, induce a transformed phenotype leading to abnormal cell growth when transfected into certain cells This is also a major problem with micro-metastases Thus, in a preferred embodiment, non-transformed cells can be transfected with these oncogenes, and then random libraries introduced into these cells, to select for bioactive peptides which reverse or correct the transformed state One of the signal features of oncogene transformation of cells is the loss of contact inhibition and the ability to grow in soft-agar When transforming viruses are constructed containing v-Abl, v-Src, or v-Ras in IRES-puro retroviral vectors, infected into target 3T3 cells, and subjected to puromycm selection, all of the 3T3 cells hyper-transform and detach from the plate The cells may be removed by washing with fresh medium This can serve as the basis of a screen, since cells which express a bioactive peptide will remain attached to the plate and form colonies
Similarly, the growth and/or spread of certain tumor types is enhanced by stimulatory responses from growth factors and cytokines (PDGF, EGF, Heregulm, and others) which bind to receptors on the surfaces of specific tumors In a preferred embodiment, the methods of the invention are used to inhibit or stop tumor growth and/or spread, by finding bioactive peptides capable of blocking the ability of the growth factor or cytokme to stimulate the tumor cell The introduction of random libraries into specific tumor cells with the addition of the growth factor or cytokme, followed by selection of bioactive peptides which block the binding, signaling, phenotypic and/or functional responses of these tumor cells to the growth factor or cytokme in question
Similarly, the spread of cancer cells (invasion and metastasis) is a significant problem limiting the success of cancer therapies The ability to inhibit the invasion and/or migration of specific tumor cells would be a significant advance in the therapy of cancer Tumor cells known to have a high metastatic potential (for example, melanoma, lung cell carcinoma, breast and ovarian carcinoma) can have random libraries introduced into them, and peptides selected which in a migration or invasion assay, inhibit the migration and/or invasion of specific tumor cells Particular applications for inhibition of the metastatic phenotype, which could allow a more specific inhibition of metastasis, include the metastasis suppressor gene NM23, which codes for a dmucleoside diphosphate kmase Thus intracellular peptide activators of this gene could block metastasis, and a screen for its upregulation (by fusing it to a reporter gene) would be of interest Many oncogenes also enhance metastasis Peptides which inactivate or counteract mutated RAS oncogenes, v-MOS, v-RAF, A-RAF, v-SRC, v-FES, and v-FMS would also act as anti-metastatics Peptides which act mtracellularly to block the release of combinations of proteases required for invasion, such as the matrix metalloproteases and urokmase, could also be effective antimetastatics
In a preferred embodiment, the random libraries of the present invention are introduced into tumor cells known to have inactivated tumor suppressor genes, and successful reversal by either reactivation or compensation of the knockout would be screened by restoration of the normal phenotype A major example is the reversal of p53-ιnactιvatιng mutations, which are present in 50% or more of all cancers Since p53's actions are complex and involve its action as a transcription factor, there are probably numerous potential ways a peptide or small molecule derived from a peptide could reverse the mutation One example would be upregulation of the immediately downstream cyclin-dependent kmase p21CIP1/WAF1 To be useful such reversal would have to work for many of the different known p53 mutations This is currently being approached by gene therapy, one or more small molecules which do this might be preferable
Another example involves screening of bioactive peptides which restore the constitutive function of the brca-1 or brca-2 genes, and other tumor suppressor genes important in breast cancer such as the adenomatous polyposis coli gene (APC) and the Drosophila discs-large gene (Dig), which are components of cell-cell junctions Mutations of brca-1 are important in hereditary ovarian and breast cancers, and constitute an additional application of the present invention
In a preferred embodiment, the methods of the present invention are used to create novel cell lines from cancers from patients A retrovirally delivered short peptide which inhibits the final common pathway of programmed cell death should allow for short- and possibly long-term cell lines to be established Conditions of in vitro culture and infection of human leukemia cells will be established There is a real need for methods which allow the maintenance of certain tumor cells in culture long enough to allow for physiological and pharmacological studies Currently, some human cell lines have been established by the use of transforming agents such as
Ebstem-Barr virus that considerably alters the existing physiology of the cell On occasion, cells will grow on their own in culture but this is a random event Programmed cell death (apoptosis) occurs via complex signaling pathways within cells that ultimately activate a final common pathway producing characteristic changes in the cell leading to a non-inflammatory destruction of the cell It is well known that tumor cells have a high apoptotic index, or propensity to enter apoptosis in vivo When cells are placed in culture, the in vivo stimuli for malignant cell growth are removed and cells readily undergo apoptosis The objective would be to develop the technology to establish cell lines from any number of primary tumor cells, for example primary human leukemia cells, in a reproducible manner without altering the native configuration of the signaling pathways in these cells By introducing nucleic acids encoding peptides which inhibit apoptosis, increased ceil survival in vitro, and hence the opportunity to study signalling transduction pathways in primary human tumor cells, is accomplished In addition, these methods may be used for culturing primary cells, i e non-tumor cells
In a preferred embodiment, the present methods are useful in cardiovascular applications In a preferred embodiment, cardiomyocytes may be screened for the prevention of cell damage or death in the presence of normally injurious conditions, including, but not limited to, the presence of toxic drugs (particularly chemotherapeutic drugs), for example, to prevent heart failure following treatment with adπamycin, anoxia, for example in the setting of coronary artery occlusion, and autoimmune cellular damage by attack from activated lymphoid cells (for example as seen in post viral myocarditis and lupus) Candidate bioactive peptides are inserted into cardiomyocytes, the cells are subjected to the insult, and bioactive peptides are selected that prevent any or all of apoptosis, membrane depolarization (i e decrease arrythmogenic potential of insult), cell swelling, or leakage of specific intracellular ions, second messengers and activating molecules (for example, arachidonic acid and/or lysophosphatidic acid) In a preferred embodiment, the present methods are used to screen for diminished arrhythmia potential in cardiomyocytes The screens comprise the introduction of the candidate nucleic acids encoding candidate bioactive peptides, followed by the application of arrythmogenic insults, with screening for bioactive peptides that block specific depolarization of cell membrane This may be detected using patch clamps, or via fluorescence techniques) Similarly, channel activity (for example, potassium and chloride channels) in cardiomyocytes could be regulated using the present methods in order to enhance contractility and prevent or dimmish arrhythmias
In a preferred embodiment, the present methods are used to screen for enhanced contractile properties of cardiomyocytes and dimmish heart failure potential The introduction of the libraries of the invention followed by measuring the rate of change of myosm polymeπzation/depolymeπzation using fluorescent techniques can be done Bioactive peptides which increase the rate of change of this phenomenon can result in a greater contractile response of the entire myocardium, similar to the effect seen with digitalis
In a preferred embodiment, the present methods are useful to identify agents that will regulate the intracellular and sarcolemmal calcium cycling in cardiomyocytes in order to prevent arrhythmias Bioactive peptides are selected that regulate sodium-calcium exchange, sodium proton pump function, and regulation of calcium-ATPase activity
In a preferred embodiment, the present methods are useful to identify agents that dimmish embo c phenomena in arteries and arteπoles leading to strokes (and other occlusive events leading to kidney failure and limb ischemia) and angina precipitating a myocardial mfarct are selected For example, bioactive peptides which will diminish the adhesion of platelets and leukocytes, and thus diminish the occlusion events Adhesion in this setting can be inhibited by the libraries of the invention being inserted into endothelial cells (quiescent cells, or activated by cytokines, i e IL-1 , and growth factors, i e PDGF / EGF) and then screening for peptides that either 1 ) downregulate adhesion molecule expression on the surface of the endothelial cells (binding assay), 2) block adhesion molecule activation on the surface of these cells (signaling assay), or 3) release in an autocπne manner peptides that block receptor binding to the cognate receptor on the adhering cell
Embo c phenomena can also be addressed by activating proteolytic enzymes on the cell surfaces of endothelial cells, and thus releasing active enzyme which can digest blood clots Thus, delivery of the libraries of the invention to endothelial cells is done, followed by standard fluorogenic assays, which will allow monitoring of proteolytic activity on the cell surface towards a known substrate Bioactive peptides can then be selected which activate specific enzymes towards specific substrates
In a preferred embodiment, arterial inflammation in the setting of vasculitis and post-infarction can be regulated by decreasing the chemotactic responses of leukocytes and mononuclear leukocytes This can be accomplished by blocking chemotactic receptors and their responding pathways on these cells Candidate bioactive libraries can be inserted into these cells, and the chemotactic response to diverse chemokmes (for example, to the IL-8 family of chemokmes, RANTES) inhibited in cell migration assays
In a preferred embodiment, arterial restenosis following coronary angioplasty can be controlled by regulating the proliferation of vascular mtimal cells and capillary and/or arterial endothelial cells Candidate bioactive peptide libraries can be inserted into these cell types and their proliferation in response to specific stimuli monitored One application may be intracellular peptides which block the expression or function of c-myc and other oncogenes in smooth muscle cells to stop their proliferation A second application may involve the expression of libraries in vascular smooth muscle cells to selectively induce their apoptosis Application of small molecules derived from these peptides may require targeted drug delivery, this is available with stents, hydrogel coatings, and infusion-based catheter systems Peptides which downregulate endothelιn-1 A receptors or which block the release of the potent vasoconstrictor and vascular smooth muscle cell mitogen endothelιn-1 may also be candidates for therapeutics Peptides can be isolated from these libraries which inhibit growth of these cells, or which prevent the adhesion of other cells in the circulation known to release autocπne growth factors, such as platelets (PDGF) and mononuclear leukocytes
The control of capillary and blood vessel growth is an important goal in order to promote increased blood flow to ischemic areas (growth), or to cut-off the blood supply (angiogenesis inhibition) of tumors Candidate bioactive peptide libraries can be inserted into capillary endothelial cells and their growth monitored Stimuli such as low oxygen tension and varying degrees of angiogenic factors can regulate the responses, and peptides isolated that produce the appropriate phenotype Screening for antagonism of vascular endothelial cell growth factor, important in angiogenesis, would also be useful
In a preferred embodiment, the present methods are useful in screening for decreases in atherosclerosis producing mechanisms to find peptides that regulate LDL and HDL metabolism Candidate libraries can be inserted into the appropriate cells (including hepatocytes, mononuclear leukocytes, endothelial cells) and peptides selected which lead to a decreased release of LDL or diminished synthesis of LDL, or conversely to an increased release of HDL or enhanced synthesis of HDL Bioactive peptides can also be isolated from candidate libraries which decrease the production of oxidized LDL, which has been implicated in atherosclerosis and isolated from atherosclerotic lesions This could occur by decreasing its expression, activating reducing systems or enzymes, or blocking the activity or production of enzymes implicated in production of oxidized LDL, such as 15-lιpoxygenase in macrophages
In a preferred embodiment, the present methods are used in screens to regulate obesity via the control of food intake mechanisms or diminishing the responses of receptor signaling pathways that regulate metabolism Bioactive peptides that regulate or inhibit the responses of neuropeptide Y (NPY), cholecystokinm and galanm receptors, are particularly desirable Candidate libraries can be inserted into cells that have these receptors cloned into them, and inhibitory peptides selected that are secreted in an autocπne manner that block the signaling responses to galanm and NPY In a similar manner, peptides can be found that regulate the leptm receptor
In a preferred embodiment, the present methods are useful in neurobiology applications Candidate libraries may be used for screening for anti-apoptotics for preservation of neuronal function and prevention of neuronal death Initial screens would be done in cell culture One application would include prevention of neuronal death, by apoptosis, in cerebral ischemia resulting from stroke Apoptosis is known to be blocked by neuronal apoptosis inhibitory protein (NAIP), screens for its upregulation, or effecting any coupled step could yield peptides which selectively block neuronal apoptosis Other applications include neurodegenerative diseases such as Alzheimer's disease and Hunt gton's disease
In a preferred embodiment, the present methods are useful in bone biology applications Osteoclasts are known to play a key role in bone remodeling by breaking down "old" bone, so that osteoblasts can lay down "new" bone In osteoporosis one has an imbalance of this process Osteoclast overactivity can be regulated by inserting candidate libraries into these cells, and then looking for bioactive peptides that produce 1) a diminished processing of collagen by these cells, 2) decreased pit formation on bone chips, and 3) decreased release of calcium from bone fragments
The present methods may also be used to screen for agonists of bone morphogenic proteins, hormone mimetics to stimulate, regulate, or enhance new bone formation (in a manner similar to parathyroid hormone and calαtonin, for example) These have use in osteoporosis, for poorly healing fractures, and to accelerate the rate of healing of new fractures Furthermore, cell lines of connective tissue origin can be treated with candidate libraries and screened for their growth, proliferation, collagen stimulating activity, and/or proline incorporating ability on the target osteoblasts Alternatively, candidate libraries can be expressed directly in osteoblasts or chondrocytes and screened for increased production of collagen or bone
In a preferred embodiment, the present methods are useful in skin biology applications Keratmocyte responses to a variety of stimuli may result in psoriasis, a prohferative change in these cells Candidate libraries can be inserted into cells removed from active psoπatic plaques, and bioactive peptides isolated which decrease the rate of growth of these cells
In a preferred embodiment, the present methods are useful in the regulation or inhibition of keloid formation (i e excessive scarring) Candidate libraries inserted into skin connective tissue cells isolated from individuals with this condition, and bioactive peptides isolated that decrease proliferation, collagen formation, or proline incorporation Results from this work can be extended to treat the excessive scarring that also occurs in burn patients If a common peptide motif is found in the context of the keloid work, then it can be used widely in a topical manner to dimmish scarring post burn
Similarly, wound healing for diabetic ulcers and other chronic "failure to heal" conditions in the skin and extremities can be regulated by providing additional growth signals to cells which populate the skin and dermal layers Growth factor mimetics may in fact be very useful for this condition Candidate libraries can be inserted into skin connective tissue cells, and bioactive peptides isolated which promote the growth of these cells under "harsh" conditions, such as low oxygen tension, low pH, and the presence of inflammatory mediators
Cosmeceutical applications of the present invention include the control of melanin production in skin melanocytes A naturally occurring peptide, arbutin, is a tyrosine hydroxylase inhibitor, a key enzyme in the synthesis of melanin Candidate libraries can be inserted into melanocytes and known stimuli that increase the synthesis of melanin applied to the cells Bioactive peptides can be isolated that inhibit the synthesis of melanin under these conditions
In a preferred embodiment, the present methods are useful in endocrinology applications The retroviral peptide library technology can be applied broadly to any endocrine, growth factor, cytokme or chemokme network which involves a signaling peptide or protein that acts in either an endocrine, paracπne or autocπne manner that binds or dimerizes a receptor and activates a signaling cascade that results in a known phenotypic or functional outcome The methods are applied so as to isolate a peptide which either mimics the desired hormone (i e , insulin, leptm, calαtonin, PDGF, EGF, EPO, GMCSF, IL1-17, mimetics) or inhibits its action by either blocking the release of the hormone, blocking its binding to a specific receptor or carrier protein (for example, CRF binding protein), or inhibiting the intracellular responses of the specific target cells to that hormone Selection of peptides which increase the expression or release of hormones from the cells which normally produce them could have broad applications to conditions of hormonal deficiency
In a preferred embodiment, the present methods are useful in infectious disease applications Viral latency (herpes viruses such as CMV, EBV, HBV, and other viruses such as HIV) and their reactivation are a significant problem, particularly in immunosuppressed patients ( patients with AIDS and transplant patients) The ability to block the reactivation and spread of these viruses is an important goal Cell lines known to harbor or be susceptible to latent viral infection can be infected with the specific virus, and then stimuli applied to these cells which have been shown to lead to reactivation and viral replication This can be followed by measuring viral titers in the medium and scoring cells for phenotypic changes Candidate libraries can then be inserted into these cells under the above conditions, and peptides isolated which block or dimmish the growth and/or release of the virus As with chemotherapeutics, these experiments can also be done with drugs which are only partially effective towards this outcome, and bioactive peptides isolated which enhance the virucidal effect of these drugs
One example of many is the ability to block HIV-1 infection HIV-1 requires CD4 and a co- receptor which can be one of several seven transmembrane G-protein coupled receptors In the case of the infection of macrophages, CCR-5 is the required co-receptor, and there is strong evidence that a block on CCR-5 will result in resistance to HIV-1 infection There are two lines of evidence for this statement First, it is known that the natural ligands for CCR-5, the CC chemokmes RANTES, MIP1a and MIP1 b are responsible for CD8+ mediated resistance to HIV Second, individuals homozygous for a mutant allele of CCR-5 are completely resistant to HIV infection Thus, an inhibitor of the CCR-5/HIV interaction would be of enormous interest to both biologists and clinicians The extracellular anchored constructs offer superb tools for such a discovery Into the transmembrane, epitope tagged, glycine-senne tethered constructs (ssTM V G20 E TM), one can place a random, cyclized peptide library of the general sequence CNNNNNNNNNNC or C-(X)n-C Then one infects a cell line that expresses CCR-5 with retroviruses containing this library Using an antibody to CCR-5 one can use FACS to sort desired cells based on the binding of this antibody to the receptor All cells which do not bind the antibody will be assumed contain inhibitors of this antibody binding site These inhibitors, in the retroviral construct can be further assayed for their ability to inhibit HIV-1 entry Viruses are known to enter cells using specific receptors to bind to cells (for example, HIV uses CD4, coronavirus uses CD13, murine leukemia virus uses transport protein, and measles virus usesCD44) and to fuse with cells (HIV uses chemokme receptor) Candidate libraries can be inserted into target cells known to be permissive to these viruses, and bioactive peptides isolated which block the ability of these viruses to bind and fuse with specific target cells
In a preferred embodiment, the present invention finds use with infectious organisms Intracellular organisms such as mycobacteπa, listena, salmonella, pneumocystis, yersmia, leishmania, T cruzi, can persist and replicate within cells, and become active in immunosuppressed patients There are currently drugs on the market and in development which are either only partially effective or ineffective against these organisms Candidate libraries can be inserted into specific cells infected with these organisms (pre- or post-infection), and bioactive peptides selected which promote the intracellular destruction of these organisms in a manner analogous to intracellular "antibiotic peptides" similar to magainms In addition peptides can be selected which enhance the cidal properties of drugs already under investigation which have insufficient potency by themselves, but when combined with a specific peptide from a candidate library, are dramatically more potent through a synergistic mechanism Finally, bioactive peptides can be isolated which alter the metabolism of these intracellular organisms, in such a way as to terminate their intracellular life cycle by inhibiting a key organismal event
Antibiotic drugs that are widely used have certain dose dependent, tissue specific toxicities For example renal toxicity is seen with the use of gentamicm, tobramycm, and amphoteπcin, hepatotoxicity is seen with the use of INH and πfampin, bone marrow toxicity is seen with chloramphenicol, and platelet toxicity is seen with ticarcillin, etc These toxicities limit their use Candidate libraries can be introduced into the specific cell types where specific changes leading to cellular damage or apoptosis by the antibiotics are produced, and bioactive peptides can be isolated that confer protection, when these cells are treated with these specific antibiotics
Furthermore, the present invention finds use in screening for bioactive peptides that block antibiotic transport mechanisms The rapid secretion from the blood stream of certain antibiotics limits their usefulness For example penicillins are rapidly secreted by certain transport mechanisms in the kidney and choroid plexus in the brain Probenecid is known to block this transport and increase serum and tissue levels Candidate agents can be inserted into specific cells derived from kidney cells and cells of the choroid plexus known to have active transport mechanisms for antibiotics Bioactive peptides can then be isolated which block the active transport of specific antibiotics and thus extend the serum halflife of these drugs In a preferred embodiment, the present methods are useful in drug toxicities and drug resistance applications Drug toxicity is a significant clinical problem This may manifest itself as specific tissue or cell damage with the result that the drug's effectiveness is limited Examples include myeloablation in high dose cancer chemotherapy, damage to epithelial cells lining the airway and gut, and hair loss Specific examples include adπamycin induced cardiomyocyte death, cisplatinm-induced kidney toxicity, vincπstine-induced gut motility disorders, and cyclospoπn-induced kidney damage Candidate libraries can be introduced into specific cell types with characteristic drug-induced phenotypic or functional responses, in the presence of the drugs, and agents isolated which reverse or protect the specific cell type against the toxic changes when exposed to the drug These effects may manifest as blocking the drug induced apoptosis of the cell of interest, thus initial screens will be for survival of the cells in the presence of high levels of drugs or combinations of drugs used in combination chemotherapy
Drug toxicity may be due to a specific metabolite produced in the liver or kidney which is highly toxic to specific cells, or due to drug interactions in the liver which block or enhance the metabolism of an administered drug Candidate libraries can be introduced into liver or kidney cells following the exposure of these cells to the drug known to produce the toxic metabolite Bioactive peptides can be isolated which alter how the liver or kidney cells metabolize the drug, and specific agents identified which prevent the generation of a specific toxic metabolite The generation of the metabolite can be followed by mass spectrometry, and phenotypic changes can be assessed by microscopy Such a screen can also be done in cultured hepatocytes, cocultured with readout cells which are specifically sensitive to the toxic metabolite Applications include reversible (to limit toxicity) inhibitors of enzymes involved in drug metabolism
Multiple drug resistance, and hence tumor cell selection, outgrowth, and relapse, leads to morbidity and mortality in cancer patients Candidate libraries can be introduced into tumor cell lines (primary and cultured) that have demonstrated specific or multiple drug resistance Bioactive peptides can then be identified which confer drug sensitivity when the cells are exposed to the drug of interest, or to drugs used in combination chemotherapy The readout can be the onset of apoptosis in these cells, membrane permeability changes, the release of intracellular ions and fluorescent markers The cells in which multidrug resistance involves membrane transporters can be preloaded with fluorescent transporter substrates, and selection carried out for peptides which block the normal efflux of fluorescent drug from these cells Candidate libraries are particularly suited to screening for peptides which reverse poorly characterized or recently discovered intracellular mechanisms of resistance or mechanisms for which few or no chemosensitizers currently exist, such as mechanisms involving LRP (lung resistance protein) This protein has been implicated in multidrug resistance in ovarian carcinoma, metastatic malignant melanoma, and acute myeloid leukemia Particularly interesting examples include screening for agents which reverse more than one important resistance mechanism in a single cell, which occurs in a subset of the most drug resistant cells, which are also important targets Applications would include screening for peptide inhibitors of both MRP (multidrug resistance related protein) and LRP for treatment of resistant cells in metastatic melanoma, for inhibitors of both p-glycoprotein and LRP in acute myeloid leukemia, and for inhibition (by any mechanism) of all three proteins for treating pan-resistant cells
In a preferred embodiment, the present methods are useful in improving the performance of existing or developmental drugs First pass metabolism of orally administered drugs limits their oral bioavailability, and can result in diminished efficacy as well as the need to administer more drug for a desired effect Reversible inhibitors of enzymes involved in first pass metabolism may thus be a useful adjunct enhancing the efficacy of these drugs First pass metabolism occurs in the liver, thus inhibitors of the corresponding catabolic enzymes may enhance the effect of the cognate drugs Reversible inhibitors would be delivered at the same time as, or slightly before, the drug of interest Screening of candidate libraries in hepatocytes for inhibitors (by any mechanism, such as protein downregulation as well as a direct inhibition of activity) of particularly problematical isozymes would be of interest These include the CYP3A4 isozymes of cytochrome P450, which are involved in the first pass metabolism of the anti-HIV drugs saqumavir and indmavir Other applications could include reversible inhibitors of UDP-glucuronyltransferases, sulfotransferases, N-acetyltransferases, epoxide hydrolases, and glutathione S-transferases, depending on the drug Screens would be done in cultured hepatocytes or liver microsomes, and could involve antibodies recognizing the specific modification performed in the liver, or cocultured readout cells, if the metabolite had a different bioactivity than the untransformed drug The enzymes modifying the drug would not necessarily have to be known, if screening was for lack of alteration of the drug
In a preferred embodiment, the present methods are useful in immunobiology, inflammation, and allergic response applications Selective regulation of T lymphocyte responses is a desired goal in order to modulate immune-mediated diseases in a specific manner Candidate libraries can be introduced into specific T cell subsets (TH1 , TH2, CD4+, CD8+, and others) and the responses which characterize those subsets (cytokme generation, cytotoxicity, proliferation in response to antigen being presented by a mononuclear leukocyte, and others) modified by members of the library Agents can be selected which increase or diminish the known T cell subset physiologic response This approach will be useful in any number of conditions, including 1) autoimmune diseases where one wants to induce a tolerant state (select a peptide that inhibits T cell subset from recognizing a self-antigen bearing cell), 2) allergic diseases where one wants to decrease the stimulation of IgE producing cells (select peptide which blocks release from T cell subsets of specific B-cell stimulating cytokines which induce switch to IgE production), 3) in transplant patients where one wants to induce selective immunosuppression (select peptide that diminishes prohferative responses of host T cells to foreign antigens), 4) in lymphoprohferative states where one wants to inhibit the growth or sensitize a specific T cell tumor to chemotherapy and/or radiation, 5) in tumor surveillance where one wants to inhibit the killing of cytotoxic T cells by Fas ligand bearing tumor cells, and 5) in T cell mediated inflammatory diseases such as Rheumatoid arthritis, Connective tissue diseases (SLE), Multiple sclerosis, and inflammatory bowel disease, where one wants to inhibit the proliferation of disease-causing T cells (promote their selective apoptosis) and the resulting selective destruction of target tissues (cartilage, connective tissue, oligodendrocytes, gut endothelial cells, respectively)
Regulation of B cell responses will permit a more selective modulation of the type and amount of immunoglobulin made and secreted by specific B cell subsets Candidate libraries can be inserted into B cells and bioactive peptides selected which inhibit the release and synthesis of a specific immunoglobulin This may be useful in autoimmune diseases characterized by the overproduction of auto antibodies and the production of allergy causing antibodies, such as IgE Agents can also be identified which inhibit or enhance the binding of a specific immunoglobulin subclass to a specific antigen either foreign of self Finally, agents can be selected which inhibit the binding of a specific immunoglobulin subclass to its receptor on specific cell types
Similarly, agents which affect cytokme production may be selected, generally using two cell systems For example, cytokme production from macrophages, monocytes, etc may be evaluated Similarly, agents which mimic cytokines, for example erythropoetm and IL1-17, may be selected, or agents that bind cytokines such as TNF-α, before they bind their receptor
Antigen processing by mononuclear leukocytes (ML) is an important early step in the immune system's ability to recognize and eliminate foreign proteins Candidate agents can be inserted into ML cell lines and agents selected which alter the intracellular processing of foreign peptides and sequence of the foreign peptide that is presented to T cells by MLs on their cell surface in the context of Class II MHC One can look for members of the library that enhance immune responses of a particular T cell subset (for example, the peptide would in fact work as a vaccine), or look for a library member that binds more tightly to MHC, thus displacing naturally occurring peptides, but nonetheless the agent would be less immunogenic (less stimulatory to a specific T cell clone) This agent would in fact induce immune tolerance and/or dimmish immune responses to foreign proteins This approach could be used in transplantation, autoimmune diseases, and allergic diseases
The release of inflammatory mediators (cytokines, leukotπenes, prostaglandms, platelet activating factor, histamme, neuropeptides, and other peptide and lipid mediators) is a key element in maintaining and amplifying aberrant immune responses Candidate libraries can be inserted into MLs, mast cells, eosmophils, and other cells participating in a specific inflammatory response, and bioactive peptides selected which inhibit the synthesis, release and binding to the cognate receptor of each of these types of mediators
In a preferred embodiment, the present methods are useful in biotechnology applications Candidate library expression in mammalian cells can also be considered for other pharmaceutical-related applications, such as modification of protein expression, protein folding, or protein secretion One such example would be in commercial production of protein pharmaceuticals in CHO or other cells Candidate libraries resulting in bioactive peptides which select for an increased cell growth rate (perhaps peptides mimicking growth factors or acting as agonists of growth factor signal transduction pathways), for pathogen resistance (see previous section), for lack of sialylation or glycosylation (by blocking glycotransferases or rerouting trafficking of the protein in the cell), for allowing growth on autoclaved media, or for growth in serum free media, would all increase productivity and decrease costs in the production of protein pharmaceuticals
Random peptides displayed on the surface of circulating cells can be used as tools to identify organ, tissue, and cell specific peptide targeting sequences Any cell introduced into the bloodstream of an animal expressing a library targeted to the cell surface can be selected for specific organ and tissue targeting The bioactive peptide sequence identified can then be coupled to an antibody, enzyme, drug, imaging agent or substance for which organ targeting is desired
Other agents which may be selected using the present invention include 1) agents which block the activity of transcription factors, using cell lines with reporter genes, 2) agents which block the interaction of two known proteins in cells, using the absence of normal cellular functions, the mammalian two hybrid system or fluorescence resonance energy transfer mechanisms for detection, and 3) agents may be identified by tethering a random peptide to a protein binding region to allow interactions with molecules steπcally close, i e within a signalling pathway, to localize the effects to a functional area of interest The following examples serve to more fully describe the manner of using the above-described invention, as well as to set forth the best modes contemplated for carrying out various aspects of the invention It is understood that these examples in no way serve to limit the true scope of this invention, but rather are presented for illustrative purposes All references cited herein are incorporated by reference in their entireity
EXAMPLES
Example 1 Selection of loop insertion sites
One example concerns the insertion of sequences of the compostion linker-test sequence-linker into defined sites within engineered GFP loops most likely to tolerate insertions These loops were selected based on having mobility in the loop or tip of the loop well above that of the most rigid parts of the beta-can structure (Yang et al , Nature Biotechnology 14, 1246-9, 1996, Ormo et al , Science 273, 1392-5, 1996) The loops of most interest are those which are not rigidly coupled to the beta-can structure of the rest of GFP, this lack of rigid coupling may allow the most tolerance for sequence additions within the loops in a library construct Loops can be selected as those which have the highest temperature factors in the crystal structures, and include loops 130-135, 154-159, 172-175, 188-193, and 208-216 in a GFP monomer The temperature factor of the loop can be artificially increased by including flexible am o acids such as glycine in the linkers (see below)
The most promising insert sites were selected by removing residues at the termini of the loops whose side chains extended into solution and did not contact either the GFP β-can or other parts of the loops Loop residues whose side chains bound to other parts of GFP were left unreplaced so as to minimize the likelihood of strong conformational coupling between the random sequences and GFP, which could lead to misfolded protein and/or could diminish the number of fluorescent GFP-fused random peptides by distorting the base of the loop and allowing collisional quenchers access to the fluorophore
loop insert location
1 replace asp 133 with insert, can't remove glu 132 as carboxylate binds to other residue side chains, this is a very short loop 2 replace gin 157 and lys 156 with entire insert, lys 156 and gin 157 side chains protrude into solution, lys 158 ion pairs with asp 155 to help close loop so these are generally retained, avoid removing asn 159 as it contacts the mam protein body in a number of spots
3 replace asp 173 with insert, as it is at the outer end of the loop, avoid replacing glu 172 as side chain contacts other side chains in the folded structure, could replace gly 174 too
4 replace residues 189-192 (gly-asp-gly-pro) with insert, this is not so much a loop as a strand connecting two separated chains, P192, G191 , D190 and G189 all protrude into solution and don't appear to form tight contacts with the mam protein body, so they appear replaceable
5 replace asn 212, glu 213 and lys 214 with insert, lys 214 side chain protrudes out into solution, glu 213 helps form the turn as it's side chain binds other side chains in the loop, thus its replacement may cause problems in maintaining a native loop conformation, asn 212 side chain protrudes into solution
Example 2
Selection of a test insert sequence
To allow a maximal number of different loop inserts or replacements in GFP to fold properly into a fluorescent GFP construct, it may be important to carefully select the linker sequences between the native GFP structure and the inserted sequences making up the actual library inserted into the loop One way to prevent problems in GFP folding is to conformationally decouple any insert sequence from the GFP structure itself, to minimize local distortions in GFP structure which could either destabilize folding intermediates or could allow access to GFP's buried tπpeptide fluorophore of exogenous collisional fluorescence quenchers (Phillips, supra) This can be done by inserting multiple highly flexible ammo acid residues between GFP and the library, which impose minimal conformational constraints on the GFP One or more glycines are ideal for this purpose, as glycine accesses significantly more phi-psi space than even alanine, and is much less restricted than residues with longer side chains (Scheraga, H A , (1992), "Predicting three-dimensional structures of oligopeptides", in Reviews in Computational Chemstry III, p 73-142) Thus to optimize the chances of the loop inserts not affecting GFP structure, -(gly)n- is inserted between these two sequences at each loop containing a library Minimally n=1 , but more optimally n > 2 The initial two test inserts were 1 -GGGGYPYDVPDYASLGGGG- and 2 -GGGG-YPYD- GGGG- The first sequence was an 19mer insert (approximately the intended library size) with the influenza hemagglutinin (HA) epitope tag embedded, with glycines added to each end to match the epitope inserted into the dimeπzer-folded scaffold, and to add flexibility to the epitope to allow a conformation which binds to polyclonal antisera This allowed estimation by Western blotting of the expression level of the different constructs The second insert is truncated to examine the effect on GFP fluorescence of a shorter peptide
Example 3 Mean fluorescence of GFP with test inserts 1 and 2 in loops 1-5, expressed in E coli
The GFP used is EGFP (Clontech Inc , Palo Alto, CA) and the two test sequences were inserted at the sites indicated in example 1 An equal number of bacteria (20000) representing clones of a single colonies were analyzed by fluorescence-activated cell sorting on a MoFlo cell sorter (Cytomation Inc , Ft Collins, CO) Intensity of FL1 was averaged The relative fluorescence intensity was calculated as (WT fluorescence - fluorescence of loop ιnsert)/(WT fluorescence - bkd) x 100% Constructs with insert 1 in loops 1 and 5 were not expressed due to cloning difficulties Equal amounts of cell lysate from each loop insert were run on a 10% SDS gel and blotted to PVDF GFP was detected with anti-GFP antibody and the bands were observed using chemiluminescent detection The intensity of individual bands was measured using a Sharp JX-330 scanning densitomer and Biolmage software The specific fluorescence was calculated as the ratio of the relative fluorescence to the relative intensity of the Western blot band
Table 1 Mean fluorescence of GFP with different insertion sequences in loops 1-5
Figure imgf000083_0001
insert 1 -GGGG-YPYDVPDYASL-GGGG- 2 -GGGG-YPYD-GGGG-
The results in Table 1 show that in E coli, the defined loop 2, 3 and 4 insertion sites support GFP folding and fluorescence for both the 12mer and 19mer inserts while inserts in sites 1 and 5 allow expression of GFP without fluorescence for the 12mer insert Libraries in these sites may thus be useful for screening using other methods for selecting positives than GFP fluorescence For insertion sites 2, 3 and 4 the fluorescence for a 12mer insert with multiple glycines at each end is at least 10% of that of wild type GFP The highest fluorescence for the 12mer insert was obtained with insertion in the loop 3 site, while the lowest was obtained from loop 4 This appeared to be due to differing expression levels for each construct For the larger 19mer insert, the highest fluorescence was again obtained with insertion in the loop 3 site, while the lowest was obtained from insertion into the loop 2 site, again due to higher apparent expression levels for the loop 3 insert GFP Again, the highest specific fluorescence was obtained with loop 4 This suggests that libraries inserted into loop 4, combined with strong promoters to enhance expressed levels of the GFP-hbrary members, will allow screening of these libraries as well as loop 2 and 3 libraries For the19mer insert sequence the loop 2, 3 and 4 inserts all give fluorescence of at least 1 % of wild type, and thus should allow screening of libraries in all three loops
The Western blot results suggest that shorter inserts in loops 1 and 5 allow GFP expression at levels as high or higher than those of loops 2 and 4, albeit without fluorescence Thus random peptide libraries inserted into these loops can be used to screen cells for phenotypic changes, but the screen for the presence of the library member will have to rely on some property other than GFP fluorescence, such as a readout reflecting a phenotypic change in the cell itself
Example 4
Mean fluorescence of GFP with test inserts 1 and 2 in loops 2-4, when expressed in Jurkat E cells Insert sequences identical to those shown in example 3 above were used with GFP when expressed in Jurkat E cells GFP was expressed using the LTR of the retroviral expression vector, and the Jurkats were infected using Phoenix 293 helper cells After 48 hours of infection, the Jurkats were subjected to FACS analysis using a Becton-Dickmson FACSCAN cell sorter For each insert 10 cells were gated using forward- vs side-scatter selection to isolate live cells Live cells were selected in a second round using propidium iodide fluorescence, and were then sorted in FL1 on the intensity of their GFP fluorescence The infection levels of the Jurkat cells with the different constructs were in the range of 30 1%-44 9%, giving on average one peptide construct inserted per cell
Table 2 Geometric mean fluorescence of GFP with different insertion sequences in loops 2-4 Jurkat cells
Figure imgf000084_0001
insert 1 -GGGG-YPYDVPDYASL-GGGG- insert 2 -GGGG-YPYD-GGGG-
These results show that the designed insertion sites in loops 2-4 retain a high level of GFP fluorescence when the inserts are flanked by multiple glycines in the tetrapeptide linkers Thus an insert of 19 residues appears to retain high levels of fluorescence, suggesting that all three loops will allow insertion of random peptide libraries and their screening Such screening should require only a level of fluorescence distinguishable from background or one decade up in FL1
The successful observation of fluorescence of nearly 10% or more of wild type in GFP with both sequences in the loop 2 insertion site was not seen by Abedi et al (1998) and suggests that inclusion of the glycine linkers on either side of the insert sequence, combined with excision of residues at the tip of the loop, may make this loop a unique and useful site for insertion of random library sequences The high levels of relative fluorescence for inserts 1 and 2 in loops 2-4 suggest that the tetraglycme linkers will allow successful insertion of random peptide libraries into these particular sites, shorter libraries may be preferred
Example 5
Mean fluorescence of GFP with test inserts 1 and 2 in loops 2-4, when expressed in Phoenix 293 cells Insert sequences identical to those shown in example 3 above were used with GFP when expressed in Phoenix 293 cells GFP was expressed using the 96 7 CMV-promoter driven CRU-5 retroviral expression vector in transfected Phoenix 293 cells The transfection efficiency was 40- 45% After 48 hours of transfection, the Phoenix 293 cells were subjected to FACS analysis using a Becton-Dickinson FACSCAN cell sorter For each insert approximately 10 4 cells were gated using forward- vs side-scatter selection to isolate live cells Live cells were selected in a second round using propidium iodide fluorescence, and were then sorted in FL1 on the intensity of their GFP fluorescence The transfection efficiency for all constructs reported was in the range of 24- 42%, giving on average one plasmid/cell expressing the GFP construct
Table 3 Geometric mean fluorescence of GFP with different insertion sequences in loops 2-4 Phoenix 293 cells
Figure imgf000085_0001
insert 1 -GGGG-YPYDVPDYASL-GGGG- 2 -GGGG-YPYD-GGGG-
The numbers for the relative fluorescence of the loop 2, 3, and 4 inserts are derived from the average value + 1 standard deviation for 1-2 independent clones with the specified insert The specific fluorescence is the ratio of the relative fluorescence to the Western blot relative intensity The standard deviation of the relative fluorescence was calculated as [fluorescence of insert/fluorescence of WT {(std dev of insert fluorescence/insert fluorescence)2 + (std dev of WT fluorescence WT fluorescence) 2}] 05 (Bevιngton, P 1969 Data reduction and error analysis for the physical sciences New York McGraw Hill, p 61-2) Data with an asterisk* was derived from cells with a 60-70% transfection efficiency and so can only be qualitatively compared with the rest of the data
These results for 293 cells show that in these cells the designed insertion sites in loops 2-4 retain a very high level of GFP fluorescence when the inserts are flanked by multiple glycines in the tetrapeptide linkers, in some cases higher than wild type GFP fluorescence Thus both inserts of 19 and 12 residues retain high levels of fluorescence, suggesting that all three loops will allow insertion of random peptide libraries and their screening, and that libraries in all three loops are roughly equivalent The high level of relative fluorescence of loop 3 appears to be mainly due to a higher expression level than the GFP construct with inserts in loops 1 and 2, although the expression levels of all 3 loop-inserts are at least 19% of the wild type GFP levels Since the specific fluorescence of both inserts in loops 2 and 4 is greater than the insert in loop 3, a higher level of expression could compensate for the overall lower level of fluorescence of these loop 2 and 4 inserts Since expression of these constructs is with a stronger promoter than expression in E coli or Jurkat cells, this also suggests that use of stronger promoters than the retroviral LTR or promoter in E coli will make more loop insertion sites usable for screens

Claims

CLAIMS We claim
1 A library of fusion proteins each comprising a) a scaffold protein, b) a random peptide fused to the N-terminus of a scaffold protein, wherein each of said random peptides is different, and c) a presentation structure that will present said peptide in a conformationally restricted form
2 A library of fusion proteins each comprising a) a scaffold protein, b) a random peptide fused to the C-terminus of a scaffold protein, wherein each of said random peptides is different, and c) a presentation structure that will present said peptide in a conformationally restricted form
3 A library of fusion proteins each comprising a) a scaffold protein, b) a random peptide inserted into a scaffold protein, wherein each of said random peptides is different, and c) at least one fusion partner
4 A library of fusion proteins according to claim 3 wherein said fusion partner is a linker between said random peptide and said scaffold protein
5 A library of fusion proteins according to claim 1 , 2, 3 and 4 wherein said scaffold protein is a green fluorescent protein (GFP)
6 A library of fusion proteins according to claim 4, 5 and 6 wherein said linker comprises -(gly)n-, wherein n ≥2
7 A library of fusion proteins according to claim 3, 4, 5 and 6 further comprising a second linker between the other end of said random peptide and said scaffold protein
8 A library of fusion proteins according to claim 3, 4, 5, 6 and 7 wherein said fusion partner is a presentation structure capable of presenting said peptide in a conformationally restricted form 9 A library of fusion proteins according to claim 1 , 2, 3, 4, 5, 6, 7 and 8 wherein said random peptide replaces at least one ammo acid of said scaffold protein
10 A library of fusion proteins according to claim 5, 6, 7, 8 and 9 wherein said GFP is from Aequorea and wherein said random peptide is inserted into the loop comprising ammo acids 130
Figure imgf000088_0001
11 A library of fusion proteins according to claim 5, 6, 7, 8 and 9 wherein said GFP is from Aequorea and wherein said random peptide is inserted into the loop comprising ammo acids 154
Figure imgf000088_0002
12 A library of fusion proteins according to claim 5, 6, 7, 8 and 9 wherein said GFP is from Aequorea and wherein said random peptide is inserted into the loop comprising ammo acids 172
Figure imgf000088_0003
13 A library of fusion proteins according to claim 5, 6, 7, 8 and 9 wherein said GFP is from Aequorea and wherein said random peptide is inserted into the loop comprising ammo acids 188
Figure imgf000088_0004
14 A library of fusion proteins according to claim 5, 6, 7, 8 and 9 wherein said GFP is from Aequorea and wherein said random peptide is inserted into the loop comprising ammo acids 208
Figure imgf000088_0005
15 A library of fusion proteins according to claim 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13 or 14 wherein said scaffold is a β-lactamase
16 A library of fusion proteins according to claim 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13 or 14 wherein said scaffold is a DHFR
17 A library of fusion proteins according to claim 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13 or 14 wherein said scaffold is a luciferase
18 A library of fusion proteins according to claim 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13 or 14 wherein said scaffold is a GFP from a Renilla species
19 A library of fusion nucleic acids each comprising a) nucleic acid encoding a random peptide, b) nucleic acid encoding a scaffold protein, and c) nucleic acid encoding a fusion partner, wherein said nucleic acid encoding said random peptide is inserted internally into said nucleic acid encoding said scaffold protein
20 A library of retroviral vectors comprising the fusion nucleic acid of claim 19
21 A library of host cells comprising the fusion nucleic acids of claim 19
22 A method of screening for bioactive peptides confenng a particular phenotype comprising a) providing cells containing a fusion nucleic acid comprising i) nucleic acid encoding a random peptide, II) nucleic acid encoding a scaffold protein, and in) nucleic acid encoding a fusion partner, wherein said nucleic acid encoding said random peptide is inserted internally into said nucleic acid encoding said scaffold protein
23 A method according to claim 22 wherein said providing is accomplished by transfectmg said cells with a retroviral vector comprising said fusion nucleic acid
24 A method according to claim 22 and 23 wherein said scaffold is a GFP
PCT/US1999/023715 1998-10-08 1999-10-08 Fusions of scaffold proteins with random peptide libraries WO2000020574A2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
AU15164/00A AU768126B2 (en) 1998-10-08 1999-10-08 Fusions of scaffold proteins with random peptide libraries
DE69936103T DE69936103T2 (en) 1998-10-08 1999-10-08 FUSION PROTEINS CONSISTING OF PROTEIN AND LIBRARIES OF RANDOM PEPTIDES
EP99957466A EP1119617B1 (en) 1998-10-08 1999-10-08 Fusions of scaffold proteins with random peptide libraries
CA002345215A CA2345215A1 (en) 1998-10-08 1999-10-08 Fusions of scaffold proteins with random peptide libraries
JP2000574670A JP2002526108A (en) 1998-10-08 1999-10-08 Fusion of scaffold protein with random peptide library

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/169,015 US6180343B1 (en) 1998-10-08 1998-10-08 Green fluorescent protein fusions with random peptides
US09/169,015 1998-10-08

Publications (3)

Publication Number Publication Date
WO2000020574A2 WO2000020574A2 (en) 2000-04-13
WO2000020574A3 WO2000020574A3 (en) 2000-09-21
WO2000020574A9 true WO2000020574A9 (en) 2001-06-21

Family

ID=22613927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/023715 WO2000020574A2 (en) 1998-10-08 1999-10-08 Fusions of scaffold proteins with random peptide libraries

Country Status (8)

Country Link
US (5) US6180343B1 (en)
EP (1) EP1119617B1 (en)
JP (1) JP2002526108A (en)
AT (1) ATE362528T1 (en)
AU (1) AU768126B2 (en)
CA (1) CA2345215A1 (en)
DE (1) DE69936103T2 (en)
WO (1) WO2000020574A2 (en)

Families Citing this family (87)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020110834A1 (en) * 1994-11-04 2002-08-15 Benkovic Stephen J. Fluorescent assay for proteolysis
US6969584B2 (en) * 1997-06-12 2005-11-29 Rigel Pharmaceuticals, Inc. Combinatorial enzymatic complexes
US6709814B1 (en) * 1998-04-02 2004-03-23 Rigel Pharmaceuticals, Inc. Peptides causing formation of compact structures
US7090976B2 (en) 1999-11-10 2006-08-15 Rigel Pharmaceuticals, Inc. Methods and compositions comprising Renilla GFP
US6180343B1 (en) * 1998-10-08 2001-01-30 Rigel Pharmaceuticals, Inc. Green fluorescent protein fusions with random peptides
US6936421B2 (en) * 1998-10-08 2005-08-30 Rigel Pharmaceuticals, Inc. Structurally biased random peptide libraries based on different scaffolds
CA2346122C (en) * 1998-10-13 2013-01-22 The University Of Georgia Research Foundation, Inc. Stabilized bioactive peptides and methods of identification, synthesis and use
US20030190740A1 (en) * 1998-10-13 2003-10-09 The University Of Georgia Research Foundation, Inc Stabilized bioactive peptides and methods of identification, synthesis, and use
US6673554B1 (en) * 1999-06-14 2004-01-06 Trellie Bioinformatics, Inc. Protein localization assays for toxicity and antidotes thereto
US20030166003A1 (en) * 1999-06-14 2003-09-04 Cochran Andrea G. Structured peptide scaffold for displaying turn libraries on phage
ATE296350T1 (en) * 1999-06-14 2005-06-15 Genentech Inc STRUCTURED PEPTIDE SCAFFOLD FOR DISPLAYING TURNED LIBRARIES ON PHAGE
WO2001009177A2 (en) * 1999-07-29 2001-02-08 Whitehead Institute For Biomedical Research Affinity fluorescent proteins and uses thereof
WO2001029225A1 (en) * 1999-10-21 2001-04-26 Panorama Research, Inc. A general method for optimizing the expression of heterologous proteins
US7109315B2 (en) * 2000-03-15 2006-09-19 Bruce J. Bryan Renilla reniformis fluorescent proteins, nucleic acids encoding the fluorescent proteins and the use thereof in diagnostics, high throughput screening and novelty items
WO2001075178A2 (en) * 2000-04-04 2001-10-11 Enanta Pharmaceuticals, Inc. Methods for identifying peptide aptamers capable of altering a cell phenotype
CN100360556C (en) * 2000-04-07 2008-01-09 利兹创新有限公司大学 Hepatitis B core antigen fusion proteins
US6780599B2 (en) 2000-05-12 2004-08-24 Yale University Methods of detecting interactions between proteins, peptides or libraries thereof using fusion proteins
US7083945B1 (en) * 2000-10-27 2006-08-01 The Board Of Regents Of The University Of Texas System Isolation of binding proteins with high affinity to ligands
AU2002221568B2 (en) 2000-12-13 2005-06-02 Anaphore, Inc. Combinatorial libraries of proteins having the scaffold structure of C-type lectin-like domains
EP2284270A3 (en) * 2000-12-13 2012-07-25 Anaphore, Inc. Method for the identification and isolation of binding polypeptides from combinatorial libraries of proteins having the scaffold structure of C-type lectin-like domains
US20030036049A1 (en) * 2001-01-03 2003-02-20 Medis El Ltd. Optical method for testing sensitivity of cells
AU2002365903A1 (en) * 2001-02-09 2003-09-04 California Institute Of Technology Method for the generation of proteins with new enzymatic function
EP1373496A4 (en) * 2001-02-22 2004-10-27 Praecis Pharm Inc Methods for identifying peptides which modulate a biological process
US6914123B2 (en) * 2001-04-17 2005-07-05 Genentech, Inc. Hairpin peptides with a novel structural motif and methods relating thereto
EP1497654A4 (en) 2001-08-13 2006-06-07 Chen Swey Shen Alex Immunoglobulin e vaccines and methods of use thereof
FR2830020B1 (en) * 2001-09-25 2003-12-19 Univ Victor Segalen Bordeaux 2 NUCLEIC ACID FOR OBTAINING A POLYNUCLEOTIDE ENCODING A DETECTION AND ISOLATION PROTEIN OF A "PREY" LIGAND AND APPLICATIONS OF THIS NUCLEIC ACID
US20030134287A1 (en) * 2002-01-16 2003-07-17 Xianqiang Li Method for isolating and characterizing short-lived proteins
AU2003209272A1 (en) * 2002-01-16 2003-09-02 Zyomyx, Inc. Engineered binding proteins
US7056665B2 (en) * 2002-01-16 2006-06-06 Panomics, Inc. Screening methods involving the detection of short-lived proteins
AU2003205830A1 (en) * 2002-01-23 2003-09-02 Mohamed Raafat El-Gewely Molecular libraries
WO2003078575A2 (en) * 2002-03-11 2003-09-25 The Johns Hopkins University Molecular switches and methods for making and using the same
WO2003089464A1 (en) * 2002-04-19 2003-10-30 Bioimage A/S Two green fluorescent protein fragments and their use in a method for detecting protein - protein interactions
US20030219723A1 (en) * 2002-05-20 2003-11-27 Lu Henry H. Compositions and methods for screening and identifying anti-HCV agents
EP1539786A4 (en) * 2002-07-10 2006-09-06 Stratagene California Humanized renilla reniformis green fluorescent protein as a scaffold
DK1534819T3 (en) 2002-08-21 2010-04-19 Revivicor Inc Pig-like animals lacking any expression of functional alpha-1,3-galactosyltransferase
US9453251B2 (en) 2002-10-08 2016-09-27 Pfenex Inc. Expression of mammalian proteins in Pseudomonas fluorescens
US7323303B2 (en) * 2003-03-31 2008-01-29 Hong Kong Polytechnic University Modified β-lactamases and uses thereof
US20050014932A1 (en) * 2003-05-15 2005-01-20 Iogenetics, Llc Targeted biocides
US7566447B2 (en) 2003-05-15 2009-07-28 Iogenetics, Llc Biocides
US8703134B2 (en) 2003-05-15 2014-04-22 Iogenetics, Llc Targeted cryptosporidium biocides
US8394379B2 (en) * 2003-05-15 2013-03-12 Iogenetics, Llc Targeted cryptosporidium biocides
ATE447611T1 (en) * 2003-08-18 2009-11-15 Univ California POLYPEPTIDE DISPLAY LIBRARIES AND METHOD FOR THE PRODUCTION AND USE THEREOF
US8257963B2 (en) * 2007-06-01 2012-09-04 Depuy Mitek, Inc. Chondrocyte container and method of use
US7897384B2 (en) * 2003-09-08 2011-03-01 Ethicon, Inc. Chondrocyte therapeutic delivery system
US7927599B2 (en) * 2003-09-08 2011-04-19 Ethicon, Inc. Chondrocyte therapeutic delivery system
US8338138B2 (en) * 2004-01-28 2012-12-25 The John Hopkins University Methods for making and using molecular switches involving circular permutation
US8034773B2 (en) * 2004-02-05 2011-10-11 Arizona Biomedical Research Commission Immunostimulatory compositions and uses thereof
US7745196B1 (en) 2004-03-25 2010-06-29 Rigel Pharmaceuticals, Inc. Methods and compositions for identifying peptide modulators of cell surface receptors
EP1749022A2 (en) * 2004-05-24 2007-02-07 Rigel Pharmaceuticals, Inc. Methods for cyclizing synthetic polymers
US8603824B2 (en) 2004-07-26 2013-12-10 Pfenex, Inc. Process for improved protein expression by strain engineering
EP1812550A2 (en) * 2004-10-20 2007-08-01 Iogenetics, Llc Biocides
JP4937138B2 (en) 2005-01-05 2012-05-23 エフ−シュタール・ビオテヒノロギシェ・フォルシュングス−ウント・エントヴィックルングスゲゼルシャフト・ミット・ベシュレンクテル・ハフツング Synthetic immunoglobulin domains with binding properties designed in a region of the molecule that is different from the complementarity-determining region
US8128952B2 (en) * 2005-01-12 2012-03-06 Clemson University Research Foundation Ligand-mediated controlled drug delivery
CA2617930A1 (en) 2005-08-09 2007-03-29 Revivicor, Inc. Transgenic ungulates expressing ctla4-ig and uses thereof
WO2007027935A2 (en) 2005-08-31 2007-03-08 The Regents Of The University Of California Cellular libraries of peptide sequences (clips) and methods of using the same
CN100355950C (en) * 2005-09-01 2007-12-19 南方医科大学 Polypeptide random library and its construction method, and method for screening polypeptide capable of penetrating cell from the library
JP2007197435A (en) * 2005-12-28 2007-08-09 Canon Inc Gold-binding protein
AT503474A1 (en) * 2006-03-29 2007-10-15 Univ Wien Bodenkultur PEPTIDE LIBRARY
KR100783670B1 (en) * 2006-05-22 2007-12-07 한국표준과학연구원 Method for Screening Drug Candidates by Using Domain Protein
AT503902B1 (en) * 2006-07-05 2008-06-15 F Star Biotech Forsch & Entw METHOD FOR MANIPULATING IMMUNE LOBULINS
AT503889B1 (en) 2006-07-05 2011-12-15 Star Biotechnologische Forschungs Und Entwicklungsges M B H F MULTIVALENT IMMUNE LOBULINE
US7798090B2 (en) * 2007-01-05 2010-09-21 Thomas Angell Hatfield Rescue and locational determination equipment
US20100004134A1 (en) * 2007-01-12 2010-01-07 Lawrence Horowitz Combinatorial libraries of conformationally constrained polypeptide sequences
WO2008128144A2 (en) * 2007-04-15 2008-10-23 Shuang Zhang Monoclonal antibody selecting system, and making and using thereof
US9580719B2 (en) 2007-04-27 2017-02-28 Pfenex, Inc. Method for rapidly screening microbial hosts to identify certain strains with improved yield and/or quality in the expression of heterologous proteins
BRPI0810120A2 (en) 2007-04-27 2014-11-11 Dow Global Technologies Inc PROCESS TO QUICKLY SELECT MICROBIAN HOST FOR THE IDENTIFICATION OF CERTAIN BETTER YIELDS AND / OR QUALITY IN EXPRESSION OF HETEROLOGICAL PROTEINS
EP2000952B1 (en) 2007-05-31 2013-06-12 Industrial Technology Research Institute Smoke detecting method and device
JP5602625B2 (en) 2007-06-26 2014-10-08 エフ−スター ビオテヒノロギッシェ フォルシュングス− ウント エントヴィッケルングスゲゼルシャフト ミット ベシュレンクテル ハフツング Binding substance display
US8293685B2 (en) 2007-07-26 2012-10-23 The Regents Of The University Of California Methods for enhancing bacterial cell display of proteins and peptides
WO2009086132A2 (en) 2007-12-20 2009-07-09 University Of Southern California Design of spacers to increase the expression of recombinant fusion proteins
EP2113255A1 (en) 2008-05-02 2009-11-04 f-star Biotechnologische Forschungs- und Entwicklungsges.m.b.H. Cytotoxic immunoglobulin
RU2636046C2 (en) * 2009-01-12 2017-11-17 Сайтомкс Терапьютикс, Инк Modified antibodies composition, methods of production and application
EP2398494A4 (en) 2009-02-23 2015-10-28 Cytomx Therapeutics Inc Proproteins and methods of use thereof
CA2760153A1 (en) * 2009-04-27 2010-11-11 Alex Chenchik Reagents and methods for producing bioactive secreted peptides
WO2010127247A1 (en) * 2009-05-01 2010-11-04 University Of Utah Research Foundation Methods and compositions for measuring high affinity interactions with kinetic imaging of single molecule interaction (kismi)
US20120321697A1 (en) * 2009-12-11 2012-12-20 Gwangju Institute Of Science And Technology Bpb-based cargo delivery system
JP5677454B2 (en) * 2009-12-11 2015-02-25 グワンジュ・インスティテュート・オブ・サイエンス・アンド・テクノロジー Bidentate peptide binder for intracellular target binding
NZ601743A (en) 2010-02-12 2014-11-28 Oncomed Pharm Inc Methods for identifying and isolating cells expressing a polypeptide
KR101294982B1 (en) * 2011-12-09 2013-08-08 경북대학교 산학협력단 Fusionpolypeptide comprising protein scaffold and method for screening the peptide library specific for target protein using the same
US10093922B2 (en) 2012-12-19 2018-10-09 Helge Zieler Compositions and methods for creating altered and improved cells and organisms
US10077441B2 (en) 2014-01-09 2018-09-18 Primordial Genetics Inc. Methods and compositions for creating altered and improved cells and organisms
EP3666897A4 (en) * 2017-07-31 2020-11-04 The University Of Tokyo Super versatile method for presenting cyclic peptide on protein structure
EA202190397A1 (en) * 2018-07-31 2021-06-11 Дзе Юниверсити Оф Токио HIGHLY UNIVERSAL METHOD FOR GIVING AN ANTIBODY WITH A NEW BINDING SPECIFICITY
CN112469736A (en) * 2018-08-01 2021-03-09 国立大学法人鹿儿岛大学 Peptide fusion proteins
WO2021035325A1 (en) 2019-08-27 2021-03-04 Fundação Oswaldo Cruz Protein receptacle, polynucleotide, vector, expression cassette, cell, method for producing the receptacle, method of identifying pathogens or diagnosing diseases, use of the receptacle and diagnostic kit
US20220332780A1 (en) 2019-09-10 2022-10-20 Obsidian Therapeutics, Inc. Ca2-il15 fusion proteins for tunable regulation
CN117092084B (en) * 2023-10-20 2024-01-12 浙江迪福润丝生物科技有限公司 Screening method of WNV protease inhibitor and inhibition effect evaluation method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5223409A (en) * 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
US5635182A (en) 1994-06-16 1997-06-03 Genetics Institute, Inc. Method of detecting ligand interactions
DE69604298T2 (en) * 1995-09-22 2000-05-18 Bioimage A S Soeborg VARIANTS OF THE GREEN FLUORESCENCE PROTEIN, GFP
JP2000511050A (en) 1996-05-31 2000-08-29 メディジーン・アクチェンゲゼルシャフト Novel synthetic protein structural templates for generation, screening, and deployment of functional molecular surfaces
US6025485A (en) 1997-02-14 2000-02-15 Arcaris, Inc. Methods and compositions for peptide libraries displayed on light-emitting scaffolds
US5955275A (en) 1997-02-14 1999-09-21 Arcaris, Inc. Methods for identifying nucleic acid sequences encoding agents that affect cellular phenotypes
US6623922B1 (en) 1997-02-14 2003-09-23 Deltagen Proteomics Methods for identifying, characterizing, and evolving cell-type specific CIS regulatory elements
EP1064360B1 (en) 1998-03-27 2008-03-05 Prolume, Ltd. Luciferases, gfp fluorescent proteins, their nucleic acids and the use thereof in diagnostics
US6180343B1 (en) * 1998-10-08 2001-01-30 Rigel Pharmaceuticals, Inc. Green fluorescent protein fusions with random peptides

Also Published As

Publication number Publication date
US6548249B1 (en) 2003-04-15
WO2000020574A2 (en) 2000-04-13
CA2345215A1 (en) 2000-04-13
ATE362528T1 (en) 2007-06-15
US6562617B1 (en) 2003-05-13
EP1119617A2 (en) 2001-08-01
AU768126B2 (en) 2003-12-04
DE69936103D1 (en) 2007-06-28
AU1516400A (en) 2000-04-26
US6596485B2 (en) 2003-07-22
WO2000020574A3 (en) 2000-09-21
US6180343B1 (en) 2001-01-30
EP1119617B1 (en) 2007-05-16
US20010003650A1 (en) 2001-06-14
DE69936103T2 (en) 2008-05-15
US6548632B1 (en) 2003-04-15
JP2002526108A (en) 2002-08-20

Similar Documents

Publication Publication Date Title
US6562617B1 (en) Fusions of scaffold proteins with random peptide libraries
US9040462B2 (en) In vivo production of cyclic peptides for inhibiting protein-protein interaction
US7208571B2 (en) In vivo production of cyclic peptides
US7034145B2 (en) Directionally cloned random cDNA expression vector libraries, compositions and methods of use
US7090976B2 (en) Methods and compositions comprising Renilla GFP
US7252952B2 (en) In vivo production of cyclic peptides for inhibiting protein—protein interaction
US6969584B2 (en) Combinatorial enzymatic complexes
US7297482B2 (en) Structurally biased random peptide libraries based on different scaffolds
US6936421B2 (en) Structurally biased random peptide libraries based on different scaffolds
EP1541679B1 (en) Method of analyzing organelle-localized protein and materials for analysis
US7205130B2 (en) Bi-directionally cloned random cDNA expression vector libraries, compositions and methods of use
CN114867854A (en) Peptides
CA2542182A1 (en) Method for identification of suitable fragmentation sites in a reporter protein

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref country code: AU

Ref document number: 2000 15164

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 574670

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 15164/00

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 1999957466

Country of ref document: EP

AK Designated states

Kind code of ref document: C2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/8-8/8, DRAWINGS, ADDED

WWP Wipo information: published in national office

Ref document number: 1999957466

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

ENP Entry into the national phase

Ref document number: 2345215

Country of ref document: CA

Ref country code: CA

Ref document number: 2345215

Kind code of ref document: A

Format of ref document f/p: F

WWG Wipo information: grant in national office

Ref document number: 15164/00

Country of ref document: AU

WWG Wipo information: grant in national office

Ref document number: 1999957466

Country of ref document: EP