Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS6617156 B1
Publication typeGrant
Application numberUS 09/134,000
Publication dateSep 9, 2003
Filing dateAug 13, 1998
Priority dateAug 15, 1997
Fee statusLapsed
Also published asUS20070021600
Publication number09134000, 134000, US 6617156 B1, US 6617156B1, US-B1-6617156, US6617156 B1, US6617156B1
InventorsLynn A. Doucette-Stamm, David Bush
Original AssigneeLynn A. Doucette-Stamm, David Bush
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Nucleic acid and amino acid sequences relating to Enterococcus faecalis for diagnostics and therapeutics
US 6617156 B1
Abstract
The invention provides isolated polypeptide and nucleic acid sequences derived from Enterococcus faecalis that are useful in diagnosis and therapy of pathological conditions; antibodies against the polypeptides; and methods for the production of the polypeptides. The invention also provides methods for the detection, prevention and treatment of pathological conditions resulting from bacterial infection.
Images(193)
Previous page
Next page
Claims(19)
What is claimed is:
1. An isolated nucleic acid comprising a nucleotide sequence encoding an E. faecalis polypeptide selected from the group consisting of SEQ ID NO:5177, SEQ ID NO:4597, SEQ ID NO:6122, and SEQ ID NO:4420.
2. A recombinant expression vector comprising the nucleic acid of claim 1 operably linked to a transcription regulatory element.
3. A cell comprising a recombinant expression vector of claim 2.
4. A method for producing an E. faecalis polypeptide comprising culturing a cell of claim 3 under conditions that permit expression of the polypeptide.
5. An isolated nucleic acid comprising a nucleotide sequence selected from the group consisting of SEQ ID NO:1772, SEQ ID NO:2248, SEQ ID NO:93, SEQ ID NO:1192, SEQ ID NO:1645, SEQ ID NO:1773, SEQ ID NO:2717, SEQ ID NO:2917, and SEQ ID NO:1015.
6. A recombinant vector comprising the nucleic acid of claim 5.
7. A cell comprising the recombinant vector of claim 6.
8. An isolated nucleic acid comprising a nucleotide sequence which can be used to detect the presence of E. faecalis in a sample, wherein said nucleic acid shares at least 90% homology to a sequence selected from the group consisting of SEQ ID NO:93, SEQ ID NO:1645, SEQ ID NO:1773, SEQ ID NO:2917, and SEQ ID NO:1015.
9. A recombinant vector comprising the nucleic acid of claim 8.
10. A cell comprising the recombinant vector of claim 9.
11. An isolated nucleic acid comprising a nucleotide sequence which can be used to detect the presence of E. faecalis in a sample, wherein said nucleic acid shares at least 95% homology to a sequence selected from the group consisting of SEQ ID NO:93, SEQ ID NO:1645, SEQ ID NO:1773, SEQ ID NO:2917, and SEQ ID N:1015.
12. A recombinant vector comprising the nucleic acid of claim 11.
13. A cell comprising the recombinant vector of claim 12.
14. An isolated nucleic acid comprising SEQ ID NO:2594.
15. A recombinant vector comprising the nucleic acid of claim 14.
16. A cell comprising the recombinant vector of claim 15.
17. An isolated nucleic acid comprising a nucleotide sequence which can be used to detect the presence of E. faecalis in a sample, wherein said nucleic acid shares at least 95% homology to SEQ ID NO:2594.
18. A recombinant vector comprising the nucleic acid of claim 17.
19. A cell comprising the recombinant vector of claim 18.
Description

This application claims priority of U.S. Provisional application No. 60/055,778, filed Aug. 15, 1997, all of which is hereby incorporated herein by reference in its entirety.

FIELD OF THE INVENTION

The invention relates to isolated nucleic acids and polypeptides derived from Enterococcus faecalis that are useful as molecular targets for diagnosis, prophylaxis and treatment of pathological conditions, as well as materials and methods for the diagnosis, prevention, and amelioration of pathological conditions resulting from bacterial infection.

BRIEF DESCRIPTION OF THE SEQUENCE LISTING

Incorporated herein by reference in its entirety is a Sequence Listing, comprising SEQ ID NO: 1 to SEQ ID NO: 6812. The Sequence Listing is contained on a CD-ROM, three copies of which are filed, the Sequence Listing being in a computer-readable ASCII file named “GTC005.pto”, created on Sep. 6, 2001 and of 10.3 megabytes in size, in Windows NT 4.0, ASCII text format.

BACKGROUND OF THE INVENTION

Enterococcus faecalis (E. faecalis) is a gram-positive, facultative, anaerobic cocci, that is widely distributed in nature, animals, and humans. Enterococci are part of the normal gastrointestinal and genital tract flora, and among the 17 known species, E. faecalis is dominant in humans, accounting for 80-90% of clinically isolated specimens, and it exhibits increasing levels of multidrug resistance (Kaufhold, A and Klein, R (1995) Zentralblatt fuer Bakteriologie 282 (4): 507-518; and Svec, P, Sedlacek, I, and Pakrova, E (1996) Epidemiologie Mikrobiologie Imunologie 45: 153-157). E. faecalis infections include urinary tract infections (UTI), bacteremia, endocarditis, and wound and abdominal-pelvic infections, accounting for 16% of all UTIs, and 8% of all becteremias (Ardino, R C, and Murray, B E (1990) Principles and Practice of Infectious Diseases, 3rd ed., Mandell et al, eds., Update Vol. 2, No. 4).

Vancomycin resistant enterococci (VRE) have emerged in the midst of high level resistance to penicillin and aminoglycosides (Centers for Disease Control and Prevention (1993) MMWR 42:597-599; and Handwerger, S, et al (1993) Clin Infect Dis 16:750-755). Resistance can be intrinsic (chromosomally mediated), or acquired (plasmid or transposon mediated), with higher levels of resistance in acquired. VRE are characterized by resistance to virtually all available antibiotics, including vancomycin, considered the “last resort” antibiotic effective against gram-positive bacteria. Treatment options for physicians are limited, with the latest strategy being combinations of antimicrobials or the use of new unproven compounds (Moellering, R C Jr. (1991) J Antimicrob Chemother 28: 1-12; and Hayden, M K et al (1994) Antimicrob Agents Chemother 38 1225-1229; and Mobarakai, N et al (1994) J Antimicrob Chemother 33: 319-321). From 1989 through 1993, the percentage of nosocomial (hospital incurred) infections by VRE increased from 0.3% to 7.9% (Centers for Disease Control and Prevention (1993) MMWR 42:597-599). There was a 34-fold increase in ICU patients, and a increasing trend among non-ICU patients (Centers for Disease Control and Prevention (1993) MMWR 42:597-599). These numbers may not be an accurate reflection of the actual total, as clinical identification of vancomycin resistance is not consistently detected, especially in the VanB phenotype which confers moderate resistance (Tenover, F C (1993) J Clin Microbiol 31:1695-1699; and Sahm, D F (1990) Antimicrob Agents Chemother 34: 1846-1848; and Zabransky, R J (1994) Microbiol Infect Dis 20:113-116). Patients can be colonized and carry VRE without symptoms, with chief areas of colonization being anus, axilla, stool, perineal, umbilicus, wounds, foley catheters, and colostomy sites.

Epidemiology of E. faecalis is not completely understood, but it is thought that most infections and colonizations are a result of the patient's endogenous flora (Murray, B E (1990) Clin Microbiol Rev 3:46-65). Recent evidence suggests that E. faecalis can be spread by direct contact with other infected patients, indirect transmission from hospital personnel (Boyce, J M et al (1994) J Clin Microbiol 32:1148-53; and Rhineheart, E et al (1990) N Engl J Med 323:1814-1818), or from contaminated hospital surfaces and equipment (Karanfil, L V et al (1992) Infect Control Hosp Epidemiol 13:195-200; and Boyce, J M et al (1994) J Clin Microbiol 32:1148-53; and Livornese, L L Jr. (1992) Ann Intern Med 117:112-116). Increased risk for the critically ill, those with underlying disease of immunosuppression (i.e. ICU, oncology, and transplant patients), cardio-thoracic/intraabdominal surgical patients and those with urinary or central venous catheters has been demonstrated. In addition, risk for E. faecalis infection increases for patients with long hospital stays or previous multiantimicrobial or vancomycin treatments (Boyce, J M et al (1994) J Clin Microbiol 32:1148-1153; Boyle, J F et al (1993) J Clin Microbiol 31:1280-1285; Karanfil, L V et al (1992) Infect Control Hosp Epidemiol 13:195-200; Handwerger, S et al (1993) Clin Infect Dis 16:750-755; Montecalvo, M A et al (1994) Antimicrob Agents Chemother 38:1363-1367).

Additional concern stems from the ability of the E. faecalis plasmid borne VanA gene, which confers high level vancomycin resistance, to transfer in vitro to several gram positive microorganisms such as Staphylococcus aureus (Leclercq, R et al (1989) Antimicrob Agents Chemother 33:10-15; and Noble, W C, et al (1992) FEMS Microbiology Letters 72:195-198). To date, no clinical isolates of S. aureus or S. epidermidis have shown vancomycin resistance conferred by plasmid transfer, but clinically isolated strains of S. haemolyticus have (Degner, J E, et al (1994) J Clin Microbiol 32:2260-2265; and Veach, L A, et al (1990) J Clin Microbiol 28:2064-2068).

These concerns point to the need for diagnostic tools and therapeutics aimed at proper identification of strain and eradication of virulence. The design of vaccines that will limit the spread of infection and halt transfer of resistance factors is very desirable.

SUMMARY OF THE INVENTION

The present invention fulfills the need for diagnostic tools and therapeutics by providing bacterial-specific compositions and methods for detecting, treating, and preventing bacterial infection, in particular E. faecalis infection.

The present invention encompasses isolated nucleic acids and polypeptides derived from E. faecalis that are useful as reagents for diagnosis of bacterial disease, components of effective antibacterial vaccines, and/or as targets for antibacterial drugs, including anti-E. faecalis drugs. They can also be used to detect the presence of E. faecalis and other Enterococcus species in a sample; and in screening compounds for the ability to interfere with the E. faecalis life cycle or to inhibit E. faecalis infection. They also has use as biocontrol agents for plants.

The present invention also provides a genome-wide comparison by FASTA of the predicted amino acid sequences of several E. faecalis open reading frames (ORFs) with the predicted amino acid sequence of several E. faecium ORFs (Table 3). Together, E. faecalis and E. faecium account for >95% of all VRE infections. Genomic comparison of E. faecalis with E. faecium at the sequence, open reading frame (ORF), and gene level provides valuable information on shared targets, which can be exploited in designing diagnostics and therapeutics for VRE. Identifying common essential genes through sequencing and analysis of both genomes provides a much quicker route to these targets, and speeds the progress of probe design for identification of VRE infection, and vaccine compositions for protection from and treatment of these infections.

More specifically, this invention features compositions of nucleic acids corresponding to entire coding sequences of E. faecalis proteins, including surface or secreted proteins or parts thereof, nucleic acids capable of binding mRNA from E. faecalis proteins to block protein translation, and methods for producing E. faecalis proteins or parts thereof using peptide synthesis and recombinant DNA techniques. This invention also features antibodies and nucleic acids useful as probes to detect E. faecalis infection. In addition, vaccine compositions and methods for the protection or treatment of infection by E. faecalis are within the scope of this invention.

The nucleotide sequences provided in SEQ ID NO: 1-SEQ ID NO: 3405, a fragment thereof, or a nucleotide sequence at least 99.5% identical to a sequence contained within SEQ ID NO: 1-SEQ ID NO: 3405 may be “provided” in a variety of medias to facilitate use thereof. As used herein, “provided” refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention, i.e., the nucleotide sequence provided in SEQ ID NO: 1-SEQ ID NO: 3405, a fragment thereof, or a nucleotide sequence at least 99.5% identical to a sequence contained within SEQ ID NO: 1-SEQ ID NO: 3405. Uses for and methods for providing nucleotide sequences in a variety of media is well known in the art (see e.g., EPO Publication No. EP 0 756 006).

In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer readable media. As used herein, “computer readable media” refers to any media which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage media, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. A person skilled in the art can readily appreciate how any of the presently known computer readable media can be used to create a manufacture comprising computer readable media having recorded thereon a nucleotide sequence of the present invention.

As used herein, “recorded” refers to a process for storing information on computer readable media. A person skilled in the art can readily adopt any of the presently known methods for recording information on computer readable media to generate manufactures comprising the nucleotide sequence information of the present invention.

A variety of data storage structures are available to a person skilled in the art for creating a computer readable media having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable media. The sequence information can be represented in a word processing text file, formatted in commercially-available software such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A person skilled in the art can readily adapt any number of data processor structuring formats (e.g. text file or database) in order to obtain computer readable media having recorded thereon the nucleotide sequence information of the present invention.

By providing the nucleotide sequences of SEQ ID NO: 1-SEQ ID NO: 3405, a fragment thereof, or a nucleotide sequence at least 99.5% identical to SEQ ID NO: 1-SEQ ID NO: 3405 in computer readable form, a person skilled in the art can routinely access the coding sequence information for a variety of purposes. Computer software is publicly available which allows a person skilled in the art to access sequence information provided in a computer readable media. Examples of such computer software include programs of the “Staden Package”, “DNA Star”, “MacVector”, GCG “Wisconsin Package” (Genetics Computer Group, Madison, Wis.) and “NCBI Toolbox” (National Center For Biotechnology Information).

Computer algorithms enable the identification of open reading frames (ORFs) within SEQ ID NO: 1-SEQ ID NO: 3405 which contain homology to ORFs or proteins from other organisms. Examples of such similarity-search algorithms include the BLAST [Altschul et al., J. Mol. Biol. 215:403-410 (1990)] and Smith-Waterman [Smith and Waterman (1981) Advances in Applied Mathematics, 2:482-489] search algorithms. These algorithms are utilized on computer systems as exemplified below. The ORFs so identified represent protein encoding fragments within the E. faecalis genome and are useful in producing commercially important proteins such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.

The present invention further provides systems, particularly computer-based systems, which contain the sequence information described herein. Such systems are designed to identify commercially important fragments of the E. faecalis genome. As used herein, “a computer-based system” refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A person skilled in the art can readily appreciate that any one of the currently available computer-based systems is suitable for use in the present invention. The computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means. As used herein, “data storage means” refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.

As used herein, “search means” refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage means. Search means are used to identify fragments or regions of the E. faecalis genome which are similar to, or “match”, a particular target sequence or target motif. A variety of known algorithms are known in the art and have been disclosed publicly, and a variety of commercially available software for conducting homology-based similarity searches are available and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, FASTA (GCG Wisconsin Package), Bic_SW (Compugen Bioccelerator), BLASTN2, BLASTP2, BLASTX2 (NCBI) and Motifs (GCG). A person skilled in the art can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systems.

As used herein, a “target sequence” can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids. A person skilled in the art can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide residues. However, it is well recognized that many genes are longer than 500 amino acids, or 1.5 kb in length, and that commercially important fragments of the E. faecalis genome, such as sequence fragments involved in gene expression and protein processing, will often be shorter than 30 nucleotides.

As used herein, “a target structural motif,” or “target motif,” refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a specific functional domain or three-dimensional configuration which is formed upon the folding of the target polypeptide. There are a variety of target motifs known in the art. Protein target motifs include, but are not limited to, enzymatic active sites, membrane spanning regions, and signal sequences. Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).

A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the E. faecalis genome possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a person skilled in the art with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.

A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the E. faecalis genome. In the present examples, implementing software which implement the BLASTP2 and bic_SW algorithms (Altschul et al., J Mol. Biol. 215:403-410 (1990); Compugen Biocellerator) was used to identify open reading frames within the E. faecalis genome. A person skilled in the art can readily recognize that any one of the publicly available homology search programs can be used as the search means for the computer-based systems of the present invention.

The invention features E. faecalis polypeptides, preferably a substantially pure preparation of an E. faecalis polypeptide, or a recombinant E. faecalis polypeptide. In preferred embodiments: the polypeptide has biological activity; the polypeptide has an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% identical to an amino acid sequence of the invention contained in the Sequence Listing, preferably it has about 65% sequence identity with an amino acid sequence of the invention contained in the Sequence Listing, and most preferably it has about 92% to about 99% sequence identity with an amino acid sequence of the invention contained in the Sequence Listing; the polypeptide has an amino acid sequence essentially the same as an amino acid sequence of the invention contained in the Sequence Listing; the polypeptide is at least 5, 10, 20, 50, 100, or 150 amino acid residues in length; the polypeptide includes at least 5, preferably at least 10, more preferably at least 20, more preferably at least 50, 100, or 150 contiguous amino acid residues of the invention contained in the Sequence Listing. In yet another preferred embodiment, the amino acid sequence which differs in sequence identity by about 7% to about 8% from the E. faecalis amino acid sequences of the invention contained in the Sequence Listing is also encompassed by the invention.

In preferred embodiments: the E. faecalis polypeptide is encoded by a nucleic acid of the invention contained in the Sequence Listing, or by a nucleic acid having at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a nucleic acid of the invention contained in the Sequence Listing.

In a preferred embodiment, the subject E. faecalis polypeptide differs in amino acid sequence at 1, 2, 3, 5, 10 or more residues from a sequence of the invention contained in the Sequence Listing. The differences, however, are such that the E. faecalis polypeptide exhibits an E. faecalis biological activity, e.g., the E. faecalis polypeptide retains a biological activity of a naturally occurring E. faecalis enzyme.

In preferred embodiments, the polypeptide includes all or a fragment of an amino acid sequence of the invention contained in the Sequence Listing; fused, in reading frame, to additional amino acid residues, preferably to residues encoded by genomic DNA 5′ or 3′ to the genomic DNA which encodes a sequence of the invention contained in the Sequence Listing.

In yet other preferred embodiments, the E. faecalis polypeptide is a recombinant fusion protein having a first E. faecalis polypeptide portion and a second polypeptide portion, e.g., a second polypeptide portion having an amino acid sequence unrelated to E. faecalis. The second polypeptide portion can be, e.g., any of glutathione-S-transferase, a DNA binding domain, or a polymerase activating domain. In preferred embodiment the fusion protein can be used in a two-hybrid assay.

Polypeptides of the invention include those which arise as a result of alternative transcription events, alternative RNA splicing events, and alternative translational and postranslational events.

In a preferred embodiment, the encoded E. faecalis polypeptide differs (e.g., by amino acid substitution, addition or deletion of at least one amino acid residue) in amino acid sequence at 1, 2, 3, 5, 10 or more residues, from a sequence of the invention contained in the Sequence Listing. The differences, however, are such that: the E. faecalis encoded polypeptide exhibits a E. faecalis biological activity, e.g., the encoded E. faecalis enzyme retains a biological activity of a naturally occurring E. faecalis.

In preferred embodiments, the encoded polypeptide includes all or a fragment of an amino acid sequence of the invention contained in the Sequence Listing; fused, in reading frame, to additional amino acid residues, preferably to residues encoded by genomic DNA 5′ or 3′ to the genomic DNA which encodes a sequence of the invention contained in the Sequence Listing.

The E. faecalis strain, from which the nucleotide sequences have been sequenced, was deposited on Jun. 26, 1997 in the American Type Culture Collection (ATCC # 55986) as strain 14336.

Included in the invention are: allelic variations; natural mutants; induced mutants; proteins encoded by DNA that hybridize under high or low stringency conditions to a nucleic acid which encodes a polypeptide of the invention contained in the Sequence Listing (for definitions of high and low stringency see Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1989, 6.3.1-6.3.6, hereby incorporated by reference); and, polypeptides specifically bound by antisera to E. faecalis polypeptides, especially by antisera to an active site or binding domain of E. faecalis polypeptide. The invention also includes fragments, preferably biologically active fragments. These and other polypeptides are also referred to herein as E. faecalis polypeptide analogs or variants.

The invention further provides nucleic acids, e.g., RNA or DNA, encoding a polypeptide of the invention. This includes double stranded nucleic acids as well as coding and antisense single strands.

In preferred embodiments, the subject E. faecalis nucleic acid will include a transcriptional regulatory sequence, e.g. at least one of a transcriptional promoter or transcriptional enhancer sequence, operably linked to the E. faecalis gene sequence, e.g., to render the E. faecalis gene sequence suitable for expression in a recombinant host cell.

In yet a further preferred embodiment, the nucleic acid which encodes an E. faecalis polypeptide of the invention, hybridizes under stringent conditions to a nucleic acid probe corresponding to at least 8 consecutive nucleotides of the invention contained in the Sequence Listing; more preferably to at least 12 consecutive nucleotides of the invention contained in the Sequence Listing; more preferably to at least 20 consecutive nucleotides of the invention contained in the Sequence Listing; more preferably to at least 40 consecutive nucleotides of the invention contained in the Sequence Listing.

In another aspect, the invention provides a substantially pure nucleic acid having a nucleotide sequence which encodes an E. faecalis polypeptide. In preferred embodiments: the encoded polypeptide has biological activity; the encoded polypeptide has an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% homologous to an amino acid sequence of the invention contained in the Sequence Listing; the encoded polypeptide has an amino acid sequence essentially the same as an amino acid sequence of the invention contained in the Sequence Listing; the encoded polypeptide is at least 5, 10, 20, 50, 100, or 150 amino acids in length; the encoded polypeptide comprises at least 5, preferably at least 10, more preferably at least 20, more preferably at least 50, 100, or 150 contiguous amino acids of the invention contained in the Sequence Listing.

In another aspect, the invention encompasses: a vector including a nucleic acid which encodes an E. faecalis polypeptide or an E. faecalis polypeptide variant as described herein; a host cell transfected with the vector; and a method of producing a recombinant E. faecalis polypeptide or E. faecalis polypeptide variant; including culturing the cell, e.g., in a cell culture medium, and isolating the E. faecalis or E. faecalis polypeptide variant, e.g., from the cell or from the cell culture medium.

One embodiment of the invention is directed to substantially isolated nucleic acids. Nucleic acids of the invention include sequences comprising at least about 8 nucleotides in length, more preferably at least about 12 nucleotides in length, even more preferably at least about 15-20 nucleotides in length, that correspond to a subsequence of any one of SEQ ID NO: 1-SEQ ID NO: 3405 or complements thereof. Alternatively, the nucleic acids comprise sequences contained within any ORF (open reading frame), including a complete protein-coding sequence, of which any of SEQ ID NO: 1-SEQ ID NO: 3405 forms a part. The invention encompasses sequence-conservative variants and function-conservative variants of these sequences. The nucleic acids may be DNA, RNA, DNA/RNA duplexes, protein-nucleic acid (PNA), or derivatives thereof.

In another aspect, the invention features, a purified recombinant nucleic acid having at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a sequence of the invention contained in the Sequence Listing. The invention also encompasses recombinant DNA (including DNA cloning and expression vectors) comprising these E. faecalis-derived sequences; host cells comprising such DNA, including fungal, bacterial, yeast, plant, insect, and mammalian host cells; and methods for producing expression products comprising RNA and polypeptides encoded by the E. faecalis sequences. These methods are carried out by incubating a host cell comprising a E. faecalis-derived nucleic acid sequence under conditions in which the sequence is expressed. The host cell may be native or recombinant. The polypeptides can be obtained by (a) harvesting the incubated cells to produce a cell fraction and a medium fraction; and (b) recovering the E. faecalis polypeptide from the cell fraction, the medium fraction, or both. The polypeptides can also be made by in vitro translation.

In another aspect, the invention features nucleic acids capable of binding mRNA of E. faecalis. Such nucleic acid is capable of acting as antisense nucleic acid to control the translation of mRNA of E. faecalis. A further aspect features a nucleic acid which is capable of binding specifically to an E. faecalis nucleic acid. These nucleic acids are also referred to herein as complements and have utility as probes and as capture reagents.

In another aspect, the invention features an expression system comprising an open reading frame corresponding to E. faecalis nucleic acid. The nucleic acid further comprises a control sequence compatible with an intended host. The expression system is useful for making polypeptides corresponding to E. faecalis nucleic acid.

In another aspect, the invention encompasses: a vector including a nucleic acid which encodes a E. faecalis polypeptide or a E. faecalis polypeptide variant as described herein; a host cell transfected with the vector; and a method of producing a recombinant E. faecalis polypeptide or E. faecalis polypeptide variant; including culturing the cell, e.g., in a cell culture medium, and isolating the E. faecalis or E. faecalis polypeptide variant, e.g., from the cell or from the cell culture medium.

In yet another embodiment, the invention encompasses reagents for detecting bacterial infection, including E. faecalis infection, which comprise at least one E. faecalis-derived nucleic acid defined by any one of SEQ ID NO: 1-SEQ ID NO: 3405, or sequence-conservative or function-conservative variants thereof. Alternatively, the diagnostic reagents comprise polypeptide sequences that are contained within any open reading frames (ORFs), including complete protein-coding sequences, contained within any of SEQ ID NO: 1-SEQ ID NO: 3405, or polypeptide sequences contained within any of SEQ ID NO: 3406-SEQ ID NO: 6810, or polypeptides of which any of the above sequences forms a part, or antibodies directed against any of the above peptide sequences or function-conservative variants and/or fragments thereof.

The invention further provides antibodies, preferably monoclonal antibodies, which specifically bind to the polypeptides of the invention. Methods are also provided for producing antibodies in a host animal. The methods of the invention comprise immunizing an animal with at least one E. faecalis-derived immunogenic component, wherein the immunogenic component comprises one or more of the polypeptides encoded by any one of SEQ ID NO: 1-SEQ ID NO: 3405 or sequence-conservative or function-conservative variants thereof; or polypeptides that are contained within any ORFs, including complete protein-coding sequences, of which any of SEQ ID NO: 1-SEQ ID NO: 3405 forms a part; or polypeptide sequences contained within any of SEQ ID NO: 3406-SEQ ID NO: 6810; or polypeptides of which any of SEQ ID NO: 3406-SEQ ID NO: 6810 forms a part. Host animals include any warm blooded animal, including without limitation mammals and birds. Such antibodies have utility as reagents for immunoassays to evaluate the abundance and distribution of E. faecalis-specific antigens.

In yet another aspect, the invention provides diagnostic methods for detecting E. faecalis antigenic components or anti-E. faecalis antibodies in a sample. E. faecalis antigenic components are detected by a process comprising: (i) contacting a sample suspected to contain a bacterial antigenic component with a bacterial-specific antibody, under conditions in which a stable antigen-antibody complex can form between the antibody and bacterial antigenic components in the sample; and (ii) detecting any antigen-antibody complex formed in step (i), wherein detection of an antigen-antibody complex indicates the presence of at least one bacterial antigenic component in the sample. In different embodiments of this method, the antibodies used are directed against a sequence encoded by any of SEQ ID NO: 1-SEQ ID NO: 3405 or sequence-conservative or function-conservative variants thereof, or against a polypeptide sequence contained in any of SEQ ID NO: 3406-SEQ ID NO: 6810 or function-conservative variants thereof.

In yet another aspect, the invention provides a method for detecting antibacterial-specific antibodies in a sample, which comprises: (i) contacting a sample suspected to contain antibacterial-specific antibodies with a E. faecalis antigenic component, under conditions in which a stable antigen-antibody complex can form between the E. faecalis antigenic component and antibacterial antibodies in the sample; and (ii) detecting any antigen-antibody complex formed in step (i), wherein detection of an antigen-antibody complex indicates the presence of antibacterial antibodies in the sample. In different embodiments of this method, the antigenic component is encoded by a sequence contained in any of SEQ ID NO: 1-SEQ ID NO: 3405 or sequence-conservative and function-conservative variants thereof, or is a polypeptide sequence contained in any of SEQ ID NO: 3406-SEQ ID NO: 6810 or function-conservative variants thereof.

In another aspect, the invention features a method of generating vaccines for immunizing an individual against E. faecalis. The method includes: immunizing a subject with an E. faecalis polypeptide, e.g., a surface or secreted polypeptide, or a combination of such peptides or active portion(s) thereof, and a pharmaceutically acceptable carrier. Such vaccines have therapeutic and prophylactic utilities.

In another aspect, the invention features a method of evaluating a compound, e.g. a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind an E. faecalis polypeptide. The method includes: contacting the Enterococcus compound with an E. faecalis polypeptide and determining if the compound binds or otherwise interacts with an E. faecalis polypeptide. Compounds which bind E. faecalis are candidates as activators or inhibitors of the bacterial life cycle. These assays can be performed in vitro or in vivo.

In another aspect, the invention features a method of evaluating a compound, e.g. a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind an E. faecalis nucleic acid, e.g., DNA or RNA. The method includes: contacting the Enterococcus compound with an E. faecalis nucleic acid and determining if the compound binds or otherwise interacts with an E. faecalis polypeptide. Compounds which bind E. faecalis are candidates as activators or inhibitors of the bacterial life cycle. These assays can be performed in vitro or in vivo.

A particularly preferred embodiment of the invention is directed to a method of screening test compounds for anti-bacterial activity, which method comprises: selecting as a target a bacterial specific sequence, which sequence is essential to the viability of a bacterial species; contacting a test compound with said target sequence; and selecting those test compounds which bind to said target sequence as potential anti-bacterial candidates. In one embodiment, the target sequence selected is specific to a single species, or even a single strain, i.e., the E. faecalis strain 14336. In a second embodiment, the target sequence is common to at least two species of bacteria. In a third embodiment, the target sequence is common to a family of bacteria. The target sequence may be a nucleic acid sequence or a polypeptide sequence. Methods employing sequences common to more than one species of microorganism may be used to screen candidates for broad spectrum anti-bacterial activity.

The invention also provides methods for preventing or treating disease caused by certain bacteria, including E. faecalis, which are carried out by administering to an animal in need of such treatment, in particular a warm-blooded vertebrate, including but not limited to birds and mammals, a compound that specifically inhibits or interferes with the function of a bacterial polypeptide or nucleic acid. In a particularly preferred embodiment, the mammal to be treated is human.

DETAILED DESCRIPTION OF THE INVENTION

The sequences of the present invention include the specific nucleic acid and amino acid sequences set forth in the Sequence Listing that forms a part of the present specification, and which are designated SEQ ID NO: 1-SEQ ID NO: 6810. Use of the terms “SEQ ID NO: 1-SEQ ID NO: 3405”, “SEQ ID NO: 3406-SEQ ID NO: 6810”, “the sequences depicted in Table 2”, etc., is intended, for convenience, to refer to each individual SEQ ID NO individually, and is not intended to refer to the genus of these sequences. In other words, it is a shorthand for listing all of these sequences individually. The invention encompasses each sequence individually, as well as any combination thereof.

Definitions

“Nucleic acid” or “polynucleotide” as used herein refers to purine- and pyrimidine-containing polymers of any length, either polyribonucleotides or polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides. This includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids, as well as “protein nucleic acids” (PNA) formed by conjugating bases to an amino acid backbone. This also includes nucleic acids containing modified bases.

A nucleic acid or polypeptide sequence that is “derived from” a designated sequence refers to a sequence that corresponds to a region of the designated sequence. For nucleic acid sequences, this encompasses sequences that are homologous or complementary to the sequence, as well as “sequence-conservative variants” and “function-conservative variants.” For polypeptide sequences, this encompasses “function-conservative variants.” Sequence-conservative variants are those in which a change of one or more nucleotides in a given codon position results in no alteration in the amino acid encoded at that position. Function-conservative variants are those in which a given amino acid residue in a polypeptide has been changed without altering the overall conformation and function of the native polypeptide, including, but not limited to, replacement of an amino acid with one having similar physico-chemical properties (such as, for example, acidic, basic, hydrophobic, and the like). “Function-conservative” variants also include any polypeptides that have the ability to elicit antibodies specific to a designated polypeptide.

An “E. faecalis-derived” nucleic acid or polypeptide sequence may or may not be present in other bacterial species, and may or may not be present in all E. faecalis strains. This term is intended to refer to the source from which the sequence was originally isolated. Thus, a E. faecalis-derived polypeptide, as used herein, may be used, e.g., as a target to screen for a broad spectrum antibacterial agent, to search for homologous proteins in other species of bacteria or in eukaryotic organisms such as fungi and humans, etc.

A purified or isolated polypeptide or a substantially pure preparation of a polypeptide are used interchangeably herein and, as used herein, mean a polypeptide that has been separated from other proteins, lipids, and nucleic acids with which it naturally occurs. Preferably, the polypeptide is also separated from substances, e.g., antibodies or gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, the polypeptide constitutes at least 10, 20, 50 70, 80 or 95% dry weight of the purified preparation. Preferably, the preparation contains: sufficient polypeptide to allow protein sequencing; at least 1, 10, or 100 mg of the polypeptide.

A purified preparation of cells refers to, in the case of plant or animal cells, an in vitro preparation of cells and not an entire intact plant or animal. In the case of cultured cells or microbial cells, it consists of a preparation of at least 10% and more preferably 50% of the subject cells.

A purified or isolated or a substantially pure nucleic acid, e.g., a substantially pure DNA, (are terms used interchangeably herein) is a nucleic acid which is one or both of the following: not immediately contiguous with both of the coding sequences with which it is immediately contiguous (i.e., one at the 5′ end and one at the 3′ end) in the naturally-occurring genome of the organism from which the nucleic acid is derived; or which is substantially free of a nucleic acid with which it occurs in the organism from which the nucleic acid is derived. The term includes, for example, a recombinant DNA which is incorporated into a vector, e.g., into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other DNA sequences. Substantially pure DNA also includes a recombinant DNA which is part of a hybrid gene encoding additional E. faecalis DNA sequence.

A “contig” as used herein is a nucleic acid representing a continuous stretch of genomic sequence of an organism.

An “open reading frame”, also referred to herein as ORF, is a region of nucleic acid which encodes a polypeptide. This region may represent a portion of a coding sequence or a total sequence and can be determined from a stop to stop codon or from a start to stop codon.

As used herein, a “coding sequence” is a nucleic acid which is transcribed into messenger RNA and/or translated into a polypeptide when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a translation start codon at the five prime terminus and a translation stop code at the three prime terminus. A coding sequence can include but is not limited to messenger RNA, synthetic DNA, and recombinant nucleic acid sequences.

A “complement” of a nucleic acid as used herein refers to an anti-parallel or antisense sequence that participates in Watson-Crick base-pairing with the original sequence.

A “gene product” is a protein or structural RNA which is specifically encoded by a gene.

As used herein, the term “probe” refers to a nucleic acid, peptide or other chemical entity which specifically binds to a molecule of interest. Probes are often associated with or capable of associating with a label. A label is a chemical moiety capable of detection. Typical labels comprise dyes, radioisotopes, luminescent and chemiluminescent moieties, fluorophores, enzymes, precipitating agents, amplification sequences, and the like. Similarly, a nucleic acid, peptide or other chemical entity which specifically binds to a molecule of interest and immobilizes such molecule is referred herein as a “capture ligand”. Capture ligands are typically associated with or capable of associating with a support such as nitro-cellulose, glass, nylon membranes, beads, particles and the like. The specificity of hybridization is dependent on conditions such as the base pair composition of the nucleotides, and the temperature and salt concentration of the reaction. These conditions are readily discernable to one of ordinary skill in the art using routine experimentation.

“Homologous” refers to the sequence similarity or sequence identity between two polypeptides or between two nucleic acid molecules. When a position in both of the two compared sequences is occupied by the same base or amino acid monomer subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then the molecules are homologous at that position. The percent of homology between two sequences is a function of the number of matching or homologous positions shared by the two sequences divided by the number of positions compared×100. For example, if 6 of 10 of the positions in two sequences are matched or homologous then the two sequences are 60% homologous. By way of example, the DNA sequences ATTGCC and TATGGC share 50% homology. Generally, a comparison is made when two sequences are aligned to give maximum homology.

Nucleic acids are hybridizable to each other when at least one strand of a nucleic acid can anneal to the other nucleic acid under defined stringency conditions. Stringency of hybridization is determined by: (a) the temperature at which hybridization and/or washing is performed; and (b) the ionic strength and polarity of the hybridization and washing solutions. Hybridization requires that the two nucleic acids contain complementary sequences; depending on the stringency of hybridization, however, mismatches may be tolerated. Typically, hybridization of two sequences at high stingency (such as, for example, in a solution of 0.5×SSC, at 65° C.) requires that the sequences be essentially completely homologous. Conditions of intermediate stringency (such as, for example, 2×SSC at 65° C.) and low stringency (such as, for example 2×SSC at 55° C.), require correspondingly less overall complementarity between the hybridizing sequences. (1×SSC is 0.15 M NaCl, 0.015 M Na citrate).

The terms peptides, proteins, and polypeptides are used interchangeably herein.

As used herein, the term “surface protein” refers to all surface accessible proteins, e.g. inner and outer membrane proteins, proteins adhering to the cell wall, and secreted proteins.

A polypeptide has E. faecalis biological activity if it has one, two and preferably more of the following properties: (1) if when expressed in the course of an E. faecalis infection, it can promote, or mediate the attachment of E. faecalis to a cell; (2) it has an enzymatic activity, structural or regulatory function characteristic of an E. faecalis protein; (3) or the gene which encodes it can rescue a lethal mutation in an E. faecalis gene. A polypeptide has biological activity if it is an antagonist, agonist, or super-agonist of a polypeptide having one of the above-listed properties.

A biologically active fragment or analog is one having an in vivo or in vitro activity which is characteristic of the E. faecalis polypeptides of the invention contained in the Sequence Listing, or of other naturally occurring E. faecalis polypeptides, e.g., one or more of the biological activities described herein. Especially preferred are fragments which exist in vivo, e.g., fragments which arise from post transcriptional processing or which arise from translation of alternatively spliced RNA's. Fragments include those expressed in native or endogenous cells as well as those made in expression systems, e.g., in CHO cells. Because peptides such as E. faecalis polypeptides often exhibit a range of physiological properties and because such properties may be attributable to different portions of the molecule, a useful E. faecalis fragment or E. faecalis analog is one which exhibits a biological activity in any biological assay for E. faecalis activity. Most preferably the fragment or analog possesses 10%, preferably 40%, more preferably 60%, 70%, 80% or 90% or greater of the activity of E. faecalis, in any in vivo or in vitro assay.

Analogs can differ from naturally occurring E. faecalis polypeptides in amino acid sequence or in ways that do not involve sequence, or both. Non-sequence modifications include changes in acetylation, methylation, phosphorylation, carboxylation, or glycosylation. Preferred analogs include E. faecalis polypeptides (or biologically active fragments thereof) whose sequences differ from the wild-type sequence by one or more conservative amino acid substitutions or by one or more non-conservative amino acid substitutions, deletions, or insertions which do not substantially diminish the biological activity of the E. faecalis polypeptide. Conservative substitutions typically include the substitution of one amino acid for another with similar characteristics, e.g., substitutions within the following groups: valine, glycine; glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. Other conservative substitutions can be made in view of the table below.

TABLE 1
CONSERVATIVE AMINO ACID REPLACEMENTS
For Amino Acid Code Replace with any of
Alanine A D-Ala, Gly, beta-Ala, L-Cys, D-Cys
Arginine R D-Arg, Lys, D-Lys, homo-Arg, D-homo-Arg,
Met, Ile, D-Met, D-Ile, Orn, D-Orn
Asparagine N D-Asn, Asp, D-Asp, Glu, D-Glu, Gln, D-Gln
Aspartic Acid D D-Asp, D-Asn, Asn, Glu, D-Glu, Gln, D-Gln
Cysteine C D-Cys, S-Me-Cys, Met, D-Met, Thr, D-Thr
Glutamine Q D-Gln, Asn, D-Asn, Glu, D-Glu, Asp, D-Asp
Glutamic Acid E D-Glu, D-Asp, Asp, Asn, D-Asn, Gln, D-Gln
Glycine G Ala, D-Ala, Pro, D-Pro, β-Ala, Acp
Isoleucine I D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met
Leucine L D-Leu, Val, D-Val, Leu, D-Leu, Met, D-Met
Lysine K D-Lys, Arg, D-Arg, homo-Arg, D-homo-Arg,
Met, D-Met, Ile, D-Ile, Orn, D-Orn
Methionine M D-Met, S-Me-Cys, Ile, D-Ile, Leu, D-Leu,
Val, D-Val
Phenylalanine F D-Phe, Tyr, D-Thr, L-Dopa, His, D-His, Trp,
D-Trp, Trans-3,4, or 5-phenylproline,
cis-3,4, or 5-phenylproline
Proline P D-Pro, L-I-thioazolidine-4-carboxylic acid,
D-or L-1-oxazolidine-4-carboxylic acid
Serine S D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met,
Met(O), D-Met(O), L-Cys, D-Cys
Threonine T D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met,
Met(O), D-Met(O), Val, D-Val
Tyrosine Y D-Tyr, Phe, D-Phe, L-Dopa, His, D-His
Valine V D-Val, Leu, D-Leu, Ile, D-Ile, Met, D-Met

Other analogs within the invention are those with modifications which increase peptide stability; such analogs may contain, for example, one or more non-peptide bonds (which replace the peptide bonds) in the peptide sequence. Also included are: analogs that include residues other than naturally occurring L-amino acids, e.g., D-amino acids or non-naturally occurring or synthetic amino acids, e.g., β or γ amino acids; and cyclic analogs.

As used herein, the term “fragment”, as applied to an E. faecalis analog, will ordinarily be at least about 20 residues, more typically at least about 40 residues, preferably at least about 60 residues in length. Fragments of E. faecalis polypeptides can be generated by methods known to those skilled in the art. The ability of a candidate fragment to exhibit a biological activity of E. faecalis polypeptide can be assessed by methods known to those skilled in the art as described herein. Also included are E. faecalis polypeptides containing residues that are not required for biological activity of the peptide or that result from alternative mRNA splicing or alternative protein processing events.

An “immunogenic component” as used herein is a moiety, such as an E. faecalis polypeptide, analog or fragment thereof, that is capable of eliciting a humoral and/or cellular immune response in a host animal.

An “antigenic component” as used herein is a moiety, such as an E. faecalis polypeptide, analog or fragment thereof, that is capable of binding to a specific antibody with sufficiently high affinity to form a detectable antigen-antibody complex.

The term “antibody” as used herein is intended to include fragments thereof which are specifically reactive with E. faecalis polypeptides.

As used herein, the term “cell-specific promoter” means a DNA sequence that serves as a promoter, i.e., regulates expression of a selected DNA sequence operably linked to the promoter, and which effects expression of the selected DNA sequence in specific cells of a tissue. The term also covers so-called “leaky” promoters, which regulate expression of a selected DNA primarily in one tissue, but cause expression in other tissues as well.

Misexpression, as used herein, refers to a non-wild type pattern of gene expression. It includes: expression at non-wild type levels, i.e., over or under expression; a pattern of expression that differs from wild type in terms of the time or stage at which the gene is expressed, e.g., increased or decreased expression (as compared with wild type) at a predetermined developmental period or stage; a pattern of expression that differs from wild type in terms of increased expression (as compared with wild type) in a predetermined cell type or tissue type; a pattern of expression that differs from wild type in terms of the splicing size, amino acid sequence, post-translational modification, or biological activity of the expressed polypeptide; a pattern of expression that differs from wild type in terms of the effect of an environmental stimulus or extracellular stimulus on expression of the gene, e.g., a pattern of increased or decreased expression (as compared with wild type) in the presence of an increase or decrease in the strength of the stimulus.

As used herein, “host cells” and other such terms denoting microorganisms or higher eukaryotic cell lines cultured as unicellular entities refers to cells which can become or have been used as recipients for a recombinant vector or other transfer DNA, and include the progeny of the original cell which has been transfected. It is understood by individuals skilled in the art that the progeny of a single parental cell may not necessarily be completely identical in genomic or total DNA compliment to the original parent, due to accident or deliberate mutation.

As used herein, the term “control sequence” refers to a nucleic acid having a base sequence which is recognized by the host organism to effect the expression of encoded sequences to which they are ligated. The nature of such control sequences differs depending upon the host organism; in prokaryotes, such control sequences generally include a promoter, ribosomal binding site, terminators, and in some cases operators; in eukaryotes, generally such control sequences include promoters, terminators and in some instances, enhancers. The term control sequence is intended to include at a minimum, all components whose presence is necessary for expression, and may also include additional components whose presence is advantageous, for example, leader sequences.

As used herein, the term “operably linked” refers to sequences joined or ligated to function in their intended manner. For example, a control sequence is operably linked to coding sequence by ligation in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequence and host cell.

The “metabolism” of a substance, as used herein, means any aspect of the expression, function, action, or regulation of the substance. The metabolism of a substance includes modifications, e.g., covalent or non-covalent modifications of the substance. The metabolism of a substance includes modifications, e.g., covalent or non-covalent modification, the substance induces in other substances. The metabolism of a substance also includes changes in the distribution of the substance. The metabolism of a substance includes changes the substance induces in the distribution of other substances.

A “sample” as used herein refers to a biological sample, such as, for example, tissue or fluid isloated from an individual (including without limitation plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture constituents, as well as samples from the environment.

Technical and scientific terms used herein have the meanings commonly understood by one of ordinary skill in the art to which the present invention pertains, unless otherwise defined. Reference is made herein to various methodologies known to those of skill in the art. Publications and other materials setting forth such known methodologies to which reference is made are incorporated herein by reference in their entireties as though set forth in full. The practice of the invention will employ, unless otherwise indicated, conventional techniques of chemistry, molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. Such techniques are explained fully in the literature. See e.g., Sambrook, Fritsch, and Maniatis, Molecular Cloning: Laboratory Manual 2nd ed. (1989); DNA Cloning, Volumes I and II (D. N Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gait ed, 1984); Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. 1984); the series, Methods in Enzymology (Academic Press, Inc.), particularly Vol. 154 and Vol. 155 (Wu and Grossman, eds.); PCR—A Practical Approach (McPherson, Quirke, and Taylor, eds., 1991); Immunology, 2d Edition, 1989, Roitt et al., C. V. Mosby Company, and New York; Advanced Immunology, 2d Edition, 1991, Male et al., Grower Medical Publishing, New York.; DNA Cloning: A Practical Approach, Volumes I and II, 1985 (D. N. Glover ed.); Oligonucleotide Synthesis, 1984, (M. L. Gait ed); Transcription and Translation, 1984 (Hames and Higgins eds.); Animal Cell Culture, 1986 (R. I. Freshney ed.); Immobilized Cells and Enzymes, 1986 (IRL Press); Perbal, 1984, A Practical Guide to Molecular Cloning; and Gene Transfer Vectors for Mammalian Cells, 1987 (J. H. Miller and M. P. Calos eds., Cold Spring Harbor Laboratory);

Any suitable materials and/or methods known to those of skill can be utilized in carrying out the present invention: however preferred materials and/or methods are described. Materials, reagents and the like to which reference is made in the following description and examples are obtainable from commercial sources, unless otherwise noted.

E. faecalis Genomic Sequence

This invention provides nucleotide sequences of the genome of E. faecalis which thus comprises a DNA sequence library of E. faecalis genomic DNA. The detailed description that follows provides nucleotide sequences of E. faecalis, and also describes how the sequences were obtained and how ORFs and protein-coding sequences were identified. Also described are methods of using the disclosed E. faecalis sequences in methods including diagnostic and therapeutic applications. Furthermore, the library can be used as a database for identification and comparison of medically important sequences in this and other strains of E. faecalis.

To determine the genomic sequence of E. faecalis, DNA from strain 14336 of E. faecalis was isolated after Zymolyase digestion, sodium dodecyl sulfate lysis, potassium acetate precipitation, phenol:chloroform extractionand ethanol precipitation (Soll, D. R., T. Srikantha and S. R. Lockhart: Characterizing Developmentally Regulated Genes in E. faecalis. In Microbial Genome Methods. K. W. Adolph, editor. CRC Press. New York. p 17-37.). DNA was sheared hydrodynamically using an HPLC (Oefner, et. al., 1996) to an insert size of 2000-3000 bp. After size fractionation by gel electrophoresis the fragments were blunt-ended, ligated to adapter oligonucleotides and cloned into the pGTC (Thomann) vector to construct a “shotgun” subclone library.

DNA sequencing was achieved using established ABI sequencing methods on ABI377 automated DNA sequencers. The cloning and sequencing procedures are described in more detail in the Exemplification.

Individual sequence reads were assembled using PHRAP (P. Green, Abstracts of DOE Human Genome Program Contractor-Grantee Workshop V, January 1996, p. 157). The average contig length was about 3-4 kb.

All subsequent steps were based on sequencing by ABI377 automated DNA sequencing methods. The cloning and sequencing procedures are described in more detail in the Exemplification.

A variety of approaches are used to order the contigs so as to obtain a continuous sequence representing the entire E. faecalis genome. Synthetic oligonucleotides are designed that are complementary to sequences at the end of each contig. These oligonucleotides may be hybridized to libaries of E. faecalis genomic DNA in, for example, lambda phage vectors or plasmid vectors to identify clones that contain sequences corresponding to the junctional regions between individual contigs. Such clones are then used to isolate template DNA and the same oligonucleotides are used as primers in polymerase chain reaction (PCR) to amplify junctional fragments, the nucleotide sequence of which is then determined.

The E. faecalis sequences were analyzed for the presence of open reading frames (ORFs) comprising at least 180 nucleotides. As a result of the analysis of ORFs based on stop-to-stop codon reads, it should be understood that these ORFs may not correspond to the ORF of a naturally-occurring E. faecalis polypeptide. These ORFs may contain start codons which indicate the initiation of protein synthesis of a naturally-occurring E. faecalis polypeptide. Such start codons within the ORFs provided herein can be identified by those of ordinary skill in the relevant art, and the resulting ORF and the encoded E. faecalis polypeptide is within the scope of this invention. For example, within the ORFs a codon such as AUG or GUG (encoding methionine or valine) which is part of the initiation signal for protein synthesis can be identified and the portion of an ORF to corresponding to a naturally-occurring E. faecalis polypeptide can be recognized. The predicted coding regions were defined by evaluating the coding potential of such sequences with the program GENEMARK™ (Borodovsky and McIninch, 1993, Comp. 17:123).

Each predicted ORF amino acid sequence was compared with all sequences found in current GENBANK, SWISS-PROT, and PIR databases using the BLAST algorithrn. BLAST identifies local alignments occurring by chance between the ORF sequence and the sequence in the databank (Altschal et al., 1990, L Mol. Biol. 215:403-410). Homologous ORFs (probabilities less than 10−5 by chance) and ORF's that are probably non-homologous (probabilities greater than 10−5 by chance) but have good codon usage were identified. Both homologous, sequences and non-homologous sequences with good codon usage, are likely to encode proteins and are encompassed by the invention.

E. faecalis Nucleic Acids

The present invention provides a library of E. faecalis-derived nucleic acid sequences. The libraries provide probes, primers, and markers which are used as markers in epidemiological studies. The present invention also provides a library of E. faecalis-derived nucleic acid sequences which comprise or encode targets for therapeutic drugs.

The nucleic acids of this invention may be obtained directly from the DNA of the above referenced E. faecalis strain by using the polymerase chain reaction (PCR). See “PCR, A Practical Approach” (McPherson, Quirke, and Taylor, eds., IRL Press, Oxford, UK, 1991) for details about the PCR. High fidelity PCR can be used to ensure a faithful DNA copy prior to expression. In addition, the authenticity of amplified products can be verified by conventional sequencing methods. Clones carrying the desired sequences described in this invention may also be obtained by screening the libraries by means of the PCR or by hybridization of synthetic oligonucleotide probes to filter lifts of the library colonies or plaques as known in the art (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual 2nd edition, 1989, Cold Spring Harbor Press, NY).

It is also possible to obtain nucleic acids encoding E. faecalis polypeptides from a cDNA library in accordance with protocols herein described. A cDNA encoding an E. faecalis polypeptide can be obtained by isolating total mRNA from an appropriate strain. Double stranded cDNAs can then be prepared from the total mRNA. Subsequently, the cDNAs can be inserted into a suitable plasmid or viral (e.g., bacteriophage) vector using any one of a number of known techniques. Genes encoding E. faecalis polypeptides can also be cloned using established polymerase chain reaction techniques in accordance with the nucleotide sequence information provided by the invention. The nucleic acids of the invention can be DNA or RNA. Preferred nucleic acids of the invention are contained in the Sequence Listing.

The nucleic acids of the invention can also be chemically synthesized using standard techniques. Various methods of chemically synthesizing polydeoxynucleotides are known, including solid-phase synthesis which, like peptide synthesis, has been fully automated in commercially available DNA synthesizers (See e.g., Itakura et al. U.S. Pat. No. 4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S. Pat. Nos. 4,401,796 and 4,373,071, incorporated by reference herein).

In another example, DNA can be chemically synthesized using, e.g., the phosphoramidite solid support method of Matteucci et al., 1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et al., 1989, J. Biol. Chem. 764:17078, or other well known methods. This can be done by sequentially linking a series of oligonucleotide cassettes comprising pairs of synthetic oligonucleotides, as described below.

Nucleic acids isolated or synthesized in accordance with features of the present invention are useful, by way of example, without limitation, as probes, primers, capture ligands, antisense genes and for developing expression systems for the synthesis of proteins and peptides corresponding to such sequences. As probes, primers, capture ligands and antisense agents, the nucleic acid normally consists of all or part (approximately twenty or more nucleotides for specificity as well as the ability to form stable hybridization products) of the nucleic acids of the invention contained in the Sequence Listing. These uses are described in further detail below.

Probes

A nucleic acid isolated or synthesized in accordance with the sequence of the invention contained in the Sequence Listing can be used as a probe to specifically detect E. faecalis. With the sequence information set forth in the present application, sequences of twenty or more nucleotides are identified which provide the desired inclusivity and exclusivity with respect to E. faecalis, and extraneous nucleic acids likely to be encountered during hybridization conditions. More preferably, the sequence will comprise at least twenty to thirty nucleotides to convey stability to the hybridization product formed between the probe and the intended target molecules.

Sequences larger than 1000 nucleotides in length are difficult to synthesize but can be generated by recombinant DNA techniques. Individuals skilled in the art will readily recognize that the nucleic acids, for use as probes, can be provided with a label to facilitate detection of a hybridization product.

Nucleic acid isolated and synthesized in accordance with the sequence of the invention contained in the Sequence Listing can also be useful as probes to detect homologous regions (especially homologous genes) of other Enterococcus species using appropriate stringency hybridization conditions as described herein.

Capture Ligand

For use as a capture ligand, the nucleic acid selected in the manner described above with respect to probes, can be readily associated with a support. The manner in which nucleic acid is associated with supports is well known. Nucleic acid having twenty or more nucleotides in a sequence of the invention contained in the Sequence Listing have utility to separate E. faecalis nucleic acid from one strain from the nucleic acid of other another strain as well as from other organisms. Nucleic acid having twenty or more nucleotides in a sequence of the invention contained in the Sequence Listing can also have utility to separate other Enterococcus species from each other and from other organisms. Preferably, the sequence will comprise at least twenty nucleotides to convey stability to the hybridization product formed between the probe and the intended target molecules. Sequences larger than 1000 nucleotides in length are difficult to synthesize but can be generated by recombinant DNA techniques.

Primers

Nucleic acid isolated or synthesized in accordance with the sequences described herein have utility as primers for the amplification of E. faecalis nucleic acid. These nucleic acids may also have utility as primers for the amplification of nucleic acids in other Enterococcus species. With respect to polymerase chain reaction (PCR) techniques, nucleic acid sequences of >10-15 nucleotides of the invention contained in the Sequence Listing have utility in conjunction with suitable enzymes and reagents to create copies of E. faecalis nucleic acid. More preferably, the sequence will comprise twenty or more nucleotides to convey stability to the hybridization product formed between the primer and the intended target molecules. Binding conditions of primers greater than 100 nucleotides are more difficult to control to obtain specificity. High fidelity PCR can be used to ensure a faithful DNA copy prior to expression. In addition, amplified products can be checked by conventional sequencing methods.

The copies can be used in diagnostic assays to detect specific sequences, including genes from E. faecalis and/or other Enterococcus species. The copies can also be incorporated into cloning and expression vectors to generate polypeptides corresponding to the nucleic acid synthesized by PCR, as is described in greater detail herein. The nucleic acids of the present invention find use as templates for the recombinant production of E. faecalis-derived peptides or polypeptides.

Antisense

Nucleic acid or nucleic acid-hybridizing derivatives isolated or synthesized in accordance with the sequences described herein have utility as antisense agents to prevent the expression of E. faecalis genes. These sequences also have utility as antisense agents to prevent expression of genes of other Enterococcus species.

In one embodiment, nucleic acid or derivatives corresponding to E. faecalis nucleic acids is loaded into a suitable carrier such as a liposome or bacteriophage for introduction into bacterial cells. For example, a nucleic acid having twenty or more nucleotides is capable of binding to bacteria nucleic acid or bacteria messenger RNA. Preferably, the antisense nucleic acid is comprised of 20 or more nucleotides to provide necessary stability of a hybridization product of non-naturally occurring nucleic acid and bacterial nucleic acid and/or bacterial messenger RNA. Nucleic acid having a sequence greater than 1000 nucleotides in length is difficult to synthesize but can be generated by recombinant DNA techniques. Methods for loading antisense nucleic acid in liposomes is known in the art as exemplified by U.S. Pat. No. 4,241,046 issued Dec. 23, 1980 to Papahadjopoulos et al.

The present invention encompasses isolated polypeptides and nucleic acids derived from E. faecalis that are useful as reagents for diagnosis of bacterial infection, components of effective anti-bacterial vaccines, and/or as targets for anti-bacterial drugs, including anti-E. faecalis drugs.

The present invention also provides a genome-wide comparison by FASTA of the predicted amino acid sequences of several E. faecalis open reading frames (ORFs) with the predicted amino acid sequence of several E. faecium ORFs (Table 3). Together, E. faecalis and E. faecium account for >95% of all VRE infections. Genomic comparison of E. faecalis ORFs with E. faecium ORFs at the amino acid sequence level provides valuable information on shared targets, which can be exploited in designing diagnostics and therapeutics for VRE. Identifying common essential genes through sequencing and analysis of both genomes provides a much quicker route to these targets, and speeds the progress of (1) probe design for identification of VRE infection, (2) identification of vaccine compositions for protection from and treatment of these infections, and (3) development of screening assays for inhibitors of gene products common to both organisms. In all cases, the homology relationships described in Table 3 are highly significant. The percentage identity between the ORFs of the two organisms shown in Table 3 ranges from about 18% up to 100%. Approximately 800 ORFs show complete amino acid sequence identity between the two organisms. Many ORFs do not share significant amino acid sequence identity between these two species, and they are not shown in Table 3. Therefore, Table 3 represents a useful listing of some gene targets common in the two species.

Expression of E. faecalis Nucleic Acids

Table 2, which is appended herewith and which forms part of the present specification, provides a list of open reading frames (ORFs) in both strands and a putative identification of the particular function of a polypeptide which is encoded by each ORF, based on the homology match (determined by the BLAST algorithm) of the predicted polypeptide with known proteins encoded by ORFs in other organisms. An ORF is a region of nucleic acid which encodes a polypeptide. This region may represent a portion of a coding sequence or a total sequence and was determined from stop to stop codons. The first column contains a designation for the contig from which each ORF was identified (numbered arbitrarily). Each contig represents a continuous stretch of the genomic sequence of the organism. The second column lists the ORF designation. The third and fourth columns list the SEQ ID numbers for the nucleic acid and amino acid sequences corresponding to each ORF, respectively. The fifth and sixth columns list the length of the nucleic acid ORF and the length of the amino acid ORF, respectively. The nucleotide sequence corresponding to each ORF begins at the first nucleotide immediately following a stop codon and ends at the nucleotide immediately preceding the next downstream stop codon in the same reading frame. It will be recognized by one skilled in the art that the natural translation initiation sites will correspond to ATG, GTG, or TTG codons located within the ORFs. The natural initiation sites depend not only on the sequence of a start codon but also on the context of the DNA sequence adjacent to the start codon. Usually, a recognizable ribosome binding site is found within 20 nucleotides upstream from the initiation codon. In some cases where genes are translationally coupled and coordinately expressed together in “operons”, ribosome binding sites are not present, but the initiation codon of a downstream gene may occur very close to, or overlap, the stop codon of the an upstream gene in the same operon. The correct start codons can be generally identified without undue experimentation because only a few codons need be tested. It is recognized that the translational machinery in bacteria initiates all polypeptide chains with the amino acid methionine, regardless of the sequence of the start codon. In some cases, polypeptides are post-translationally modified, resulting in an N-terminal amino acid other than methionine in vivo. The seventh and eighth columns in Table 2 provide metrics for assessing the likelihood of the homology match (determined by the BLASTP2 algorithm), as is known in the art, to the genes indicated in the ninth column. Specifically, the seventh column represents the “score” for the match (a higher score is a better match), and the eighth column represents the “P-value” for the match (the probability that such a match could have occurred by chance; the lower the value, the more likely the match is valid). If a BLASP2 score of less than 46 was obtained, no value is reported in the table. The ninth column provides, where available, the accession number (AC) or the Swissprot accession number (SP), the organism (OR), the gene name (GN), the product name (PN), and the description (DE) or notes (NT) for each ORF. This information allows one of ordinary skill in the art to determine a potential use for each identified coding sequence and, as a result, allows to use the polypeptides of the present invention for commercial and industrial purposes consistent with the type of putative identification of the polypeptide.

Using the information provided in SEQ ID NO: 1-SEQ ID NO: 3405 and in Table 2 together with routine cloning and sequencing methods, one of ordinary skill in the art will be able to clone and sequence all the nucleic acid fragments of interest including open reading frames (ORFs) encoding a large variety proteins of E. faecalis.

Nucleic acid isolated or synthesized in accordance with the sequences described herein have utility to generate polypeptides. The nucleic acid of the invention exemplified in SEQ ID NO: 1-SEQ ID NO: 3405 and in Table 2 or fragments of said nucleic acid encoding active portions of E. faecalis polypeptides can be cloned into suitable vectors or used to isolate nucleic acid. The isolated nucleic acid is combined with suitable DNA linkers and cloned into a suitable vector.

The function of a specific gene or operon can be ascertained by expression in a bacterial strain under conditions where the activity of the gene product(s) specified by the gene or operon in question can be specifically measured. Alternatively, a gene product may be produced in large quantities in an expressing strain for use as an antigen, an industrial reagent, for structural studies, etc. This expression can be accomplished in a mutant strain which lacks the activity of the gene to be tested, or in a strain that does not produce the same gene product(s). This includes, but is not limited to, Eucaryotic species such as the yeast Saccharomyces cerevisiae, Methanobacterium strains or other Archaea, and Eubacteria such as E. coli, B. Subtilis, S. Aureus, S. Pneumonia or Pseudomonas putida. In some cases the expression host will utilize the natural E. faecalis promoter whereas in others, it will be necessary to drive the gene with a promoter sequence derived from the expressing organism (e.g., an E. coli beta-galactosidase promoter for expression in E. coli).

To express a gene product using the natural E. faecalis promoter, a procedure such as the following can be used. A restriction fragment containing the gene of interest, together with its associated natural promoter element and regulatory sequences (identified using the DNA sequence data) is cloned into an appropriate recombinant plasmid containing an origin of replication that functions in the host organism and an appropriate selectable marker. This can be accomplished by a number of procedures known to those skilled in the art. It is most preferably done by cutting the plasmid and the fragment to be cloned with the same restriction enzyme to produce compatible ends that can be ligated to join the two pieces together. The recombinant plasmid is introduced into the host organism by, for example, electroporation and cells containing the recombinant plasmid are identified by selection for the marker on the plasmid. Expression of the desired gene product is detected using an assay specific for that gene product.

In the case of a gene that requires a different promoter, the body of the gene (coding sequence) is specifically excised and cloned into an appropriate expression plasmid. This subcloning can be done by several methods, but is most easily accomplished by PCR amplification of a specific fragment and ligation into an expression plasmid after treating the PCR product with a restriction enzyme or exonuclease to create suitable ends for cloning.

A suitable host cell for expression of a gene can be any procaryotic or eucaryotic cell. Suitable methods for transforming host cells can be found in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks.

For example, a host cell transfected with a nucleic acid vector directing expression of a nucleotide sequence encoding an E. faecalis polypeptide can be cultured under appropriate conditions to allow expression of the polypeptide to occur. Suitable media for cell culture are well known in the art. Polypeptides of the invention can be isolated from cell culture medium, host cells, or both using techniques known in the art for purifying proteins including ion-exchange chromatography, gel filtration chromatography, ultrafiltration, electrophoresis, and immunoaffinity purification with antibodies specific for such polypeptides. Additionally, in many situations, polypeptides can be produced by chemical cleavage of a native protein (e.g., tryptic digestion) and the cleavage products can then be purified by standard techniques.

In the case of membrane bound proteins, these can be isolated from a host cell by contacting a membrane-associated protein fraction with a detergent forming a solubilized complex, where the membrane-associated protein is no longer entirely embedded in the membrane fraction and is solubilized at least to an extent which allows it to be chromatographically isolated from the membrane fraction. Chromatographic techniques which can be used in the final purification step are known in the art and include hydrophobic interaction, lectin affinity, ion exchange, dye affinity and immunoaffinity.

One strategy to maximize recombinant E. faecalis peptide expression in E. coli is to express the protein in a host bacteria with an impaired capacity to proteolytically cleave the recombinant protein (Gottesman, S., Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990) 119-128). Another strategy would be to alter the nucleic acid encoding an E. faecalis peptide to be inserted into an expression vector so that the individual codons for each amino acid would be those preferentially utilized in highly expressed E. coli proteins (Wada et al., (1992) Nuc. Acids Res. 20:2111-2118). Such alteration of nucleic acids of the invention can be carried out by standard DNA synthesis techniques.

The nucleic acids of the invention can also be chemically synthesized using standard techniques. Various methods of chemically synthesizing polydeoxynucleotides are known, including solid-phase synthesis which, like peptide synthesis, has been fully automated in commercially available DNA synthesizers (See, e.g., Itakura et al. U.S. Pat. No. 4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S. Pat. Nos. 4,401,796 and 4,373,071, incorporated by reference herein).

The present invention provides a library of E. faecalis-derived nucleic acid sequences. The libraries provide probes, primers, and markers which can be used as markers in epidemiological studies. The present invention also provides a library of E. faecalis-derived nucleic acid sequences which comprise or encode targets for therapeutic drugs.

Nucleic acids comprising any of the sequences disclosed herein or sub-sequences thereof can be prepared by standard methods using the nucleic acid sequence information provided in SEQ ID NO: 1-SEQ ID NO: 3405. For example, DNA can be chemically synthesized using, e.g., the phosphoramidite solid support method of Matteucci et al., 1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et al., 1989, J. Biol. Chem. 764:17078, or other well known methods. This can be done by sequentially linking a series of oligonucleotide cassettes comprising pairs of synthetic oligonucleotides, as described below.

Of course, due to the degeneracy of the genetic code, many different nucleotide sequences can encode polypeptides having the amino acid sequences defined by SEQ ID NO: 3406-SEQ ID NO: 6810 or sub-sequences thereof. The codons can be selected for optimal expression in prokaryotic or eukaryotic systems. Such degenerate variants are also encompassed by this invention.

Insertion of nucleic acids (typically DNAs) encoding the polypeptides of the invention into a vector is easily accomplished when the termini of both the DNAs and the vector comprise compatible restriction sites. If this cannot be done, it may be necessary to modify the termini of the DNAs and/or vector by digesting back single-stranded DNA overhangs generated by restriction endonuclease cleavage to produce blunt ends, or to achieve the same result by filling in the single-stranded termini with an appropriate DNA polymerase.

Alternatively, any site desired may be produced, e.g., by ligating nucleotide sequences (linkers) onto the termini. Such linkers may comprise specific oligonucleotide sequences that define desired restriction sites. Restriction sites can also be generated by the use of the polymerase chain reaction (PCR). See, e.g., Saiki et al., 1988, Science 239:48. The cleaved vector and the DNA fragments may also be modified if required by homopolymeric tailing.

The nucleic acids of the invention may be isolated directly from cells. Alternatively, the polymerase chain reaction (PCR) method can be used to produce the nucleic acids of the invention, using either chemically synthesized strands or genomic material as templates. Primers used for PCR can be synthesized using the sequence information provided herein and can further be designed to introduce appropriate new restriction sites, if desirable, to facilitate incorporation into a given vector for recombinant expression.

The nucleic acids of the present invention may be flanked by natural E. faecalis regulatory sequences, or may be associated with heterologous sequences, including promoters, enhancers, response elements, signal sequences, polyadenylation sequences, introns, 5′- and 3′-noncoding regions, and the like. The nucleic acids may also be modified by many means known in the art. Non-limiting examples of such modifications include methylation, “caps”, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.). Nucleic acids may contain one or more additional covalently linked moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, oxidative metals, etc.), and alkylators. PNAs are also included. The nucleic acid may be derivatized by formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the nucleic acid sequences of the present invention may also be modified with a label capable of providing a detectable signal, either directly or indirectly. Exemplary labels include radioisotopes, fluorescent molecules, biotin, and the like.

The invention also provides nucleic acid vectors comprising the disclosed E. faecalis-derived sequences or derivatives or fragments thereof. A large number of vectors, including plasmid and bacterial vectors, have been described for replication and/or expression in a variety of eukaryotic and prokaryotic hosts, and may be used for cloning or protein expression.

The encoded E. faecalis polypeptides may be expressed by using many known vectors, such as pUC plasmids, pET plasmids (Novagen, Inc., Madison, Wis.), or pRSET or pREP (Invitrogen, San Diego, Calif.), and many appropriate host cells, using methods disclosed or cited herein or otherwise known to those skilled in the relevant art. The particular choice of vector/host is not critical to the practice of the invention.

Recombinant cloning vectors will often include one or more replication systems for cloning or expression, one or more markers for selection in the host, e.g. antibiotic resistance, and one or more expression cassettes. The inserted E. faecalis coding sequences may be synthesized by standard methods, isolated from natural sources, or prepared as hybrids, etc. Ligation of the E. faecalis coding sequences to transcriptional regulatory elements and/or to other amino acid coding sequences may be achieved by known methods. Suitable host cells may be transformed/transfected/infected as appropriate by any suitable method including electroporation, CaCl2 mediated DNA uptake, bacterial infection, microinjection, microprojectile, or other established methods.

Appropriate host cells include bacteria, archebacteria, fungi, especially yeast, and plant and animal cells, especially mammalian cells. Of particular interest are E. faecalis, E. coli, B. Subtilis, Saccharomyces cerevisiae, Saccharomyces carlsbergensis, Schizosaccharomyces pombi, SF9 cells, C129 cells, 293 cells, Neurospora, and CHO cells, COS cells, HeLa cells, and immortalized mammalian myeloid and lymphoid cell lines. Preferred replication systems include M13, ColE1, SV40, baculovirus, lambda, adenovirus, and the like. A large number of transcription initiation and termination regulatory regions have been isolated and shown to be effective in the transcription and translation of heterologous proteins in the various hosts. Examples of these regions, methods of isolation, manner of manipulation, etc. are known in the art. Under appropriate expression conditions, host cells can be used as a source of recombinantly produced E. faecalis-derived peptides and polypeptides.

Advantageously, vectors may also include a transcription regulatory element (i.e., a promoter) operably linked to the E. faecalis portion. The promoter may optionally contain operator portions and/or ribosome binding sites. Non-limiting examples of bacterial promoters compatible with E. coli include: b-lactamase (penicillinase) promoter; lactose promoter; tryptophan (trp) promoter; araBAD (arabinose) operon promoter; lambda-derived P1 promoter and N gene ribosome binding site; and the hybrid tac promoter derived from sequences of the trp and lac UV5 promoters. Non-limiting examples of yeast promoters include 3-phosphoglycerate kinase promoter, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) promoter, galactokinase (GAL1) promoter, galactoepimerase promoter, and alcohol dehydrogenase (ADH) promoter. Suitable promoters for mammalian cells include without limitation viral promoters such as that from Simian Virus 40 (SV40), Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papilloma virus (BPV). Mammalian cells may also require terminator sequences, polyA addition sequences and enhancer sequences to increase expression. Sequences which cause amplification of the gene may also be desirable. Furthermore, sequences that facilitate secretion of the recombinant product from cells, including, but not limited to, bacteria, yeast, and animal cells, such as secretory signal sequences and/or prohormone pro region sequences, may also be included. These sequences are well described in the art.

Nucleic acids encoding wild-type or variant E. faecalis-derived polypeptides may also be introduced into cells by recombination events. For example, such a sequence can be introduced into a cell, and thereby effect homologous recombination at the site of an endogenous gene or a sequence with substantial identity to the gene. Other recombination-based methods such as nonhomologous recombinations or deletion of endogenous genes by homologous recombination may also be used.

The nucleic acids of the present invention find use as templates for the recombinant production of E. faecalis-derived peptides or polypeptides.

Identification and Use of E. faecalis Nucleic Acid Sequences

The disclosed E. faecalis polypeptide and nucleic acid sequences, or other sequences that are contained within ORFs, including complete protein-coding sequences, of which any of the disclosed E. faecalis-specific sequences forms a part, are useful as target components for diagnosis and/or treatment of E. faecalis-caused infection.

It will be understood that the sequence of an entire protein-coding sequence of which each disclosed nucleic acid sequence forms a part can be isolated and identified based on each disclosed sequence. This can be achieved, for example, by using an isolated nucleic acid encoding the disclosed sequence, or fragments thereof, to prime a sequencing reaction with genomic E. faecalis DNA as template; this is followed by sequencing the amplified product. The isolated nucleic acid encoding the disclosed sequence, or fragments thereof, can also be hybridized to E. faecalis genomic libraries to identify clones containing additional complete segments of the protein-coding sequence of which the shorter sequence forms a part. Then, the entire protein-coding sequence, or fragments thereof, or nucleic acids encoding all or part of the sequence, or sequence-conservative or function-conservative variants thereof, may be employed in practicing the present invention.

Preferred sequences are those that are useful in diagnostic and/or therapeutic applications. Diagnostic applications include without limitation nucleic-acid-based and antibody-based methods for detecting bacterial infection. Therapeutic applications include without limitation vaccines, passive immunotherapy, and drug treatments directed against gene products that are both unique to bacteria and essential for growth and/or replication of bacteria.

Identification of Nucleic Acids Encoding Vaccine Components and Targets for Agents Effective against E. faecalis

The disclosed E. faecalis genome sequence includes segments that direct the synthesis of ribonucleic acids and polypeptides, as well as origins of replication, promoters, other types of regulatory sequences, and intergenic nucleic acids. The invention encompasses nucleic acids encoding immunogenic components of vaccines and targets for agents effective against E. faecalis. Identification of said immunogenic components involved in the determination of the function of the disclosed sequences, which can be achieved using a variety of approaches. Non-limiting examples of these approaches are described briefly below.

Homology to Known Sequences

Computer-assisted comparison of the disclosed E. faecalis sequences with previously reported sequences present in publicly available databases is useful for identifying functional E. faecalis nucleic acid and polypeptide sequences. It will be understood that protein-coding sequences, for example, may be compared as a whole, and that a high degree of sequence homology between two proteins (such as, for example, >80-90%) at the amino acid level indicates that the two proteins also possess some degree of functional homology, such as, for example, among enzymes involved in metabolism, DNA synthesis, or cell wall synthesis, and proteins involved in transport, cell division, etc. In addition, many structural features of particular protein classes have been identified and correlate with specific consensus sequences, such as, for example, binding domains for nucleotides, DNA, metal ions, and other small molecules; sites for covalent modifications such as phosphorylation, acylation, and the like; sites of protein:protein interactions, etc. These consensus sequences may be quite short and thus may represent only a fraction of the entire protein-coding sequence. Identification of such a feature in an E. faecalis sequence is therefore useful in determining the function of the encoded protein and identifying useful targets of antibacterial drugs.

Of particular relevance to the present invention are structural features that are common to secretory, transmembrane, and surface proteins, including secretion signal peptides and hydrophobic transmembrane domains. E. faecalis proteins identified as containing putative signal sequences and/or transmembrane domains are useful as immunogenic components of vaccines.

Targets for therapeutic drugs according to the invention include, but are not limited to, polypeptides of the invention, whether unique to E. faecalis or not, that are essential for growth and/or viability of E. faecalis under at least one growth condition. Polypeptides essential for growth and/or viability can be determined by examining the effect of deleting and/or disrupting the genes, i.e., by so-called gene “knockout”. Alternatively, genetic footprinting can be used (Smith et al., 1995, Proc. Natl. Acad. Sci. USA 92:5479-6433; Published International Application WO 94/26933; U.S. Pat. No. 5,612,180). Still other methods for assessing essentiality includes the ability to isolate conditional lethal mutations in the specific gene (e.g., temperature sensitive mutations). Other useful targets for therapeutic drugs, which include polypeptides that are not essential for growth or viability per se but lead to loss of viability of the cell, can be used to target therapeutic agents to cells.

Strain-specific Sequences

Because of the evolutionary relationship between different E. faecalis strains, it is believed that the presently disclosed E. faecalis sequences are useful for identifying, and/or discriminating between, previously known and new E. faecalis strains. It is believed that other E. faecalis strains will exhibit at least 70% sequence homology with the presently disclosed sequence. Systematic and routine analyses of DNA sequences derived from samples containing E. faecalis strains, and comparison with the present sequence allows for the identification of sequences that can be used to discriminate between strains, as well as those that are common to all E. faecalis strains. In one embodiment, the invention provides nucleic acids, including probes, and peptide and polypeptide sequences that discriminate between different strains of E. faecalis. Strain-specific components can also be identified functionally by their ability to elicit or react with antibodies that selectively recognize one or more E. faecalis strains.

In another embodiment, the invention provides nucleic acids, including probes, and peptide and polypeptide sequences that are common to all E. faecalis strains but are not found in other bacterial species.

E. faecalis Polypeptides

This invention encompasses isolated E. faecalis polypeptides encoded by the disclosed E. faecalis genomic sequences, including the polypeptides of the invention contained in the Sequence Listing. Polypeptides of the invention are preferably at least 5 amino acid residues in length. Using the DNA sequence information provided herein, the amino acid sequences of the polypeptides encompassed by the invention can be deduced using methods well-known in the art. It will be understood that the sequence of an entire nucleic acid encoding an E. faecalis polypeptide can be isolated and identified based on an ORF that encodes only a fragment of the cognate protein-coding region. This can be achieved, for example, by using the isolated nucleic acid encoding the ORF, or fragments thereof, to prime a polymerase chain reaction with genomic E. faecalis DNA as template; this is followed by sequencing the amplified product.

The polypeptides of the present invention, including function-conservative variants of the disclosed ORFs, may be isolated from wild-type or mutant E. faecalis cells, or from heterologous organisms or cells (including, but not limited to, bacteria, fungi, insect, plant, and mammalian cells) including E. faecalis into which a E. faecalis-derived protein-coding sequence has been introduced and expressed. Furthermore, the polypeptides may be part of recombinant fusion proteins.

E. faecalis polypeptides of the invention can be chemically synthesized using commercially automated procedures such as those referenced herein, including, without limitation, exclusive solid phase synthesis, partial solid phase methods, fragment condensation or classical solution synthesis. The polypeptides are preferably prepared by solid phase peptide synthesis as described by Merrifield, 1963, J. Am. Chem. Soc. 85:2149. The synthesis is carried out with amino acids that are protected at the alpha-amino terminus. Trifunctional amino acids with labile side-chains are also protected with suitable groups to prevent undesired chemical reactions from occurring during the assembly of the polypeptides. The alpha-amino protecting group is selectively removed to allow subsequent reaction to take place at the amino-terminus. The conditions for the removal of the alpha-amino protecting group do not remove the side-chain protecting groups.

Methods for polypeptide purification are well-known in the art, including, without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, and countercurrent distribution. For some purposes, it is preferable to produce the polypeptide in a recombinant system in which the E. faecalis protein contains an additional sequence tag that facilitates purification, such as, but not limited to, a polyhistidine sequence. The polypeptide can then be purified from a crude lysate of the host cell by chromatography on an appropriate solid-phase matrix. Alternatively, antibodies produced against a E. faecalis protein or against peptides derived therefrom can be used as purification reagents. Other purification methods are possible.

The present invention also encompasses derivatives and homologues of E. faecalis-encoded polypeptides. For some purposes, nucleic acid sequences encoding the peptides may be altered by substitutions, additions, or deletions that provide for functionally equivalent molecules, i.e., function-conservative variants. For example, one or more amino acid residues within the sequence can be substituted by another amino acid of similar properties, such as, for example, positively charged amino acids (arginine, lysine, and histidine); negatively charged amino acids (aspartate and glutamate); polar neutral amino acids; and non-polar amino acids.

The isolated polypeptides may be modified by, for example, phosphorylation, sulfation, acylation, or other protein modifications. They may also be modified with a label capable of providing a detectable signal, either directly or indirectly, including, but not limited to, radioisotopes and fluorescent compounds.

To identify E. faecalis-derived polypeptides for use in the present invention, essentially the complete genomic sequence of a virulent, methicillin-resistant isolate of Enterococcus faecalise isolate was analyzed. While, in very rare instances, a nucleic acid sequencing error may be revealed, resolving a rare sequencing error is well within the art, and such an occurrence will not prevent one skilled in the art from practicing the invention.

Also encompassed are any E. faecalis polypeptide sequences that are contained within the open reading frames (ORFs), including complete protein-coding sequences, of which any of SEQ ID NO: 1-SEQ ID NO: 3405 forms a part. Table 2, which is appended herewith and which forms part of the present specification, provides a putative identification of the particular function of a polypeptide which is encoded by each ORF, based on the homology match (determined by the BLAST algorithm) of the predicted polypeptide with known proteins encoded by ORFs in other organisms. As a result, one skilled in the art can use the polypeptides of the present invention for commercial and industrial purposes consistent with the type of putative identification of the polypeptide.

The present invention provides a library of E. faecalis-derived polypeptide sequences, and a corresponding library of nucleic acid sequences encoding the polypeptides, wherein the polypeptides themselves, or polypeptides contained within ORFs of which they form a part, comprise sequences that are contemplated for use as components of vaccines. Non-limiting examples of such sequences are listed by SEQ ID NO in Table 2, which is appended herewith and which forms part of the present specification.

The present invention also provides a library of E. faecalis-derived polypeptide sequences, and a corresponding library of nucleic acid sequences encoding the polypeptides, wherein the polypeptides themselves, or polypeptides contained within ORFs of which they form a part, comprise sequences lacking homology to any known prokaryotic or eukaryotic sequences. Such libraries provide probes, primers, and markers which can be used to diagnose E. faecalis infection, including use as markers in epidemiological studies. Non-limiting examples of such sequences are listed by SEQ ID NO in Table 2, which is appended.

The present invention also provides a library of E. faecalis-derived polypeptide sequences, and a corresponding library of nucleic acid sequences encoding the polypeptides, wherein the polypeptides themselves, or polypeptides contained within ORFs of which they form a part, comprise targets for therapeutic drugs.

Specific Example: Determination of Enterococcus Protein Antigens for Antibody and Vaccine Development

The selection of Enterococcus protein antigens for vaccine development can be derived from the nucleic acids encoding E. faecalis polypeptides. First, the ORF's can be analyzed for homology to other known exported or membrane proteins and analyzed using the discriminant analysis described by Klein, et al. (Klein, P., Kanehsia, M., and DeLisi, C. (1985) Biochimica et Biophysica Acta 815, 468-476) for predicting exported and membrane proteins.

Homology searches can be performed using the BLAST algorithm contained in the Wisconsin Sequence Analysis Package (Genetics Computer Group, University Research Park, 575 Science Drive, Madison, Wis. 53711) to compare each predicted ORF amino acid sequence with all sequences found in the current GenBank, SWISS-PROT and PIR databases. BLAST searches for local alignments between the ORF and the databank sequences and reports a probability score which indicates the probability of finding this sequence by chance in the database. ORF's with significant homology (e.g. probabilities lower than 1×10−6 that the homology is only due to random chance) to membrane or exported proteins represent protein antigens for vaccine development. Possible functions can be provided to E. faecalis genes based on sequence homology to genes cloned in other organisms.

Discriminant analysis (Klein, et al. supra) can be used to examine the ORF amino acid sequences. This algorithm uses the intrinsic information contained in the ORF amino acid sequence and compares it to information derived from the properties of known membrane and exported proteins. This comparison predicts which proteins will be exported, membrane associated or cytoplasmic. ORF amino acid sequences identified as exported or membrane associated by this algorithm are likely protein antigens for vaccine development.

Production of Fragments and Analogs of E. faecalis Nucleic Acids and Polypeptides

Based on the discovery of the E. faecalis gene products of the invention provided in the Sequence Listing, one skilled in the art can alter the disclosed structure (of E. faecalis genes), e.g., by producing fragments or analogs, and test the newly produced structures for activity. Examples of techniques known to those skilled in the relevant art which allow the production and testing of fragments and analogs are discussed below. These, or analogous methods can be used to make and screen libraries of polypeptides, e.g., libraries of random peptides or libraries of fragments or analogs of cellular proteins for the ability to bind E. faecalis polypeptides. Such screens are useful for the identification of inhibitors of E. faecalis.

Generation of Fragments

Fragments of a protein can be produced in several ways, e.g., recombinantly, by proteolytic digestion, or by chemical synthesis. Internal or terminal fragments of a polypeptide can be generated by removing one or more nucleotides from one end (for a terminal fragment) or both ends (for an internal fragment) of a nucleic acid which encodes the polypeptide. Expression of the mutagenized DNA produces polypeptide fragments. Digestion with “end-nibbling” endonucleases can thus generate DNA's which encode an array of fragments. DNA's which encode fragments of a protein can also be generated by random shearing, restriction digestion or a combination of the above-discussed methods.

Fragments can also be chemically synthesized using techniques known in the art such as conventional Merrifield solid phase f-Moc or t-Boc chemistry. For example, peptides of the present invention may be arbitrarily divided into fragments of desired length with no overlap of the fragments, or divided into overlapping fragments of a desired length.

Alteration of Nucleic Acids and Polypeptides: Random Methods

Amino acid sequence variants of a protein can be prepared by random mutagenesis of DNA which encodes a protein or a particular domain or region of a protein. Useful methods include PCR mutagenesis and saturation mutagenesis. A library of random amino acid sequence variants can also be generated by the synthesis of a set of degenerate oligonucleotide sequences. (Methods for screening proteins in a library of variants are elsewhere herein).

PCR Mutagenesis

In PCR mutagenesis, reduced Taq polymerase fidelity is used to introduce random mutations into a cloned fragment of DNA (Leung et al., 1989, Technique 1:11-15). The DNA region to be mutagenized is amplified using the polymerase chain reaction (PCR) under conditions that reduce the fidelity of DNA synthesis by Taq DNA polymerase, e.g., by using a dGTP/dATP ratio of five and adding Mn2+ to the PCR reaction. The pool of amplified DNA fragments are inserted into appropriate cloning vectors to provide random mutant libraries.

Saturation Mutagenesis

Saturation mutagenesis allows for the rapid introduction of a large number of single base substitutions into cloned DNA fragments (Mayers et al., 1985, Science 229:242). This technique includes generation of mutations, e.g., by chemical treatment or irradiation of single-stranded DNA in vitro, and synthesis of a complimentary DNA strand. The mutation frequency can be modulated by modulating the severity of the treatment, and essentially all possible base substitutions can be obtained. Because this procedure does not involve a genetic selection for mutant fragments both neutral substitutions, as well as those that alter function, are obtained. The distribution of point mutations is not biased toward conserved sequence elements.

Degenerate Oligonucleotides

A library of homologs can also be generated from a set of degenerate oligonucleotide sequences. Chemical synthesis of a degenerate sequences can be carried out in an automatic DNA synthesizer, and the synthetic genes then ligated into an appropriate expression vector. The synthesis of degenerate oligonucleotides is known in the art (see for example, Narang, S A (1983) Tetrahedron 39:3; Itakura et al. (1981) Recombinant DNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. A G Walton, Amsterdam: Elsevier pp273-289; Itakura et al. (1984) Annu. Rev. Biochem. 53:323; Itakura et al. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 11:477. Such techniques have been employed in the directed evolution of other proteins (see, for example, Scott et al. (1990) Science 249:386-390; Roberts et al. (1992) PNAS 89:2429-2433; Devlin et al. (1990) Science 249: 404-406; Cwirla et al. (1990) PNAS 87: 6378-6382; as well as U.S. Pat. Nos. 5,223,409, 5,198,346, and 5,096,815).

Alteration of Nucleic Acids and Polypeptides: Methods for Directed Mutagenesis

Non-random or directed, mutagenesis techniques can be used to provide specific sequences or mutations in specific regions. These techniques can be used to create variants which include, e.g., deletions, insertions, or substitutions, of residues of the known amino acid sequence of a protein. The sites for mutation can be modified individually or in series, e.g., by (1) substituting first with conserved amino acids and then with more radical choices depending upon results achieved, (2) deleting the target residue, or (3) inserting residues of the same or a different class adjacent to the located site, or combinations of options 1-3.

Alanine Scanning Mutagenesis

Alanine scanning mutagenesis is a useful method for identification of certain residues or regions of the desired protein that are preferred locations or domains for mutagenesis, Cunningham and Wells (Science 244:1081-1085, 1989). In alanine scanning, a residue or group of target residues are identified (e.g., charged residues such as Arg, Asp, His, Lys, and Glu) and replaced by a neutral or negatively charged amino acid (most preferably alanine or polyalanine). Replacement of an amino acid can affect the interaction of the amino acids with the surrounding aqueous environment in or outside the cell. Those domains demonstrating functional sensitivity to the substitutions are then refined by introducing further or other variants at or for the sites of substitution. Thus, while the site for introducing an amino acid sequence variation is predetermined, the nature of the mutation per se need not be predetermined. For example, to optimize the performance of a mutation at a given site, alanine scanning or random mutagenesis may be conducted at the target codon or region and the expressed desired protein subunit variants are screened for the optimal combination of desired activity.

Oligonucleotide-Mediated Mutagenesis

Oligonucleotide-mediated mutagenesis is a useful method for preparing substitution, deletion, and insertion variants of DNA, see, e.g., Adelman et al., (DNA 2:183, 1983). Briefly, the desired DNA is altered by hybridizing an oligonucleotide encoding a mutation to a DNA template, where the template is the single-stranded form of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the desired protein. After hybridization, a DNA polymerase is used to synthesize an entire second complementary strand of the template that will thus incorporate the oligonucleotide primer, and will code for the selected alteration in the desired protein DNA. Generally, oligonucleotides of at least 25 nucleotides in length are used. An optimal oligonucleotide will have 12 to 15 nucleotides that are completely complementary to the template on either side of the nucleotide(s) coding for the mutation. This ensures that the oligonucleotide will hybridize properly to the single-stranded DNA template molecule. The oligonucleotides are readily synthesized using techniques known in the art such as that described by Crea et al. (Proc. Natl. Acad. Sci. USA, 75: 5765 [1978]).

Cassette Mutagenesis

Another method for preparing variants, cassette mutagenesis, is based on the technique described by Wells et al. (Gene, 34:315[1985]). The starting material is a plasmid (or other vector) which includes the protein subunit DNA to be mutated. The codon(s) in the protein subunit DNA to be mutated are identified. There must be a unique restriction endonuclease site on each side of the identified mutation site(s). If no such restriction sites exist, they may be generated using the above-described oligonucleotide-mediated mutagenesis method to introduce them at appropriate locations in the desired protein subunit DNA. After the restriction sites have been introduced into the plasmid, the plasmid is cut at these sites to linearize it. A double-stranded oligonucleotide encoding the sequence of the DNA between the restriction sites but containing the desired mutation(s) is synthesized using standard procedures. The two strands are synthesized separately and then hybridized together using standard techniques. This double-stranded oligonucleotide is referred to as the cassette. This cassette is designed to have 3′ and 5′ ends that are comparable with the ends of the linearized plasmid, such that it can be directly ligated to the plasmid. This plasmid now contains the mutated desired protein subunit DNA sequence.

Combinatorial Mutagenesis

Combinatorial mutagenesis can also be used to generate mutants (Ladner et al., WO 88/06630). In this method, the amino acid sequences for a group of homologs or other related proteins are aligned, preferably to promote the highest homology possible. All of the amino acids which appear at a given position of the aligned sequences can be selected to create a degenerate set of combinatorial sequences. The variegated library of variants is generated by combinatorial mutagenesis at the nucleic acid level, and is encoded by a variegated gene library. For example, a mixture of synthetic oligonucleotides can be enzymatically ligated into gene sequences such that the degenerate set of potential sequences are expressible as individual peptides, or alternatively, as a set of larger fusion proteins containing the set of degenerate sequences.

Other Modifications of E. faecalis Nucleic Acids and Polypeptides

It is possible to modify the structure of an E. faecalis polypeptide for such purposes as increasing solubility, enhancing stability (e.g., shelf life ex vivo and resistance to proteolytic degradation in vivo). A modified E. faecalis protein or peptide can be produced in which the amino acid sequence has been altered, such as by amino acid substitution, deletion, or addition as described herein.

An E. faecalis peptide can also be modified by substitution of cysteine residues preferably with alanine, serine, threonine, leucine or glutamic acid residues to minimize dimerization via disulfide linkages. In addition, amino acid side chains of fragments of the protein of the invention can be chemically modified. Another modification is cyclization of the peptide.

In order to enhance stability and/or reactivity, an E. faecalis polypeptide can be modified to incorporate one or more polymorphisms in the amino acid sequence of the protein resulting from any natural allelic variation. Additionally, D-amino acids, non-natural amino acids, or non-amino acid analogs can be substituted or added to produce a modified protein within the scope of this invention. Furthermore, an E. faecalis polypeptide can be modified using polyethylene glycol (PEG) according to the method of A. Sehon and co-workers (Wie et al., supra) to produce a protein conjugated with PEG. In addition, PEG can be added during chemical synthesis of the protein. Other modifications of E. faecalis proteins include reduction/alkylation (Tarr, Methods of Protein Microcharacterization, J. E. Silver ed., Humana Press, Clifton N.J. 155-194 (1986)); acylation (Tarr, supra); chemical coupling to an appropriate carrier (Mishell and Shiigi, eds, Selected Methods in Cellular Immunology, WH Freeman, San Francisco, Calif. (1980), U.S. Pat. No. 4,939,239; or mild formalin treatment (Marsh, (1971) Int. Arch. of Allergy and Appl. Immunol., 41: 199-215).

To facilitate purification and potentially increase solubility of an E. faecalis protein or peptide, it is possible to add an amino acid fusion moiety to the peptide backbone. For example, hexa-histidine can be added to the protein for purification by immobilized metal ion affinity chromatography (Hochuli, E. et al., (1988) Bio/Technology, 6: 1321-1325). In addition, to facilitate isolation of peptides free of irrelevant sequences, specific endoprotease cleavage sites can be introduced between the sequences of the fusion moiety and the peptide.

To potentially aid proper antigen processing of epitopes within an E. faecalis polypeptide, canonical protease sensitive sites can be engineered between regions, each comprising at least one epitope via recombinant or synthetic methods. For example, charged amino acid pairs, such as KK or RR, can be introduced between regions within a protein or fragment during recombinant construction thereof. The resulting peptide can be rendered sensitive to cleavage by cathepsin and/or other trypsin-like enzymes which would generate portions of the protein containing one or more epitopes. In addition, such charged amino acid residues can result in an increase in the solubility of the peptide.

Primary Methods for Screening Polypeptides and Analogs

Various techniques are known in the art for screening generated mutant gene products. Techniques for screening large gene libraries often include cloning the gene library into replicable expression vectors, transforming appropriate cells with the resulting library of vectors, and expressing the genes under conditions in which detection of a desired activity, e.g., in this case, binding to E. faecalis polypeptide or an interacting protein, facilitates relatively easy isolation of the vector encoding the gene whose product was detected. Each of the techniques described below is amenable to high through-put analysis for screening large numbers of sequences created, e.g., by random mutagenesis techniques.

Two Hybrid Systems

Two hybrid assays such as the system described below (as with the other screening methods described herein), can be used to identify polypeptides, e.g., fragments or analogs of a naturally-occurring E. faecalis polypeptide, e.g., of cellular proteins, or of randomly generated polypeptides which bind to an E. faecalis protein. (The E. faecalis domain is used as the bait protein and the library of variants are expressed as prey fusion proteins.) In an analogous fashion, a two hybrid assay (as with the other screening methods described herein), can be used to find polypeptides which bind a E. faecalis polypeptide.

Display Libraries

In one approach to screening assays, the Enterococcus peptides are displayed on the surface of a cell or viral particle, and the ability of particular cells or viral particles to bind an appropriate receptor protein via the displayed product is detected in a “panning assay”. For example, the gene library can be cloned into the gene for a surface membrane protein of a bacterial cell, and the resulting fusion protein detected by panning (Ladner et al., WO 88/06630; Fuchs et al. (1991) Bio/Technology 9:1370-1371; and Goward et al. (1992) TIBS 18:136-140). In a similar fashion, a detectably labeled ligand can be used to score for potentially functional peptide homologs. Fluorescently labeled ligands, e.g., receptors, can be used to detect homologs which retain ligand-binding activity. The use of fluorescently labeled ligands, allows cells to be visually inspected and separated under a fluorescence microscope, or, where the morphology of the cell permits, to be separated by a fluorescence-activated cell sorter.

A gene library can be expressed as a fusion protein on the surface of a viral particle. For instance, in the filamentous phage system, foreign peptide sequences can be expressed on the surface of infectious phage, thereby conferring two significant benefits. First, since these phage can be applied to affinity matrices at concentrations well over 1013 phage per milliliter, a large number of phage can be screened at one time. Second, since each infectious phage displays a gene product on its surface, if a particular phage is recovered from an affinity matrix in low yield, the phage can be amplified by another round of infection. The group of almost identical E. coli filamentous phages M13, fd., and fl are most often used in phage display libraries. Either of the phage gIII or gVIII coat proteins can be used to generate fusion proteins without disrupting the ultimate packaging of the viral particle. Foreign epitopes can be expressed at the NH2-terminal end of pIII and phage bearing such epitopes recovered from a large excess of phage lacking this epitope (Ladner et al. PCT publication WO 90/02909; Garrard et al., PCT publication WO 92/09690; Marks et al. (1992) J. Biol. Chem. 267:16007-16010; Griffiths et al. (1993) EMBO J 12:725-734; Clackson et al. (1991) Nature 352:624-628; and Barbas et al. (1992) PNAS 89:4457-4461).

A common approach uses the maltose receptor of E. coli (the outer membrane protein, LamB) as a peptide fusion partner (Charbit et al. (1986) EMBO 5, 3029-3037). Oligonucleotides have been inserted into plasmids encoding the LamB gene to produce peptides fused into one of the extracellular loops of the protein. These peptides are available for binding to ligands, e.g., to antibodies, and can elicit an immune response when the cells are administered to animals. Other cell surface proteins, e.g., OmpA (Schorr et al. (1991) Vaccines 91, pp. 387-392), PhoE (Agterberg, et al. (1990) Gene 88, 37-45), and PAL (Fuchs et al. (1991) Bio/Tech 9, 1369-1372), as well as large bacterial surface structures have served as vehicles for peptide display. Peptides can be fused to pilin, a protein which polymerizes to form the pilus-a conduit for interbacterial exchange of genetic information (Thiry et al. (1989) Appl. Environ. Microbiol. 55, 984-993). Because of its role in interacting with other cells, the pilus provides a useful support for the presentation of peptides to the extracellular environment. Another large surface structure used for peptide display is the bacterial motive organ, the flagellum. Fusion of peptides to the subunit protein flagellin offers a dense array of many peptide copies on the host cells (Kuwajima et al. (1988) Bio/Tech. 6, 1080-1083). Surface proteins of other bacterial species have also served as peptide fusion partners. Examples include the Staphylococcus protein A and the outer membrane IgA protease of Neisseria (Hansson et al. (1992) J. Bacteriol. 174, 4239-4245 and Klauser et al. (1990) EMBO J. 9, 1991-1999).

In the filamentous phage systems and the LamB system described above, the physical link between the peptide and its encoding DNA occurs by the containment of the DNA within a particle (cell or phage) that carries the peptide on its surface. Capturing the peptide captures the particle and the DNA within. An alternative scheme uses the DNA-binding protein LacI to form a link between peptide and DNA (Cull et al. (1992) PNAS USA 89:1865-1869). This system uses a plasmid containing the LacI gene with an oligonucleotide cloning site at its 3′-end. Under the controlled induction by arabinose, a LacI-peptide fusion protein is produced. This fusion retains the natural ability of LacI to bind to a short DNA sequence known as LacO operator (LacO). By installing two copies of LacO on the expression plasmid, the LacI-peptide fusion binds tightly to the plasmid that encoded it. Because the plasmids in each cell contain only a single oligonucleotide sequence and each cell expresses only a single peptide sequence, the peptides become specifically and stablely associated with the DNA sequence that directed its synthesis. The cells of the library are gently lysed and the peptide-DNA complexes are exposed to a matrix of immobilized receptor to recover the complexes containing active peptides. The associated plasmid DNA is then reintroduced into cells for amplification and DNA sequencing to determine the identity of the peptide ligands. As a demonstration of the practical utility of the method, a large random library of dodecapeptides was made and selected on a monoclonal antibody raised against the opioid peptide dynorphin B. A cohort of peptides was recovered, all related by a consensus sequence corresponding to a six-residue portion of dynorphin B. (Cull et al. (1992) Proc. Natl. Acad. Sci. U.S.A. 89-1869)

This scheme, sometimes referred to as peptides-on-plasmids, differs in two important ways from the phage display methods. First, the peptides are attached to the C-terminus of the fusion protein, resulting in the display of the library members as peptides having free carboxy termini. Both of the filamentous phage coat proteins, pIII and pVIII, are anchored to the phage through their C-termini, and the guest peptides are placed into the outward-extending N-terminal domains. In some designs, the phage-displayed peptides are presented right at the amino terminus of the fusion protein. (Cwirla, et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 6378-6382) A second difference is the set of biological biases affecting the population of peptides actually present in the libraries. The LacI fusion molecules are confined to the cytoplasm of the host cells. The phage coat fusions are exposed briefly to the cytoplasm during translation but are rapidly secreted through the inner membrane into the periplasmic compartment, remaining anchored in the membrane by their C-terminal hydrophobic domains, with the N-termini, containing the peptides, protruding into the periplasm while awaiting assembly into phage particles. The peptides in the LacI and phage libraries may differ significantly as a result of their exposure to different proteolytic activities. The phage coat proteins require transport across the inner membrane and signal peptidase processing as a prelude to incorporation into phage. Certain peptides exert a deleterious effect on these processes and are underrepresented in the libraries (Gallop et al. (1994) J. Med. Chem. 37(9):1233-1251). These particular biases are not a factor in the LacI display system.

The number of small peptides available in recombinant random libraries is enormous. Libraries of 107-109 independent clones are routinely prepared. Libraries as large as 1011 recombinants have been created, but this size approaches the practical limit for clone libraries. This limitation in library size occurs at the step of transforming the DNA containing randomized segments into the host bacterial cells. To circumvent this limitation, an in vitro system based on the display of nascent peptides in polysome complexes has recently been developed. This display library method has the potential of producing libraries 3-6 orders of magnitude larger than the currently available phage/phagemid or plasmid libraries. Furthermore, the construction of the libraries, expression of the peptides, and screening, is done in an entirely cell-free format.

In one application of this method (Gallop et al. (1994) J. Med. Chem. 37(9):1233-1251), a molecular DNA library encoding 1012 decapeptides was constructed and the library expressed in an E. coli S30 in vitro coupled transcription/translation system. Conditions were chosen to stall the ribosomes on the mRNA, causing the accumulation of a substantial proportion of the RNA in polysomes and yielding complexes containing nascent peptides still linked to their encoding RNA. The polysomes are sufficiently robust to be affinity purified on immobilized receptors in much the same way as the more conventional recombinant peptide display libraries are screened. RNA from the bound complexes is recovered, converted to cDNA, and amplified by PCR to produce a template for the next round of synthesis and screening. The polysome display method can be coupled to the phage display system. Following several rounds of screening, cDNA from the enriched pool of polysomes was cloned into a phagemid vector. This vector serves as both a peptide expression vector, displaying peptides fused to the coat proteins, and as a DNA sequencing vector for peptide identification. By expressing the polysome-derived peptides on phage, one can either continue the affinity selection procedure in this format or assay the peptides on individual clones for binding activity in a phage ELISA, or for binding specificity in a completion phage ELISA (Barret, et al. (1992) Anal. Biochem 204,357-364). To identify the sequences of the active peptides one sequences the DNA produced by the phagemid host.

Secondary Screening of Polypeptides and Analogs

The high through-put assays described above can be followed by secondary screens in order to identify further biological activities which will, e.g., allow one skilled in the art to differentiate agonists from antagonists. The type of a secondary screen used will depend on the desired activity that needs to be tested. For example, an assay can be developed in which the ability to inhibit an interaction between a protein of interest and its respective ligand can be used to identify antagonists from a group of peptide fragments isolated though one of the primary screens described above.

Therefore, methods for generating fragments and analogs and testing them for activity are known in the art. Once the core sequence of interest is identified, it is routine for one skilled in the art to obtain analogs and fragments.

Peptide Mimetics of E. faecalis Polypeptides

The invention also provides for reduction of the protein binding domains of the subject E. faecalis polypeptides to generate mimetics, e.g. peptide or non-peptide agents. The peptide mimetics are able to disrupt binding of a polypeptide to its counter ligand, e.g., in the case of an E. faecalis polypeptide binding to a naturally occurring ligand. The critical residues of a subject E. faecalis polypeptide which are involved in molecular recognition of a polypeptide can be determined and used to generate E. faecalis-derived peptidomimetics which competitively or noncompetitively inhibit binding of the E. faecalis polypeptide with an interacting polypeptide (see, for example, European patent applications EP-412,762A and EP-B31,080A).

For example, scanning mutagenesis can be used to map the amino acid residues of a particular E. faecalis polypeptide involved in binding an interacting polypeptide, peptidomimetic compounds (e.g. diazepine or isoquinoline derivatives) can be generated which mimic those residues in binding to an interacting polypeptide, and which therefore can inhibit binding of an E. faecalis polypeptide to an interacting polypeptide and thereby interfere with the function of E. faecalis polypeptide. For instance, non-hydrolyzable peptide analogs of such residues can be generated using benzodiazepine (e.g., see Freidinger et al. in Peptides: Chemistry and Biology, G. R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), azepine (e.g., see Huffman et al. in Peptides: Chemistry and Biology, G. R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), substituted gama lactam rings (Garvey et al. in Peptides: Chemistry and Biology, G. R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), keto-methylene pseudopeptides (Ewenson et al. (1986) J Med Chem 29:295; and Ewenson et al. in Peptides: Structure and Function (Proceedings of the 9th American Peptide Symposium) Pierce Chemical Co. Rockland, Ill., 1985), b-turn dipeptide cores (Nagai et al. (1985) Tetrahedron Lett 26:647; and Sato et al. (1986) J Chem Soc Perkin Trans 1:1231), and b-aminoalcohols (Gordon et al. (1985) Biochem Biophys Res Commun 126:419; and et al. (1986) Biochem Biophys Res Commun 134:71).

Vaccine Formulations for E. faecalis Nucleic Acids and Polypeptides

This invention also features vaccine compositions for protection against infection by E. faecalis or for treatment of E. faecalis infection, a gram-positive bacterium. In one embodiment, the vaccine compositions contain one or more immunogenic components such as a surface protein from E. faecalis, or portion thereof, and a pharmaceutically acceptable carrier. Nucleic acids within the scope of the invention are exemplified by the nucleic acids of the invention contained in the Sequence Listing which encode E. faecalis surface proteins. Any nucleic acid encoding an immunogenic E. faecalis protein, or portion thereof, which is capable of expression in a cell, can be used in the present invention. These vaccines have therapeutic and prophylactic utilities.

One aspect of the invention provides a vaccine composition for protection against infection by E. faecalis which contains at least one immunogenic fragment of an E. faecalis protein and a pharmaceutically acceptable carrier. Preferred fragments include peptides of at least about 10 amino acid residues in length, preferably about 10-20 amino acid residues in length, and more preferably about 12-16 amino acid residues in length.

Immunogenic components of the invention can be obtained, for example, by screening polypeptides recombinantly produced from the corresponding fragment of the nucleic acid encoding the full-length E. faecalis protein. In addition, fragments can be chemically synthesized using techniques known in the art such as conventional Merrifield solid phase f-Moc or t-Boc chemistry.

In one embodiment, immunogenic components are identified by the ability of the peptide to stimulate T cells. Peptides which stimulate T cells, as determined by, for example, T cell proliferation or cytokine secretion are defined herein as comprising at least one T cell epitope. T cell epitopes are believed to be involved in initiation and perpetuation of the immune response to the protein allergen which is responsible for the clinical symptoms of allergy. These T cell epitopes are thought to trigger early events at the level of the T helper cell by binding to an appropriate HLA molecule on the surface of an antigen presenting cell, thereby stimulating the T cell subpopulation with the relevant T cell receptor for the epitope. These events lead to T cell proliferation, lymphokine secretion, local inflammatory reactions, recruitment of additional immune cells to the site of antigen/T cell interaction, and activation of the B cell cascade, leading to the production of antibodies. A T cell epitope is the basic element, or smallest unit of recognition by a T cell receptor, where the epitope comprises amino acids essential to receptor recognition (e.g., approximately 6 or 7 amino acid residues). Amino acid sequences which mimic those of the T cell epitopes are within the scope of this invention.

Screening immunogenic components can be accomplished using one or more of several different assays. For example, in vitro, peptide T cell stimulatory activity is assayed by contacting a peptide known or suspected of being immunogenic with an antigen presenting cell which presents appropriate MHC molecules in a T cell culture. Presentation of an immunogenic E. faecalis peptide in association with appropriate MHC molecules to T cells in conjunction with the necessary co-stimulation has the effect of transmitting a signal to the T cell that induces the production of increased levels of cytokines, particularly of interleukin-2 and interleukin-4. The culture supernatant can be obtained and assayed for interleukin-2 or other known cytokines. For example, any one of several conventional assays for interleukin-2 can be employed, such as the assay described in Proc. Natl. Acad. Sci USA, 86: 1333 (1989) the pertinent portions of which are incorporated herein by reference. A kit for an assay for the production of interferon is also available from Genzyme Corporation (Cambridge, Mass.).

Alternatively, a common assay for T cell proliferation entails measuring tritiated thymidine incorporation. The proliferation of T cells can be measured in vitro by determining the amount of 3H-labeled thymidine incorporated into the replicating DNA of cultured cells. Therefore, the rate of DNA synthesis and, in turn, the rate of cell division can be quantified.

Vaccine compositions of the invention containing immunogenic components (e.g., E. faecalis polypeptide or fragment thereof or nucleic acid encoding an E. faecalis polypeptide or fragment thereof) preferably include a pharmaceutically acceptable carrier. The term “pharmaceutically acceptable carrier” refers to a carrier that does not cause an allergic reaction or other untoward effect in patients to whom it is administered. Suitable pharmaceutically acceptable carriers include, for example, one or more of water, saline, phosphate buffered saline, dextrose, glycerol, ethanol and the like, as well as combinations thereof. Pharmaceutically acceptable carriers may further comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, preservatives or buffers, which enhance the shelf life or effectiveness of the antibody. For vaccines of the invention containing E. faecalis polypeptides, the polypeptide is co-administered with a suitable adjuvant.

It will be apparent to those of skill in the art that the therapeutically effective amount of DNA or protein of this invention will depend, inter alia, upon the administration schedule, the unit dose of antibody administered, whether the protein or DNA is administered in combination with other therapeutic agents, the immune status and health of the patient, and the therapeutic activity of the particular protein or DNA.

Vaccine compositions are conventionally administered parenterally, e.g., by injection, either subcutaneously or intramuscularly. Methods for intramuscular immunization are described by Wolff et al. (1990) Science 247: 1465-1468 and by Sedegah et al. (1994) Immunology 91: 9866-9870. Other modes of administration include oral and pulmonary formulations, suppositories, and transdermal applications. Oral immunization is preferred over parenteral methods for inducing protection against infection by E. faecalis. Cain et. al. (1993) Vaccine 11: 637-642. Oral formulations include such normally employed excipients as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium carbonate, and the like.

The vaccine compositions of the invention can include an adjuvant, including, but not limited to aluminum hydroxide; N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP); N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP); N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(1′-2′-dipalmitoyl-sn-glycero-3-hydroxyphos-phoryloxy)-ethylamine (CGP 19835A, referred to a MTP-PE); RIBI, which contains three components from bacteria; monophosphoryl lipid A; trehalose dimycoloate; cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/Tween 80 emulsion; and cholera toxin. Others which may be used are non-toxic derivatives of cholera toxin, including its B subunit, and/or conjugates or genetically engineered fusions of the E. faecalis polypeptide with cholera toxin or its B subunit, procholeragenoid, fungal polysaccharides, including schizophyllan, muramyl dipeptide, muramyl dipeptide derivatives, phorbol esters, labile toxin of E. coli, non-E. faecalis bacterial lysates, block polymers or saponins.

Other suitable delivery methods include biodegradable microcapsules or immuno-stimulating complexes (ISCOMs), cochleates, or liposomes, genetically engineered attenuated live vectors such as viruses or bacteria, and recombinant (chimeric) virus-like particles, e.g., bluetongue. The amount of adjuvant employed will depend on the type of adjuvant used. For example, when the mucosal adjuvant is cholera toxin, it is suitably used in an amount of 5 mg to 50 mg, for example 10 mg to 35 mg. When used in the form of microcapsules, the amount used will depend on the amount employed in the matrix of the microcapsule to achieve the desired dosage. The determination of this amount is within the skill of a person of ordinary skill in the art.

Carrier systems in humans may include enteric release capsules protecting the antigen from the acidic environment of the stomach, and including E. faecalis polypeptide in an insoluble form as fusion proteins. Suitable carriers for the vaccines of the invention are enteric coated capsules and polylactide-glycolide microspheres. Suitable diluents are 0.2 N NaHCO3 and/or saline.

Vaccines of the invention can be administered as a primary prophylactic agent in adults or in children, as a secondary prevention, after successful eradication of E. faecalis in an infected host, or as a therapeutic agent in the aim to induce an immune response in a susceptible host to prevent infection by E. faecalis. The vaccines of the invention are administered in amounts readily determined by persons of ordinary skill in the art. Thus, for adults a suitable dosage will be in the range of 10 mg to 10 g, preferably 10 mg to 100 mg. A suitable dosage for adults will also be in the range of 5 mg to 500 mg. Similar dosage ranges will be applicable for children. Those skilled in the art will recognize that the optimal dose may be more or less depending upon the patient's body weight, disease, the route of administration, and other factors. Those skilled in the art will also recognize that appropriate dosage levels can be obtained based on results with known oral vaccines such as, for example, a vaccine based on an E. coli lysate (6 mg dose daily up to total of 540 mg) and with an enterotoxigenic E. coli purified antigen (4 doses of 1 mg) (Schulman et al., J. Urol. 150:917-921 (1993); Boedecker et al., American Gastroenterological Assoc. 999:A-222 (1993)). The number of doses will depend upon the disease, the formulation, and efficacy data from clinical trials. Without intending any limitation as to the course of treatment, the treatment can be administered over 3 to 8 doses for a primary immunization schedule over 1 month (Boedeker, American Gastroenterological Assoc. 888:A-222 (1993)).

In a preferred embodiment, a vaccine composition of the invention can be based on a killed whole E. coli preparation with an immunogenic fragment of an E. faecalis protein of the invention expressed on its surface or it can be based on an E. coli lysate, wherein the killed E. coli acts as a carrier or an adjuvant.

It will be apparent to those skilled in the art that some of the vaccine compositions of the invention are useful only for preventing E. faecalis infection, some are useful only for treating E. faecalis infection, and some are useful for both preventing and treating E. faecalis infection. In a preferred embodiment, the vaccine composition of the invention provides protection against E. faecalis infection by stimulating humoral and/or cell-mediated immunity against E. faecalis. It should be understood that amelioration of any of the symptoms of E. faecalis infection is a desirable clinical goal, including a lessening of the dosage of medication used to treat E. faecalis-caused disease, or an increase in the production of antibodies in the serum or mucous of patients.

Antibodies Reactive with E. faecalis Polypeptides

The invention also includes antibodies specifically reactive with the subject E. faecalis polypeptide. Anti-protein/anti-peptide antisera or monoclonal antibodies can be made by standard protocols (See, for example, Antibodies: A Laboratory Manual ed. by Harlow and Lane (Cold Spring Harbor Press: 1988)). A mammal such as a mouse, a hamster or rabbit can be immunized with an immunogenic form of the peptide. Techniques for conferring immunogenicity on a protein or peptide include conjugation to carriers or other techniques well known in the art. An immunogenic portion of the subject E. faecalis polypeptide can be administered in the presence of adjuvant. The progress of immunization can be monitored by detection of antibody titers in plasma or serum. Standard ELISA or other immunoassays can be used with the immunogen as antigen to assess the levels of antibodies.

In a preferred embodiment, the subject antibodies are immunospecific for antigenic determinants of the E. faecalis polypeptides of the invention, e.g. antigenic determinants of a polypeptide of the invention contained in the Sequence Listing, or a closely related human or non-human mammalian homolog (e.g., 90% homologous, more preferably at least 95% homologous). In yet a further preferred embodiment of the invention, the anti-E. faecalis antibodies do not substantially cross react (i.e., react specifically) with a protein which is for example, less than 80% percent homologous to a sequence of the invention contained in the Sequence Listing. By “not substantially cross react”, it is meant that the antibody has a binding affinity for a non-homologous protein which is less than 10 percent, more preferably less than 5 percent, and even more preferably less than 1 percent, of the binding affinity for a protein of the invention contained in the Sequence Listing. In a most preferred embodiment, there is no cross-reactivity between bacterial and mammalian antigens.

The term antibody as used herein is intended to include fragments thereof which are also specifically reactive with E. faecalis polypeptides. Antibodies can be fragmented using conventional techniques and the fragments screened for utility in the same manner as described above for whole antibodies. For example, F(ab′)2 fragments can be generated by treating antibody with pepsin. The resulting F(ab′)2 fragment can be treated to reduce disulfide bridges to produce Fab′ fragments. The antibody of the invention is further intended to include bispecific and chimeric molecules having an anti-E. faecalis portion.

Both monoclonal and polyclonal antibodies (Ab) directed against E. faecalis polypeptides or E. faecalis polypeptide variants, and antibody fragments such as Fab′ and F(ab′)2, can be used to block the action of E. faecalis polypeptide and allow the study of the role of a particular E. faecalis polypeptide of the invention in aberrant or unwanted intracellular signaling, as well as the normal cellular function of the E. faecalis and by microinjection of anti-E. faecalis polypeptide antibodies of the present invention.

Antibodies which specifically bind E. faecalis epitopes can also be used in immunohistochemical staining of tissue samples in order to evaluate the abundance and pattern of expression of E. faecalis antigens. Anti E. faecalis polypeptide antibodies can be used diagnostically in immuno-precipitation and immuno-blotting to detect and evaluate E. faecalis levels in tissue or bodily fluid as part of a clinical testing procedure. Likewise, the ability to monitor E. faecalis polypeptide levels in an individual can allow determination of the efficacy of a given treatment regimen for an individual afflicted with such a disorder. The level of an E. faecalis polypeptide can be measured in cells found in bodily fluid, such as in urine samples or can be measured in tissue, such as produced by gastric biopsy. Diagnostic assays using anti-E. faecalis antibodies can include, for example, immunoassays designed to aid in early diagnosis of E. faecalis infections. The present invention can also be used as a method of detecting antibodies contained in samples from individuals infected by this bacterium using specific E. faecalis antigens.

Another application of anti-E. faecalis polypeptide antibodies of the invention is in the immunological screening of cDNA libraries constructed in expression vectors such as λgt11, λgt18-23, λZAP, and λORF8. Messenger libraries of this type, having coding sequences inserted in the correct reading frame and orientation, can produce fusion proteins. For instance, λgt11 will produce fusion proteins whose amino termini consist of β-galactosidase amino acid sequences and whose carboxy termini consist of a foreign polypeptide. Antigenic epitopes of a subject E. faecalis polypeptide can then be detected with antibodies, as, for example, reacting nitrocellulose filters lifted from infected plates with anti-E. faecalis polypeptide antibodies. Phage, scored by this assay, can then be isolated from the infected plate. Thus, the presence of E. faecalis gene homologs can be detected and cloned from other species, and alternate isoforms (including splicing variants) can be detected and cloned.

Kits Containing Nucleic Acids, Polypeptides or Antibodies of the Invention

The nucleic acid, polypeptides and antibodies of the invention can be combined with other reagents and articles to form kits. Kits for diagnostic purposes typically comprise the nucleic acid, polypeptides or antibodies in vials or other suitable vessels. Kits typically comprise other reagents for performing hybridization reactions, polymerase chain reactions (PCR), or for reconstitution of lyophilized components, such as aqueous media, salts, buffers, and the like. Kits may also comprise reagents for sample processing such as detergents, chaotropic salts and the like. Kits may also comprise immobilization means such as particles, supports, wells, dipsticks and the like. Kits may also comprise labeling means such as dyes, developing reagents, radioisotopes, fluorescent agents, luminescent or chemiluminescent agents, enzymes, intercalating agents and the like. With the nucleic acid and amino acid sequence information provided herein, individuals skilled in art can readily assemble kits to serve their particular purpose. Kits further can include instructions for use.

Drug Screening Assays Using E. faecalis Polypeptides

By making available purified and recombinant E. faecalis polypeptides, the present invention provides assays which can be used to screen for drugs which are either agonists or antagonists of the normal cellular function, in this case, of the subject E. faecalis polypeptides, or of their role in intracellular signaling. Such inhibitors or potentiators may be useful as new therapeutic agents to combat E. faecalis infections in humans. A variety of assay formats will suffice and, in light of the present inventions, will be comprehended by the person skilled in the art.

In many drug screening programs which test libraries of compounds and natural extracts, high throughput assays are desirable in order to maximize the number of compounds surveyed in a given period of time. Assays which are performed in cell-free systems, such as may be derived with purified or semi-purified proteins, are often preferred as “primary” screens in that they can be generated to permit rapid development and relatively easy detection of an alteration in a molecular target which is mediated by a test compound. Moreover, the effects of cellular toxicity and/or bioavailability of the test compound can be generally ignored in the in vitro system, the assay instead being focused primarily on the effect of the drug on the molecular target as may be manifest in an alteration of binding affinity with other proteins or change in enzymatic properties of the molecular target. Accordingly, in an exemplary screening assay of the present invention, the compound of interest is contacted with an isolated and purified E. faecalis polypeptide.

Screening assays can be constructed in vitro with a purified E. faecalis polypeptide or fragment thereof, such as an E. faecalis polypeptide having enzymatic activity, such that the activity of the polypeptide produces a detectable reaction product. The efficacy of the compound can be assessed by generating dose response curves from data obtained using various concentrations of the test compound. Moreover, a control assay can also be performed to provide a baseline for comparison. Suitable products include those with distinctive absorption, fluorescence, or chemi-luminescence properties, for example, because detection may be easily automated. A variety of synthetic or naturally occurring compounds can be tested in the assay to identify those which inhibit or potentiate the activity of the E. faecalis polypeptide. Some of these active compounds may directly, or with chemical alterations to promote membrane permeability or solubility, also inhibit or potentiate the same activity (e.g., enzymatic activity) in whole, live E. faecalis cells.

Overexpression Assays

Overexpression assays are based on the premise that overproduction of a protein would lead to a higher level of resistance to compounds that selectively interfere with the function of that protein. Overexpression assays may be used to identify compounds that interfere with the function of virtually any type of protein, including without limitation enzymes, receptors, DNA- or RNA-binding proteins, or any proteins that are directly or indirectly involved in regulating cell growth.

Typically, two bacterial strains are constructed. One contains a single copy of the gene of interest, and a second contains several copies of the same gene. Identification of useful inhibitory compounds of this type of assay is based on a comparison of the activity of a test compound in inhibiting growth and/or viability of the two strains. The method involves constructing a nucleic acid vector that directs high level expression of a particular target nucleic acid. The vectors are then transformed into host cells in single or multiple copies to produce strains that express low to moderate and high levels of protein encoding by the target sequence (strain A and B, respectively). Nucleic acid comprising sequences encoding the target gene can, of course, be directly integrated into the host cell.

Large numbers of compounds (or crude substances which may contain active compounds) are screened for their effect on the growth of the two strains. Agents which interfere with an unrelated target equally inhibit the growth of both strains. Agents which interfere with the function of the target at high concentration should inhibit the growth of both strains. It should be possible, however, to titrate out the inhibitory effect of the compound in the overexpressing strain. That is, if the compound is affecting the particular target that is being tested, it should be possible to inhibit the growth of strain A at a concentration of the compound that allows strain B to grow.

Alternatively, a bacterial strain is constructed that contains the gene of interest under the control of an inducible promoter. Identification of useful inhibitory agents using this type of assay is based on a comparison of the activity of a test compound in inhibiting growth and/or viability of this strain under both inducing and non-inducing conditions. The method involves constructing a nucleic acid vector that directs high-level expression of a particular target nucleic acid. The vector is then transformed into host cells that are grown under both non-inducing and inducing conditions (conditions A and B, respectively).

Large numbers of compounds (or crude substances which may contain active compounds) are screened for their effect on growth under these two conditions. Agents that interfere with the function of the target should inhibit growth under both conditions. It should be possible, however, to titrate out the inhibitory effect of the compound in the overexpressing strain. That is, if the compound is affecting the particular target that is being tested, it should be possible to inhibit growth under condition A at a concentration that allows the strain to grow under condition B.

Ligand-binding Assays

Many of the targets according to the invention have functions that have not yet been identified. Ligand-binding assays are useful to identify inhibitor compounds that interfere with the function of a particular target, even when that function is unknown. These assays are designed to detect binding of test compounds to particular targets. The detection may involve direct measurement of binding. Alternatively, indirect indications of binding may involve stabilization of protein structure or disruption of a biological function. Non-limiting examples of useful ligand-binding assays are detailed below.

A useful method for the detection and isolation of binding proteins is the Biomolecular Interaction Assay (BIAcore) system developed by Pharmacia Biosensor and described in the manufacturer's protocol (LKB Pharmacia, Sweden). The BIAcore system uses an affinity purified anti-GST antibody to immobilize GST-fusion proteins onto a sensor chip. The sensor utilizes surface plasmon resonance which is an optical phenomenon that detects changes in refractive indices. In accordance with the practice of the invention, a protein of interest is coated onto a chip and test compounds are passed over the chip. Binding is detected by a change in the refractive index (surface plasmon resonance).

A different type of ligand-binding assay involves scintillation proximity assays (SPA, described in U.S. Pat. No. 4,568,649).

Another type of ligand binding assay, also undergoing development, is based on the fact that proteins containing mitochondrial targeting signals are imported into isolated mitochondria in vitro (Hurt et al., 1985, Embo J. 4:2061-2068; Eilers and Schatz, Nature, 1986, 322:228-231). In a mitochondrial import assay, expression vectors are constructed in which nucleic acids encoding particular target proteins are inserted downstream of sequences encoding mitochondrial import signals. The chimeric proteins are synthesized and tested for their ability to be imported into isolated mitochondria in the absence and presence of test compounds. A test compound that binds to the target protein should inhibit its uptake into isolated mitochondria in vitro.

Another ligand-binding assay is the yeast two-hybrid system (Fields and Song, 1989, Nature 340:245-246). The yeast two-hybrid system takes advantage of the properties of the GAL4 protein of the yeast Saccharomyces cerevisiae. The GAL4 protein is a transcriptional activator required for the expression of genes encoding enzymes of galactose utilization. This protein consists of two separable and functionally. essential domains: an N-terminal domain which binds to specific DNA sequences (UASG); and a C-terminal domain containing acidic regions, which is necessary to activate transcription. The native GAL4 protein, containing both domains, is a potent activator of transcription when yeast are grown on galactose media. The N-terminal domain binds to DNA in a sequence-specific manner but is unable to activate transcription. The C-terminal domain contains the activating regions but cannot activate transcription because it fails to be localized to UASG. In the two-hybrid system, a system of two hybrid proteins containing parts of GAL4: (1) a GAL4 DNA-binding domain fused to a protein ‘X’ and (2) a GAL4 activation region fused to a protein ‘Y’. If X and Y can form a protein-protein complex and reconstitute proximity of the GAL4 domains, transcription of a gene regulated by UASG occurs. Creation of two hybrid proteins, each containing one of the interacting proteins X and Y, allows the activation region of UASG to be brought to its normal site of action.

The binding assay described in Fodor et al., 1991, Science 251:767-773, which involves testing the binding affinity of test compounds for a plurality of defined polymers synthesized on a solid substrate, may also be useful.

Compounds which bind to the polypeptides of the invention are potentially useful as antibacterial agents for use in therapeutic compositions.

Pharmaceutical formulations suitable for antibacterial therapy comprise the antibacterial agent in conjunction with one or more biologically acceptable carriers. Suitable biologically acceptable carriers include, but are not limited to, phosphate-buffered saline, saline, deionized water, or the like. Preferred biologically acceptable carriers are physiologically or pharmaceutically acceptable carriers.

The antibacterial compositions include an antibacterial effective amount of active agent. Antibacterial effective amounts are those quantities of the antibacterial agents of the present invention that afford prophylactic protection against bacterial infections or which result in amelioration or cure of an existing bacterial infection. This antibacterial effective amount will depend upon the agent, the location and nature of the infection, and the particular host. The amount can be determined by experimentation known in the art, such as by establishing a matrix of dosages and frequencies and comparing a group of experimental units or subjects to each point in the matrix.

The antibacterial active agents or compositions can be formed into dosage unit forms, such as for example, creams, ointments, lotions, powders, liquids, tablets, capsules, suppositories, sprays, aerosols or the like. If the antibacterial composition is formulated into a dosage unit form, the dosage unit form may contain an antibacterial effective amount of active agent. Alternatively, the dosage unit form may include less than such an amount if multiple dosage unit forms or multiple dosages are to be used to administer a total dosage of the active agent. Dosage unit forms can include, in addition, one or more excipient(s), diluent(s), disintegrant(s), lubricant(s), plasticizer(s), colorant(s), dosage vehicle(s), absorption enhancer(s), stabilizer(s), bactericide(s), or the like.

For general information concerning formulations, see, e.g., Gilman et al. (eds.), 1990, Goodman and Gilman's: The Pharmacological Basis of Therapeutics, 8th ed., Pergamon Press; and Remington's Pharmaceutical Sciences, 17th ed., 1990, Mack Publishing Co., Easton, Pa.; Avis et al. (eds.), 1993, Pharmaceutical Dosage Forms: Parenteral Medications, Dekker, New York; Lieberman et al (eds.), 1990, Pharmaceutical Dosage Forms: Disperse Systems, Dekker, New York.

The antibacterial agents and compositions of the present invention are useful for preventing or treating E. faecalis infections. Infection prevention methods incorporate a prophylactically effective amount of an antibacterial agent or composition. A prophylactically effective amount is an amount effective to prevent E. faecalis infection and will depend upon the specific bacterial strain, the agent, and the host. These amounts can be determined experimentally by methods known in the art and as described above.

E. faecalis infection treatment methods incorporate a therapeutically effective amount of an antibacterial agent or composition. A therapeutically effective amount is an amount sufficient to ameliorate or eliminate the infection. The prophylactically and/or therapeutically effective amounts can be administered in one administration or over repeated administrations. Therapeutic administration can be followed by prophylactic administration, once the initial bacterial infection has been resolved.

The antibacterial agents and compositions can be administered topically or systemically. Topical application is typically achieved by administration of creams, ointments, lotions, or sprays as described above. Systemic administration includes both oral and parental routes. Parental routes include, without limitation, subcutaneous, intramuscular, intraperitoneal, intravenous, transdermal, inhalation and intranasal administration.

EXEMPLIFICATION

Cloning and Sequencing E. faecalis Genomic Sequence

This invention provides nucleotide sequences of the genome of E. faecalis which thus comprises a DNA sequence library of E. faecalis genomic DNA. The detailed description that follows provides nucleotide sequences of E. faecalis, and also describes how the sequences were obtained and how ORFs (Open Reading Frames) and protein-coding sequences can be identified. Also described are methods of using the disclosed E. faecalis sequences in methods including diagnostic and therapeutic applications. Furthermore, the library can be used as a database for identification and comparison of medically important sequences in this and other strains of E. faecalis as well as other species of Enterococcus.

Chromosomal DNA from strain14336 of E. faecalis was isolated after Zymolyase digestion, sodium dodecyl sulfate lysis, potassium acetate precipitation, phenol:chloroform extraction and ethanol precipitation (Soll, D. R., T. Srikantha and S. R. Lockhart: Characterizing Developmentally Regulated Genes in E. faecalis. In Microbial Genome Methods. K. W. Adolph, editor. CRC Press. New York. p 17-37.). Genomic E. faecalis DNA was hydrodynamically sheared in an HPLC and then separated on a standard 1% agarose gel. Fractions corresponding to 2500-3000 bp in length were excised from the gel and purifed by the GeneClean procedure (Bio101, Inc.).

The purified DNA fragments were then blunt-ended using T4 DNA polymerase. The healed DNA was then ligated to unique BstXI-linker adapters (5′-GTCTTCACCACGGGG-3′ and 5′-GTGGTGAAGAC-3′ (SEQ ID NOS: 6811 and 6812) in 100-1000 fold molar excess). These linkers are complimentary to the BstXI-cut pGTC vector, while the overhang is not self-complimentary. Therefore, the linkers will not concatermerize nor will the cut-vector religate itself easily. The linker-adapted inserts were separated from the unincorporated linkers on a 1% agarose gel and purified using GeneClean. The linker-adapted inserts were then ligated to BstXI-cut vector to construct a “shotgun” sublclone libraries.

Only major modifications to the protocols are highlighted. Briefly, the library was then transformed into DH5á competent cells (Gibco/BRL, DH5á transformation protocol). It was assessed by plating onto antibiotic plates containing ampicillin and IPTG/Xgal. The plates were incubated overnight at 37° C. Transformants were then used for plating of clones and picking for sequencing. The cultures were grown overnight at 37° C. DNA was purified using a silica bead DNA preparation (Engelstein, 1996) method. In this manner, 25 μg of DNA was obtained per clone.

These purified DNA samples were then sequenced using primarily ABI dye-terminator chemistry. All subsequent steps were based on sequencing by ABI377 automated DNA sequencing methods. The ABI dye terminator sequence reads were run on ABI377 machines and the data was transferred to UNIX machines following lane tracking of the gels. Base calls and quality scores were determined using the program PHRED (Ewing et al., 1998, Genome Res. 8: 175-185; Ewing and Green, 1998, Genome Res. 8: 685-734). Reads were assembled using PHRAP (P. Green, Abstracts of DOE Human Genome Program Contractor—Grantee Workshop V, January 1996, p.157) with default program parameters and quality scores. The initial assembly was done at 2.3-fold coverage and yielded 712 contigs.

Finishing can follow the initial assembly. Missing mates (sequences from clones that only gave reads from one end of the Enterococcus DNA inserted in the plasmid) can be identified and sequenced with ABI technology to allow the identification of additional overlapping contigs.

End-sequencing of randomly picked genomic lambda was also performed. Sequencing on a both sides was done for all lambda sequences. The lambda library backbone helped to verify the integrity of the assembly and allowed closure of some of the physical gaps. Primers for walking off the ends of contigs would be selected using pick_primer (a GTC program) near the ends of the clones to facilitate gap closure. These walks can be sequenced using the selected clones and primers. These data are then reassembled with PHRAP. Additional sequencing using PCR-generated templates and screened and/or unscreened lambda templates can be done in addition.

To identify E. faecalis polypeptides the complete genomic sequence of E. faecalis were analyzed essentially as follows: First, all possible stop-to-stop open reading frames (ORFs) greater than 180 nucleotides in all six reading frames were translated into amino acid sequences. Second, the identified ORFs were analyzed for homology to known (archeabacter, prokaryotic and eukaryotic) protein sequences. Third, the coding potential of non-homologous sequences were evaluated with the program GENEMARK™ (Borodovsky and Mclninch, 1993, Comp. Chem. 17:123).

Identification, Cloning and Expression of E. faecalis Nucleic Acids

Expression and purification of the E. faecalis polypeptides of the invention can be performed essentially as outlined below.

To facilitate the cloning, expression and purification of membrane and secreted proteins from E. faecalis, a gene expression system, such as the pET System (Novagen), for cloning and expression of recombinant proteins in E. coli, is selected. Also, a DNA sequence encoding a peptide tag, the His-Tag, is fused to the 3′ end of DNA sequences of interest in order to facilitate purification of the recombinant protein products. The 3′ end is selected for fusion in order to avoid alteration of any 5′ terminal signal sequence.

PCR Amplification and Cloning of Nucleic Acids Containing ORF's Encoding Enzymes

Nucleic acids chosen (for example, from the nucleic acids set forth in SEQ ID NO: 1-SEQ ID NO: 3405) for cloning from the 14336 strain of E. faecalis are prepared for amplification cloning by polymerase chain reaction (PCR). Synthetic oligonucleotide primers specific for the 5′ and 3′ ends of open reading frames (ORFs) are designed and purchased from GibcoBRL Life Technologies (Gaithersburg, Md., USA). All forward primers (specific for the 5′ end of the sequence) are designed to include an NcoI cloning site at the extreme 5′ terminus. These primers are designed to permit initiation of protein translation at a methionine residue followed by a valine residue and the coding sequence for the remainder of the native E. faecalis DNA sequence. All reverse primers (specific for the 3′ end of any E. faecalis ORF) include a EcoRI site at the extreme 5′ terminus to permit cloning of each E. faecalis sequence into the reading frame of the pET-28b. The pET-28b vector provides sequence encoding an additional 20 carboxy-terminal amino acids including six histidine residues (at the extreme C-terminus), which comprise the His-Tag.

Genomic DNA prepared from the 14336 strain of E. faecalis is used as the source of template DNA for PCR amplification reactions (Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). To amplify a DNA sequence containing an E. faecalis ORF, genomic DNA (50 nanograms) is introduced into a reaction vial containing 2 mM MgCl2, 1 micromolar synthetic oligonucleotide primers (forward and reverse primers) complementary to and flanking a defined E. faecalis ORF, 0.2 mM of each deoxynucleotide triphosphate; dATP, dGTP, dCTP, dTTP and 2.5 units of heat stable DNA polymerase (Amplitaq, Roche Molecular Systems, Inc., Branchburg, N.J., USA) in a final volume of 100 microliters.

Upon completion of thermal cycling reactions, each sample of amplified DNA is washed and purified using the Qiaquick Spin PCR purification kit (Qiagen, Gaithersburg, Md., USA). All amplified DNA samples are subjected to digestion with the restriction endonucleases, e.g., NcoI and EcoRI (New England BioLabs, Beverly, Mass., USA)(Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). DNA samples are then subjected to electrophoresis on 1.0% NuSeive (FMC BioProducts, Rockland, Me. USA) agarose gels. DNA is visualized by exposure to ethidium bromide and long wave uv irradiation. DNA contained in slices isolated from the agarose gel is purified using the Bio 101 GeneClean Kit protocol (Bio 101 Vista, Calif., USA).

Cloning of E. faecalis Nucleic Acids into an Expression Vector

The pET-28b vector is prepared for cloning by digestion with endonucleases, e.g., NcoI and EcoRI (Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). The pET-28a vector, which encodes a His-Tag that can be fused to the 5′ end of an inserted gene, is prepared by digestion with appropriate restriction endonucleases.

Following digestion, DNA inserts are cloned (Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) into the previously digested pET-28b expression vector. Products of the ligation reaction are then used to transform the BL21 strain of E. coli (Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) as described below.

Transformation of Competent Bacteria with Recombinant Plasmids

Competent bacteria, E coli strain BL21 or E. coli strain BL21 (DE3), are transformed with recombinant pET expression plasmids carrying the cloned E. faecalis sequences according to standard methods (Current Protocols in Molecular, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). Briefly, 1 microliter of ligation reaction is mixed with 50 microliters of electrocompetent cells and subjected to a high voltage pulse, after which, samples are incubated in 0.45 milliliters SOC medium (0.5% yeast extract, 2.0% tryptone, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4 and 20, mM glucose) at 37° C. with shaking for 1 hour. Samples are then spread on LB agar plates containing 25 microgram/ml kanamycin sulfate for growth overnight. Transformed colonies of BL21 are then picked and analyzed to evaluate cloned inserts as described below.

Identification of Recombinant Expression Vectors with E. faecalis Nucleic Acids

Individual BL21 clones transformed with recombinant pET-28b E. faecalis ORFs are analyzed by PCR amplification of the cloned inserts using the same forward and reverse primers, specific for each E. faecalis sequence, that were used in the original PCR amplification cloning reactions. Successful amplification verifies the integration of the E. faecalis sequences in the expression vector (Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994).

Isolation and Preparation of Nucleic Acids from Transformants

Individual clones of recombinant pET-28b vectors carrying properly cloned E. faecalis ORFs are picked and incubated in 5 mls of LB broth plus 25 microgram/ml kanamycin sulfate overnight. The following day plasmid DNA is isolated and purified using the Qiagen plasmid purification protocol (Qiagen Inc., Chatsworth, Calif., USA).

Expression of Recombinant E. faecalis Sequences in E. coli

The pET vector can be propagated in any E. coli K-12 strain e.g. HMS174, HB101, JM109, DH5, etc. for the purpose of cloning or plasmid preparation. Hosts for expression include E. coli strains containing a chromosomal copy of the gene for T7 RNA polymerase. These hosts are lysogens of bacteriophage DE3, a lambda derivative that carries the lacI gene, the lacUV5 promoter and the gene for T7 RNA polymerase. T7 RNA polymerase is induced by addition of isopropyl-B-D-thiogalactoside (IPTG), and the T7 RNA polymerase transcribes any target plasmid, such as pET-28b, carrying its gene of interest. Strains used include: BL21(DE3) (Studier, F. W., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990) Meth. Enzymol. 185, 60-89).

To express recombinant E. faecalis sequences, 50 nanograms of plasmid DNA isolated as described above is used to transform competent BL21(DE3) bacteria as described above (provided by Novagen as part of the pET expression system kit). The lacZ gene (beta-galactosidase) is expressed in the pET-System as described for the E. faecalis recombinant constructions. Transformed cells are cultured in SOC medium for 1 hour, and the culture is then plated on LB plates containing 25 micrograms/ml kanamycin sulfate. The following day, bacterial colonies are pooled and grown in LB medium containing kanamycin sulfate (25 micrograms/ml) to an optical density at 600 nM of 0.5 to 1.0 O.D. units, at which point, 1 millimolar IPTG was added to the culture for 3 hours to induce gene expression of the E. faecalis recombinant DNA constructions.

After induction of gene expression with IPTG, bacteria are pelleted by centrifugation in a Sorvall RC-3B centrifuge at 3500×g for 15 minutes at 4° C. Pellets are resuspended in 50 milliliters of cold 10 mM Tris-HCl, pH 8.0, 0.1 M NaCl and 0.1 mM EDTA (STE buffer). Cells are then centrifuged at 2000×g for 20 min at 4° C. Wet pellets are weighed and frozen at −80° C. until ready for protein purification.

A variety of methodologies known in the art can be utilized to purify the isolated proteins. (Current Protocols in Protein Science, John Wiley and Sons, Inc., J. E. Coligan et al., eds., 1995). For example, the frozen cells are thawed, resupended in buffer and ruptured by several passages through a small volume microfluidizer (Model M-110S, Microfluidics International Corporation, Newton, Mass.). The resultant homogenate is centrifuged to yield a clear supernatant (crude extract) and following filtration the crude extract is fractionated over columns. Fractions are monitored by absorbance at OD280 nm. and peak fractions may analyzed by SDS-PAGE.

The concentrations of purified protein preparations are quantified spectrophotometrically using absorbance coefficients calculated from amino acid content (Perkins, S. J. 1986 Eur. J. Biochem. 157, 169-180). Protein concentrations are also measured by the method of Bradford, M. M. (1976) Anal. Biochem. 72, 248-254, and Lowry, O. H., Rosebrough, N., Farr, A. L. & Randall, R. J. (1951) J. Biol. Chem. 193, pages 265-275, using bovine serum albumin as a standard.

SDS-polyacrylamide gels of various concentrations are purchased from BioRad (Hercules, Calif., USA), and stained with Coomassie blue. Molecular weight markers may include rabbit skeletal muscle myosin (200 kDa), E. coli (-galactosidase (116 kDa), rabbit muscle phosphorylase B (97.4 kDa), bovine serum albumin (66.2 kDa), ovalbumin (45 kDa), bovine carbonic anhydrase (31 kDa), soybean trypsin inhibitor (21.5 kDa), egg white lysozyme (14.4 kDa) and bovine aprotinin (6.5 kDa).

EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments and methods described herein. The specific embodiments described herein are offered by way of example only, and the invention is to limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.

TABLE 2
NT Seq AA Seq NT Orf AA Orf
Contig Orf Id ID Length Length Score Probability Description
contig100 35409705_f3_1 1 3406 918 305 437 2.40E-41 [SP:P77327] [OR:ESCHERICHIA COLI] [GN:PROY] [DE:PROLINE
SPECIFIC PERMEASE PROY]
contig101 31647702_f1_1 2 3407 738 246 619 1.20E-60 [SP:P46338] [OR:BACILLUS SUBTILIS] [GN:YQGG]
[DE:REGION PRECURSOR (ORF108)]
contig102 32602186_c2_2 3 3408 627 208 245 5.30E-21 [AC:D90783] [OR:Escherichia coli] [PN:Spermidine/putrescine
transport ATP-binding] [NT:ORF_ID]
contig103 35160251_f1_1 4 3409 534 178 501 4.00E-48 [SP:P54548] [OR:BACILLUS SUBTILIS] [GN:YQJK]
[DE:HYPOTHETICAL 34.0 KD PROTEIN IN GLNQ-ANSR
INTERGENIC REGION]
contig104 781525_c1_3 5 3410 651 216 527 7.00E-51 [SP:P44697] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0416]
[DE:HYPOTHETICAL PROTEIN HI0416]
contig105 24430340_c3_4 6 3411 510 169 55 0.84 [AC:M30560] [OR:Mus musculus] [NT:Ig mu-chain V-D-J region]
contig105 22321062_c2_3 7 3412 555 184 83 0.14 [SP:P37467] [OR:BACILLUS SUBTILIS] [GN:XPAC] [DE:XPAC
PROTEIN]
contig106 5132942_f2_2 8 3413 255 85 160 7.10E-11 [AC:D86418] [OR:Bacillus subtilis] [PN:YfnI]
contig107 16302276_f1_1 9 3414 372 123 197 6.50E-16 [SP:P54452] [OR:BACILLUS SUBTILIS] [GN:YQEG]
[DE:HYPOTHETICAL 20.1 KD PROTEIN IN NUCB-AROD
INTERGENIC REGION]
contig107 19726707_f3_2 10 3415 366 121 265 4.10E-23 [SP:P54453] [OR:BACILLUS SUBTILIS] [GN:YQEH]
[DE:HYPOTHETICAL 41.0 KD PROTEIN IN NUCB-AROD
INTERGENIC REGION]
contig108 33690625_c3_11 11 3416 591 197 264 5.20E-23 [SP:P45691] [OR:ESCHERICHIA COLI] [GN:YHCS] [DE:(O309)]
contig108 35651090_f1_1 12 3417 213 70 53 0.69 [SP:P10713] [OR:NEUROSPORA CRASSA] [GN:CON-10]
[DE:CONIDIATION-SPECIFIC PROTEIN 10]
contig108 2551651_c2_9 13 3418 366 121 84 0.0049 [AC:U53585] [OR:Mycobacterium avium] [PN:fibronectin attachment
protein] [GN:FAP-A]
contig109 17162_f3_2 14 3419 324 107 172 6.90E-13 [SP:P75942] [OR:ESCHERICHIA COLI] [GN:FLGJ]
[DE:FLAGELLAR PROTEIN FLGJ]
contig109 14954677_f1_1 15 3420 654 218 84 0.82 [SP:P22382] [OR:SIMIAN IMMUNODEFICIENCY VIRUS]
[GN:POL] [DE:TRANSCRIPTASE,; RIBONUCLEASE H,)]
contig11 35677382_f1_1 16 3421 690 229 638 1.90E-62 [SP:P41972] [OR:STAPHYLOCOCCUS AUREUS] [GN:ILES]
[DE:(ILERS)]
contig110 33400061_c2_6 17 3422 963 321 445 3.40E-42 [AC:D50098] [OR:Bacillus subtilis] [PN:multidrug transporter]
[GN:bmr3]
contig110 19817885_c2_5 18 3423 405 134 562 1.40E-54 [SP:P54419] [OR:BACILLUS SUBTILIS] [GN:METK]
[DE:ADENOSYLTRANSFERASE) (ADOMET SYNTHETASE)]
contig111 3994212_f3_3 19 3424 192 63 89 0.00066 [AC:M32103] [OR:Staphylococcus aureus] [NT:ORF-27]
contig111 22151437_f1_1 20 3425 540 179 223 1.10E-18 [AC:M32103] [OR:Staphylococcus aureus] [NT:ORF-27]
contig112 26589762_f2_2 21 3426 396 131 337 1.40E-30 [SP:P23240] [OR:VIBRIO CHOLERAE] [GN:ALDA]
[DE:ALDEHYDE DEHYDROGENASE,]
contig112 25485887_c2_4 22 3427 369 122 378 4.30E-35 [SP:P00343] [OR:LACTOBACILLUS CASEI] [DE:L-LACTATE
DEHYDROGENASE,]
contig113 594567_f1_1 23 3428 273 90 330 5.30E-30 [SP:P07842] [OR:BACILLUS STEAROTHERMOPHILUS]
[GN:RPSI] [DE:30S RIBOSOMAL PROTEIN S9 (BS10)]
contig113 14501449_c1_5 24 3429 588 195 125 2.00E-06 [SP:P20709] [OR:BACTERIOPHAGE L54A] [GN:INT]
[DE:INTEGRASE]
contig114 5910137_f3_2 25 3430 756 251 384 1.00E-35 [AC:U36837] [OR:Lactococcus lactis] [PN:ORFU]
contig114 4486088_f1_1 26 3431 198 65 56 0.43 [AC:PS0274] [OR:Parechinus angulosus] [PN:homeotic protein box6]
[GN:box6]
contig115 22382306_c3_3 27 3432 843 281 1090 1.50E-110 [AC:U72720] [OR:Streptococcus pneumoniae] [PN:heat shock protein
70] [GN:dnaK] [NT:HSP70; partial peptide sequencing was also done]
contig116 253181_f1_1 28 3433 717 238 849 5.30E-85 [AC:M92842] [OR:Listeria monocytogenes] [GN:prs]
contig117 14660938_c3_3 29 3434 1338 446 129 1.00E-05 [SP:P55140] [OR:ESCHERICHIA COLI] [GN:YGCG]
[DE:HYPOTHETICAL 34.9 KD PROTEIN IN CYSJ-ENO
INTERGENIC REGION (O313)]
contig118 7082011_f2_1 30 3435 504 167 168 7.70E-13 [AC:Z82987] [OR:Bacillus subtilis] [PN:unknown, highly similar to E.
coli YecD] [GN:ywoC]
contig118 14625627_c3_4 31 3436 282 93 189 4.60E-15 [AC:Z75208] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:ysbB] [NT:homology to ywbG of Bacillus subtilis; putative]
contig119 23861675_c2_3 32 3437 720 240 690 3.70E-68 [SP:P23630] [OR:BACILLUS SUBTILIS] [GN:LYSA]
[DE:DIAMINOPIMELATE DECARBOXYLASE, (DAP
DECARBOXYLASE)]
contig12 25551286_c1_1 33 3438 570 189 608 1.80E-59 [SP:P77834] [OR:BACILLUS STEAROTHERMOPHILUS]
[GN:DEOD] [DE:(PNP)]
contig120 30712532_f3_2 34 3439 501 166 250 1.60E-21 [SP:P39787] [OR:BACILLUS SUBTILIS] [GN:DNAD] [DE:DNA
REPLICATION PROTEIN DNAD]
contig120 34648427_f2_1 35 3440 555 185 423 7.30E-40 [SP:P39788] [OR:BACILLUS SUBTILIS] [GN:NTH]
[DE:APYRIMIDINIC SITE) LYASE)]
contig121 24069450_c3_4 36 3441 819 273 338 7.50E-31 [AC:U64312] [OR:Bacillus firmus] [PN:amidase]
contig122 34084442_f1_1 37 3442 621 206 447 2.10E-42 [AC:Z75208] [OR:Bacillus subtilis] [PN:trigger factor] [GN:tig]
[NT:homology to trigger factor of Haemophilus]
contig122 6148577_f3_3 38 3443 540 179
contig123 22460816_c2_3 39 3444 318 105 480 6.70E-46 [SP:P19775] [OR:STAPHYLOCOCCUS AUREUS] [GN:TNP]
[DE:TRANSPOSASE FOR INSERTION SEQUENCE ELEMENT
IS256 IN TRANSPOSON TN4001]
contig123 33203387_f3_1 40 3445 678 225 142 2.70E-07 [AC:AB001488] [OR:Bacillus subtilis] [GN:yddE] [NT:SIMILAR TO
ORF16 OF ENTEROCOCCUS FAECALIS]
contig124 976590_f1_1 41 3446 1230 410 352 2.40E-32 [AC:D90909] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig125 1268806_f3_1 42 3447 408 135 63 0.999 [AC:D87601] [OR:Measles virus] [PN:fusion protein]
contig125 3945468_f3_2 43 3448 423 141 174 1.00E-12 [SP:P54389] [OR:BACILLUS SUBTILIS] [GN:YPIA]
[DE:HYPOTHETICAL 48.3 KD PROTEIN IN QCRA 5′REGION]
contig126 6929630_f1_1 44 3449 195 64 69 0.52 [AC:U59323] [OR:Homo sapiens] [PN:homolog of yeast UPF1]
[GN:HUPF1] [NT:putative Zn Knuckle; type 1 RNA helicase region;]
contig126 36142138_f2_2 45 3450 393 130 123 4.50E-08 [SP:P05706] [OR:ESCHERICHIA COLI] [GN:SRLB] [DE:II, A
COMPONENT), (EIII-GUT)]
contig127 4100463_c1_2 46 3451 471 156 382 6.30E-40 [SP:P37455] [OR:BACILLUS SUBTILIS] [GN:SSB] [DE:SINGLE-
STRAND BINDING PROTEIN (SSB) (HELIX-DESTABILIZING
PROTEIN)]
contig127 992176_c3_3 47 3452 234 77 66 0.41 [SP:P47254] [OR:MYCOPLASMA GENITALIUM] [GN:THDF]
[DE:POSSIBLE THIOPHENE AND FURAN OXIDATION PROTEIN
THDF]
contig128 35162513_c2_6 48 3453 435 145 60 0.25 [AC:U42580] [OR:Paramecium bursaria Chlorella virus 1]
[GN:a269R]
contig128 4328591_c2_5 49 3454 735 244 111 0.00041 [SP:Q58207] [OR:METHANOCOCCUS JANNASCHII]
[GN:MJ0797] [DE:HYPOTHETICAL PROTEIN MJ0797]
contig129 36207812_c1_6 50 3455 204 68 81 0.0029 [SP:P80239] [OR:BACILLUS SUBTILIS] [GN:AHPC]
[DE:PROTEIN 22)]
contig129 24645438_c1_5 51 3456 801 266 523 1.90E-50 [AC:X99710] [OR:Lactococcus lactis] [PN:methyltransferase]
[NT:homology with (D64004)]
contig129 24506253_c3_7 52 3457 207 68 55 0.52 [AC:U19586] [OR:Kluyveromyces lactis] [NT:similar to
Saccharomyces cerevisiae KIN28,]
contig13 21973437_c2_4 53 3458 369 123 132 8.20E-08 [SP:P47473] [OR:MYCOPLASMA GENITALIUM] [GN:NRDE]
[DE:(RIBONUCLEOTIDE REDUCTASE)]
contig13 4688465_c3_5 54 3459 258 85 166 1.80E-11 [SP:P50620] [OR:BACILLUS SUBTILIS] [GN:NRDE]
[DE:(RIBONUCLEOTIDE REDUCTASE)]
contig13 16597525_f2_2 55 3460 222 73 50 0.9999 [AC:L76581] [OR:Escherichia coli] [PN:unknown]
contig130 35316275_f2_2 56 3461 189 62 68 0.03 [OR:Mycoplasma hyopneumoniae] [PN:hypothetical protein]
contig130 35194512_c3_3 57 3462 489 162 654 2.40E-64 [SP:P12047] [OR:BACILLUS SUBTILIS] [GN:PURB]
[DE:ADENYLOSUCCINATE LYASE, (ADENYLOSUCCINASE)
(ASL)]
contig131 782751_f2_2 58 3463 504 168 51 0.92 [SP:P07367] [OR:RHODOBACTER CAPSULATUS] [GN:PUCA]
[DE:PIGMENT PROTEIN, ALPHA CHAIN)]
contig132 26209843_f2_2 59 3464 798 265 844 1.80E-84 [AC:U51115] [OR:Bacillus subtilis] [PN:GMP synthetase] [GN:guaA]
contig133 4801442_c3_5 60 3465 741 246 521 3.00E-50 [AC:Z94043] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yvdM] [NT:putative beta-phosphoglucomutase]
contig133 10573524_c1_3 61 3466 351 116 152 1.30E-09 [SP:Q10850] [OR:MYCOBACTERIUM TUBERCULOSIS]
[GN:MTCY39.11C] [DE:HYPOTHETICAL 145.8 KD PROTEIN
CY39.11C]
contig134 19823425_f3_2 62 3467 717 238 488 9.50E-47 [SP:P43440] [OR:ENTEROCOCCUS HIRAE] [GN:NTPJ]
[DE:TRANSLOCATING ATPASE SUBUNIT J)]
contig134 35183438_f2_1 63 3468 348 115 214 2.80E-17 [SP:P26235] [OR:ENTEROCOCCUS HIRAE] [GN:NAPA]
[DE:NA(+)/H(+) ANTIPORTER]
contig135 5910176_f3_2 64 3469 186 61 102 2.00E-05 [SP:P44865] [OR:HAEMOPHILUS INFLUENZAE] [GN:GPMA]
[DE:(BPG-DEPENDENT PGAM)]
contig135 34177192_f2_1 65 3470 585 194 655 1.90E-64 [SP:P44865] [OR:HAEMOPHILUS INFLUENZAE] [GN:GPMA]
[DE:(BPG-DEPENDENT PGAM)]
contig135 11931254_c1_4 66 3471 621 206 557 4.60E-54 [SP:Q04944] [OR:CLOSTRIDIUM ACETOBUTYLICUM]
[GN:BDHA] [DE:NADH-DEPENDENT BUTANOL
DEHYDROGENASE A, (BDH I)]
contig136 34179767_c2_3 67 3472 1047 348 171 2.40E-10 [AC:AE000125] [OR:Escherichia coli] [PN:hypothetical protein in
hemL-pfs intergenic] [GN:yadQ] [NT:o473; 100 pct identical to
YADQ_ECOLI SW]
contig137 35406287_f1_1 68 3473 1110 369 1036 8.10E-105 [AC:D78193] [OR:Bacillus subtilis] [GN:yydE]
contig137 2923202_c3_2 69 3474 228 75 65 0.22 [SP:Q28233] [OR:CERVUS ELAPHUS] [GN:IL12A]
[DE:MATURATION FACTOR 35 KD SUBUNIT) (CLMF P35)]
contig138 24823512_c3_3 70 3475 480 159 348 6.50E-32 [AC:AB002150] [OR:Bacillus subtilis] [PN:YbbK]
contig138 24328187_f1_1 71 3476 324 108 65 0.93 [AC:M95596] [OR:Oryctolagus cuniculus] [PN:titin]
contig139 14101442_f3_1 72 3477 1035 344 1360 3.70E-139 [SP:P39815] [OR:BACILLUS SUBTILIS] [GN:GID] [DE:GID
PROTEIN (FRAGMENT)]
contig14 12146825_f2_1 73 3478 183 60 72 0.034 [SP:P46378] [OR:RHODOCOCCUS FASCIANS] [GN:FAS6]
[DE:HYPOTHETICAL 21.1 KD PROTEIN IN FASCIATION LOCUS
(ORF6)]
contig14 11931555_c3_3 74 3479 279 92 304 3.00E-27 [SP:P54537] [OR:BACILLUS SUBTILIS] [GN:YQIZ]
[DE:INTERGENIC REGION]
contig140 34064712_c1_4 75 3480 243 81 61 0.47 [AC:U22157] [OR:Methanosarcina thermophila] [PN:beta-type
proteasome subunit] [GN:psmB]
contig140 1073426_c1_3 76 3481 810 269 66 0.24 [SP:Q01049] [OR:HERPESVIRUS SAIMIRI] [GN:53]
[DE:PUTATIVE MEMBRANE PROTEIN 53]
contig140 12579377_c2_5 77 3482 396 131 164 1.60E-11 [AC:L16534] [OR:Rhodococcus corallinus] [PN:N-ethylammeline
chlorohydrolase] [GN:trzA]
contig141 4961528_c2_5 78 3483 333 111 181 3.30E-14 [SP:P39803] [OR:BACILLUS SUBTILIS] [GN:YITT]
[DE:HYPOTHETICAL 30.5 KD PROTEIN IN IPI 5′REGION (ORF1)]
contig141 6292137_f1_1 79 3484 258 85 61 0.994 [SP:P32745] [OR:HOMO SAPIENS] [GN:SSTR3]
[DE:SOMATOSTATIN RECEPTOR TYPE 3 (SS3R) (SSR-28)]
contig141 16611010_f3_3 80 3485 531 176 521 3.00E-50 [AC:Y09476] [OR:Bacillus subtilis] [PN:YitK] [NT:putative - Some
homology with HI1034 (H.]
contig142 289688_c1_4 81 3486 414 137 175 4.30E-13 [SP:Q38653] [OR:BACTERIOPHAGE A511] [GN:PLY511]
[DE:ENDOLYSIN, (N-ACETYLMURAMOYL-L-ALANINE
AMIDASE)]
contig142 23476577_c3_5 82 3487 198 65 52 0.92 [SP:P46079] [OR:ANABAENA SP] [DE:HYPOTHETICAL 13.8 KD
PROTEIN IN FRAC 3′REGION]
contig142 25583436_f2_1 83 3488 243 81 50 0.99 [AC:Z79753] [OR:Caenorhabditis elegans] [PN:F35E12.8]
contig143 36111001_f1_1 84 3489 252 83 51 0.85 [OR:Presbytis entellus] [PN:MHC class II histocompatibility antigen
DQ alpha chain 1]
contig143 23377_f2_3 85 3490 942 313 278 1.70E-24 [AC:D37826] [OR:Photobacterium damsela subsp. piscicida] [PN:PP-
FLO]
contig144 34617817_f1_1 86 3491 993 330 228 3.40E-19 [SP:P44869] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0767]
[DE:HYPOTHETICAL PROTEIN HI0767]
contig144 6281316_f2_2 87 3492 276 92 173 2.30E-13 [SP:P23875] [OR:ESCHERICHIA COLI] [GN:KDTB]
[DE:LIPOPOLYSACCHARIDE CORE BIOSYNTHESIS PROTEIN
KDTB]
contig145 36133462_f3_1 88 3493 462 153 61 0.51 [AC:Z75208] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yshA] [NT:unknown function; putative]
contig145 11181916_c1_2 89 3494 201 66 66 0.11 [AC:L25385] [OR:Chloroplast Phragmites australis] [PN:RNA
polymerase beta-subunit] [GN:rpoC2]
contig147 24252326_f1_1 90 3495 303 100 67 0.9999 [AC:U64198] [OR:Homo sapiens] [PN:II-12 receptor beta2]
contig147 36126057_c1_5 91 3496 474 157 338 7.50E-31 [SP:P33385] [OR:LISTERIA MONOCYTOGENES] [DE:(ORFZ)]
contig147 26596327_c1_4 92 3497 195 64 113 5.20E-07 [SP:P54173] [OR:BACILLUS SUBTILIS] [GN:YPJQ]
[DE:HYPOTHETICAL 19.9 KD PROTEIN IN ILVD-THYB
INTERGENIC REGION]
contig148 25423317_f2_2 93 3498 1299 432 79 0.15 [OR:Rattus norvegicus] [PN:protein kinase C substrate, 80K]
contig149 16406308_f3_4 94 3499 351 116 76 0.22 [SP:P15994] [OR:PODOSPORA ANSERINA] [DE:ATP
SYNTHASE A CHAIN, (PROTEIN 6)]
contig149 2742882_f2_3 95 3500 471 157 208 4.40E-17 [AC:Y09476] [OR:Bacillus subtilis] [PN:YisV] [NT:putative]
contig15 3360312_f1_1 96 3501 459 152 516 1.00E-49 [AC:Z93937] [OR:Bacillus subtilis] [PN:unknown] [GN:yufQ]
contig150 10010957_c1_4 97 3502 408 136 289 1.20E-24 [SP:Q54089] [OR:STREPTOCOCCUS EQUISIMILIS] [GN:RELA]
[DE:PROTEIN)]
contig150 444662_c3_6 98 3503 261 86 235 6.10E-20 [SP:P54461] [OR:BACILLUS SUBTILIS] [GN:YQEU]
[DE:HYPOTHETICAL 28.8 KD PROTEIN IN DNAJ-RPSU
INTEREGENIC REGION]
contig150 24042813_c3_5 99 3504 192 63 128 3.30E-08 [SP:P54461] [OR:BACILLUS SUBTILIS] [GN:YQEU]
[DE:HYPOTHETICAL 28.8 KD PROTEIN IN DNAJ-RPSU
INTEREGENIC REGION]
contig151 4710963_f3_1 100 3505 603 200 59 0.63 [SP:P72851] [OR:SYNECHOCYSTIS SP] [GN:RPMB] [DE:50S
RIBOSOMAL PROTEIN L28]
contig151 24390927_c2_4 101 3506 186 61 65 0.15 [SP:P01365] [OR:SACCHAROMYCES CEREVISIAE]
[GN:MATAL1] [DE:MATING-TYPE PROTEIN ALPHA-1]
contig152 5334552_f1_1 102 3507 291 96 242 1.10E-20 [SP:P55873] [OR:BACILLUS SUBTILIS] [GN:RPLT] [DE:50S
RIBOSOMAL PROTEIN L20]
contig152 12923552_c1_4 103 3508 387 128 228 3.40E-19 [SP:P37507] [OR:BACILLUS SUBTILIS] [GN:YYAQ]
[DE:HYPOTHETICAL 13.9 KD PROTEIN IN COTF-TETB
INTERGENIC REGION]
contig152 31438465_f3_3 104 3509 204 67
contig153 24636010_f2_2 105 3510 240 79 235 6.10E-20 [SP:P37807] [OR:BACILLUS SUBTILIS] [GN:RPMB]
[DE:RIBOSOMAL PROTEIN L28]
contig153 5370716_c3_3 106 3511 510 169 50 0.9995 [AC:Y14082] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yhdS] [NT:Similarity to a hypothetical protein, YF18, from]
contig154 24015950_c3_5 107 3512 309 102
contig154 6928125_f3_4 108 3513 402 134 53 0.86 [OR:Entamoeba histolytica] [PN:amoebapore B]
contig155 6095133_f1_1 109 3514 513 170 139 9.10E-10 [AC:Y14081] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yhjE] [NT:Similarity to hypothetical protein yqeD from]
contig155 19687530_f2_3 110 3515 312 103 51 0.98 [AC:Z84823] [OR:Nicotiana tabacum] [PN:phospholipase D] [NT:5′
fragment]
contig155 21724200_f1_2 111 3516 456 151 75 0.83 [AC:749968] [OR:Caenorhabditis elegans] [PN:M110.5]
[NT:similarity to the Drosophila disabled protein; cDNA]
contig156 259825_c3_5 112 3517 234 78 55 0.52 [SP:P51415] [OR:MYCOPLASMA CAPRICOLUM] [GN:RPMG]
[DE:50S RIBOSOMAL PROTEIN L33]
contig156 23953562_c3_4 113 3518 546 181 325 1.80E-29 [SP:P54475] [OR:BACILLUS SUBTILIS] [GN:YQFR]
[DE:PROBABLE RNA HELICASE IN CCCA-SODA INTERGENIC
REGION]
contig157 14665802_f3_1 114 3519 804 268 910 1.80E-91 [SP:Q53727] [OR:STAPHYLOCOCCUS AUREUS] [GN:PCRA]
[DE:ATP-DEPENDENT HELICASE PCRA,]
contig158 207340_f3_2 115 3520 1059 352 343 2.20E-31 [SP:P44550] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0172]
[DE:HYPOTHETICAL PROTEIN HI0172]
contig158 25880002_c2_3 116 3521 270 89 55 0.999 [AC:Z86089] [OR:Mycobacterium tuberculosis] [PN:unknown]
[GN:MTCY0A4.04c] [NT:MTCY0A4.04c, 381 aa, some similarity to]
contig159 16611577_f3_3 117 3522 501 166 488 9.50E-47 [SP:P42923] [OR:BACILLUS SUBTILIS] [GN:RPLJ] [DE:50S
RIBOSOMAL PROTEIN L10 (BL5)]
contig16 7082035_f2_1 118 3523 966 322 853 2.00E-85 [SP:Q24803] [OR:ENTAMOEBA HISTOLYTICA] [GN:ADH2]
[DE:ALCOHOL DEHYDROGENASE 2,]
contig160 2601444_c1_1 119 3524 327 108 255 4.80E-21 [AC:L26286] [OR:Schistosoma mansoni] [PN:SMDR1]
contig161 19729692_f3_2 120 3525 414 137 407 3.60E-38 [SP:P37887] [OR:BACILLUS SUBTILIS] [GN:CYSK] [DE:(O-
ACETYLSERINE (THIOL)-LYASE) (CSASE)]
contig161 23469005_f2_1 121 3526 591 196 468 1.20E-44 [SP:P37887] [OR:BACILLUS SUBTILIS] [GN:CYSK] [DE:(O-
ACETYLSERINE (THIOL)-LYASE) (CSASE)]
contig162 14582001_f1_1 122 3527 876 292 303 1.00E-25 [SP:Q10850] [OR:MYCOBACTERIUM TUBERCULOSIS]
[GN:MTCY39.11C] [DE:HYPOTHETICAL 145.8 KD PROTEIN
CY39.11C]
contig163 24331561_c2_5 123 3528 1023 341 1046 7.00E-106 [SP:Q00752] [OR:STREPTOCOCCUS MUTANS] [GN:MSMK]
[DE:MULTIPLE SUGAR-BINDING TRANSPORT ATP-BINDING
PROTEIN MSMK]
contig164 35957051_f1_1 124 3529 495 164 298 1.30E-26 [SP:P33661] [OR:CLOSTRIDIUM ACETOBUTYLICUM]
[DE:HYPOTHETICAL 15.2 KD PROTEIN IN SIGG 3′REGION (ORF
V)]
contig164 2242937_f1_2 125 3530 309 102 129 9.30E-08 [SP:P07908] [OR:BACILLUS SUBTILIS] [GN:DNAB]
[DE:REPLICATION INITIATION AND MEMBRANE
ATTACHMENT PROTEIN]
contig165 35410177_f1_1 126 3531 489 162 53 0.97 [SP:P00840] [OR:ZEA MAYS] [GN:ATP9] [DE:PROTEIN)]
contig165 6542687_f1_2 127 3532 321 107 213 3.20E-17 [SP:P37112] [OR:BACILLUS STEAROTHERMOPHILUS]
[GN:AMA] [DE:N-ACYL-L-AMINO ACID AMIDOHYDROLASE,
(AMINOACYLASE)]
contig166 34197182_c3_3 128 3533 720 239 733 1.00E-72 [SP:P35159] [OR:BACILLUS SUBTILIS] [GN:YPUL]
[DE:HYPOTHETICAL 26.0 KD PROTEIN IN SPMB-AROC
INTERGENIC REGION (ORFX13)]
contig167 26220638_c1_4 129 3534 306 102 119 2.00E-07 [OR:Methanococcus jannaschii] [PN:hypothetical protein MJ1163]
contig167 12535400_c1_3 130 3535 501 166 120 1.00E-06 [OR:Streptomyces coelicolor] [PN:actVA-1 protein]
contig168 24628186_c2_4 131 3536 666 221 595 4.40E-58 [AC:D83026] [OR:Bacillus subtilis] [GN:cydD] [NT:homologous to
many ATP-binding transport proteins;]
contig168 36140775_c1_3 132 3537 213 70 64 0.12 [SP:P47579] [OR:MYCOPLASMA GENITALIUM] [GN:MG337]
[DE:HYPOTHETICAL PROTEIN MG337]
contig169 14114030_f3_1 133 3538 876 291 258 2.20E-22 [AC:Y09476] [OR:Bacillus subtilis] [PN:DegA]
contig169 24401533_c3_5 134 3539 384 127
contig17 5117318_f2_1 135 3540 786 261 422 9.40E-40 [AC:U51911] [OR:Bacillus subtilis] [PN:unknown] [GN:ykrA]
[NT:similar in C-terminus to partial sequence of orf1]
contig170 24425150_f3_1 136 3541 609 202 60 0.5 [OR:Homo sapiens] [PN:Sm protein G]
contig170 13148452_f3_2 137 3542 240 80 74 0.096 [AC:Y10304] [OR:Bacillus subtilis] [GN:priA]
contig171 5098212_f1_1 138 3543 570 189 128 6.40E-07 [SP:Q03158] [OR:STREPTOCOCCUS PNEUMONIAE] [GN:ENDA]
[DE:DNA-ENTRY NUCLEASE (COMPETENCE-SPECIFIC
NUCLEASE),]
contig171 23629682_f2_2 139 3544 267 89 78 0.019 [AC:U88907] [OR:Pseudomonas wisconsinensis] [PN:lipase helper
protein] [GN:lpwB] [NT:LpwB; necessary for activation of LpwA]
contig172 16800967_f1_1 140 3545 1638 545 434 5.00E-41 [AC:U67998] [OR:Sinorhizobium meliloti] [PN:cyclic beta-1,2-glucan
modification protein] [GN:cgmA] [NT:ORF2; with similarity to rkpl
gene product encoded]
contig173 7071877_f1_1 141 3546 852 283 751 1.30E-74 [SP:P43472] [OR:PEDIOCOCCUS PENTOSACEUS] [GN:SCRR]
[DE:SUCROSE (SCR) OPERON REPRESSOR]
contig174 1300800_c2_3 142 3547 738 245 566 5.10E-55 [SP:P54521] [OR:BACILLUS SUBTILIS] [GN:YQIB] [DE:VII
LARGE SUBUNIT)]
contig175 6330408_f2_2 143 3548 375 124 330 5.30E-30 [SP:P39796] [OR:BACILLUS SUBTILIS] [GN:TRER]
[DE:TREHALOSE OPERON TRANSCRIPTIONAL REPRESSOR]
contig175 35360936_f3_3 144 3549 690 229 52 0.999 [AC:M55534] [OR:Rattus norvegicus] [GN:alpha(B)-crystallin]
[NT:ORF2]
contig176 4070317_f1_1 145 3550 213 70 56 0.43 [AC:Z66494] [OR:Caenorhabditis elegans] [PN:T01B7.1]
contig176 26306587_c2_6 146 3551 495 164 66 0.997 [AC:U38915] [OR:Synechocystis sp.] [PN:hypothetical transposase]
[NT:C-terminal part of the truncated hypothetical]
contig176 16212837_c3_7 147 3552 720 239 607 2.30E-59 [SP:P31458] [OR:ESCHERICHIA COLI] [GN:YIDU]
[DE:HYPOTHETICAL 64.0 KD PROTEIN IN IBPA-GYRB
INTERGENIC REGION]
contig177 13861532_c2_4 148 3553 813 270 52 0.78 [AC:X92955] [OR:Brassica oleracea] [PN:pollen coat protein]
[NT:putative]
contig177 26839662_c3_5 149 3554 615 204 54 0.7 [AC:JH0658] [OR:Chlamydia psittaci] [PN:histone H1-like protein]
contig178 26578936_f1_1 150 3555 243 80 191 3.40E-15 [SP:P54530] [OR:BACILLUS SUBTILIS] [GN:YQIS]
[DE:(PHOSPHOTRANSBUTYRYLASE)]
contig178 22870312_f1_2 151 3556 696 232 627 1.80E-61 [OR:Clostridium acetobutylicum] [PN:butyrate kinase]
contig179 6523282_f1_1 152 3557 450 149
contig179 20078317_f3_3 153 3558 528 175 199 8.30E-15 [AC:Y14078] [OR:Bacillus subtilis] [PN:Hypothetical protein]
[GN:yhaN] [NT:similarity to orfX from Staphylococcus aureus]
contig18 25414063_f3_1 154 3559 801 267 310 8.70E-28 [AC:D90905] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig180 3907966_f3_1 155 3560 579 192 267 2.50E-23 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydgI] [NT:SIMILAR TO
NITROREDUCTASE.]
contig181 20507325_f1_1 156 3561 546 181 485 2.00E-46 [AC:U57759] [OR:Streptococcus gordonii] [PN:intrageneric
coaggregation-relevant adhesin]
contig181 34275682_f2_3 157 3562 780 259 78 0.997 [SP:P26027] [OR:BRADYRHIZOBIUM JAPONICUM] [GN:NODU]
[DE:NODULATION PROTEIN U,]
contig181 4875063_c2_4 158 3563 318 105 207 5.70E-17 [AC:U51115] [OR:Bacillus subtilis] [PN:unknown] [NT:yebF]
contig182 1052192_c2_5 159 3564 246 82 98 2.00E-05 [SP:P23884] [OR:ESCHERICHIA COLI] [GN:GCVH]
[DE:GLYCINE CLEAVAGE SYSTEM H PROTEIN]
contig182 24808067_c3_7 160 3565 384 127 120 9.40E-08 [SP:P54503] [OR:BACILLUS SUBTILIS] [GN:YQGZ]
[DE:HYPOTHETICAL 14.8 KD PROTEIN IN SODA-COMGA
INTERGENIC REGION]
contig182 31853510_c3_6 161 3566 687 228 196 3.20E-15 [SP:P39604] [OR:BACILLUS SUBTILIS] [GN:YWCF]
[DE:HYPOTHETICAL 43.3 KD PROTEIN IN QOXD-VPR
INTERGENIC REGION]
contig183 35290627_f2_1 162 3567 510 169 67 0.088 [AC:Z81331] [OR:Mycobacterium tuberculosis] [PN:unknown]
[GN:MTCY16B7.35c] [NT:MTCY16B7.35c, unknown, len]
contig183 25680250_f2_2 163 3568 945 315 724 9.30E-72 [SP:P54460] [OR:BACILLUS SUBTILIS] [GN:YQET]
[DE:HYPOTHETICAL 34.6 KD PROTEIN IN DNAJ-RPSU
INTEREGENIC REGION]
contig184 34632787_f3_3 164 3569 372 123 53 0.84 [AC:L47974] [OR:Bos taurus] [PN:TATA-box binding protein]
contig184 34586007_c1_4 165 3570 387 128
contig184 401683_c3_5 166 3571 306 101 175 4.30E-13 [AC:U19620] [OR:Agrobacterium tumefaciens] [PN:MocD]
[GN:mocD]
contig185 11832325_f2_2 167 3572 849 282 811 5.60E-81 [AC:AF000658] [OR:Streptococcus pneumoniae] [PN:putative serine
protease] [GN:sphtra] [NT:SPHtra]
contig185 14272500_c2_4 168 3573 366 121 219 4.50E-17 [SP:P43820] [OR:HAEMOPHILUS INFLUENZAE] [GN:PHET]
[DE:TRNA LIGASE BETA CHAIN) (PHERS)]
contig186 36026883_f1_1 169 3574 372 123 164 2.00E-12 [SP:P54519] [OR:BACILLUS SUBTILIS] [GN:YQHY]
[DE:HYPOTHETICAL 14.7 KD PROTEIN IN ACCC-FOLD
INTERGENIC REGION]
contig186 14501578_f2_3 170 3575 531 177 210 2.20E-16 [SP:P47609] [OR:MYCOPLASMA GENITALIUM] [GN:MG369]
[DE:HYPOTHETICAL PROTEIN MG369]
contig187 24742937_c1_5 171 3576 885 295 647 1.30E-63 [SP:Q46171] [OR:CLOSTRIDIUM PERFRINGENS] [GN:ARCC]
[DE:CARBAMATE KINASE,]
contig187 35253525_c2_6 172 3577 654 217 179 3.00E-13 [SP:P37489] [OR:BACILLUS SUBTILIS] [GN:YYBO]
[DE:HYPOTHETICAL 48.2 KD PROTEIN IN COTF-TETB
INTERGENIC REGION]
contig188 6350281_f1_1 173 3578 1086 362 631 6.70E-62 [SP:Q53727] [OR:STAPHYLOCOCCUS AUREUS] [GN:PCRA]
[DE:ATP-DEPENDENT HELICASE PCRA,]
contig189 34252285_c2_3 174 3579 864 288 686 9.90E-68 [SP:Q46807] [OR:ESCHERICHIA COLI] [GN:YQEA]
[DE:CARBAMATE KINASE-LIKE PROTEIN 1]
contig189 973278_c3_4 175 3580 300 99 65 0.9991 [SP:P24588] [OR:HOMO SAPIENS] [DE:REGULATORY
SUBUNIT II HIGH AFFINITY BINDING PROTEIN)]
contig19 32657757_c2_1 176 3581 543 181 492 3.60E-47 [SP:P54476] [OR:BACILLUS SUBTILIS] [GN:YQFS]
[DE:PROBABLE ENDONUCLEASE IV,
(ENDODEOXYRIBONUCLEASE IV)]
contig19 4790931_c3_2 177 3582 327 108 68 0.992 [AC:AB000622] [OR:Enterobacter cloacae] [PN:MelY] [GN:melY]
contig190 29392275_f2_2 178 3583 279 92 67 0.21 [SP:P43013] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI1366]
[DE:HYPOTHETICAL PROTEIN HI1366 (ORF3)]
contig190 26570830_c1_6 179 3584 690 229 418 2.50E-39 [OR:Methanococcus jannaschii] [PN:ABC transporter probable ATP-
binding subunit homolog]
contig190 14631452_c3_7 180 3585 219 72 53 0.89 [SP:P42618] [OR:ESCHERICHIA COLI] [GN:YQJE]
[DE:HYPOTHETICAL 15.1 KD PROTEIN IN EXUR-TDCC
INTERGENIC REGION]
contig191 24728386_c3_4 181 3586 732 244 538 7.90E-52 [AC:Z75208] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yshD] [NT:shows homology to mutS of Thermus aquaticus;]
contig191 20523312_c2_3 182 3587 543 180 356 8.40E-32 [AC:Z75208] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yshD] [NT:shows homology to mutS of Thermus aquaticus;]
contig192 24656655_f2_3 183 3588 291 96 169 6.00E-13 [SP:P07472] [OR:HALOPHILIC EUBACTERIUM NRCC 41227]
[GN:RPLL] [DE:50S RIBOSOMAL PROTEIN L7/L12 (‘A’ TYPE)]
contig192 33867817_f1_1 184 3589 198 65 54 0.96 [OR:Mus musculus] [PN:corticosteroid-binding globulin]
contig192 19726550_c2_4 185 3590 438 145 71 0.37 [OR:Arabidopsis thaliana] [PN:V-type proton-ATPase]
contig193 7032562_f1_1 186 3591 945 314 1203 1.60E-122 [OR:Bacillus subtilis] [PN:recE protein] [GN:recE]
contig193 630301_c3_4 187 3592 183 60
contig194 1953142_f1_1 188 3593 522 173 298 1.30E-26 [AC:U73111] [OR:Salmonella typhimurium] [PN:high-affinity
periplasmic glutamine binding]
contig194 32226087_f2_2 189 3594 852 284 275 3.50E-24 [SP:P10344] [OR:ESCHERICHIA COLI] [GN:GLNH]
[DE:GLUTAMINE-BINDING PERIPLASMIC PROTEIN
PRECURSOR (GLNBP)]
contig195 4884708_c3_3 190 3595 1245 415 861 2.80E-86 [AC:Y08559] [OR:Bacillus subtilis] [PN:Unknown] [GN:ywnE]
[NT:Product similar to Escherichia coli cardiolipin]
contig196 22270327_f3_4 191 3596 183 60 58 0.29 [SP:P13308] [OR:BACTERIOPHAGE T4] [GN:Y06C]
[DE:HYPOTHETICAL 8.5 KD PROTEIN IN TK-VS INTERGENIC
REGION]
contig196 30578406_c3_6 192 3597 216 71 83 0.0045 [OR:Bacillus amyloliquefaciens] [PN:probable phosphotransferase
system enzyme II, fructose-specific]
contig196 23906264_c1_5 193 3598 1392 463 616 2.60E-60 [SP:P23387] [OR:RHODOBACTER CAPSULATUS] [GN:FRUA]
[DE:(EC 2.7.1.69) (EII-FRU)]
contig197 21680187_f2_2 194 3599 735 244 223 1.10E-18 [AC:AB002150] [OR:Bacillus subtilis] [PN:YbbH]
contig198 3915192_c2_3 195 3600 957 319 89 0.15 [SP:P00551] [OR:Escherichia coli] [NT:phosphotransferase (AA 1-
271)]
contig199 24329703_c3_4 196 3601 603 200 219 3.00E-18 [SP:P75144] [OR:MYCOPLASMA PNEUMONIAE] [GN:MGPA]
[DE:MGPA PROTEIN]
contig2 23646932_f2_1 197 3602 285 95 140 7.10E-10 [AC:L29324] [OR:Streptococcus pneumoniae] [PN:repressor protein]
[NT:ORF3]
contig20 23632010_f3_1 198 3603 462 153 321 6.70E-28 [SP:P55465] [OR:RHIZOBIUM SP] [GN:Y4GI]
[DE:HYPOTHETICAL 102.8 KD PROTEIN Y4GI]
contig20 32429577_f3_2 199 3604 387 128
contig200 16142807_f3_2 200 3605 201 66 94 0.00063 [OR:Craterostigma plantagineum] [PN:transketolase, 3]
contig200 29579660_f2_1 201 3606 837 278 926 3.70E-93 [AC:Z73234] [OR:Bacillus subtilis] [PN:transketolase] [GN:tktA]
contig200 2460378_c3_7 202 3607 507 168 98 0.0001 [AC:AE000352] [OR:Escherichia coli] [NT:o305; This 305 aa orf is
20 pct identical (9 gaps)]
contig201 10664542_f3_3 203 3608 999 332 400 2.00E-37 [AC:U29454] [OR:Staphylococcus aureus] [PN:pencillin binding
protein 4] [GN:pbpD] [NT:PBP4; low molecular weight PBP; Method]
contig202 34084415_f3_3 204 3609 192 63 133 4.30E-09 [OR:Methanococcus jannaschii] [PN:hypothetical protein MJ1163]
contig202 3945463_f3_4 205 3610 1329 442 139 4.80E-07 [SP:Q57647] [OR:METHANOCOCCUS JANNASCHII]
[GN:MJ0188] [DE:HYPOTHETICAL PROTEIN MJ0188]
contig202 32242840_f1_1 206 3611 231 77 129 4.70E-08 [SP:P75144] [OR:MYCOPLASMA PNEUMONIAE] [GN:MGPA]
[DE:MGPA PROTEIN]
contig203 22868761_f1_1 207 3612 417 138 372 1.90E-34 [SP:P42085] [OR:BACILLUS SUBTILIS] [GN:XPT]
[DE:XANTHINE PHOSPHORIBOSYLTRANSFERASE,]
contig203 25676592_f3_2 208 3613 666 222 422 9.40E-40 [SP:P42086] [OR:BACILLUS SUBTILIS] [GN:PBUX]
[DE:XANTHINE PERMEASE]
contig204 3946941_c1_3 209 3614 399 132 110 1.10E-06 [SP:P33645] [OR:ESCHERICHIA COLI] [GN:CHPA] [DE:PEMK-
LIKE PROTEIN 1 (MAZF PROTEIN)]
contig204 7244062_c3_4 210 3615 249 82 61 0.89 [AC:U24188] [OR:Lilium longiflorum] [PN:calcium/calmodulin-
dependent protein kinase] [GN:CCaMK] [NT:serine/threonine kinase;
binds to calcium and]
contig205 5132817_f2_1 211 3616 894 297 184 4.40E-13 [SP:P39074] [OR:BACILLUS SUBTILIS] [GN:BMRU] [DE:BMRU
PROTEIN]
contig206 19728452_c2_8 212 3617 822 274 84 0.76 [AC:U97189] [OR:Caenorhabditis elegans] [GN:C48B6.7]
contig206 9870317_c2_7 213 3618 183 60 59 0.24 [AC:Z82015] [OR:Bacillus subtilis] [GN:yukI] [NT:yukI is new name
for yuxI]
contig206 167176_c3_9 214 3619 429 142
contig206 25445312_c1_6 215 3620 558 185 50 0.997 [AC:U67984] [OR:Pongo pygmaeus] [PN:Charcot-Leyden crystal
protein] [NT:contains carbohydrate recognition domain;]
contig207 36375811_f1_1 216 3621 801 266 775 3.70E-77 [AC:U28137] [OR:Lactobacillus casei] [PN:Ccpa protein] [GN:ccpA]
contig207 23957837_c2_3 217 3622 621 206 132 5.00E-09 [AC:X81089] [OR:Lactococcus lactis] [NT:ORF2]
contig208 969452_c3_3 218 3623 723 241 248 2.60E-21 [SP:Q60283] [OR:METHANOCOCCUS JANNASCHII]
[GN:MJECL24] [DE:HYPOTHETICAL PROTEIN MJECL24]
contig209 35406412_f2_1 219 3624 537 178 135 2.40E-09 [AC:D90768] [OR:Escherichia coli] [PN:Immunity repressor protein.]
[GN:ycjC] [NT:ORF_ID]
contig209 22526950_f3_2 220 3625 384 127 302 4.90E-27 [SP:P45171] [OR:HAEMOPHILUS INFLUENZAE] [GN:POTA]
[DE:SPERMIDINE/PUTRESCINE TRANSPORT ATP-BINDING
PROTEIN POTA]
contig209 24414127_f3_3 221 3626 225 74 156 5.00E-11 [SP:P44531] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0126]
[DE:HYPOTHETICAL ABC TRANSPORTER ATP-BINDING
PROTEIN HI0126]
contig209 3907127_f3_4 222 3627 216 72 176 4.60E-13 [SP:P23858] [OR:ESCHERICHIA COLI] [GN:POTA]
[DE:SPERMIDINE/PUTRESCINE TRANSPORT ATP-BINDING
PROTEIN POTA]
contig21 4328382_c3_1 223 3628 459 152 228 3.40E-19 [OR:Bacillus megaterium] [PN:hypothetical protein 2]
contig210 32221951_c3_1 224 3629 708 235 549 3.30E-53 [SP:P19210] [OR:BACILLUS FIRMUS] [GN:MUTM]
[DE:GLYCOSYLASE)]
contig211 14648452_c2_6 225 3630 687 229 70 0.74 [AC:M61022] [OR:Mus musculus] [PN:immunoglobulin heavy chain
VDJ region]
contig212 19579212_c3_1 226 3631 360 119 300 7.90E-27 [OR:Streptococcus thermophilus] [PN:transposase]
contig213 1208316_f2_1 227 3632 762 253 87 0.12 [SP:P39787] [OR:BACILLUS SUBTILIS] [GN:DNAD] [DE:DNA
REPLICATION PROTEIN DNAD]
contig213 26776567_f2_2 228 3633 333 110 78 0.0075 [SP:P32529] [OR:SACCHAROMYCES CEREVISIAE] [GN:RPA12]
[DE:(A12.2)]
contig214 26384681_f3_3 229 3634 432 143 497 1.10E-47 [AC:U39612] [OR:Streptococcus mutans] [PN:formyl-tetrahydrofolate
synthetase] [GN:fhs] [NT:formyl-tetrahydrofolate ligase; ATP-
dependant]
contig214 5164025_f1_1 230 3635 903 300 989 7.70E-100 [AC:U39612] [OR:Streptococcus mutans] [PN:formyl-tetrahydrofolate
synthetase] [GN:fhs] [NT:formyl-tetrahydrofolate ligase; ATP-
dependant]
contig215 22535885_c3_2 231 3636 708 236 347 8.30E-32 [SP:P54176] [OR:BACILLUS CEREUS] [DE:HEMOLYSIN III]
contig215 25667192_f1_1 232 3637 369 123 263 6.60E-23 [AC:Z16422] [OR:Staphylococcus aureus] [PN:unknown] [GN:ORF2]
contig216 24025151_c2_3 233 3638 918 305 586 3.90E-57 [SP:P09374] [OR:ESCHERICHIA COLI] [GN:PFLA]
[DE:PYRUVATE FORMATE-LYASE 1 ACTIVATING ENZYME,]
contig216 20488149_c1_2 234 3639 435 144 344 1.70E-31 [SP:P37836] [OR:CHLAMYDOMONAS REINHARDTII] [GN:PF1]
[DE:(FRAGMENT)]
contig217 651708_f3_3 235 3640 279 92 115 1.20E-06 [SP:P33566] [OR:NEISSERIA GONORRHOEAE] [GN:PILD]
[DE:TYPE 4 PREPILIN-LIKE PROTEIN SPECIFIC LEADER
PEPTIDASE,]
contig217 24317803_c2_5 236 3641 297 98 73 0.086 [SP:P44493] [OR:HAEMOPHILUS INFLUENZAE] [GN:AMIB]
[DE:PROBABLE N-ACETYLMURAMOYL-L-ALANINE AMIDASE
PRECURSOR,]
contig217 10743932_c3_6 237 3642 399 132 431 1.00E-40 [AC:U81957] [OR:Streptococcus gordonii] [PN:RNA polymerase beta′
subunit] [GN:rpoC]
contig218 34570202_f3_1 238 3643 924 308 582 2.80E-63 [AC:D64005] [OR:Synechocystis sp.] [PN:cadmium-transporting
ATPase] [GN:cadA] [NT:ORF_ID]
contig219 16853152_f2_2 239 3644 1305 435 277 4.80E-24 [OR:Saccharopolyspora erythraea] [PN:glutamate transport protein
homolog] [GN:hgtA]
contig22 23457281_f3_1 240 3645 867 288 905 6.10E-91 [SP:P04077] [OR:BACILLUS CALDOTENAX] [GN:TYRS]
[DE:TYROSYL-TRNA SYNTHETASE, (TYROSINE--TRNA
LIGASE) (TYRRS)]
contig220 17002213_c1_3 241 3646 417 138 257 2.90E-22 [SP:P39667] [OR:BACILLUS SUBTILIS] [GN:YRXA]
[DE:HYPOTHETICAL 19.7 KD PROTEIN IN PHEA-NIFS
INTERGENIC REGION (ORF1)]
contig220 25679500_c3_4 242 3647 645 214 318 9.80E-29 [SP:P77791] [OR:ESCHERICHIA COLI] [GN:YLAD] [DE:20.0 KD
PROTEIN IN TESB-HHA INTERGENIC REGION]
contig221 4039802_f2_1 243 3648 843 280 144 4.50E-08 [AC:JC6007] [OR:Bacillus thuringiensis] [PN:transcriptional activator
plcR] [GN:plcR]
contig221 34195262_f3_2 244 3649 510 169 142 2.70E-09 [AC:U65015] [OR:Vibrio furnissii] [PN:GlcNAc 6-P deacetylase]
[GN:manD] [NT:ManD]
contig222 12218900_f2_2 245 3650 312 103 136 8.90E-09 [SP:Q38653] [OR:BACTERIOPHAGE A511] [GN:PLY511]
[DE:ENDOLYSIN, (N-ACETYLMURAMOYL-L-ALANINE
AMIDASE)]
contig222 16822026_f3_3 246 3651 822 273 266 3.20E-23 [SP:Q38653] [OR:BACTERIOPHAGE A511] [GN:PLY511]
[DE:ENDOLYSIN, (N-ACETYLMURAMOYL-L-ALANINE
AMIDASE)]
contig222 30078177_f3_4 247 3652 567 188 328 8.60E-30 [SP:P74696] [OR:SYNECHOCYSTIS SP] [GN:TRUB] [DE:TRNA
PSEUDOURIDINE 55 SYNTHASE (PS155 SYNTHASE)]
contig223 33400316_c2_5 248 3653 396 132 212 1.70E-17 [OR:Streptococcus salivarius] [PN:hypothetical protein]
contig223 864187_c2_4 249 3654 663 220 62 0.9998 [AC:U93364] [OR:Lactococcus lactis cremoris] [PN:EpsR] [GN:epsR]
contig224 23523557_c1_5 250 3655 252 84 65 0.64 [AC:U86345] [OR:Trypanosoma brucei rhodesiense] [PN:GP63-1
surface protease homolog] [GN:gp63-1] [NT:homolog of Leishmania
GP63 surface protease;]
contig224 20820886_c1_4 251 3656 234 77 55 0.94 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydfE] [NT:FUNCTION
UNKNOWN.]
contig224 6739062_c3_7 252 3657 948 315 545 8.70E-53 [SP:P42015] [OR:BACILLUS STEAROTHERMOPHILUS]
[GN:PTSG] [DE:COMPONENT), (EII-GLC/EIII-GLC)
(FRAGMENT)]
contig225 35394531_f2_1 253 3658 684 227 293 4.40E-26 [SP:P26212] [OR:BACILLUS SUBTILIS] [GN:SACT] [DE:SACPA
OPERON ANTITERMINATOR]
contig225 33396937_f2_2 254 3659 615 204 321 8.50E-29 [AC:U65014] [OR:Vibrio furnissii] [PN:PTS permease for N-
acetylglucosamine and] [GN:nagE] [NT:PTS enzyme IINag]
contig226 7146942_c1_5 255 3660 609 202 128 5.50E-07 [SP:P45544] [OR:ESCHERICHIA COLI] [GN:YHFR] [DE:(O265)]
contig226 4808462_f3_3 256 3661 192 63 55 0.84 [SP:P45710] [OR:BACILLUS SUBTILIS] [GN:YOXI]
[DE:HYPOTHETICAL 18.1 KD PROTEIN IN CCDA 3′REGION]
contig226 5109387_f1_1 257 3662 264 87 51 0.9995 [OR:Aspergillus fumigatus] [PN:chs A protein]
contig227 20898510_f2_1 258 3663 1062 354 1820 6.70E-188 [OR:Enterococcus faecalis] [PN:cylM protein]
contig228 33672192_f2_1 259 3664 1230 409 878 4.50E-88 [AC:Y08559] [OR:Bacillus subtilis] [PN:Unknown] [GN:ywnE]
[NT:Product similar to Escherichia coli cardiolipin]
contig229 158568_f1_1 260 3665 846 281 53 0.997 [SP:P38636] [OR:SACCHAROMYCES CEREVISIAE] [GN:ATX1]
[DE:METAL HOMEOSTASIS FACTOR ATX1]
contig23 26179687_c3_5 261 3666 387 129 138 1.20E-09 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydeP] [NT:FUNCTION
UNKNOWN, SIMILAR PRODUCT IN E. COLI, H.]
contig23 494557_c3_4 262 3667 387 128 172 2.90E-13 [OR:Streptococcus mutans] [PN:orf X 5′ of lacR]
contig230 33401717_c1_3 263 3668 1194 398 459 1.10E-42 [AC:D90911] [OR:Synechocystis sp.] [PN:cation-transporting ATPase
PacL] [GN:pacL] [NT:ORF_ID]
contig231 31678968_f3_2 264 3669 726 241 101 0.002 [AC:Z50854] [OR:Enterococcus hirae] [GN:orf]
contig231 24644013_f3_3 265 3670 252 83 60 0.36 [AC:U41518] [OR:Homo sapiens] [PN:channel-like integral membrane
protein] [GN:AQP-1] [NT:aquaporin-1]
contig232 14652268_c2_2 266 3671 858 285 364 4.00E-33 [SP:Q02469] [OR:SHEWANELLA PUTREFACIENS]
[DE:(FLAVOCYTOCHROME C)]
contig233 11116326_f2_1 267 3672 861 286 355 1.20E-32 [AC:D83026] [OR:Bacillus subtilis] [GN:yxkD] [NT:homologous to
jojC gene product (B. subtilis;]
contig234 22474077_c1_3 268 3673 822 274 537 3.20E-56 [SP:P39046] [OR:ENTEROCOCCUS HIRAE] [DE:(MURAMIDASE
2)]
contig234 22464077_f1_1 269 3674 348 116 154 1.30E-10 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydeR] [NT:PROBABLE
INTEGRAL MEMBRANE PROTEIN, SIMILAR TO]
contig235 30642252_f3_3 270 3675 1704 567 94 0.36 [AC:D90906] [OR:Synechocystis sp.] [PN:DNA helicase II]
[GN:uvrD] [NT:ORF_ID]
contig236 9932836_c1_3 271 3676 495 164 180 4.10E-14 [OR:Listeria monocytogenes] [PN:probable transport protein arpJ]
contig236 14567268_c1_2 272 3677 309 102
contig237 25439437_c3_7 273 3678 750 250 288 1.50E-25 [SP:P45170] [OR:HAEMOPHILUS INFLUENZAE] [GN:POTB]
[DE:SPERMIDINE/PUTRESCINE TRANSPORT SYSTEM
PERMEASE PROTEIN POTB]
contig237 10632837_c3_6 274 3679 453 150 75 0.77 [SP:P50591] [OR:HOMO SAPIENS] [DE:TNF-RELATED
APOPTOSIS INDUCING LIGAND (TRAIL) (APO-2 LIGAND)]
contig238 26423437_f1_1 275 3680 759 252 258 2.20E-22 [AC:AE000310] [OR:Escherichia coli] [GN:yojL] [NT:f351; Residues
1-121 are 100 pct identical to]
contig238 33261461_f2_3 276 3681 675 224 227 4.30E-19 [SP:P31465] [OR:ESCHERICHIA COLI] [GN:YIEF]
[DE:HYPOTHETICAL 20.4 KD PROTEIN IN TNAB-BGLB
INTERGENIC REGION]
contig238 6923441_f2_4 277 3682 297 99 124 3.50E-08 [SP:P31465] [OR:ESCHERICHIA COLI] [GN:YIEF]
[DE:HYPOTHETICAL 20.4 KD PROTEIN IN TNAB-BGLB
INTERGENIC REGION]
contig239 7082001_f1_1 278 3683 201 66 214 1.00E-17 [SP:P39788] [OR:BACILLUS SUBTILIS] [GN:NTH]
[DE:APYRIMIDINIC SITE) LYASE)]
contig239 35979692_f3_2 279 3684 699 232 258 2.20E-22 [SP:P39796] [OR:BACILLUS SUBTILIS] [GN:TRER]
[DE:TREHALOSE OPERON TRANSCRIPTIONAL REPRESSOR]
contig24 15026400_c1_3 280 3685 681 226 91 0.083 [SP:Q09251] [OR:CAENORHABDITIS ELEGANS] [GN:C16C10.5]
[DE:HYPOTHETICAL 47.6 KD PROTEIN C16C10.5 IN
CHROMOSOME III]
contig240 20000332_c3_6 281 3686 1302 433 1201 2.60E-122 [SP:P22326] [OR:BACILLUS SUBTILIS] [GN:TYRS]
[DE:(TYRRS)]
contig241 16532892_f2_3 282 3687 276 91 154 4.40E-11 [SP:P28635] [OR:ESCHERICHIA COLI] [GN:YAEC]
[DE:PRECURSOR]
contig241 23992175_f1_2 283 3688 849 282 550 2.60E-53 [SP:Q55482] [OR:SYNECHOCYSTIS SP] [GN:SLL0506]
[DE:HYPOTHETICAL 28.8 KD PROTEIN SLL0506]
contig241 29306955_f3_4 284 3689 195 65
contig242 4431426_f1_1 285 3690 1044 347 97 0.0062 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydcM] [NT:SIMILAR TO
IMMUNITY REGION PROTEIN IN BACTERIOPHAGE]
contig242 807688_f2_2 286 3691 399 133
contig243 1370311_f2_1 287 3692 1089 362 402 1.20E-37 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydcL] [NT:PROBABLE
INTEGRASE.]
contig244 1222061_f2_2 288 3693 306 101 183 2.20E-13 [AC:L49336] [OR:Clostridium longisporum] [PN:PTS-dependent
enzyme II] [GN:abgF]
contig244 14629663_f1_1 289 3694 1479 492 1067 4.20E-108 [SP:P42403] [OR:BACILLUS SUBTILIS] [GN:YCKE] [DE:(BETA-
D-GLUCOSIDE GLUCOHYDROLASE) (AMYGDALASE)]
contig245 12895913_f2_1 290 3695 402 133 66 0.9999 [AC:U41224] [OR:Trypanosoma brucei rhodesiense] [GN:expression
site-associated gene Id]
contig245 16609682_f2_2 291 3696 228 75 79 0.0093 [OR:Trypanosoma cruzi] [PN:repetitive protein antigen 3]
contig245 16485885_f2_3 292 3697 240 79 108 7.80E-06 [AC:Z94043] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yvcJ] [NT:similar to hypothetical MTCY21B4]
contig246 35391578_f2_2 293 3698 627 208 545 8.70E-53 [SP:P04067] [OR:STREPTOMYCES PLICATUS]
[DE:ACETYLCHITOBIOSYL BETA-N-
ACETYLGLUCOSAMINIDASE H) (ENDO H)]
contig247 30565805_f3_2 294 3699 1209 402 75 0.89 [SP:Q06242] [OR:ENTEROCOCCUS FAECIUM] [GN:VANZ]
[DE:VANZ PROTEIN]
contig248 24501538_f1_1 295 3700 792 263 93 0.00028 [SP:P35838] [OR:CLOSTRIDIUM ACETOBUTYLICUM]
[DE:HYPOTHETICAL PROTEIN IN LYC 5′REGION
(FRAGMENT)]
contig248 507187_f1_2 296 3701 210 70 52 0.78 [SP:P00023] [OR:CROTALUS ATROX] [DE:CYTOCHROME C]
contig249 4804703_c3_8 297 3702 456 151 310 6.90E-28 [AC:D84432] [OR:Bacillus subtilis] [PN:BltD]
contig249 25680332_c3_7 298 3703 258 85 70 0.026 [AC:U53154] [OR:Caenorhabditis elegans] [GN:C33G8.3]
contig249 30272056_c2_6 299 3704 498 165 205 9.30E-17 [OR:Methanococcus jannaschii] [PN:hypothetical protein homolog
MJ0531]
contig25 6257627_f1_1 300 3705 576 191 392 1.40E-36 [SP:P39605] [OR:BACILLUS SUBTILIS] [GN:YWCG]
[DE:HYPOTHETICAL 28.3 KD PROTEIN IN QOXD-VPR
INTERGENIC REGION]
contig250 13923401_c3_5 301 3706 1170 389 983 3.30E-99 [OR:Bacillus subtilis] [PN:hisC homolog]
contig250 129032_c2_4 302 3707 243 80 57 0.36 [AC:U31743] [OR:Homo sapiens] [PN:HLA-DMB variant]
contig250 24710428_c1_3 303 3708 183 60 51 0.85 [AC:L19118] [OR:Rattus norvegicus] [PN:complement receptor type 1]
[GN:CR1]
contig251 34414724_c1_4 304 3709 1434 478 401 6.00E-37 [SP:P37710] [OR:ENTEROCOCCUS FAECALIS]
[DE:AUTOLYSIN, (N-ACETYLMURAMOYL-L-ALANINE
AMIDASE)]
contig252 10157842_c1_4 305 3710 492 163 64 0.15 [AC:U56077] [OR:Pseudomonas aeruginosa] [PN:periplasmic
glucosidase] [NT:Escherichia coli BglX homolog]
contig252 26260912_f2_1 306 3711 474 157 232 1.30E-19 [SP:Q59384] [OR:ESCHERICHIA COLI] [GN:GLOA]
[DE:(ALDOKETOMUTASE) (GLYOXALASE 1)]
contig252 6064443_f3_3 307 3712 441 147 314 2.60E-28 [SP:P22347] [OR:LACTOCOCCUS LACTIS]
[DE:HYPOTHETICAL 18.7 KD PROTEIN IN PEPX 3′REGION
(ORF3)]
contig253 34179786_c2_2 308 3713 828 276 375 8.90E-35 [SP:Q57664] [OR:METHANOCOCCUS JANNASCHII]
[GN:MJ0211] [DE:GALACTOSE 4-EPIMERASE)]
contig253 860636_c3_3 309 3714 537 178 75 0.32 [SP:P45145] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI1297]
[DE:HYPOTHETICAL PROTEIN HI1297]
contig254 33243877_f3_6 310 3715 375 124 208 6.90E-17 [SP:P40410] [OR:BACILLUS SUBTILIS] [GN:FEUB] [DE:IRON-
UPTAKE SYSTEM PROTEIN FEUB]
contig254 14667842_f2_5 311 3716 873 291 372 1.90E-34 [SP:P49937] [OR:BACILLUS SUBTILIS] [GN:FHUG]
[DE:FERRICHROME TRANSPORT PERMEASE PROTEIN FHUG]
contig255 21490957_c3_4 312 3717 243 80 50 0.93 [AC:Z72843] [OR:Saccharomyces cerevisiae] [NT:ORF YGR057c]
contig255 22470386_f2_2 313 3718 426 142 52 0.96 [AC:X61517] [OR:Mycoplasma genitalium] [NT:random genomic
sequence MG08; open reading frame]
contig256 38557_f2_2 314 3719 615 204 522 2.40E-50 [AC:D90907] [OR:Synechocystis sp.] [PN:amidase] [NT:ORF_ID]
contig256 33802182_f1_1 315 3720 801 266 899 2.70E-90 [SP:Q45486] [OR:BACILLUS SUBTILIS] [GN:YZDD] [DE:PET112-
LIKE PROTEIN]
contig257 7070428_c3_3 316 3721 468 155 70 0.87 [SP:P48923] [OR:CANDIDA PARAPSILOSIS] [GN:ND6]
[DE:NADH-UBIQUINONE OXIDOREDUCTASE CHAIN 6,]
contig257 6692125_f3_1 317 3722 234 77 68 0.081 [AC:U02510] [OR:Ovine respiratory syncytial virus] [PN:M2 (22K)
protein]
contig257 24300674_c1_2 318 3723 636 211 1069 2.60E-108 [SP:P19775] [OR:STAPHYLOCOCCUS AUREUS] [GN:TNP]
[DE:TRANSPOSASE FOR INSERTION SEQUENCE ELEMENT
IS256 IN TRANSPOSON TN4001]
contig258 12219781_f3_3 319 3724 837 278 500 5.10E-48 [AC:D84648] [OR:Bacillus stearothermophilus] [PN:exo-alpha-1,4-
glucosidase]
contig258 210962_f1_1 320 3725 489 162 240 1.30E-19 [SP:P29094] [OR:BACILLUS THERMOGLUCOSIDASIUS]
[DE:DEXTRINASE) (ISOMALTASE) (DEXTRIN 6-ALPHA-D-
GLUCANOHYDROLASE)]
contig258 23604838_f2_2 321 3726 276 91 217 2.70E-17 [SP:P51184] [OR:STAPHYLOCOCCUS XYLOSUS] [GN:SCRA]
[DE:(EC 2.7.1.69) (EII-SCR)]
contig259 1172752_f2_3 322 3727 213 70 58 0.51 [SP:P19746] [OR:CAPRIPOXVIRUS] [DE:PROTEIN F7]
contig259 423125_f1_1 323 3728 768 255 276 2.80E-24 [AC:X98110] [OR:Streptococcus gordonii] [PN:response regulator]
[GN:comE2]
contig259 4814625_f2_4 324 3729 759 252 186 9.50E-15 [AC:L13334] [OR:Staphylococcus lugdunensis]
contig26 23647555_c1_3 325 3730 720 239 335 1.60E-30 [SP:P50726] [OR:BACILLUS SUBTILIS] [GN:YPAA]
[DE:HYPOTHETICAL 20.5 KD PROTEIN IN SERA-FER
INTERGENIC REGION]
contig260 14220443_f1_1 326 3731 639 212 286 2.40E-25 [AC:D90779] [OR:Escherichia coli] [PN:Acyl carrier protein
phosphodiesterase (ACP] [NT:ORF_ID]
contig260 25651702_c3_6 327 3732 1185 394 418 2.50E-39 [AC:X99400] [OR:Streptococcus pneumoniae] [PN:membrane protein]
contig261 781432_f3_1 328 3733 900 300 444 1.40E-41 [SP:P08716] [OR:ESCHERICHIA COLI] [GN:HLYB]
[DE:HEMOLYSIN SECRETION ATP-BINDING PROTEIN,
PLASMID]
contig261 33292342_c3_2 329 3734 207 68 52 0.999 [SP:P20296] [OR:PYROCOCCUS WOESEI] [DE:HYPOTHETICAL
24.7 KD PROTEIN IN GAPDH 5′REGION (ORF A)]
contig262 15710913_f3_2 330 3735 1257 418 385 7.80E-36 [SP:Q48460] [OR:KLEBSIELLA PNEUMONIAE] [DE:PROBABLE
CPS BIOSYNTHESIS GLYCOSYLTRANSFERASE, (ORF14)]
contig263 4688263_f1_1 331 3736 624 208 228 1.50E-18 [AC:U61539] [OR:Bacillus firmus] [PN:Na+/H+ antiporter] [GN:nhaC]
[NT:NahC]
contig264 7082026_f1_1 332 3737 885 295 192 9.50E-14 [SP:Q05587] [OR:SALMONELLA TYPHIMURIUM] [GN:POCR]
[DE:REGULATORY PROTEIN POCR]
contig265 29392933_f1_1 333 3738 1146 382 763 6.90E-76 [AC:D78016] [OR:Enterococcus faecalis] [PN:TRAC] [GN:traC]
[NT:ORF3; replication related gene]
contig266 26023917_f1_1 334 3739 528 175 270 1.60E-33 [AC:Z94864] [OR:Schizosaccharomyces pombe] [PN:unknown]
[GN:SPAC57A10.03] [NT:SPAC57A10.03, cyclophilin-related, len]
contig266 36147192_f3_3 335 3740 231 76 61 0.69 [AC:U00033] [OR:Caenorhabditis elegans] [GN:F37C12.2]
contig266 21679664_c2_5 336 3741 378 125 321 4.70E-29 [SP:Q08432] [OR:BACILLUS SUBTILIS] [GN:PATB]
[DE:PUTATIVE AMINOTRANSFERASE B,]
contig267 2757327_c1_3 337 3742 1005 335 340 4.60E-31 [AC:U77778] [OR:Staphylococcus epidermidis] [PN:putative
membrane protein] [GN:epiH] [NT:EpiH]
contig267 12673385_f3_2 338 3743 216 71 64 0.078 [SP:P26886] [OR:EUPLOTES RAIKOVI] [GN:MAT2,MAT9]
[DE:MATING PHEROMONE ER-2/ER-9 PRECURSOR
(EUPLOMONE R2/R9)]
contig267 4886018_f1_1 339 3744 285 95 91 0.00059 [SP:Q05624] [OR:CLOSTRIDIUM ACETOBUTYLICUM] [GN:PTB]
[DE:PHOSPHATE BUTYRYLTRANSFERASE,
(PHOSPHOTRANSBUTYRYLASE)]
contig268 14875000_f1_1 340 3745 1620 540 273 1.30E-20 [AC:M81736] [OR:Staphylococcus aureus] [PN:collagen adhesin]
[GN:cna]
contig269 3917842_c3_9 341 3746 516 171 498 8.30E-48 [SP:P32393] [OR:BACILLUS SUBTILIS] [GN:COMEB] [DE:COME
OPERON PROTEIN 2]
contig269 23697187_c1_5 342 3747 681 226 327 1.10E-29 [SP:P39694] [OR:BACILLUS SUBTILIS] [GN:COMEA] [DE:COME
OPERON PROTEIN 1]
contig269 29785134_c2_6 343 3748 1095 364 136 4.30E-06 [OR:Methanococcus jannaschii] [PN:hypothetical protein MJ1318]
contig27 36366450_f2_1 344 3749 543 181
contig270 35407675_f2_2 345 3750 1416 471 1252 1.00E-127 [AC:D88802] [OR:Bacillus subtilis] [GN:ydiF] [NT:H. influenzae
hypothetical ABC transporter; P44808]
contig271 34661566_c2_4 346 3751 336 111 111 3.70E-06 [SP:P29823] [OR:AGROBACTERIUM RADIOBACTER]
[GN:LACF] [DE:LACTOSE TRANSPORT SYSTEM PERMEASE
PROTEIN LACF]
contig271 26250012_c3_5 347 3752 1221 406 375 1.30E-59 [SP:P75264] [OR:MYCOPLASMA PNEUMONIAE]
[DE:HYPOTHETICAL ABC TRANSPORTER ATP-BINDING
PROTEIN MG187 HOMOLOG]
contig272 209811_f1_1 348 3753 732 243 201 2.50E-16 [SP:P77728] [OR:ESCHERICHIA COLI] [GN:APBA] [DE:APBA
PROTEIN]
contig272 21978881_c1_2 349 3754 1080 359 710 2.80E-70 [SP:P54459] [OR:BACILLUS SUBTILIS] [GN:YQEN]
[DE:HYPOTHETICAL 40.5 KD PROTEIN IN COMEC-RPST
INTERGENIC REGION]
contig273 2444525_f3_1 350 3755 1032 343 250 3.70E-21 [AC:U66880] [OR:Staphylococcus simulans] [PN:FemA] [GN:femA]
contig273 26760250_c1_2 351 3756 276 91 62 0.34 [AC:D90900] [OR:Synechocystis sp.] [PN:shikimate kinase]
[GN:aroK] [NT:ORF_ID]
contig274 969092_f2_1 352 3757 768 255 72 0.039 [AC:M29955] [OR:Escherichia coli] [NT:ltrA gene product (5′ end
put.); putative]
contig274 21519813_f2_2 353 3758 228 75 58 0.57 [AC:Z81030] [OR:Caenorhabditis elegans] [PN:C01G10.4]
contig275 26377340_c3_4 354 3759 747 249 176 4.20E-12 [SP:P44180] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI1405]
[DE:HYPOTHETICAL PROTEIN HI1405]
contig275 7245326_c3_3 355 3760 228 75 79 0.0085 [AC:D90907] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig275 4042878_c2_2 356 3761 330 109 53 0.74 [AC:U68241] [OR:Carassius auratus] [PN:twiggy-winkle hedgehog]
contig276 14954143_c2_8 357 3762 432 144 501 4.00E-48 [SP:P54689] [OR:HAEMOPHILUS INFLUENZAE] [GN:ILVE]
[DE:BRANCHED-CHAIN AMINO ACID AMINOTRANSFERASE,]
contig276 23445432_c1_5 358 3763 477 158 291 7.10E-26 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydaT] [NT:FUNCTION
UNKNOWN.]
contig277 6736631_f3_2 359 3764 1092 363 707 5.90E-70 [SP:P12045] [OR:BACILLUS SUBTILIS] [GN:PURK] [DE:(AIR
CARBOXYLASE) (AIRC)]
contig277 5116543_f3_3 360 3765 393 130 502 3.10E-48 [SP:P12047] [OR:BACILLUS SUBTILIS] [GN:PURB]
[DE:ADENYLOSUCCINATE LYASE, (ADENYLOSUCCINASE)
(ASL)]
contig278 31461062_c3_2 361 3766 1089 362 353 1.90E-32 [SP:P44720] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0457]
[DE:HYPOTHETICAL PROTEIN HI0457]
contig279 26282885_f3_5 362 3767 1548 515 281 6.50E-32 [SP:P13398] [OR:PSEUDOMONAS SP] [GN:NYLA]
[DE:DEGRADING ENZYME EI)]
contig279 14663938_f1_2 363 3768 183 60 58 0.29 [OR:Bos primigenius taurus] [PN:bactenecin 5]
contig28 253886_f1_1 364 3769 438 145 248 2.60E-21 [AC:L16975] [OR:Lactococcus lactis] [NT:ORF1]
contig28 5197813_f1_2 365 3770 453 151 227 4.30E-19 [SP:P54390] [OR:BACILLUS SUBTILIS] [GN:YPIB]
[DE:HYPOTHETICAL 21.4 KD PROTEIN IN QCRA 5′REGION]
contig280 12188261_f1_1 366 3771 447 148 161 1.60E-10 [SP:P13267] [OR:BACILLUS SUBTILIS] [GN:POLC] [DE:DNA
POLYMERASE III, ALPHA CHAIN,]
contig280 24406282_f3_4 367 3772 219 72 60 0.81 [OR:Methanococcus jannaschii] [PN:acetylpolyamine aminohydolase]
contig280 24414042_f2_3 368 3773 306 101
contig281 29882092_c3_7 369 3774 201 66 58 0.29 [AC:X53324] [OR:group G streptococcus] [PN:Protein G′] [GN:Protein
G′gene] [NT:Truncated gene]
contig281 56552_f3_1 370 3775 858 286 101 0.0077 [SP:P25551] [OR:ESCHERICHIA COLI] [GN:RBSR] [DE:RIBOSE
OPERON REPRESSOR]
contig282 245962_c3_4 371 3776 879 292 398 3.30E-37 [AC:D90905] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig282 19688909_c2_3 372 3777 222 73 110 1.70E-06 [AC:D32253] [OR:Magnetospirillum sp.] [NT:putative protein highly
homologous to E. coli RNase]
contig283 25820313_f1_1 373 3778 648 215 100 0.0059 [OR:Methanococcus jannaschii] [PN:hypothetical protein MJ0798]
contig283 16443905_c2_5 374 3779 555 184 867 6.50E-87 [AC:U35659] [OR:Streptococcus bovis] [PN:malic enzyme]
[NT:Malic enzyme ((S)-malate]
contig284 902187_c3_7 375 3780 777 258 178 1.30E-12 [SP:P26833] [OR:CLOSTRIDIUM PERFRINGENS]
[DE:HYPOTHETICAL 31.2 KD PROTEIN IN NAGH 5′REGION
(ORFB)]
contig284 24643812_f2_2 376 3781 765 254 338 7.50E-31 [AC:D90816] [OR:Escherichia coli] [GN:ydjC] [NT:ORF_ID]
contig285 9767842_c3_6 377 3782 237 79 141 5.00E-09 [AC:Y13308] [OR:Yersinia enterocolitica] [PN:sulfate permease]
[NT:ORF3]
contig285 29390711_c2_4 378 3783 966 321 274 4.50E-24 [SP:P33019] [OR:ESCHERICHIA COLI] [GN:YEIH]
[DE:HYPOTHETICAL 36.9 KD PROTEIN IN LYSP-NFO
INTERGENIC REGION]
contig285 2056268_c3_5 379 3784 375 124 50 0.99 [AC:U51793] [OR:Hepatitis C virus] [PN:envelope glycoprotein E2]
[NT:hypervariable region 1]
contig285 9881557_f2_3 380 3785 240 80 62 0.69 [AC:Y08256] [OR:Sulfolobus solfataricus] [GN:orf c02005]
contig286 963342_c1_3 381 3786 1200 399 791 7.40E-79 [SP:Q01444] [OR:MYCOPLASMA MYCOIDES]
[DE:HYPOTHETICAL PROTEIN IN FFH 5′REGION (FRAGMENT)]
contig287 20745436_f3_2 382 3787 876 291 145 4.50E-08 [SP:P54721] [OR:BACILLUS SUBTILIS] [GN:YFIE]
[DE:HYPOTHETICAL 31.5 KD PROTEIN IN GLVBC 3′REGION]
contig287 93757_f3_3 383 3788 477 159 133 2.40E-08 [AC:D86380] [OR:Bacillus cereus] [PN:Alkaline D-peptidase]
[GN:adp]
contig288 25673900_f3_1 384 3789 684 227 74 0.57 [AC:U47096] [OR:Daucus carota] [NT:a LEA protein]
contig288 24242056_c3_5 385 3790 477 158 341 3.60E-31 [SP:P19219] [OR:BACILLUS SUBTILIS] [GN:ADAA]
[DE:METHYLPHOSPHOTRIESTER-DNA ALKYLTRANSFERASE]
contig289 7082027_f1_1 386 3791 1200 400 86 0.53 [AC:U33332] [OR:Human cytomegalovirus] [NT:orf UL154]
contig29 13006930_f2_1 387 3792 336 111 220 9.90E-18 [SP:P37546] [OR:BACILLUS SUBTILIS] [GN:YABE]
[DE:HYPOTHETICAL 47.7 KD PROTEIN IN METS-KSGA
INTERGENIC REGION]
contig29 13711592_f2_2 388 3793 504 168 91 0.034 [SP:Q49857] [OR:MYCOBACTERIUM LEPRAE]
[GN:B229_C1_170] [DE:HYPOTHETICAL 38.0 KD PROTEIN
B229_C1_170 PRECURSOR]
contig290 26808211_f3_3 389 3794 1347 448 667 1.00E-65 [AC:D86418] [OR:Bacillus subtilis] [PN:YfnA]
contig290 24875093_f1_1 390 3795 240 80 248 2.50E-20 [SP:P37465] [OR:BACILLUS SUBTILIS] [GN:METS]
[DE:(METRS)]
contig291 12508578_f2_1 391 3796 1452 483 132 3.30E-05 [AC:Z83107] [OR:Caenorhabditis elegans] [PN:F11C3.3] [NT:similar
to myosin heavy chain; cDNA EST CESAB82R]
contig291 26282182_f2_2 392 3797 405 134 67 0.99995 [SP:Q13620] [OR:HOMO SAPIENS] [GN:CUL4B] [DE:CULLIN
HOMOLOG 4B (CUL-4B) (FRAGMENT)]
contig292 3928318_f1_1 393 3798 603 200 746 4.30E-74 [AC:Z94043] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yvdN] [NT:similar to CLPP_ECOLI ATP-dependent clp protease]
contig292 34187592_c2_3 394 3799 543 180 91 0.027 [AC:Y11777] [OR:Rickettsia prowazekii] [PN:outer membrane protein]
[GN:com1]
contig293 24431562_c2_4 395 3800 585 194 439 1.50E-41 [SP:P42020] [OR:LACTOCOCCUS LACTIS] [GN:PEPT]
[DE:PEPTIDASE T, (AMINOTRIPEPTIDASE) (TRIPEPTIDASE)]
contig293 24413142_c2_3 396 3801 546 181 246 2.60E-20 [OR:Enterococcus faecalis] [PN:probable pheromone binding
proteinpheromone responsive gene Z protein] [GN:prgZ]
contig294 22400450_f3_2 397 3802 1758 585 1364 1.40E-139 [SP:P21458] [OR:BACILLUS SUBTILIS] [GN:SPOIIIE]
[DE:STAGE III SPORULATION PROTEIN E]
contig295 26569418_f2_1 398 3803 774 257 518 6.30E-50 [SP:P54470] [OR:BACILLUS SUBTILIS] [GN:YQFL]
[DE:HYPOTHETICAL 30.3 KD PROTEIN IN GLYS-DNAG/DNAE
INTERGENIC REGION]
contig295 24619703_c1_3 399 3804 816 271 679 5.50E-67 [SP:P72535] [OR:STREPTOCOCCUS PNEUMONIAE] [GN:THRB]
[DE:HOMOSERINE KINASE, (HK)]
contig295 22925961_c3_5 400 3805 351 116 113 3.10E-06 [SP:P09123] [OR:BREVIBACTERIUMBACILLUS SP] [GN:THRC]
[DE:THREONINE SYNTHASE,]
contig296 14142542_f3_3 401 3806 324 107 121 2.30E-07 [SP:P42902] [OR:ESCHERICHIA COLI] [GN:AGAR]
[DE:PUTATIVE AGA OPERON TRANSCRIPTIONAL
REPRESSOR]
contig296 34063527_f2_1 402 3807 966 321 332 3.20E-30 [SP:P23391] [OR:LACTOCOCCUS LACTIS] [GN:LACC]
[DE:TAGATOSE-6-PHOSPHATE KINASE,
(PHOSPHOTAGATOKINASE)]
contig297 5161663_f1_1 403 3808 1791 596 1557 5.00E-160 [AC:Z94043] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yvdK] [NT:similar to trembl]
contig298 25678562_c1_2 404 3809 321 107 415 5.20E-39 [OR:Enterococcus faecalis] [PN:probable pheromone-responsive
regulatory protein R] [GN:prgR]
contig298 22359627_f3_1 405 3810 183 60 51 0.85 [AC:U02259] [OR:Mycoplasma genitalium] [NT:Homology to
initiation factor DnaA J01602]
contig299 4798587_f3_1 406 3811 1734 578 1349 5.50E-138 [AC:L76359] [OR:Streptomyces peucetius] [PN:daunorubicin
resistance protein] [GN:drrC]
contig3 24410169_c1_1 407 3812 429 142 639 9.50E-63 [SP:P36920] [OR:ENTEROCOCCUS FAECALIS] [GN:EBSA]
[DE:PORE FORMING PROTEIN EBSA]
contig30 33861563_c1_4 408 3813 1035 345 293 4.40E-26 [SP:P39074] [OR:BACILLUS SUBTILIS] [GN:BMRU] [DE:BMRU
PROTEIN]
contig300 10353950_f2_2 409 3814 1008 335 506 1.20E-48 [SP:Q10449] [OR:SCHIZOSACCHAROMYCES POMBE]
[GN:SPAC12B10.16C] [DE:HYPOTHETICAL 57.2 KD PROTEIN
C12B10.16C IN CHROMOSOME 1]
contig300 35432011_f1_1 410 3815 390 129 243 1.40E-19 [AC:AE000176] [OR:Escherichia coli] [GN:ybgB] [NT:o877; 100 pct
identical to the first 86 residues of]
contig300 10007652_f3_3 411 3816 207 68 52 0.95 [OR:Brassica napus] [PN:3-oxoacyl-[acyl-carrier-protein]
contig301 26375006_f2_1 412 3817 1290 429 307 1.30E-26 [AC:L25426] [OR:Staphylococcus aureus] [PN:penicillin-binding
protein 2] [GN:pbp2]
contig301 2625187_f2_2 413 3818 189 62 63 0.27 [AC:JH0240] [OR:Mus musculus] [PN:aspartic proteinase,]
contig302 36223418_f3_2 414 3819 645 214 910 1.80E-91 [AC:U74322] [OR:Lactococcus lactis] [PN:6-phosphogluconate
dehydrogenase]
contig302 15117317_f1_1 415 3820 555 184 272 7.40E-24 [AC:D64002] [OR:Synechocystis sp.] [PN:regulatory components of
sensory transduction] [NT:ORF_ID]
contig303 21875903_f3_3 416 3821 486 161 130 1.00E-08 [SP:P37958] [OR:BACILLUS SUBTILIS] [GN:MECA]
[DE:NEGATIVE REGULATOR OF GENETIC COMPETENCE
MECA]
contig303 4884838_f2_2 417 3822 846 281 407 3.60E-38 [AC:X99710] [OR:Lactococcus lactis] [PN:transcription factor]
[NT:weak homology with vsf-1 gene (X73635)]
contig303 21971918_c2_4 418 3823 246 81 64 0.89 [SP:Q09630] [OR:CAENORHABDITIS ELEGANS] [GN:ZC506.4]
[DE:PROBABLE METABOTROPIC GLUTAMATE RECEPTOR
ZC506.4]
contig304 6285250_c3_3 419 3824 750 249 654 2.40E-64 [AC:D88802] [OR:Bacillus subtilis] [GN:ydhR] [NT:S. mutans
fructokinase; Q07211 (665)]
contig305 31533516_c3_6 420 3825 1227 408 1077 3.60E-109 [SP:P12039] [OR:BACILLUS SUBTILIS] [GN:PURD]
[DE:RIBONUCLEOTIDE SYNTHETASE)
(PHOSPHORIBOSYLGLYCINAMIDE SYNTHETASE)]
contig306 26383507_c3_9 421 3826 849 283 216 6.30E-18 [AC:U87792] [OR:Bacillus subtilis] [PN:unknown] [NT:ORF307;
hypothetical 34.7 kd protein]
contig306 4343752_f3_6 422 3827 279 92 66 0.68 [AC:D64005] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig307 14537691_f2_2 423 3828 294 97 135 3.70E-09 [AC:AE000181] [OR:Escherichia coli] [NT:o234; This 234 aa orf is
26 pct identical (15 gaps)]
contig307 35937841_c1_4 424 3829 1149 382 182 1.00E-13 [AC:Y14082] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yhdW] [NT:Similarity to glycerol diester phosphodiesterase]
contig308 16506875_c2_5 425 3830 1134 377 158 2.40E-15 [SP:P29240] [OR:DISCOPYGE OMMATA] [DE:5′-
NUCLEOTIDASE PRECURSOR, (ECTO-NUCLEOTIDASE)]
contig309 13183591_f3_4 426 3831 288 95 57 0.36 [AC:U37208] [OR:Simian immunodeficiency virus] [PN:envelope
glycoprotein] [GN:env]
contig309 34100687_f2_3 427 3832 663 220 274 4.50E-24 [AC:Y09476] [OR:Bacillus subtilis] [PN:YisX] [NT:putative]
contig309 12150313_c3_6 428 3833 246 81 69 0.54 [SP:Q00099] [OR:ICTALURID HERPESVIRUS 1] [GN:56]
[DE:HYPOTHETICAL GENE 56 PROTEIN]
contig31 24687678_c3_4 429 3834 255 85 62 0.12 [SP:Q05360] [OR:LUCILIA CUPRINA] [GN:W] [DE:WHITE
PROTEIN (FRAGMENT)]
contig310 15120887_f2_1 430 3835 1335 444 1354 1.60E-138 [SP:P20964] [OR:BACILLUS SUBTILIS] [GN:OBG] [DE:SPO0B-
ASSOCIATED GTP-BINDING PROTEIN]
contig311 36329057_f1_1 431 3836 876 291 683 2.10E-67 [SP:P26946] [OR:BACILLUS FIRMUS] [DE:HYPOTHETICAL
ABC TRANSPORTER ATP-BINDING PROTEIN]
contig311 5271078_f2_2 432 3837 249 82 92 0.00077 [AC:Y14078] [OR:Bacillus subtilis] [PN:Hypothetical protein]
[GN:yhaP] [NT:aa 1-147 show similarity to putative]
contig312 1283452_f2_2 433 3838 1659 552 68 0.31 [AC:S79441] [OR:Bacillus sp.] [NT:Description]
contig313 5975890_f1_1 434 3839 192 63 64 0.97 [AC:L35928] [OR:Streptococcus salivarius] [PN:glucosyltransferase]
[GN:gtfm]
contig313 3955336_f1_2 435 3840 210 69 58 0.61 [SP:Q05966] [OR:BRASSICA NAPUS] [GN:GRP10] [DE:GLYCINE-
RICH RNA-BINDING PROTEIN 10]
contig313 6048562_c1_4 436 3841 255 84 58 0.999 [SP:P32899] [OR:SACCHAROMYCES CEREVISIAE]
[GN:YHR148W] [DE:PUTATIVE 40S RIBOSOMAL PROTEIN
YHR148W]
contig314 4720463_c1_3 437 3842 369 123 66 0.048 [AC:L47607] [OR:Picea glauca] [PN:late embryogenesis abundant
protein] [GN:EMB15] [NT:ABA-responsive and embryogenesis-
associated gene]
contig315 16839086_f2_2 438 3843 465 154 87 0.0003 [SP:P00323] [OR:DESULFOVIBRIO VULGARIS]
[DE:FLAVODOXIN]
contig315 5196062_f3_3 439 3844 438 145 120 2.60E-07 [SP:P25983] [OR:BACILLUS SUBTILIS] [GN:PYRDII]
[DE:DIHYDROOROTATE DEHYDROGENASE ELECTRON
TRANSFER SUBUNIT]
contig315 16979712_f1_1 440 3845 210 70 111 2.70E-06 [OR:Methanococcus jannaschii] [PN:cytochrome-c3 hydrogenase
gamma chain homolog]
contig316 34157952_c1_3 441 3846 723 241 201 2.50E-16 [AC:Z56283] [OR:Lactobacillus helveticus] [GN:orf2]
contig316 207811_c2_4 442 3847 801 266 239 8.30E-20 [AC:Z56283] [OR:Lactobacillus helveticus] [GN:orf1]
contig317 1220462_f2_1 443 3848 1671 556 1226 5.90E-125 [AC:U16134] [OR:Synechococcus sp.] [PN:ClpC] [GN:clpC]
contig318 11754215_f2_1 444 3849 1356 451 964 3.40E-97 [SP:P17894] [OR:BACILLUS SUBTILIS] [GN:RECN] [DE:DNA
REPAIR PROTEIN RECN (RECOMBINATION PROTEIN N)]
contig319 25442657_f3_2 445 3850 783 260 85 0.18 [OR:Saccharomyces cerevisiae] [PN:HSH49 protein] [GN:HSH49]
contig319 4876643_c2_3 446 3851 324 107 70 0.98 [AC:U30821] [OR:Cyanelle Cyanophora paradoxa] [PN:DnaK]
[GN:dnaK-A]
contig319 2932937_c3_4 447 3852 222 73 195 1.10E-15 [AC:L36907] [OR:Lactococcus lactis] [PN:ATP-dependent protease]
[GN:clpA] [NT:ORF137; putative]
contig32 31735793_f3_2 448 3853 1188 396 1291 7.70E-132 [SP:P09373] [OR:ESCHERICHIA COLI] [GN:PFLB]
[DE:FORMATE ACETYLTRANSFERASE 1, (PYRUVATE
FORMATE-LYASE 1)]
contig320 14187801_f2_4 449 3854 219 72 101 9.20E-05 [SP:Q50739] [OR:MYCOBACTERIUM TUBERCULOSIS]
[GN:MTCY9C4.09] [DE:HYPOTHETICAL 47.5 KD PROTEIN
CY9C4.09]
contig320 13001076_c1_13 450 3855 255 84 65 0.9 [AC:Z81317] [OR:Schizosaccharomyces pombe] [PN:unknown]
[GN:SPAC6G9.04] [NT:SPAC6G9.04, unknown, len]
contig320 33835942_f3_8 451 3856 213 70
contig320 4695187_f2_7 452 3857 516 171 218 3.90E-18 [OR:Methanococcus jannaschii] [PN:hypothetical protein homolog
MJ0531]
contig321 25430443_c3_4 453 3858 1002 333 428 1.40E-48 [SP:Q24803] [OR:ENTAMOEBA HISTOLYTICA] [GN:ADH2]
[DE:ALCOHOL DEHYDROGENASE 2,]
contig321 29773438_f3_3 454 3859 261 87 54 0.6 [AC:AF003525] [OR:Mus musculus] [PN:beta-defensin 1]
contig322 35839066_c2_3 455 3860 396 132 208 7.70E-17 [SP:Q47898] [OR:FLAVOBACTERIUM MENINGOSEPTICUM]
[DE:(GLYCOSYLASPARAGINASE)
(ASPARTYLGLUCOSAMINIDASE) (AGA)]
contig322 24394206_c1_2 456 3861 1278 425 358 4.00E-54 [OR:Bacillus stearothermophilus] [PN:hypothetical protein 1]
contig323 25585967_c2_5 457 3862 195 65 66 0.28 [AC:D90901] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig323 23548427_c2_4 458 3863 948 315 63 0.5 [OR:Macaca mulatta] [GN:TPH] [NT:Description]
contig323 12401075_c1_3 459 3864 216 71 55 0.52 [AC:Y08631] [OR:Human astrovirus] [PN:capsid protein precursor]
[GN:ORF2]
contig324 33395317_c3_11 460 3865 840 280 843 2.30E-84 [AC:U58210] [OR:Streptococcus thermophilus] [PN:tetrahydrofolate
dehydrogenase/cyclohydrolase] [GN:folD]
contig324 9923452_c2_10 461 3866 489 162 283 5.00E-25 [SP:P54520] [OR:BACILLUS SUBTILIS] [GN:YQHZ] [DE:N
UTILIZATION SUBSTANCE PROTEIN B HOMOLOG (NUSB
PROTEIN)]
contig324 10724144_c1_8 462 3867 480 159 215 8.10E-18 [SP:P54519] [OR:BACILLUS SUBTILIS] [GN:YQHY]
[DE:HYPOTHETICAL 14.7 KD PROTEIN IN ACCC-FOLD
INTERGENIC REGION]
contig325 25977343_c3_10 463 3868 780 259 299 1.00E-26 [AC:U75471] [OR:Streptococcus mutans] [PN:high affinity branched
chain amino acid] [GN:livG]
contig325 24803800_c1_7 464 3869 912 303 127 8.20E-06 [AC:D90794] [OR:Escherichia coli] [PN:L-arabinose transport system
permease protein] [GN:araH] [NT:ORF_ID]
contig326 30562806_f3_2 465 3870 1815 604 136 2.70E-05 [SP:P54334] [OR:BACILLUS SUBTILIS] [GN:XKDO] [DE:PHAGE-
LIKE ELEMENT PBSX PROTEIN XKDO]
contig327 198302_f2_2 466 3871 1923 640 1409 2.40E-144 [AC:U21932] [OR:Bacillus subtilis] [PN:L-glutamine-D-fructose-6-
phosphate] [GN:gcaA]
contig327 34652213_c1_6 467 3872 285 94
contig327 24805317_c2_8 468 3873 261 86
contig328 6445250_f1_1 469 3874 1314 437 676 1.10E-66 [SP:P46317] [OR:BACILLUS SUBTILIS] [GN:CELB]
[DE:PERMEASE IIC COMPONENT) (PHOSPHOTRANSFERASE
ENZYME II, C COMPONENT)]
contig328 24265702_c2_8 470 3875 264 87 63 0.996 [AC:Z96800] [OR:Mycobacterium tuberculosis] [PN:hypothetical
protein MTCY63.10c] [GN:PPE-family] [NT:MTCY63.10c. len]
contig328 4098527_f2_2 471 3876 309 102 77 0.087 [AC:Z71264] [OR:Caenorhabditis elegans] [PN:K07G5.1] [NT:protein
predicted using Genefinder; Weak similarity]
contig328 4882827_f3_6 472 3877 243 80 145 1.30E-09 [SP:P13254] [OR:PSEUDOMONAS PUTIDA] [DE:METHIONINE
GAMMA-LYASE, (L-METHIONINASE)]
contig329 23636086_f2_1 473 3878 888 295 62 0.56 [OR:Homo sapiens] [PN:alpha-actinin]
contig329 1050900_c1_2 474 3879 345 114 89 0.00053 [OR:Haemophilus influenzae] [PN:hypothetical protein HI0522]
contig33 33237700_c2_2 475 3880 555 185 79 0.99 [AC:U97014] [OR:Caenorhabditis elegans] [GN:T05E8.3] [NT:strong
similarity to the ‘DEAH’ subfamily of the]
contig330 4414768_f1_1 476 3881 354 117 80 0.25 [AC:U83113] [OR:Homo sapiens] [PN:INS-1 winged-helix homolog]
[NT:similar to human putative M phase phosphoprotein 2]
contig330 25673512_f1_2 477 3882 444 147 390 2.30E-36 [SP:P44551] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0174]
[DE:HYPOTHETICAL PROTEIN HI0174]
contig330 25430391_f1_3 478 3883 600 200 470 7.70E-45 [SP:P25745] [OR:ESCHERICHIA COLI] [GN:YCFB]
[DE:HYPOTHETICAL PROTEIN IN PURB 5′REGION (ORF-15)
(FRAGMENT)]
contig331 24348577_c2_6 479 3884 1347 448 1359 4.80E-139 [OR:Corynebacterium glutamicum] [PN:glutamate dehydrogenase
(NADP+),]
contig331 19582805_f2_5 480 3885 801 267 183 9.50E-14 [AC:D64004] [OR:Synechocystis sp.] [PN:hypothetical protein]
[NT:ORF_ID]
contig332 15625013_f1_1 481 3886 216 71 52 0.95 [OR:Staphylococcus sp.] [PN:fofB protein]
contig332 94063_c3_7 482 3887 945 314 966 2.10E-97 [AC:U09352] [OR:Streptococcus pyogenes] [PN:67 kDa Myosin-
crossreactive streptococcal] [NT:ORF2]
contig333 35401562_c2_4 483 3888 375 125 163 2.60E-12 [SP:Q00053] [OR:LACTOBACILLUS HELVETICUS] [GN:GALM]
[DE:ALDOSE 1-EPIMERASE, (MUTAROTASE) (FRAGMENT)]
contig333 10801250_c1_3 484 3889 696 231 720 2.50E-71 [AC:Z70730] [OR:Lactococcus lactis] [PN:beta-phosphoglucomutase]
contig333 985383_c3_5 485 3890 387 128 372 1.40E-33 [AC:Z94043] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yvdK] [NT:similar to trembl]
contig334 26384567_f1_1 486 3891 540 179 80 0.54 [SP:P07908] [OR:BACILLUS SUBTILIS] [GN:DNAB]
[DE:REPLICATION INITIATION AND MEMBRANE
ATTACHMENT PROTEIN]
contig334 26603431_f3_4 487 3892 936 311 613 5.40E-60 [SP:P06567] [OR:BACILLUS SUBTILIS] [GN:DNAI]
[DE:PRIMOSOMAL PROTEIN DNAI]
contig335 9766068_f1_1 488 3893 1419 472 1695 1.20E-174 [AC:Z67739] [OR:Streptococcus pneumoniae] [PN:DNA
topoisomerase IV] [GN:parE] [NT:ParE subunit]
contig336 26572687_f2_2 489 3894 1341 446 644 2.80E-63 [OR:Bacillus subtilis] [PN:DNA-directed DNA polymerase, III chain
dnaX] [GN:dnaX]
contig336 36189717_f1_1 490 3895 327 109 178 6.70E-14 [SP:P24281] [OR:BACILLUS SUBTILIS] [GN:YAAK]
[DE:HYPOTHETICAL 11.8 KD PROTEIN IN DNAZ-RECR
INTERGENIC REGION]
contig337 4118938_f3_1 491 3896 471 156 493 2.80E-47 [SP:P80239] [OR:BACILLUS SUBTILIS] [GN:AHPC]
[DE:PROTEIN 22)]
contig337 24430287_f3_2 492 3897 987 329 727 4.50E-72 [SP:P52213] [OR:CLOSTRIDIUM LITORALE] [GN:TRXB]
[DE:THIOREDOXIN REDUCTASE,]
contig338 6539712_c1_5 493 3898 657 218 91 0.025 [SP:P17418] [OR:BACTEROIDES NODOSUS] [GN:FIMC]
[DE:POSSIBLE FIMBRIAL ASSEMBLY PROTEIN FIMC
(SEROGROUP D)]
contig338 16907312_f2_2 494 3899 207 68 66 0.62 [OR:Saccharomyces cerevisiae] [PN:probable membrane protein
YPR185w]
contig338 6929713_f2_3 495 3900 390 129
contig339 33361378_f3_6 496 3901 228 75 59 0.95 [OR:Saccharomyces cerevisiae] [PN:probable membrane protein
YLR360w]
contig339 19723451_c2_7 497 3902 1491 496 984 2.60E-99 [SP:P39883] [OR:BACTEROIDES NODOSUS] [GN:PRFC]
[DE:PEPTIDE CHAIN RELEASE FACTOR 3 (RF-3)]
contig34 4867943_c2_3 498 3903 528 175 448 1.60E-42 [SP:P07672] [OR:ESCHERICHIA COLI] [GN:APT] [DE:ADENINE
PHOSPHORIBOSYLTRANSFERASE, (APRT)]
contig340 79806_f2_1 499 3904 438 145 259 1.80E-22 [SP:P26646] [OR:ESCHERICHIA COLI] [GN:YHDH] [DE:(O324)]
contig340 52318_f2_2 500 3905 789 262 132 2.50E-06 [OR:Methanococcus jannaschii] [PN:pantothenate metabolism
flavoprotein]
contig341 31773887_c1_3 501 3906 1665 555 257 2.90E-19 [SP:P23545] [OR:BACILLUS SUBTILIS] [GN:PHOR]
[DE:ALKALINE PHOSPHATASE SYNTHESIS SENSOR PROTEIN
PHOR,]
contig341 11896090_c2_4 502 3907 393 130 331 4.10E-30 [SP:P13792] [OR:BACILLUS SUBTILIS] [GN:PHOP] [DE:PHOP]
contig342 31742010_f3_1 503 3908 1341 446 633 4.10E-62 [SP:P30336] [OR:BACILLUS FIRMUS] [GN:CADA] [DE:ATPASE)]
contig343 21769802_f3_2 504 3909 234 77 131 2.10E-08 [SP:P54721] [OR:BACILLUS SUBTILIS] [GN:YFIE]
[DE:HYPOTHETICAL 31.5 KD PROTEIN IN GLVBC 3′REGION]
contig343 1208443_f3_3 505 3910 768 255 446 2.70E-42 [AC:Z83337] [OR:Bacillus subtilis] [GN:ywpI] [NT:highly similar to
phosphotransferase system]
contig343 4730340_f3_4 506 3911 276 92 67 0.34 [SP:P39918] [OR:COXIELLA BURNETII] [DE:HYPOTHETICAL
49.9 KD PROTEIN IN SPOIIIE-SERS INTERGENIC REGION]
contig344 23494127_c1_3 507 3912 861 287 238 2.60E-19 [SP:Q09320] [OR:CAENORHABDITIS ELEGANS] [GN:F40B5.2]
[DE:HYPOTHETICAL 69.0 KD PROTEIN F40B5.2 IN
CHROMOSOME X]
contig344 36070188_c2_4 508 3913 618 205 55 0.94 [SP:P45628] [OR:LEIURUS QUINQUESTRIATUS HEBRAEUS]
[DE:CHARYBDOTOXIN 2 (CHTX-LQ2) (TOXIN 18-2) (LQH 18-2)]
contig344 23849218_c3_5 509 3914 663 220 53 0.995 [SP:P13832] [OR:Rattus norvegicus] [NT:myosin regulatory light
chain (582 is 1st base in]
contig345 34550000_c3_3 510 3915 894 297 366 8.00E-34 [OR:Methanococcus jannaschii] [PN:N-ethylammeline
chlorohydrolase homolog]
contig345 4571905_c2_1 511 3916 255 84 102 0.0001 [SP:P39761] [OR:BACILLUS SUBTILIS] [GN:ADEC]
[DE:ADENINE DEAMINASE, (ADENASE) (ADENINE AMINASE)]
contig346 7089212_c3_4 512 3917 198 65
contig346 34566502_f3_2 513 3918 273 90 63 0.93 [AC:D90910] [OR:Synechocystis sp.] [PN:sensory transduction
histidine kinase] [NT:ORF_ID]
contig346 4901587_f2_1 514 3919 342 113 73 0.51 [OR:Campylobacter jejuni] [PN:cell binding factor 2]
contig347 7126_c2_7 515 3920 465 155 98 0.00031 [SP:P10051] [OR:CITROBACTER DIVERSUS]
[DE:AMINOGLYCOSIDE N6′-ACETYLTRANSFERASE, (AAC(6′))]
contig347 6690753_c2_6 516 3921 204 67 59 0.24 [SP:P34279] [OR:CAENORHABDITIS ELEGANS] [GN:C02F5.2]
[DE:HYPOTHETICAL 9.4 KD PROTEIN C02F5.2 IN
CHROMOSOME III]
contig347 26766903_f2_2 517 3922 630 209 155 1.80E-11 [OR:Methanococcus jannaschii] [PN:mutator protein mutT]
contig347 24398507_c1_5 518 3923 402 133 687 7.80E-68 [SP:P37061] [OR:ENTEROCOCCUS FAECALIS] [GN:NOX]
[DE:NADH OXIDASE, (NOXASE)]
contig348 12303138_f1_1 519 3924 189 62 58 0.31 [SP:P06385] [OR:MARCHANTIA POLYMORPHA] [GN:RPL20]
[DE:CHLOROPLAST 50S RIBOSOMAL PROTEIN L20]
contig348 4961578_c3_6 520 3925 774 257 686 9.90E-68 [SP:P42423] [OR:BACILLUS SUBTILIS] [GN:YXDL]
[DE:HYPOTHETICAL ABC TRANSPORTER ATP-BINDING
PROTEIN IN IDH 3′REGION]
contig348 9772555_c3_5 521 3926 756 251 173 8.20E-13 [SP:P45544] [OR:ESCHERICHIA COLI] [GN:YHFR] [DE:(O265)]
contig349 24306712_f3_2 522 3927 969 322 189 7.50E-13 [AC:L37338] [OR:Streptomyces peucetius] [PN:putative repressor]
[GN:dnrO] [NT:putative]
contig349 5088212_c1_3 523 3928 294 97 201 1.00E-15 [AC:Z81451] [OR:Mycobacterium tuberculosis] [PN:unknown]
[GN:MTCY428.20] [NT:MTCY428.20, len]
contig349 29735006_c3_5 524 3929 279 92 281 8.60E-25 [AC:X92418] [OR:Streptococcus thermophilus] [PN:gamma-glutamyl
phosphate reductase] [GN:proA]
contig35 26753941_c1_3 525 3930 894 298 86 0.42 [AC:U30873] [OR:Bacillus subtilis] [PN:NatB] [GN:natB]
contig350 4111032_f2_2 526 3931 348 115 63 0.098 [AC:M63929] [OR:Human immunodeficiency virus type 1] [GN:vpu]
contig350 2767891_c2_5 527 3932 369 122 68 0.992 [OR:Coxiella burnetii] [PN:mucZ protein] [GN:mucZ]
contig350 23555443_f3_4 528 3933 684 227 442 7.10E-42 [SP:P54596] [OR:BACILLUS SUBTILIS] [GN:YHCL]
[DE:HYPOTHETICAL 49.0 KD PROTEIN IN CSPB-GLPP
INTERGENIC REGION]
contig351 23438438_c3_6 529 3934 222 73 66 0.28 [AC:D70843] [OR:Bacillus stearothermophilus] [PN:heme O
synthetase] [GN:ctaB]
contig351 33252217_c1_5 530 3935 687 228 354 1.50E-32 [SP:P54168] [OR:BACILLUS SUBTILIS] [GN:YPGQ]
[DE:HYPOTHETICAL 23.1 KD PROTEIN IN BSAA-ILVD
INTERGENIC REGION]
contig351 16932962_c1_4 531 3936 525 174 269 1.50E-23 [SP:P05100] [OR:ESCHERICHIA COLI] [GN:TAG]
[DE:GLYCOSYLASE I, CONSTITUTIVE) (TAG I)]
contig352 24727050_c2_9 532 3937 252 83 52 0.78 [AC:U42580] [OR:Paramecium bursaria Chlorella virus 1]
[GN:a371R]
contig352 9766376_f1_1 533 3938 279 92 83 0.0031 [SP:P25385] [OR:SACCHAROMYCES CEREVISIAE] [GN:BOS1]
[DE:VESICULAR TRANSPORT PROTEIN BOS1]
contig352 976582_f2_3 534 3939 348 115 67 0.89 [OR:Vibrio cholerae] [PN:hypothetical protein]
contig352 672081_f3_4 535 3940 414 137 59 0.38 [OR:Phasianus colchicus] [PN:Major Histocompatibility Complex
class IIB]
contig352 26384687_f3_5 536 3941 675 224 89 0.27 [OR:Kluyveromyces marxianus var. lactis] [PN:DNA-binding protein
RAP1 homolog]
contig353 26567068_f1_1 537 3942 363 120 152 8.70E-11 [AC:Z79580] [OR:Bacillus subtilis] [GN:putative ORF]
contig353 14898587_f2_3 538 3943 915 304 637 1.50E-62 [AC:Y11213] [OR:Streptococcus thermophilus] [PN:hypothetical
protein] [GN:ORF 1]
contig353 24782806_f3_4 539 3944 612 204 195 1.10E-15 [AC:Y11213] [OR:Streptococcus thermophilus] [PN:hypothetical
protein] [GN:ORF 2]
contig354 31735783_f3_2 540 3945 582 193 414 6.60E-39 [SP:P44634] [OR:HAEMOPHILUS INFLUENZAE] [GN:HI0315]
[DE:HYPOTHETICAL PROTEIN HI0315]
contig354 35820387_f1_1 541 3946 489 162
contig355 15079207_f1_1 542 3947 840 279 156 1.70E-09 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydfL] [NT:PROBABLE
REGULATORY PROTEIN, SIMILAR TO]
contig355 22459375_f2_2 543 3948 183 60
contig355 16595327_f3_3 544 3949 603 201 83 0.23 [AC:X95984] [OR:Solanum berthaultii] [PN:glutamic acid-rich protein]
contig356 25673900_f2_2 545 3950 1311 436 379 1.70E-44 [AC:AE000219] [OR:Escherichia coli] [PN:hypothetical protein in pth-
prs intergenic] [GN:ychM] [NT:f550; 98 pct identical to fragment
YCHM_ECOLI SW]
contig357 11002_c3_6 546 3951 1275 425 797 1.70E-79 [SP:P50852] [OR:BACILLUS STEAROTHERMOPHILUS]
[GN:MTLA] [DE:(EC 2.7.1.69) (EII-MTL)]
contig358 30566880_f3_3 547 3952 642 213 278 1.70E-24 [AC:Y08498] [OR:Lactobacillus gasseri] [PN:aggregation promoting
protein] [GN:apfA]
contig358 5102312_c2_5 548 3953 387 128 218 3.90E-18 [SP:P14205] [OR:BACILLUS SUBTILIS] [GN:COMAB]
[DE:COMA OPERON PROTEIN 2]
contig358 26053762_f2_2 549 3954 381 126 296 2.10E-26 [SP:P23966] [OR:BACILLUS SUBTILIS] [GN:MENB] [DE:(DHNA
SYNTHETASE)]
contig358 35738752_f1_1 550 3955 189 62 153 4.80E-11 [SP:P23966] [OR:BACILLUS SUBTILIS] [GN:MENB] [DE:(DHNA
SYNTHETASE)]
contig359 23439707_c1_4 551 3956 654 218 103 0.0091 [SP:P32896] [OR:SACCHAROMYCES CEREVISIAE] [GN:PDC2]
[DE:PDC2 PROTEIN]
contig359 35344432_c1_3 552 3957 648 215 84 0.42 [SP:P22865] [OR:LACTOCOCCUS LACTIS] [GN:USP45]
[DE:SECRETED 45 KD PROTEIN PRECURSOR]
contig36 14648432_c1_1 553 3958 240 79 62 0.62 [OR:Saccharomyces cerevisiae] [PN:hypothetical protein YPR128c]
contig36 33242125_c2_2 554 3959 390 129 93 6.80E-05 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydcN] [NT:PROBABLE
REPRESSOR PROTEIN.]
contig360 5910087_f1_1 555 3960 228 75 120 1.10E-07 [AC:AE000139] [OR:Escherichia coli] [NT:o201; This 201 aa orf is
28 pct identical (6 gaps)]
contig360 35995177_f1_2 556 3961 1236 411 500 5.10E-48 [AC:AE000139] [OR:Escherichia coli] [NT:o460; This 460 aa orf is
23 pct identical (24 gaps)]
contig361 22459375_c2_5 557 3962 1128 375 658 9.20E-65 [AC:U81166] [OR:Lactococcus lactis cremoris] [PN:histidine kinase
LlkinA] [GN:llkinA]
contig361 234812_c1_4 558 3963 249 82 56 0.45 [AC:L05017] [OR:Streptococcus pyogenes] [PN:M-like protein]
contig362 6347031_f2_3 559 3964 537 178 113 3.30E-05 [AC:U25682] [OR:Pasteurella haemolytica] [PN:Lpp38] [NT:38 kDa
lipoprotein]
contig362 10759680_f1_1 560 3965 477 158 218 3.30E-17 [SP:P39761] [OR:BACILLUS SUBTILIS] [GN:ADEC]
[DE:ADENINE DEAMINASE, (ADENASE) (ADENINE AMINASE)]
contig362 991678_f1_2 561 3966 846 281 232 9.20E-19 [OR:Methanococcus jannaschii] [PN:adenine deaminase,]
contig363 23991442_f3_2 562 3967 972 323 416 4.00E-39 [SP:P39584] [OR:BACILLUS SUBTILIS] [GN:YWBA]
[DE:HYPOTHETICAL 47.6 KD PROTEIN IN EPR-GALK
INTERGENIC REGION]
contig363 2866677_c3_4 563 3968 387 128 70 0.89 [SP:Q01104] [OR:PSEUDOMONAS PUTIDA] [GN:PCAJ] [DE:3-
OXOADIPATE COA-TRANSFERASE SUBUNIT B,]
contig364 3367202_f1_1 564 3969 186 61 55 0.52 [AC:M19541] [OR:Human adenovirus type 41] [PN:hexon protein]
contig364 24398313_f1_2 565 3970 807 268 202 1.90E-16 [SP:P54567] [OR:BACILLUS SUBTILIS] [GN:YQKD]
[DE:HYPOTHETICAL 34.6 KD PROTEIN IN GLNQ-ANSR
INTERGENIC REGION]
contig364 21928124_c1_3 566 3971 480 159
contig365 35429692_c1_4 567 3972 600 199 346 1.10E-31 [AC:AB002668] [OR:Haemophilus actinomycetemcomitans]
[NT:unnamed protein product]
contig365 14859375_c3_5 568 3973 936 311 900 2.10E-90 [AC:U09239] [OR:Streptococcus pneumoniae] [GN:cps19fO]
[NT:32.3 kDa cps19fO gene product]
contig366 80186_f3_4 569 3974 768 255 121 1.70E-07 [SP:P27246] [OR:ESCHERICHIA COLI] [GN:MARA]
[DE:MULTIPLE ANTIBIOTIC RESISTANCE PROTEIN MARA]
contig366 23712543_f1_1 570 3975 849 282 65 0.29 [AC:U11222] [OR:Cervus elaphus] [PN:MHC class II DRB]
[GN:CEEL-DRB] [NT:CEEL-DRB37 allele]
contig366 25995462_c3_11 571 3976 216 71 112 6.60E-07 [AC:X94434] [OR:Lactobacillus plantarum] [PN:PlnM] [GN:plnM]
[NT:putative]
contig366 29550659_c1_6 572 3977 813 270 710 2.80E-70 [AC:D50453] [OR:Bacillus subtilis] [PN:homologue of Di-tripeptide
transporter Dtp of L.] [GN:yclF]
contig367 11879577_c1_6 573 3978 609 202 108 0.0017 [SP:P30314] [OR:BACTERIOPHAGE SP01] [GN:31] [DE:DNA
POLYMERASE,]
contig367 25651443_c1_5 574 3979 735 244 93 0.032 [AC:D90868] [OR:Escherichia coli] [PN:PUTATIVE PEPTIDASE IN
GCVT-SPOIIIAA INTERGENIC] [GN:YQHT] [NT:similar to
[SwissProt Accession Number P54518]
contig368 25446000_c3_5 575 3980 843 281 103 0.009 [OR:Streptococcus pyogenes] [PN:rofA protein]
contig368 21516012_c2_3 576 3981 993 330 100 0.0032 [SP:P31465] [OR:ESCHERICHIA COLI] [GN:YIEF]
[DE:HYPOTHETICAL 20.4 KD PROTEIN IN TNAB-BGLB
INTERGENIC REGION]
contig369 4563832_f3_2 577 3982 1026 341 917 3.30E-92 [SP:P52985] [OR:LACTOCOCCUS LACTIS] [GN:HOM]
[DE:HOMOSERINE DEHYDROGENASE, (HDH)]
contig369 20739053_f2_1 578 3983 678 225 718 4.00E-71 [SP:P09123] [OR:BREVIBACTERIUMBACILLUS SP] [GN:THRC]
[DE:THREONINE SYNTHASE,]
contig37 201_f2_1 579 3984 756 252 171 3.70E-13 [OR:Yersinia enterocolitica] [PN:hemin binding protein]
contig370 24657761_c1_8 580 3985 369 122
contig370 6673515_c3_11 581 3986 639 212 75 0.34 [OR:Escherichia coli] [PN:hypothetical protein C-125]
contig370 16836591_f1_1 582 3987 741 246 162 4.50E-15 [OR:Haemophilus influenzae] [PN:dihydrolipoamide acetyltransferase
(acoC) homolog]
contig370 783156_c2_9 583 3988 231 76 58 0.29 [OR:Lycopersicon esculentum] [PN:gamma-thionin-like protein
precursor]
contig370 4867202_f3_6 584 3989 303 100 73 0.078 [AC:K00825] [OR:Mitochondrion Neurospora crassa] [NT:cyt
oxidase subunit 2 prepeptide (‘taa’ stop]
contig371 7214051_c2_5 585 3990 186 61 60 0.88 [OR:Phytophthora capsici] [PN:serine/threonine kinase]
contig371 26571937_f2_1 586 3991 663 220 375 8.90E-35 [AC:D90907] [OR:Synechocystis sp.] [PN:glutamine-binding
periplasmic protein] [GN:glnH] [NT:ORF_ID]
contig371 13754838_f2_2 587 3992 741 246 446 2.70E-42 [SP:P10346] [OR:ESCHERICHIA COLI] [GN:GLNQ]
[DE:GLUTAMINE TRANSPORT ATP-BINDING PROTEIN GLNQ]
contig372 31288212_f1_1 588 3993 366 121 70 0.3 [AC:Y13049] [OR:Sulfolobus acidocaldarius] [PN:orf1 hypothetical
protein]
contig372 22694012_f3_4 589 3994 876 291 202 1.30E-14 [SP:Q10384] [OR:MYCOBACTERIUM TUBERCULOSIS]
[GN:MTCY190.02] [DE:HYPOTHETICAL 69.2 KD PROTEIN
CY190.02]
contig372 26765937_f2_2 590 3995 387 128 51 0.88 [AC:L38403] [OR:Plasmid pNB2]
contig372 36428176_f3_5 591 3996 279 92 67 0.64 [SP:P46681] [OR:SACCHAROMYCES CEREVISIAE] [GN:AIP2]
[DE:ACTIN INTERACTING PROTEIN 2]
contig372 25673905_c2_6 592 3997 675 224 644 2.80E-63 [SP:P37550] [OR:BACILLUS SUBTILIS] [GN:YABH]
[DE:HYPOTHETICAL 31.7 KD PROTEIN IN SSPF-PURR
INTERGENIC REGION (ORF1)]
contig373 31272288_f3_1 593 3998 381 126 209 3.50E-17 [SP:P54464] [OR:BACILLUS SUBTILIS] [GN:YQEY]
[DE:HYPOTHETICAL 16.8 KD PROTEIN IN RPSU-PHOH
INTEREGENIC REGION]
contig373 11131876_f3_2 594 3999 1068 355 957 1.90E-96 [SP:P46343] [OR:BACILLUS SUBTILIS] [GN:PHOH] [DE:PHOH
PROTEIN HOMOLOG]
contig374 14664128_f2_1 595 4000 675 224 635 2.50E-62 [SP:P49668] [OR:PEDIOCOCCUS ACIDILACTICI] [GN:RPSB]
[DE:30S RIBOSOMAL PROTEIN S2]
contig374 34650760_f2_2 596 4001 648 215 388 3.80E-36 [SP:P19216] [OR:SPIROPLASMA CITRI] [GN:TSF]
[DE:ELONGATION FACTOR TS (EF-TS)]
contig375 32131340_c2_2 597 4002 702 233 74 0.023 [OR:Methanococcus jannaschii] [PN:hypothetical protein MJ0346]
contig376 35352181_c2_10 598 4003 660 220 413 8.40E-39 [AC:Y14078] [OR:Bacillus subtilis] [PN:Hypothetical protein]
[GN:yhaM] [NT:similarity to CMP-binding-factor-1 (cbf1) from]
contig376 26742082_c3_11 599 4004 2025 674 249 6.50E-19 [AC:Y14078] [OR:Bacillus subtilis] [PN:Hypothetical protein]
[GN:yhaN] [NT:similarity to orfX from Staphylococcus aureus]
contig377 583567_f3_2 600 4005 375 124 386 6.10E-36 [AC:Y11463] [OR:Streptococcus pneumoniae] [NT:ORF3]
contig377 133562_c1_4 601 4006 552 183 61 0.36 [AC:M15420] [OR:Bacillus subtilis] [NT:N-acetyl-gamma-glutamyl-
phosphate reductase (EC]
contig377 4179177_c2_6 602 4007 501 166 82 0.62 [AC:U08920] [OR:Chloroplast Lycium cestroides] [PN:NADH
dehydrogenase subunit] [GN:ndhF]
contig377 26204712_c1_3 603 4008 423 140 68 0.98 [SP:Q57741] [OR:METHANOCOCCUS JANNASCHII]
[GN:MJ0293] [DE:PROBABLE THYMIDYLATE KINASE, (DTMP
KINASE)]
contig378 35945327_c3_5 604 4009 657 218 158 3.00E-19 [AC:U55214] [OR:Treponema pallidum] [PN:Pfs] [GN:pfs]
[NT:similar to E. coli Pfs encoded by GenBank Accession]
contig378 34665913_c2_4 605 4010 387 128 52 0.93 [AC:K03489] [OR:Human herpesvirus 4] [NT:nuclear antigen
(EBNA2)]
contig379 36227187_c2_8 606 4011 336 111 296 2.10E-26 [SP:P42904] [OR:ESCHERICHIA COLI] [GN:AGAV]
[DE:ENZYME II, B COMPONENT 2),]
contig379 2240952_c3_9 607 4012 438 145 199 4.00E-16 [AC:U65015] [OR:Vibrio furnissii] [PN:PTS permease for mannose
subunit IIIMan N] [GN:manW] [NT:ManW; IIAMan]
contig379 195342_c2_7 608 4013 855 284 83 0.85 [AC:U50300] [OR:Caenorhabditis elegans] [GN:R03H4.5] [NT:weak
similarity to exoZ gene product from Rhizobium]
contig38 26023541_f3_1 609 4014 204 67 124 3.50E-08 [SP:P21477] [OR:BACILLUS SUBTILIS] [GN:RPST] [DE:30S
RIBOSOMAL PROTEIN S20 (BS20)]
contig38 6723262_c3_2 610 4015 327 108 196 8.30E-16 [SP:P49851] [OR:BACILLUS SUBTILIS] [GN:YKHA]
[DE:HYPOTHETICAL 20.1 KD PROTEIN IN HMP 5′REGION
(ORF1)]
contig380 26828840_f3_2 611 4016 684 227 79 0.95 [AC:AB001896] [OR:Staphylococcus aureus] [GN:dnaG]
contig380 7042337_f2_1 612 4017 864 287 664 2.10E-65 [AC:D86418] [OR:Bacillus subtilis] [PN:YfmR]
contig381 5960778_f1_1 613 4018 492 163 133 2.90E-08 [SP:P37189] [OR:ESCHERICHIA COLI] [GN:GATC]
[DE:PERMEASE IIC COMPONENT) (PHOSPHOTRANSFERASE
ENZYME II, C COMPONENT)]
contig381 20585887_c3_6 614 4019 786 261 661 4.40E-65 [SP:P26422] [OR:STREPTOCOCCUS MUTANS] [GN:LACR]
[DE:LACTOSE PHOSPHOTRANSFERASE SYSTEM REPRESSOR]
contig381 24354038_c1_4 615 4020 267 88 66 0.28 [SP:Q05278] [OR:MYCOBACTERIOPHAGE L5] [GN:6]
[DE:MINOR TAIL PROTEIN GP6]
contig382 24845937_c2_3 616 4021 1896 631 619 3.30E-87 [AC:U93874] [OR:Bacillus subtilis] [PN:hypothetical protein YrhL]
[GN:yrhL] [NT:similar to Haemophilus influenzae hypothetical]
contig383 9940693_f3_5 617 4022 555 184 369 3.90E-34 [AC:U93874] [OR:Bacillus subtilis] [PN:formate dehydrogenase]
[GN:yrhG] [NT:similar to Methanobacterium formicicum formate]
contig383 22832811_f1_1 618 4023 639 212 73 0.9998 [AC:X99485] [OR:Lupinus luteus] [PN:G protein] [NT:alpha subunit]
contig383 192677_c2_13 619 4024 192 63
contig383 30508562_f3_6 620 4025 477 158 200 3.10E-16 [AC:Z77663] [OR:Caenorhabditis elegans] [PN:F53F4.10] [NT:protein
predicted using Genefinder; Similarity to]
contig383 35985342_f3_7 621 4026 651 216 430 1.30E-40 [SP:Q56222] [OR:THERMUS AQUATICUS] [GN:NQO1]
[DE:DEHYDROGENASE 1, CHAIN 1) (NDH-1, CHAIN 1)]
contig383 24619052_f3_8 622 4027 666 221 322 1.10E-28 [AC:D90911] [OR:Synechocystis sp.] [PN:hydrogenase subunit]
[GN:hoxf] [NT:ORF_ID]
contig383 2149037_f2_4 623 4028 336 112 162 3.30E-12 [AC:D90911] [OR:Synechocystis sp.] [PN:hydrogenase subunit]
[GN:hoxU] [NT:ORF_ID]
contig384 34250405_c3_6 624 4029 834 278 882 1.70E-88 [OR:Streptococcus pneumonia] [PN:helicase recG homolog]
contig384 214818_c1_4 625 4030 1095 364 855 1.20E-85 [OR:Streptococcus pneumoniae] [PN:helicase recG homolog]
contig384 7036537_c2_5 626 4031 216 71 63 0.31 [OR:Saccharomyces cerevisiae] [PN:hypothetical protein YLR456w]
contig385 34651628_f3_3 627 4032 627 208 59 0.55 [AC:D17510] [OR:Chloroplast Pinus thunbergiana] [PN:ORF42b]
[GN:psaM]
contig385 34277265_c1_5 628 4033 291 96 59 0.57 [SP:P44624] [OR:HAEMOPHILUS INFLUENZAE] [GN:AMPD]
[DE:AMPD PROTEIN HOMOLOG]
contig385 4121044_c1_4 629 4034 1482 493 1444 4.70E-148 [SP:P23920] [OR:BACILLUS STEAROTHERMOPHILUS]
[GN:METS] [DE:(METRS)]
contig386 4565701_c1_4 630 4035 201 67 110 1.10E-05 [SP:P14218] [OR:PSEUDOMONAS FLUORESCENS] [GN:LPD]
[DE:DIHYDROLIPOAMIDE DEHYDROGENASE,]
contig386 5133558_c3_5 631 4036 1047 348 719 3.20E-71 [SP:P54533] [OR:BACILLUS SUBTILIS] [GN:BFMBC]
[DE:DEHYDROGENASE) (LPD-VAL)]
contig386 5193839_c1_2 632 4037 396 131 281 8.20E-25 [SP:Q05619] [OR:CLOSTRIDIUM ACETOBUTYLICUM]
[GN:BUK] [DE:BUTYRATE KINASE, (BK)]
contig387 35410176_f1_1 633 4038 699 232 476 1.80E-45 [SP:P39805] [OR:BACILLUS SUBTILIS] [GN:LICT]
[DE:TRANSCRIPTION ANTITERMINATOR LICT]
contig387 32620287_f2_3 634 4039 192 63 98 2.00E-05 [AC:X00754] [OR:Bacillus subtilis] [GN:open reading frame]
contig387 5078128_f2_4 635 4040 726 241 550 2.60E-53 [AC:L49336] [OR:Clostridium longisporum] [PN:PTS-dependent
enzyme II] [GN:abgF]
contig387 23831266_f2_5 636 4041 249 82 162 4.00E-11 [AC:L49336] [OR:Clostridium longisporum] [PN:PTS-dependent
enzyme II] [GN:abgF]
contig388 32242302_c1_5 637 4042 360 119 424 5.70E-40 [SP:P38424] [OR:BACILLUS SUBTILIS] [GN:YSXC]
[DE:HYPOTHETICAL 22.0 KD PROTEIN IN LON-HEMA
INTERGENIC REGION (ORFX)]
contig388 23860912_c3_7 638 4043 264 87 77 0.0034 [AC:U66708] [OR:Vibrio parahaemolyticus] [PN:ClpX-like protein]
[GN:clpX]
contig388 23626707_c2_6 639 4044 1008 335 1274 4.90E-130 [SP:P50866] [OR:BACILLUS SUBTILIS] [GN:CLPX] [DE:ATP-
DEPENDENT CLP PROTEASE ATP-BINDING SUBUNIT CLPX]
contig388 14064385_f1_1 640 4045 303 100 217 4.90E-18 [AC:Z75208] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:ysoC] [NT:unknown function; putative]
contig389 29558587_f1_1 641 4046 783 260 385 7.80E-36 [SP:P23878] [OR:ESCHERICHIA COLI] [GN:FEPC] [DE:FERRIC
ENTEROBACTIN TRANSPORT ATP-BINDING PROTEIN FEPC]
contig389 4960842_f1_2 642 4047 552 183 215 8.10E-18 [AC:AE000332] [OR:Escherichia coli] [NT:f159 was f126; This 126
aa orf is 33 pct identical]
contig389 24649042_f3_3 643 4048 573 191 210 2.70E-17 [SP:P45515] [OR:CITROBACTER FREUNDII]
[DE:HYPOTHETICAL 19.8 KD PROTEIN IN DHAR-DHAT
INTERGENIC REGION (ORFW)]
contig39 656663_f3_1 644 4049 504 168 247 3.30E-21 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydcL] [NT:PROBABLE
INTEGRASE.]
contig390 32479712_f1_1 645 4050 636 211 604 4.80E-59 [SP:P31080] [OR:BACILLUS SUBTILIS] [GN:LEXA] [DE:SOS
REGULATORY PROTEIN LEXA/DINR]
contig390 14947183_c3_7 646 4051 1173 390 959 1.20E-96 [AC:Z93102] [OR:Bacillus subtilis] [PN:hypothetical 48.5 kd protein]
[GN:ygaP]
contig390 29328156_c3_6 647 4052 189 62 77 0.0064 [OR:Lactococcus lactis] [PN:dihydrofolate reductase,]
contig391 32128186_f1_1 648 4053 204 67 66 0.048 [AC:Z33252] [OR:Mycoplasma capricolum] [PN:DNA polymerase III
(alpha)] [NT:ORF identified by homology to SwissProt entry]
contig391 24883437_f2_3 649 4054 972 323 652 4.00E-64 [AC:Z80835] [OR:Bacillus subtilis] [PN:FMN adenylyltransferase]
[GN:ribC] [NT:riboflavin kinase]
contig391 4726443_f1_2 650 4055 570 189 118 1.90E-06 [SP:P39368] [OR:ESCHERICHIA COLI] [GN:YJHQ]
[DE:HYPOTHETICAL 20.0 KD PROTEIN IN FECI-FIMB
INTERGENIC REGION (F181)]
contig391 10003183_f3_5 651 4056 276 91 51 0.85 [AC:X90990] [OR:Solanum tuberosum] [PN:sucrase] [NT:potential]
contig392 30565930_f3_2 652 4057 1122 373 122 3.10E-05 [OR:Haliotis rufescens] [PN:tropomyosin]
contig392 26443837_c2_8 653 4058 708 235 499 6.50E-48 [AC:Y08559] [OR:Bacillus subtilis] [PN:Unknown] [GN:ywnB]
contig392 16832925_c1_5 654 4059 432 143 323 2.90E-29 [AC:Y08559] [OR:Bacillus subtilis] [PN:Unknown] [GN:ywnA]
contig392 12144840_c3_9 655 4060 393 130 145 2.10E-10 [AC:AE000198] [OR:Escherichia coli] [PN:hypothetical protein in
helD 5′ region] [GN:yccF] [NT:f148; 100 pct to fragment
YCCF_ECOLI SW]
contig393 23443811_f1_1 656 4061 798 265 61 0.54 [SP:Q53866] [OR:STREPTOMYCES COELICOLOR]
[DE:HYPOTHETICAL PROTEIN IN PTPA 5′REGION (ORF1)
(FRAGMENT)]
contig393 5111088_c3_12 657 4062 429 142 146 1.70E-10 [AC:U28163] [OR:Lactobacillus curvatus] [PN:EIIA-man] [GN:manA]
[NT:mannose phosphotransferase system enzyme EII]
contig393 32235050_c3_11 658 4063 588 195 104 5.70E-08 [AC:D90872] [OR:Escherichia coli] [PN:BETA-LACTAMASE
PRECURSOR (EC 3.5.2.6)] [GN:AMPC] [NT:similar to [SwissProt
Accession Number P24735]
contig394 34064402_f1_1 659 4064 486 161 64 0.17 [AC:S81098] [OR:Shigella flexneri] [PN:TEM-type beta-lactamase]
[NT:This sequence comes from Table 2A. Author-given]
contig394 23986437_f2_3 660 4065 468 155 69 0.99 [OR:infectious pancreatic necrosis virus] [PN:genome polyprotein]
contig394 10400626_c2_5 661 4066 360 119 66 0.048 [SP:Q60373] [OR:METHANOCOCCUS JANNASCHII]
[GN:MJ0070] [DE:HYPOTHETICAL PROTEIN MJ0070]
contig394 34276576_c3_6 662 4067 654 217 734 8.10E-73 [SP:P43435] [OR:ENTEROCOCCUS HIRAE] [GN:NTPD]
[DE:TRANSLOCATING ATPASE SUBUNIT D)]
contig394 32157531_c2_4 663 4068 255 84 317 1.30E-28 [SP:Q08637] [OR:ENTEROCOCCUS HIRAE] [GN:NTPB]
[DE:TRANSLOCATING ATPASE SUBUNIT B)]
contig395 3382837_f1_1 664 4069 720 239 139 9.20E-21 [SP:P23553] [OR:CALDOCELLUM SACCHAROLYTICUM]
[GN:XYNC] [DE:ACETYL ESTERASE,]
contig395 22003187_c2_5 665 4070 576 191 140 1.10E-09 [SP:P03038] [OR:ESCHERICHIA COLI] [GN:TETR]
[DE:TETRACYCLINE REPRESSOR PROTEIN CLASS A
(TRANSPOSON 1721)]
contig396 30546912_f1_1 666 4071 1662 553 239 1.10E-36 [AC:AE000176] [OR:Escherichia coli] [GN:ybgB] [NT:o877; 100 pct
identical to the first 86 residues of]
contig397 5116502_f2_1 667 4072 1878 626 1348 7.00E-138 [AC:D86418] [OR:Bacillus subtilis] [PN:Yfnl]
contig398 29407562_f3_4 668 4073 225 74 55 0.994 [SP:Q10702] [OR:MYCOBACTERIUM TUBERCULOSIS]
[GN:MTCY49.33C] [DE:HYPOTHETICAL 33.9 KD PROTEIN
CY49.33C]
contig398 36129837_f1_1 669 4074 558 185 296 2.10E-26 [SP:P41027] [OR:BACILLUS CALDOLYTICUS] [GN:SIPC]
[DE:SIGNAL PEPTIDASE 1, (SPASE 1) (LEADER PEPTIDASE 1)]
contig399 31750775_f1_1 670 4075 399 132 460 8.80E-44 [AC:Z82044] [OR:Bacillus subtilis] [PN:hypothetical 16.4 kd protein]
[GN:ygaG] [NT:homology to ferric uptake regulation protein]
contig399 36207932_f1_2 671 4076 912 303 1257 3.10E-128 [SP:P37061] [OR:ENTEROCOCCUS FAECALIS] [GN:NOX]
[DE:NADH OXIDASE, (NOXASE)]
contig4 4886578_c3_2 672 4077 321 106 164 2.00E-12 [SP:P39345] [OR:ESCHERICHIA COLI] [GN:YJGU] [DE:(EC 1.—.—.—
) (F254)]
contig4 785388_c2_1 673 4078 363 120 301 6.20E-27 [AC:X96977] [OR:Enterococcus faecalis] [GN:orf11]
contig40 7032013_f1_1 674 4079 615 205 505 1.50E-48 [SP:P46352] [OR:BACILLUS SUBTILIS] [GN:RIPX]
[DE:PROBABLE INTEGRASE/RECOMBINASE RIPX]
contig400 30474038_f3_3 675 4080 969 322 53 0.9999 [AC:X81139] [OR:garlic latent virus] [PN:7,8K protein] [GN:coat
protein]
contig400 23652182_f2_2 676 4081 768 255 721 1.90E-71 [AC:AE000290] [OR:Escherichia coli] [NT:o238; This 238 aa orf is
40 pct identical (5 gaps)]
contig401 11218958_c2_10 677 4082 468 156 252 9.70E-22 [SP:P54458] [OR:BACILLUS SUBTILIS] [GN:YQEM]
[DE:HYPOTHETICAL 28.3 KD PROTEIN IN AROD-COMER
INTERGENIC REGION]
contig401 4024213_c1_8 678 4083 213 70 176 1.10E-13 [SP:P54457] [OR:BACILLUS SUBTILIS] [GN:YQEL]
[DE:HYPOTHETICAL 13.3 KD PROTEIN IN AROD-COMER
INTERGENIC REGION]
contig401 23834632_c3_12 679 4084 621 206 445 3.40E-42 [SP:P54456] [OR:BACILLUS SUBTILIS] [GN:YQEK]
[DE:HYPOTHETICAL 21.3 KD PROTEIN IN AROD-COMER
INTERGENIC REGION]
contig401 26462807_c2_9 680 4085 663 220 519 4.90E-50 [SP:P54455] [OR:BACILLUS SUBTILIS] [GN:YQEJ]
[DE:HYPOTHETICAL 22.2 KD PROTEIN IN AROD-COMER
INTERGENIC REGION]
contig401 23474062_c3_11 681 4086 351 116 244 6.80E-21 [SP:P54454] [OR:BACILLUS SUBTILIS] [GN:YQEI]
[DE:HYPOTHETICAL 10.8 KD PROTEIN IN AROD-COMER
INTERGENIC REGION]
contig401 4688164_c1_6 682 4087 462 153 501 4.00E-48 [SP:P54453] [OR:BACILLUS SUBTILIS] [GN:YQEH]
[DE:HYPOTHETICAL 41.0 KD PROTEIN IN NUCB-AROD
INTERGENIC REGION]
contig402 21689766_f3_2 683 4088 1752 583 229 1.60E-15 [AC:Z85982] [OR:Mycobacterium tuberculosis] [PN:unknown]
[GN:MTCY06H11.04c] [NT:MTCY06H11.04c, len]
contig402 16853836_c1_3 684 4089 186 61 117 6.50E-07 [SP:P54470] [OR:BACILLUS SUBTILIS] [GN:YQFL]
[DE:HYPOTHETICAL 30.3 KD PROTEIN IN GLYS-DNAG/DNAE
INTERGENIC REGION]
contig403 19569007_c2_5 685 4090 522 174 119 1.20E-07 [OR:Haemophilus parainfluenzae] [PN:tetracycline resistance protein]
[GN:tetR]
contig403 4491000_f3_2 686 4091 552 183 279 1.30E-24 [AC:D83026] [OR:Bacillus subtilis] [GN:yxkA] [NT:hypothetical]
contig403 4954693_f1_1 687 4092 1248 415 865 1.10E-86 [SP:P25744] [OR:ESCHERICHIA COLI] [GN:YCEE]
[DE:HYPOTHETICAL 43.9 KD PROTEIN IN MSYB-HTRB
INTERGENIC REGION (ORF1)]
contig404 36535250_f2_3 688 4093 1821 606 1475 2.40E-151 [AC:U73807] [OR:Moorella thermoacetica] [PN:formate
dehydrogenase alpha subunit] [GN:fdha] [NT:selenocysteine]
contig405 2382628_f1_2 689 4094 183 60 71 0.015 [AC:Y08502] [OR:Mitochondrion Arabidopsis thaliana] [GN:tRNA-
Ser] [NT:orf107g]
contig406 15824193_c3_11 690 4095 468 155 194 1.40E-15 [SP:P00373] [OR:ESCHERICHIA COLI] [GN:PROC]
[DE:PYRROLINE-5-CARBOXYLATE REDUCTASE, (P5CR) (P5C
REDUCTASE)]
contig406 15631937_c2_10 691 4096 1827 608 908 5.60E-149 [AC:Y14079] [OR:Bacillus subtilis] [PN:hypothetical protein]
[GN:yhxB] [NT:see EMBL M34393 and Swiss Prot P18159.; This
could]
contig406 6823763_c2_9 692 4097 186 61 59 0.24 [AC:X16625] [OR:Mus musculus] [PN:MALA-2 protein] [NT:N-
terminal fragment (AA 1-56)]
contig406 2441309_c2_8 693 4098 918 305 371 2.40E-34 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydfD] [NT:SIMILAR TO
THE RHIZOPINE CATABOLISM (MOCR) GENE OF]
contig407 3367812_c3_9 694 4099 1119 373 727 4.50E-72 [AC:D83026] [OR:Bacillus subtilis] [GN:cydD] [NT:homologous to
many ATP-binding transport proteins;]
contig407 31366394_c1_7 695 4100 1317 438 955 3.10E-96 [AC:D83026] [OR:Bacillus subtilis] [GN:cydC] [NT:homologous to
many ATP-binding transport proteins]
contig408 7035176_f2_2 696 4101 807 268 336 1.20E-30 [OR:Staphylococcus aureus] [PN:llm protein] [GN:llm]
contig408 35657558_f3_4 697 4102 483 160 222 2.20E-18 [OR:Staphylococcus aureus] [PN:llm protein] [GN:llm]
contig408 34414193_f1_1 698 4103 795 264 350 4.00E-32 [AC:X81320] [OR:Acinetobacter calcoaceticus] [GN:epsX]
contig408 3415936_f3_5 699 4104 759 253 99 0.009 [SP:P37782] [OR:SHIGELLA FLEXNERI] [GN:RFBF] [DE:DTDP-
RHAMNOSYL TRANSFERASE RFBF,]
contig409 12986668_f3_3 700 4105 294 97 127 1.70E-08 [AC:D84214] [OR:Bacillus subtilis] [PN:YbbG]
contig409 35942192_c1_6 701 4106 732 243 136 1.00E-12 [OR:Escherichia coli] [PN:hypothetical protein o215b]
contig409 7033138_c3_9 702 4107 678 225 352 2.40E-32 [SP:P54501] [OR:BACILLUS SUBTILIS] [GN:YQGX]
[DE:HYPOTHETICAL 23.2 KD PROTEIN IN SODA-COMGA
INTERGENIC REGION]
contig409 171952_f2_1 703 4108 1386 461 508 7.20E-49 [OR:Saccharomyces cerevisiae] [PN:hypothetical protein YDL238c]
contig409 25665937_f2_2 704 4109 231 77 95 0.00039 [SP:P42086] [OR:BACILLUS SUBTILIS] [GN:PBUX]
[DE:XANTHINE PERMEASE]
contig41 21992790_c3_6 705 4110 453 151 310 6.90E-28 [AC:Y10304] [OR:Bacillus subtilis] [PN:polypeptide deformylase]
[GN:def]
contig41 14661542_c1_3 706 4111 282 93 80 0.023 [AC:Y10304] [OR:Bacillus subtilis] [GN:priA]
contig41 14850064_c2_4 707 4112 279 93 214 8.80E-17 [AC:Y10304] [OR:Bacillus subtilis] [GN:priA]
contig410 34267517_c3_6 708 4113 603 200 66 0.93 [AC:U64502] [OR:Homo sapiens] [PN:immunoglobulin heavy chain
variable region]
contig410 33478377_c1_4 709 4114 618 205 97 0.0086 [AC:Z73102] [OR:Caenorhabditis elegans] [PN:B0035.15] [NT:cDNA
EST CEESB05F comes from this gene]
contig410 29534809_c2_5 710 4115 276 91 118 8.80E-07 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydcR] [NT:SIMILAR TO
ORF20 OF ENTEROCOCCUS FAECALIS]
contig411 6735937_f3_4 711 4116 444 147 146 2.90E-09 [OR:Pseudomonas diminuta] [PN:isoquinoline 1-oxidoreductase 80k
chain] [GN:iorB]
contig411 2444062_f1_1 712 4117 627 208 129 1.80E-07 [AC:AE000371] [OR:Escherichia coli] [NT:o192; This 192 aa orf is
22 pct identical (12 gaps)]
contig411 12698400_f2_2 713 4118 411 137 145 1.40E-09 [AC:D64004] [OR:Synechocystis sp.] [PN:NifS] [GN:nifS]
[NT:ORF_ID]
contig412 36525062_f1_1 714 4119 192 63 59 0.65 [OR:Streptococcus sanguis] [PN:hypothetical IgA1 gene 5′-region
protein]
contig412 24407202_f2_4 715 4120 1101 366 133 7.20E-06 [OR:Staphylococcus epidermidis] [PN:PepT protein]
contig412 34382752_f3_7 716 4121 501 166 289 1.00E-24 [SP:P40416] [OR:SACCHAROMYCES CEREVISIAE] [GN:ATM1]
[DE:MITOCHONDRIAL TRANSPORTER ATM1 PRECURSOR]
contig412 16296901_f3_8 717 4122 438 145 82 0.054 [OR:Staphylococcus aureus] [PN:hypothetical protein]
contig412 29322032_f1_3 718 4123 315 104 77 0.054 [AC:U51115] [OR:Bacillus subtilis] [PN:YebA] [GN:yebA]
[NT:encodes 5 transmembrane helixes]
contig412 50076_c2_13 719 4124 477 158 59 0.38 [AC:X00394] [OR:Saccharomyces cerevisiae] [NT:Ty protein]
contig412 4417817_c3_16 720 4125 393 130 83 0.0081 [AC:AB001488] [OR:Bacillus subtilis] [GN:ydgF] [NT:PROBABLE
AMINO ACID TRANSPORT PERMIASE.]
contig412 4885926_f3_9 721 4126 198 65
contig413 24415938_f2_5 722 4127 270 89 62 0.54 [AC:Z81489] [OR:Caenorhabditis elegans] [PN:C55A1.d] [NT:protein
predicted using Genefinder; preliminary]
contig413 13937768_f1_1 723 4128 432 143 54 0.78 [SP:P11339] [OR:SPIROPLASMA VIRUS 4] [GN:7] [DE:GENE 7
P