  |
The RCSB Protein Data Bank (PDB) - http://www.rcsb.org/pdb/
Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory. |
  |
National Space Science Data Center - http://nssdc.gsfc.nasa.gov/
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software. |
  |
Penn Treebank Project - http://www.cis.upenn.edu/~treebank/
A corpus of parsed sentences. Used by many researchers for training data-driven parsing algorithms. |
  |
TREC Data - http://trec.nist.gov/data.html
Text datasets used in information retrieval and learning in text domains. |
  |
Reuters-21578 Text Categorization Corpus - http://www.daviddlewis.com/resources/testcollections/reuters21578/
A classic benchmark for text categorization algorithms. |
  |
Time Series Data Library - http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/
A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject. |
  |
DELVE - Data for Evaluating Learning in Valid Experiments - http://www.cs.utoronto.ca/~delve/
Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets. |
  |
NIST Special Database 4. - http://www.nist.gov/srd/nistsd4.htm
This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. |
  |
UCI Machine Learning Repository - http://www.ics.uci.edu/~mlearn/MLRepository.html
A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. |
  |
Web->KB dataset - http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web. |