15 datasets

Natural Science

Not finding the data sets you're looking for? Not all of our data sets are categorized yet. Try checking out tags instead.
  • Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
  • RCSB Protein Data Bank

    Offsite — Description As of August 2008 over 52 thousand structures available for download. From home page: > The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease. > > The RCSB is a member of the wwPDB whose mission is to ensure that the PDB archive remains ...
  • Mushroom Data Set

    Offsite — This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no ...
  • Central Nervous System (CNS) Compound Library from OTAVA

    Free Download — There are many different drug targets expressed in brain tissue. The delivery of drugs to central nervous system (CNS) is complicated by blood brain barrier (BBB) which controls entry of drugs into CNS. BBB is one of the most important factors limiting the development of drugs that specifically target brain disorders. Nowadays BBB remains a bottleneck in brain drug ...
  • AceDB Genome Database

    Offsite — AceDB is a genome database system developed since 1989 primarily by Jean Thierry-Mieg (CNRS, Montpellier) and Richard Durbin (Sanger Institute). It provides a custom database kernel, with a non-standard data model designed specifically for handling scientific data flexibly, and a graphical user interface with many specific displays and tools for genomic data. AceDB is ...
  • Human Genome Data Set

    Offsite — This data set contains the raw export files of the first genome sequenced by Illumina Individual Genome Service using Illumina’s Genome Analyzer technology of paired 75-base reads. 92,254,659,274 bases were used to generate a consensus sequence with coverage of 32x average depth. The genome was obtained via peripheral blood of Jay Flatley, CEO of Illumina.
  • YRI Trio Dataset

    Offsite — The YRI Trio Dataset provides complete genome sequence data for three Yoruba individuals from Ibadan, Nigeria, which represent the first human genomes sequenced using Illumina’s next generation Sequence-by-Synthesis technology. For each genome, the dataset contains >30x average depth of paired 35-base reads. This data set can be used for the following applications: The ...
  • Ensembl - FASTA Database Files

    Offsite — FASTA database files are sequence databases of transcript and translation models predicted by the Ensembl analysis and annotation pipeline, as well as by ab initio methods. Read more about the FASTA format.
  • 3D Version of the PubChem Library

    Offsite — This data set is a 3D Version of the PubChem Library. PubChem provides information on the biological activities of small molecules. It is a component of NIH’s Molecular Libraries Roadmap Initiative.
  • PubChem Library

    Offsite — PubChem provides information on the biological activities of small molecules. It is a component of NIH’s Molecular Libraries Roadmap Initiative.
  • GenBank

    Offsite — GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2008 Jan;36(Database issue):D25-30). There are approximately 85,759,586,764 bases in 82,853,685 sequence records in the traditional GenBank divisions and 108,635,736,141 bases in 27,439,206 sequence records in the WGS division as of ...
  • Unigene

    Offsite — Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location.
  • Ensembl Annotated Human Genome Data - for MySQL

    Offsite — This data set provides scientists with the opportunity to research and understand this important area of biology. These snapshots includes all the databases that are available at, as well as the Ensembl Biomart, which is a denormalized, query-optimized database that facilitates complex queries of one or more datasets. Full installation instructions ...
  • AnthroKids - Anthropometric Data of Children

    Offsite — This data set includes the results of two studies which collected anthropometric data of children. The studies, conducted in 1975 and 1977 are available in a number of different formats. These studies were the result of a Consumer Product Safety Commission (CPSC) effort in the mid-seventies. The creation of a publically accessible database is the result of a joint effort ...
  • Influenza Virus (including updated Swine Flu sequences)

    Offsite — This data set includes database and sequence data from the NIAID Influenza Genome Sequencing Project and Genbank. For more information on this data set refer to the NCBI Influenza Virus Resource *Update: This data set is being updated regularly to include new sequences of swine influenza A (H1N1) submitted by the Center for Disease Control and Prevention (CDC).