  • 1980 US Census

    Offsite — Data from the 1980 US Census from the US Census Bureau
  • 1990 US Census

    Offsite — Data from the 1990 US Census from the US Census Bureau
  • 2000 US Census

    Offsite — Data from the 2000 US Census from the US Census Bureau
  • 2003-2006 US Economic Data

    Offsite — US Economic Data for 2003-2006 from the The US Census Bureau
  • 2008 TIGER/Line Shapefiles

    Offsite — This data set is a complete set of Census 2000 and Current shapefiles for American states, counties, subdivisions, districts, places, and areas. The data is available as shapefiles suitable for use in GIS, along with their associated metadata. The official source of this data is the US Census Bureau, Geography Division.
  • 3D Version of the PubChem Library

    Offsite — This data set is a 3D Version of the PubChem Library. PubChem provides information on the biological activities of small molecules. It is a component of NIH’s Molecular Libraries Roadmap Initiative.
  • AnthroKids - Anthropometric Data of Children

    Offsite — This data set includes the results of two studies which collected anthropometric data of children. The studies, conducted in 1975 and 1977 are available in a number of different formats. These studies were the result of a Consumer Product Safety Commission (CPSC) effort in the mid-seventies. The creation of a publically accessible database is the result of a joint effort ...
  • Business and Industry Summary Data

    Offsite — Business and Industry Summary Data from the The US Census Bureau
  • Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD)

    Offsite — Data originally collected as part of the Global Surface Summary of Day (GSOD) by the National Climactic Data Center (NCDC). Data collected, transformed, and uploaded by Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex ...
  • DBPedia

    Offsite — ,DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of ...
  • Ensembl - FASTA Database Files

    Offsite — FASTA database files are sequence databases of transcript and translation models predicted by the Ensembl analysis and annotation pipeline, as well as by ab initio methods. Read more about the FASTA format.
  • Ensembl Annotated Human Genome Data - for MySQL

    Offsite — This data set provides scientists with the opportunity to research and understand this important area of biology. These snapshots includes all the databases that are available at, as well as the Ensembl Biomart, which is a denormalized, query-optimized database that facilitates complex queries of one or more datasets. Full installation instructions ...
  • Federal Contracts from the Federal Procurement Data Center (

    Offsite — This data set is a dump from the Federal Procurement Data Center (FPDC), which manages the Federal Procurement Data System (FPDS-NG). FPDS-NG collects and disseminates procurement data – or information about contracts that the federal government gives to private companies. The FPDS-NG summarizes who bought what, from whom, and where. See For a ...
  • GenBank

    Offsite — GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2008 Jan;36(Database issue):D25-30). There are approximately 85,759,586,764 bases in 82,853,685 sequence records in the traditional GenBank divisions and 108,635,736,141 bases in 27,439,206 sequence records in the WGS division as of ...
  • Human Genome Data Set

    Offsite — This data set contains the raw export files of the first genome sequenced by Illumina Individual Genome Service using Illumina’s Genome Analyzer technology of paired 75-base reads. 92,254,659,274 bases were used to generate a consensus sequence with coverage of 32x average depth. The genome was obtained via peripheral blood of Jay Flatley, CEO of Illumina.
  • Influenza Virus (including updated Swine Flu sequences)

    Offsite — This data set includes database and sequence data from the NIAID Influenza Genome Sequencing Project and Genbank. For more information on this data set refer to the NCBI Influenza Virus Resource *Update: This data set is being updated regularly to include new sequences of swine influenza A (H1N1) submitted by the Center for Disease Control and Prevention (CDC).
  • Labor Statistics Databases

    Offsite — Statistics on Inflation & Prices, Employment, Unemployment, Pay & Benefits, Spending & Time Use, Productivity, Workplace Injuries, International Comparisons, Employment Projections, and Regional Resources
  • M-Lab dataset: Network Diagnostic Tool (NDT)

    Offsite — NDT is a network performance testing system that allows end-users to attempt to identify computer configuration and network infrastructure problems that degrade their broadband experience. By running a short test between a user’s computer and an NDT server, the tool can provide information on a user’s connection speed and attempt to diagnose what, if any, problems ...
  • M-Lab dataset: Network Path and Application Diagnosis tool (NPAD)

    Offsite — NPAD is a network performance testing system that helps end-users to diagnose some of the common problems effecting the last network mile and end-users’ systems. As NPAD transfers bulk data between a user’s computer and an NPAD server, it gathers detailed statistics about what mechanisms actually regulate performance. In doing so, the server collects test results and ...