Category

Showing 1 - 20 out of 716 datasets

Science

Not finding the data sets you're looking for? Not all of our data sets are categorized yet. Try checking out tags instead.
  • Open Notebook Science Challenge Solubility Dataset

    Offsite — A collection of non-aqueous solubility measurements, mainly aldehydes, carboxylic acids and amines. The data are linked to the laboratory notebook pages where the measurements were obtained. This is part of the Open Notebook Science Solubility Challenge. Sponsored by Submeta, Nature and Sigma-Aldrich.
  • Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
  • Enron Email Dataset

    Offsite — From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials About This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a ...
  • Word List - 100,000 + Official Crossword Words (Excel readable)

    Free Download — A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of words: ...
  • Word List of 64,000+ Common English Dictionary Words (most with definitions, Excel format)

    Free Download — Over 64,000 common dictionary words — A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.
  • The First Billion Digits of Pi

    Free Download — Why calculate so many digits of pi? According to Alexander J. Yee and Shigeru Kondo, world record holders for calculating the most digits of pi (5 billion digits), “because it’s Pi… and because we can!” Do not fret if you lack the expensive hardware, multi-terabytes of storage capacity and loads of free time to calculate the number that never ends. The Infochimps have ...
  • Word List 80,000+ Official Crossword Words (with most definitions, Excel format)

    Free Download — A list of over 80,000 words officially permitted in crossword games like Scrabble™ with some but not all of their definitions. The words are compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has variants of words: -ing, -ed, -s, and so on, it makes a good addition when building a custom spelling dictionary. It is an reference ...
  • Oil Spills in U.S. Water -- Number and Volume: 2000 to 2004

    Free Download — Want Census data in a manageable format? Look no further – this data set of Oil Spills in U.S. Water (Number and Volume: 2000 to 2004) has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
  • U.S. Water Withdrawals and Consumptive Use Per Day by End Use: 1940 to 2000

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • Word List - 100,000+ official crossword words (Excel readable)

    Free Download — 113,809 official crosswords A list of words permitted in crossword games such as Scrabble™. Compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has all forms: -ing, -ed, -s, and so on of words, it makes a good addition when building a custom spelling dictionary.
  • Word List - 350,000+ Simple English Words (Excel readable)

    Free Download — Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
  • Project Vulcan: North American fossil fuel carbon dioxide (CO2) emissions

    Offsite — The Vulcan Project is a NASA/DOE funded effort under the North American Carbon Program (NACP) to quantify North American fossil fuel carbon dioxide (CO2) emissions at space and time scales much finer than has been achieved in the past. The purpose is to aid in quantification of the North American carbon budget, to support inverse estimation of carbon sources and sinks, ...
  • Global Daily Weather Data from the National Climate Data Center (NCDC)

    Offsite — Weather data provided by the National Climate Data Center (NCDC). The downloads available include Global Surface Summary of Day (GSOD) data provided by the NOAA division, National Climate Data Center. Explore the download for weather data, temperature data and more. You can fetch your own copy with wget -r -l3 —no-clobber —no-parent —no-verbose -a /tmp/wget_log.log ...
  • A list of all 22,802 words in the Scribblenauts dictionary.

    Free Download — List of summonable objects from the Nintendo DS game Scribblenauts, from AARDVARK, ABOMINABLE SNOWMAN and ABSCONDER to ZOMBIE, ZUNICERATOPS and ZYGOTE. via the Scribblenauts Wikipedia entry: Scribblenauts is an emergent puzzle action video game with the tagline “Write Anything, Solve Everything”. Its objective is to complete puzzles by summonning any object (from a ...
  • Wordnet

    Offsite — WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts ...
  • Email Data Sets

    Offsite — Due to privacy issues, it is very hard to get a hold of large and realistic email corpora. Here you can find a few email data sets, as well as a dataset of news groups text – annotated with personal names spans. The email corpora given here were extracted from the Enron corpus, made public by the Federal agency Regulatory commission. As a second type of informal text, ...
  • Northern Ireland Neighbourhood Information Service (NINIS) Data Catalogue

    Offsite — About List of datasets is available as XLS file: [Datasets_Available_On_Ninis.xls](http://www.ninis.nisra.gov.uk/mapxtreme/linkeddocs/Datasets_Available_On_Ninis.xls) Re-use Click use license.
  • Major U.S. Weather Disasters by Type, Cost, and Number of Deaths

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • Austin Daily Weather (extracted from National Climate Data Center (NCDC) Data)

    Free Download — About This is an extract from the “Global Daily Weather Data from the National Climate Data Center (NCDC)” dataset for just austin. Contents There are several files in the packages: austin_daily_weather.tsv — daily weather from the operational weather station closest to Austin (Mueller Airport 1948-1999, Bergstrom for the last part of 1999, and Camp Mabry 2000-2009). ...
  • The European Pollutant Release and Transfer Register (E-PRTR) Data

    Offsite — The E-PRTR, the European Pollutant Release and Transfer Register, was formally known as the EPER is the European Pollutant Emission Register – the first European-wide register of industrial emissions into air and water. The E-PRTR covers the 27 EU Member States as well as Iceland, Liechtenstein, Norway, Serbia and Switzerland. The register contains annual data reported ...