Datasets

Showing 1 - 32 out of 11438 datasets
  • Digital Element IP Intelligence Demographics

    No Data — The service for this API is available to customers of the Infochimps Cloud, the fastest way to develop and deploy Big Data applications in public, virtual private and private clouds. Learn more About Digital Element: Digital Element is the premier supplier for IP geolocation data. Their data is used by a number of large internet companies, including Ask.com, AOL, and ...
  • Wikipedia Articles

    Free Download — Services for this API has ceased Our apologies for the inconvenience this may cause. You can find a download of the data set for this API on this page Did you ever want to correlate Wikipedia articles with geographic locations? You know, so you can figure out whose castle that is on the hill you just drove past, know whether there’s a natural or supernatural phenomenon ...
  • Geocoding API

    No Data — The service for this API has ceased Our apologies for the inconvenience this may cause. The Geocoding API is a powerful and useful tool that provides location information for any given address in the United States. Geocoding is a process that assigns geographic data (ie, latitude and longitude) to an address. For example, the API would take the address “1214 W 6th St. ...
  • Foursquare Places

    No Data — The service for this API has ceased Our apologies for the inconvenience this may cause. The Foursquare Places API delivers uniquely rich information about venues, worldwide. Where many geolocation providers will deliver venue categories described across broad types: bars, restaurants, gyms, colleges, grocery stores, etc, Foursquare data is unique in the venue type depth ...
  • Digital Element IP Intelligence Geolocation

    No Data — The service for this API is available to customers of the Infochimps Cloud, the fastest way to develop and deploy Big Data applications in public, virtual private and private clouds. Learn more About Digital Element: Digital Element is the premier supplier for IP geolocation data. Their data is used by a number of large internet companies, including Ask.com, AOL, and ...
  • US Census (ACS): Income, Age, Housing and Population by Location

    Free Download — The service for this API has ceased Our apologies for the inconvenience this may cause. You can find a download of the data set for this API on this page The 2009 American Community Survey (ACS) Topline API provides basic demographic data based on your geographically defined query. This geo to ACS data API searches by lat/long coordinates to retrieve ACS data about a ...
  • Digital Element IP Intelligence Domains

    No Data — The service for this API is available to customers of the Infochimps Cloud, the fastest way to develop and deploy Big Data applications in public, virtual private and private clouds. Learn more About Digital Element: Digital Element is the premier supplier for IP geolocation data. Their data is used by a number of large internet companies, including Ask.com, AOL, and ...
  • EIA - Petroleum Data, Reports, Analysis, Surveys

    Offsite — Find statistics on crude oil, gasoline, diesel, propane, jet fuel, ethanol, and other liquid fuels, and information on petroleum prices, crude reserves and production, refining and processing, imports/exports, stocks, and consumption/sales.
  • Twitter Census :: Developer Tools - Mapping from Twitter User Search ID to Twitter API IDs

    Free Download — Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data comes from analysis on the full set of tweets during that time period, which is 35 million users, over 500 million tweets, and more than 1 billion relationships between users. This dataset maps Twitter screen names to a user’s corresponding Twitter API ID ...
  • Open Notebook Science Challenge Solubility Dataset

    Offsite — A collection of non-aqueous solubility measurements, mainly aldehydes, carboxylic acids and amines. The data are linked to the laboratory notebook pages where the measurements were obtained. This is part of the Open Notebook Science Solubility Challenge. Sponsored by Submeta, Nature and Sigma-Aldrich.
  • Richard Nixon - Presidential Recordings

    Offsite — Between February 16, 1971 and July 18, 1973 Richard Nixon secretly recorded roughly 3,700 hours of conversations and meetings in five different locations. With the exception of the manually-operated equipment in the Cabinet Room, Nixon’s recording system was sound-activated and recorded a wide range of conversations of varying audio and substantive quality. The original ...
  • Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
  • Twitter Census - Conversation Metrics: One year of URLs, Hashtags, Smileys usage (Smiley Counts)

    Free Download — Twitter smiley data from millions of tweets! This is a free download of Twitter data from March 2006 to November 2009. The smiley data comes from analysis on the full set of tweets during that time period, which is 35 million users, over 500 ...
  • Twitter Census - Conversation Metrics: One year of URLs, Hashtags, Smileys usage (monthly)

    Free Download — Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data set consists of “tokens,” which are hashtags (#data), URLs, or emoticons (Twitter smileys or other “faces” created using keyboard characters). The data comes from analysis on the full set of tweets during that time period, which is 35 million users, over ...
  • Enron Email Dataset

    Offsite — From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials About This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a ...
  • Twitter Census - Conversation Metrics: One Year of URLs, Hashtags, Smileys Usage (by Hour)

    Free Download — Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data set consists of “tokens,” which are hashtags (#data), URLs, or emoticons (Twitter smileys or other “faces” created using keyboard characters). The data comes from analysis on the full set of tweets during that time period, which is 40 million users, 1.6 ...
  • Crime Rates by State, 2004 and 2005, and by Type, 2005 (Cleaned up version)

    Free Download — Want Census data in a manageable format? Look no further – this data set of Crime Rates by State (2004 and 2005), and by Type (2005), has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
  • Crime Rates by State, 2004 and 2005, and by Type, 2005 (Cleaned up version)

    Free Download — The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
  • Average Hours Worked Per Day by Employed Persons: 2005

    Free Download — Want Census data in a manageable format? Look no further – this data set of Average Hours Worked Per Day by Employed Persons: 2005 has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
  • The Whitburn Project: 120 Years of Music Chart History

    Offsite — For the last ten years, obsessive record collectors in Usenet have been working on the Whitburn Project — a huge undertaking to preserve and share high-quality recordings of every popular song since the 1890s. To assist their efforts, they’ve created a spreadsheet of 37,000 songs and 112 columns of raw data, including each song’s duration, beats-per-minute, songwriters, ...
  • Retrosheet: Ballpark Data by Major League Baseball Franchise

    Offsite — All ballparks used for Major League Baseball that have opened since 1903 and many before that. The list for each park contains significant “firsts” to occur there. Parks used for 1 or 2 games are not included. Primary research was done by Jim Herdman and David Vincent. Please notify us of any additions or changes.
  • Word List - 100,000 + Official Crossword Words (Excel readable)

    Free Download — A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of words: ...
  • Last.fm Music Tags

    Offsite — This is a set of artist and genre tag data collected from Last.fm using the Audioscrobbler webservice during the Spring of 2007. The data consists of the raw tag counts for the 100 most frequently occuring tags that Last.fm listeners have applied to over 20,000 artists. Included are artist tags and genre related tags. An undocumented (and deprecated) option of the ...
  • Word List of 64,000+ Common English Dictionary Words (most with definitions, Excel format)

    Free Download — Over 64,000 common dictionary words — A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.
  • Retrosheet: Event Files (play-by-play) data for Major League Baseball Games

    Offsite — Retrosheet was founded in 1989 for the purpose of computerizing play-by-play accounts of as many pre-1984 major league games as possible. Play-by-play files (also called event files) — Data files containing literally every play in the included games. The files are designed to be processed further using your own computer. We provide some software to help and some ...
  • Retrosheet: Major League Baseball Awards and Honors

    Offsite — List of Major League Baseball (MLB) Awards and Honors: The Hall of Fame The Chalmers Most Valuable Player Awards The League Most Valuable Player Awards The Baseball Writers Association of America’s Most Valuable Player Awards The Sporting News Most Valuable Players The Sporting News Major League Players of the Year The Sporting News Players of the Year The ...
  • AMEX Exchange Daily 1970-2010 Open, Close, High, Low and Volume

    Free Download — Historical AMEX stock data from 1970 – 2010, including daily open, close, low, high and trading volume figures. Data is organized alphabetically by ticker symbol. Tickers are filed in spreadsheets titled with the corresponding letter of the alphabet (for example, Daily Prices for MEA appear in the AMEX_daily_prices_M file, and Dividends for MEA appear in the ...
  • Word List - 10,000+ Common Place Names

    Free Download — U.S. place names for more than 10,000 entries. This U.S. place name list is available in a simple, alphabetically-ordered .txt format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom location tool or database. The entries represent a sampling of U.S. place names: 10,196 places in total.
  • Delicious bookmarks, September 2009

    Offsite — A record of all bookmarking activity on delicious.com for a roughly 10-day period in September 2009. The data comes from Arvind Narayanan, a post-doctoral researcher in Computer Science at Stanford University. Format is JSON, one record per line. There are 1.25 million entries. Download size is 170 MB. Sample record: {"updated": “Tue, 08 Sep 2009 08:45:00 +0000”, ...
  • Teenagers -- Births and Birth Rates, by Age, Race, and Hispanic Origin: 1990 to 2005

    Free Download — Want Census data in a manageable format? Look no further – this data set of Teenagers: Births and Birth Rates, by Age, Race and Hispanic Origin (1990 to 2005) has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with ...
  • Age-Adjusted Percent Distributions of Body Mass Index (BMI) Among Persons

    Free Download — Want Census data in a manageable format? Look no further – this data set of Age-Adjusted Percent Distributions of Body Mass Index (BMI) Among Persons has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with ...
  • HIV Drug Resistance Database

    Offsite — The main functions of HIVDB are: To store, analyze and make available the diverse forms of data underlying drug resistance knowledge to the broad community of researchers and clinicians studying HIV drug resistance and using HIV drug resistance tests; To provide a publicly available online resource to help those performing HIV drug resistance surveillance, interpreting ...