All Data sets

Digital Element IP Intelligence Demographics

A geolocation API for all your demographics needs. Search by IP address to return data about a geographical area, including number of households, gender, age groups and language. Looking for more dimensions of IP searchable data? Try the Gelocation API, returning up to 20 geo data points of custom query information per IP address. Or the Domains API that retrieves ...
API

Wikipedia Articles

Did you ever want to correlate Wikipedia articles with geographic locations? You know, so you can figure out whose castle that is on the hill you just drove past, know whether there’s a natural or supernatural phenomenon nearby, or find a tiny museum in your neighborhood? With the Wikipedia Articles API, you can swiftly sift through ~300k Wikipedia entries to find ...
API

Geocoding API

The Geocoding API is a powerful and useful tool that provides location information for any given address in the United States. Geocoding is a process that assigns geographic data (ie, latitude and longitude) to an address. For example, the API would take the address “1214 W 6th St. Austin, TX” and return the latitude 30.272896 and the longitude -97,757443. The API ...
API

Foursquare Places

The Foursquare Places API delivers uniquely rich information about venues, worldwide. Where many geolocation providers will deliver venue categories described across broad types: bars, restaurants, gyms, colleges, grocery stores, etc, Foursquare data is unique in the venue type depth provided: for example, bars are further classified as sports, gay, dive, wine, whiskey, ...
API

Digital Element IP Intelligence Geolocation

A geolocation API with 20 fields of search results, all customized to your IP query. Search by IP address to return data about a geographical area, including country, region, city, internet connection speed, global coordinates, postal and country codes, time zone, and even daylight savings observation status. Looking for more dimensions of IP searchable data? Try the ...
API

US Census (ACS): Income, Age, Housing and Population by Location

The 2009 American Community Survey (ACS) Topline API provides basic demographic data based on your geographically defined query. This geo to ACS data API searches by lat/long coordinates to retrieve ACS data about a geographical area, including education levels, household income, race statistics, household size, gender and age groups. For more information about the ...
API

Digital Element IP Intelligence Domains

A reverse IP lookup API with 5 fields of search results, all customized to your IP query. Search by IP address to return data about the domain, company, ISP, NAICS industry code and proxy type for an IP address. Looking for more dimensions of IP searchable data? Try the Geolocation API, returning up to 20 geo data points of custom query information per IP address. Or ...
API

EIA - Petroleum Data, Reports, Analysis, Surveys

Find statistics on crude oil, gasoline, diesel, propane, jet fuel, ethanol, and other liquid fuels, and information on petroleum prices, crude reserves and production, refining and processing, imports/exports, stocks, and consumption/sales.
Offsite

Open Notebook Science Challenge Solubility Dataset

A collection of non-aqueous solubility measurements, mainly aldehydes, carboxylic acids and amines. The data are linked to the laboratory notebook pages where the measurements were obtained. This is part of the Open Notebook Science Solubility Challenge. Sponsored by Submeta, Nature and Sigma-Aldrich.
Offsite

Richard Nixon - Presidential Recordings

Between February 16, 1971 and July 18, 1973 Richard Nixon secretly recorded roughly 3,700 hours of conversations and meetings in five different locations. With the exception of the manually-operated equipment in the Cabinet Room, Nixon’s recording system was sound-activated and recorded a wide range of conversations of varying audio and substantive quality. The original ...
Offsite

Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species

This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
Free

Enron Email Dataset

From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials About This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a ...
Offsite

Crime Rates by State, 2004 and 2005, and by Type, 2005 (Cleaned up version)

Want Census data in a manageable format? Look no further – this data set of Crime Rates by State (2004 and 2005), and by Type (2005), has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
Free

Crime Rates by State, 2004 and 2005, and by Type, 2005 (Cleaned up version)

The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free

Average Hours Worked Per Day by Employed Persons: 2005

Want Census data in a manageable format? Look no further – this data set of Average Hours Worked Per Day by Employed Persons: 2005 has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
Free

The Whitburn Project: 120 Years of Music Chart History

For the last ten years, obsessive record collectors in Usenet have been working on the Whitburn Project — a huge undertaking to preserve and share high-quality recordings of every popular song since the 1890s. To assist their efforts, they’ve created a spreadsheet of 37,000 songs and 112 columns of raw data, including each song’s duration, beats-per-minute, songwriters, ...
Offsite

Retrosheet: Ballpark Data by Major League Baseball Franchise

All ballparks used for Major League Baseball that have opened since 1903 and many before that. The list for each park contains significant “firsts” to occur there. Parks used for 1 or 2 games are not included. Primary research was done by Jim Herdman and David Vincent. Please notify us of any additions or changes.
Offsite

Word List - 100,000 + Official Crossword Words (Excel readable)

A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of words: ...
Free

Last.fm Music Tags

This is a set of artist and genre tag data collected from Last.fm using the Audioscrobbler webservice during the Spring of 2007. The data consists of the raw tag counts for the 100 most frequently occuring tags that Last.fm listeners have applied to over 20,000 artists. Included are artist tags and genre related tags. An undocumented (and deprecated) option of the ...
Offsite

Retrosheet: Event Files (play-by-play) data for Major League Baseball Games

Retrosheet was founded in 1989 for the purpose of computerizing play-by-play accounts of as many pre-1984 major league games as possible. Play-by-play files (also called event files) — Data files containing literally every play in the included games. The files are designed to be processed further using your own computer. We provide some software to help and some ...
Offsite

Corpus of Erotica Stories

Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet ...
Free

Retrosheet: Major League Baseball Awards and Honors

List of Major League Baseball (MLB) Awards and Honors: The Hall of Fame The Chalmers Most Valuable Player Awards The League Most Valuable Player Awards The Baseball Writers Association of America’s Most Valuable Player Awards The Sporting News Most Valuable Players The Sporting News Major League Players of the Year The Sporting News Players of the Year The ...
Offsite

AMEX Exchange Daily 1970-2010 Open, Close, High, Low and Volume

Historical AMEX stock data from 1970 – 2010, including daily open, close, low, high and trading volume figures. Data is organized alphabetically by ticker symbol. Tickers are filed in spreadsheets titled with the corresponding letter of the alphabet (for example, Daily Prices for MEA appear in the AMEX_daily_prices_M file, and Dividends for MEA appear in the ...
Free

Word List - 10,000+ Common Place Names

U.S. place names for more than 10,000 entries. This U.S. place name list is available in a simple, alphabetically-ordered .txt format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom location tool or database. The entries represent a sampling of U.S. place names: 10,196 places in total.
Free

Delicious bookmarks, September 2009

A record of all bookmarking activity on delicious.com for a roughly 10-day period in September 2009. The data comes from Arvind Narayanan, a post-doctoral researcher in Computer Science at Stanford University. Format is JSON, one record per line. There are 1.25 million entries. Download size is 170 MB. Sample record: {"updated": “Tue, 08 Sep 2009 08:45:00 +0000”, ...
Offsite

Teenagers -- Births and Birth Rates, by Age, Race, and Hispanic Origin: 1990 to 2005

Want Census data in a manageable format? Look no further – this data set of Teenagers: Births and Birth Rates, by Age, Race and Hispanic Origin (1990 to 2005) has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with ...
Free
Question-mark Can't find what you're looking for? Drop us a line.