Collection
Pete Skomoroch's Bookmarks
Showing 51 - 100 out of 375 datasetsPete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.
He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.
-
TradingSolutions - Data Sources
Offsite — -
Announcing the New York Times Campaign Finance API - Open - Code - New York Times Blog
Offsite — -
Beautiful Data - WikiContent
Offsite — -
public domain sounds | free sound library
Offsite — -
Netflix API - Welcome to the Netflix Developer Network
Offsite — The Netflix API allows anyone to build their own Netflix-integrated applications for the web, the desktop, mobile devices or the TV. We do so by providing a range of API methods and data, which we hope will improve the way Netflix customers discover, watch, rate and discuss movies and TV shows. -
Data Catalog
Offsite — -
Voter registration data; or, HERE IS YOUR HOPE, YOU FOOLS! « The Edge of the American West
Offsite — -
Tickermine
Offsite — -
Linked Movie Data Base
Offsite — LinkedMDB publishes linked open data using the D2R Server. The project aims at publishing the first open semantic web database for movies, including a large number of interlinks to several datasets on the open data cloud and references to related webpages. -
Big Huge Thesaurus API: Access 145,000 Words and Phrases
Offsite — This site sports a very simple API for retrieving the synonyms for any word and also an actual Big Huge Thesaurus. License You may use the service for any legal and non-slimy purpose* so long as you link to this site in your website or application credits as follows: Thesaurus service provided by words.bighugelabs.com THE SERVICE IS PROVIDED “AS IS” WITHOUT WARRANTY OF ... -
import/parse/fec.py at master from aaronsw's watchdog — GitHub
Offsite — -
The Watchdog Project: volunteer
Offsite — -
Dataset of the day: Where are the Obamacans? | Off the Map - Official Blog of FortiusOne
Offsite — -
Activity Recognition: Datasets, Bibliography and others
Offsite — -
Normalized Campaign Contribution Data
Offsite — -
YouTube Dataset
Offsite — -
CRAWDAD
Offsite — -
Twitter Development Talk - API Documentation
Offsite — -
Web FAQ collection | ILPS
Offsite — -
Yahoo! Music API - YDN
Offsite — -
Search Query Performance report - Google AdWords Help Center
Offsite — -
Frontal Face Databases
Offsite — -
Searchable Catalogs of Data
Offsite — -
Download Database - baseball1.com
Offsite — -
radiohead - Google Code
Offsite — -
80 Million Tiny Images
Offsite — Visual dictionary presents a visualization of all the nouns in the English language arranged by semantic meaning. Each of the tiles in the mosaic is an arithmetic average of images relating to one of 53,464 nouns. The images for each word were obtained using Google’s Image Search and other engines. A total of 7,527,697 images were used, each tile being the average of 140 ... -
Time Series Center | Harvard University
Offsite — -
OpenVisuals - Open Source Visualization Framework
Offsite — -
BGN (Board on Geographic Names): Domestic Names - State and Topical Gazetteer Download Files
Offsite — The Geographic Names Information System (GNIS) is the Federal and national standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS in support of the U.S. Board on Geographic Names as the official repository of domestic ... -
NGA: Country Files
Offsite — -
Datasets for regression analysis, CVT basis calculations, K-means analysis, and so on.
Offsite — -
Isomap Datasets
Offsite — -
BOSS -- Yahoo! Open Search Ecosystem
Offsite — -
IP Address Lookup - Community Geotarget IP Project
Offsite — -
Airline Data Project
Offsite — -
reddit.com: Ask Reddit: Where to download a DB dump of Reddit?
Offsite — -
What public data is already available?
Offsite — -
Collaborative filtering dataset - dating agency
Offsite — -
About Us - Predictify [DEAD]
No Data — -
VGChartz.com | Video Games, Charts, News, Forums, Reviews, Wii, PS3, Xbox360, DS, PSP
Offsite — -
Code for querying and downloading Flickr images
Offsite — -
Image Parsing Datasets
Offsite — Indeed, the “dataset issue” is a big challenge against every researcher who takes Computer Vision seriously. There are dozens of problems that remain unanswered, such as: How to build a general image database without bias to purpose? How to create a benchmark that reflects the real-world difficulty of image understanding? How to guarantee the correctness of annotation? ... -
TAGora » Integrated IMDB and Netflix Dataset
Offsite — To support the investigation of communal data structures, such as folksonomies, in the context of recommendation, we have created a large knowledge base about movies and how users rate movies. To achieve this, a large portion of the Internet Movie Database (IMDB) was downloaded from to provide information about movies, actors and production personnel, as well a large set ... -
OHPI - Traffic Volume Trends
Offsite — -
PigTutorial - Pig Wiki
Offsite — Apache Pig is a platform for analyzing large data sets. Pig’s language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing. Pig Latin ... -
Quality of Life Grand Challlenge Dataset: Kitchen Capture
Offsite — -
Twitter API Documentation
Offsite — Twitter exposes its data via an Application Programming Interface (API). This document is the official reference for that functionality. -
2008 IEEE InfoVis Contest Dataset
Offsite — -
IMDb Pro : Scary Movie 4: Box office
Offsite — -
Spider-Man 2 (2004) - Daily Box Office Results
Offsite —