Collection
Pete Skomoroch's Bookmarks
Showing 1 - 50 out of 375 datasetsPete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.
He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.
-
Article Search API - NYTimes.com
Offsite — With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata. Along with standard keyword searching, the API also offers faceted searching. The available facets include Times-specific fields such as sections, taxonomic classifiers and ... -
Information Extraction: The RISE Repository of Information Sources
Offsite — RISE is a distributed repository of online information sources that are used for the empirical analysis of learning algorithms that generate extraction patterns. The sources included in this repository are provided by people from the information extraction (IE) and wrapper generation (WG) communities. Both communities use machine learning algorithms to generate ... -
Kiva API
Offsite — -
Using the Wikipedia link dataset -- Henry Haselgrove
Offsite — -
Lookery Developer Network - Lookery Developer Resources
Offsite — -
Visualizing the Growth of Target, 1962-2008 | FlowingData
Offsite — The first Target opened in 1962 in Roseville, Minnesota, and by 1972 there were 46. The corporation focused mostly on expansion in the Central United States for the next decade, but in 1982, Target acquired 33 FedMart stores in Arizona, California, and Texas. There are now over 1,600 stores across the United States. This visualization shows the location and opening date ... -
The Economy According To Mint
Offsite — -
Digging into Data - Various Repositories
Offsite — A list of digital libraries, data archives, and data repositories that are inviting Digging into Data researchers to use their collections. For each repository, you’ll find a description of their contents, contact information, and other details. -
Subsidyscope.com
Offsite — -
Best Buy Remix - Welcome to the Best Buy Remix Developer Network
Offsite — Opening up data gives it a purpose. Feel free to build upon our data. Be our guest. BBYOpen offers a RESTful interface It’s free and Commission can be earned Well documented Lots of samples & tutorials -
Twibs : Find the Businesses on Twitter
Offsite — Twibs was created by a small group of people with one purpose: Give twitter users a place to find businesses on twitter. The Twibs founders are big believers in the power of twitter to connect customers with businesses. They are working on making it easy for consumers to find businesses, both local and national. Keep in mind, they’re just getting started, so there may be ... -
Earth Satellite Images - True Marble Imagery
Offsite — -
Massive Scrape of Twitter’s Friend Graph « blog.infochimps.org - Organizing Huge Information Sources
Offsite — -
Twitter Scrape (rough draft) - get.theinfo | Google Groups
Offsite — -
API Documentation — BackType
Offsite — -
generatedata.com
Offsite — -
Full Examples — PyMVPA Home
Offsite — -
wiki.dbpedia.org : Downloads 32
Offsite — -
CinC Challenge 2000 data sets
Offsite — -
Free Book Usage Data from the University of Huddersfield
Offsite — The University of Huddersfield released a major portion of their book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period. The data they’ve released essentially comes in two big chunks: 1) Circulation Data ... -
UC Berkeley. Sheldon Margen Public Health Library. Statistical/Data Resources
Offsite — -
BART - For Developers
Offsite — -
Sparse Matrix Collection : Sparse Matrices From a Wide Range of Applications
Offsite — These matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (such as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that ... -
Others Online - Behavioral Targeting, Analytics and Advertising Service for Publishers, Ad Networks,
Offsite — -
HumanScan : BioID : Downloads : BioID Face Database
Offsite — -
Face Detection
Offsite — hello i want face detection dataset.tanx -
Building a (fast) Wikipedia offline reader
Offsite — -
Change.gov: The Obama-Biden Transition Team | Join the Discussion: Healthcare
Offsite — -
UN General Assembly Voting Data
Offsite — -
NORB Object Recognition Dataset, Fu Jie Huang, Yann LeCun, New York University
Offsite — This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). ... -
Reddit’s Secret API
Offsite — -
Idealware: Mapping Blues: Where is the Data?
Offsite — -
Opinion Extraction, Opinion Mining, Sentiment Analysis, Summarization of Customer Reviews
Offsite — -
Amazon Web Services Public Datasets » Data Wrangling Blog
Offsite — -
Amazon Web Services (AWS) Hosted Public Data Sets
Offsite — -
AFL-CIO Executive PayWatch Database
Offsite — An index of company names that link to their CEO’s total compensation and to see how their compensation compares to your and other workers’ earnings. -
http://www.yr-bcn.es/semanticWikipedia
Offsite — -
Research Datasets :: CID Data :: Center for International Development at Harvard University (CID)
Offsite — -
NACDA: Search Holdings
Offsite — -
LIFE photo archive hosted by Google
Offsite — -
phishingcorpus [JoseWiki]
Offsite — -
Wikipedia Datasets for the Hadoop Hack | Cloudera
Offsite — -
WSCD09: Workshop on Web Search Click Data 2009
Offsite — -
Main Task QA Data
Offsite — -
ADL Gazetteer Development
Offsite — -
The New York Times Annotated Corpus « YooName - named entity recognition
Offsite — -
downloading - flossmole - Google Code - How to get FLOSSmole data for your own use
Offsite — -
Google Flu Trends | How does this work?
Offsite — Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time. This site has a visualization of Google Flu Trends in comparison to the CDC’s data. There is also a link to a dataset of Google Flu Trends weekly influenza activity estimates for the world, from December 2002 to the present. Each week, millions of ... -
Multi-Domain Sentiment Dataset
Offsite — -
Chris Pound's Name Generation Page
Offsite —