Collection
Pete Skomoroch's Bookmarks
Showing 301 - 350 out of 375 datasetsPete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.
He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.
-
Melissa DATA - Lookups
Offsite — Free lookups index of Databases, Web Services, Desktop Software, Mailing Lists, Developer Tools, Integrated Platforms, Desktop Software, Data Enhancement and List Hygiene. -
FactSet: Data Maven - Kiplinger.com
Offsite — -
Wharton Research Data Services (WRDS)
Offsite — Wharton Research Data Services (WRDS) is the leading, comprehensive, internet-based data research service used by academic, government, non-profit institutions, and corporate firms. WRDS manages the data and delivers it in a seamless, unified, and consistent form. WRDS provides the user with one location to access over 200 terabytes of data across multiple disciplines ... -
Thomson Financial I/B/E/S (Institutional Broker Estimate System) Data
Offsite — The I/B/E/S quality database of historical estimates, has been the proving ground for new, innovative investment strategies for 30 years. The database has a superior reputation for historical data, and is especially known for its unparalleled quality and depth among both practitioners and academicians. License I/B/E/S places restrictions on the use of its data. You ... -
Historical Quotes - Yahoo! Finance
Offsite — Yahoo! Finance offers access to historical quote data in tabular format in several timeframes: Daily, Weekly, Monthly, and Dividends. The historical quotes feature includes notations for all splits and dividend distributions during the date range covered. Open, high, low, and close quotes are not adjusted for splits or dividends. An additional column, Adjusted Close, is ... -
Network data
Offsite — -
Bureau of Labor Statistics Home Page
Offsite — -
NAR: Research: Existing Home Sales Data
Offsite — Latest Existing-Home Sales (EHS) Information -
Chain Store Guide - Retail Locations
Offsite — -
Energy Information Administration - Official Energy Statistics from the U.S. Government
Offsite — -
Databases you can use for benchmarking
Offsite — -
UPC Database: Downloads
Offsite — -
Web Crawling / Crawl Datasets at Tobias Escher at the OII
Offsite — -
Minnesota Traffic Management Center (TMC) Data Archive Download
Offsite — The data in this archive are continuously collected by the Traffic Management Center (TMC), a division of Mn/DOT, at a 30-second interval from over 4,500 loop detectors located around the Twin Cities Metro freeways, seven days a week and all year round. The collected data are then daily packaged into a single zip file and loaded into the UMD (University of Minnesota ... -
http://www.volvis.org/
Offsite — -
Computational Vision: Archive
Offsite — -
DC Pedestrian Classification Benchmark
Offsite — -
Web as Corpus
Offsite — -
Computer hacker wordlists from packetstormsecurity.org
Offsite — -
Enron Dataset
Offsite — -
Splog Blog Dataset
Offsite — -
Home Page for 20 Newsgroups Data Set
Offsite — -
White Glove Tracking
Offsite — -
NOAA Paleoclimatology Program - Coral and Sclerosponge Data
Offsite — -
NAICS -- North American Industry Classification System
Offsite — -
Saving Democracy With Web 2.0 -
Offsite — -
Congresspedia - Congresspedia
Offsite — -
Population Estimates Data Sets
Offsite — -
CRAN Task View: Machine Learning & Statistical Learning
Offsite — -
Data for Data Mining
Offsite — -
PAIDA - Pure Python scientific analysis package
Offsite — -
SUBDUE - Graph Based Knowledge Discovery
Offsite — -
AOL Search Data Mirrors
Offsite — This collection consists of ~20M web queries collected from ~650k users over three months. The data is sorted by anonymous user ID and sequentially arranged. The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research. From AOL’s original Read-Me ... -
Python Cheese Shop : Shakespeare 0.4
Offsite — -
AG's corpus of news articles
Offsite — -
Sampling Techniques for Massive Data - Google Video
Offsite — -
metachronistic » Mirror the Wikipedia
Offsite — -
LETOR: Benchmark Datasets for Learning to Rank
Offsite — We release two large scale datasets for research on learning to rank: MSLR-WEB30k with more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 queries. The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels: ... -
CN710: Comparative Analysis of Learning Systems (Spring 2006) - Class Project
Offsite — -
UrbanSim, Urban Development Software
Offsite — -
Wikipedia³ - Conversion of Wikipedia into RDF
Offsite — Wikipedia³ is a conversion of the English Wikipedia into RDF. It’s a monthly updated dataset containing around 47 million triples. The creation of the dataset is motivated by several factors, one being the desire to have more real-world RDF datasets of reasonable size. Wikipedia assembles a wealth of information created and maintained by people all over the globe – ... -
System One - Labs
Offsite — -
Face Recognition Homepage - Databases
Offsite — -
CBCL SOFTWARE Face data set
Offsite — -
Text Analytics Solutions from ClearForest
Offsite — -
23C3 - Mining Search Queries - Google Video
Offsite — -
Digital History Hacks: Keywords and Clues
Offsite — -
Digital History Hacks: Searching for History
Offsite — -
The Tom Kyte Blog: An interesting data set...
Offsite — -
KDD 2005 - KDD Cup 2005: Aug 21-24, Chicago, IL. USA
Offsite —