Collection
Pete Skomoroch's Bookmarks
Showing 151 - 200 out of 375 datasetsPete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.
He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.
-
TeradataUniversityNetwork.com -> Registration
Offsite — -
Pascal Learning Challenge Large Datasets
Offsite — -
ECIS 2007 - The 15th European Conference on Information Systems
Offsite — -
Alexa Web Search
Offsite — -
developerWorks Interviews: Massive data mining and the resurgent mainframe
Offsite — -
University of Arkansas - Daily Headlines
Offsite — -
Crime Data Bonanza!
Offsite — A New Data Set Available through Ohio State University’s Criminal Justice Research Center So you think you know how to analyze time series! Well, how would you like to test your mettle on over 400,000 time series, each with up to 540 data points? The time series in question are monthly data from 1960-2004, for over 17,000 police departments, for seven crime types ... -
State and Federal Case Law
Offsite — -
Wikipedia:Lists of common misspellings/For machines - Wikipedia, the free encyclopedia
Offsite — -
Copyright Free and Public Domain Media
Offsite — -
Access to Web Research Collections VLC2/WT10g/WT2g
Offsite — -
Databases you can use for benchmarking
Offsite — -
Lyricsfly Lyrics REST API
Offsite — Application Programming Interface is available to anyone who wishes to use our database for their own music project, website or program. If you currently use the web to search out lyrics or use code tricks to access other lyrics websites to display relevant lyrics text for your content you can now have a reliable source without the hassle. example code for php: ... -
AVSS Detection and Tracking Algorithm Datasets
Offsite — -
Eigenvector Research, Inc. : Data Sets Available to Download
Offsite — -
OTCBVS
Offsite — -
99 Wikipedia Sources Aiding the Semantic Web » AI3:::Adaptive Information
Offsite — -
UNdata
Offsite — -
AudioScrobbler Data
Offsite — Audioscrobbler, which is now merged with last.fm, once published a database of what music people listened to with the audioscrobbler plugin. Last.fm no longer publishes it, however the initial releases were in the public domain so I can offer it for download. Here’s the file: http://www.iro.umontreal.ca/~lisa/datasets/profiledata_06-May-2005.tar.gz (135MB compressed, ... -
The Linking Open Data dataset cloud
Offsite — -
Free Economic Data | Economic, Financial, and Demographic Data
Offsite — -
MLSP (MACHINE LEARNING FOR SIGNAL PROCESSING) competition
Offsite — -
The Dataverse Network Project | The Dataverse Network Project
Offsite — -
DVN - Home
Offsite — -
Ohio voter registration data
Offsite — -
Voter List Data Files - Election Department, Clark County, Nevada
Offsite — -
Temperature data (HadCRUT3 and CRUTEM3)
Offsite — -
MNIST handwritten digit database, Yann LeCun and Corinna Cortes
Offsite — -
LFW : Labelled Faces in the Wild
Offsite — -
Making random contacts - (37signals)
Offsite — -
Compete - Compete Developer Resources
Offsite — The Compete API is a web service that acts as a middleman between your application and their humongous stores of web metrics for over 1 million web sites. The API calls are structured around the concept of a “Site”, meaning you call the API about a specific site and get back metrics or charts for that site. Compete Site Analytics – provides information for every site ... -
Machine Learning (Theory) » The Peekaboom Dataset
Offsite — -
Ocean Processes and Modeling: Ocean Data
Offsite — -
BlogoCenter Data Sets
Offsite — The following datasets are available: Real-Web dataset containing hash values of the content of 353,739 web pages collected over a period of six months (Feb. 1999 – July 1999). Same real-web dataset formated in three columns (web_site, web_page, change_history). Change history is a sequence of bits: 1 means that the specific page has changed between the respective visits ... -
Tagged datasets for named entity recognition tasks
Offsite — -
del.icio.us stats - deli.ckoma
Offsite — -
Fisher College of Business: The Financial Data Finder
Offsite — At the Financial Data Finder, you can search the database of web sites targeted for finance researchers and scholars. From the Department of Finance at the Fisher College of Business of the Ohio State University You can also use an old text based version of Financial Data Finder, link. -
Freebase Wikipedia Extraction (WEX)
Offsite — -
The arXiv.org API
Offsite — -
England Football Results Betting Odds | Premiership Results & Betting Odds
Offsite — -
HughesData - Main - Hughes Lab
Offsite — -
Stanford MicroArray Database
Offsite — -
ArrayExpress Home
Offsite — -
Gene Expression Omnibus (GEO) Main page
Offsite — -
Public Resources: Courts
Offsite — Bulk.resource.org is a service of Public.Resource.Org, the system contains unsupported, as-is copies of selected U.S. government archives. These resources are pertaining to court information with topics like, fiches and scans, cases, courthouse news service, federal judicial center, JURIS database, request for clarification, and video proceedings. -
Openvest
Offsite — Openvest is the first site on the Financial Semantic web. This is a dynamic site where features and datasets are added and dropped based on client interest. This is not a site for actual Investment Research, but a place where Investment and IT professionals can share ideas. Openvest Finance: This is a demonstration area where one can access Company SEC EDGAR Filings, ... -
Statistical Science Web: Data Sets
Offsite — -
Data Mining: Text Mining, Visualization and Social Media: TailRank, Spinn3r, TechMeme and TechCrunch
Offsite — -
Aleix Face Database
Offsite — -
Data Repository Evaluation
Offsite —