Collection
Pete Skomoroch's Bookmarks
Showing 101 - 150 out of 375 datasetsPete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.
He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.
-
Live Search : xRank™ Celebrity — check out who’s hot and who’s not!
Offsite — -
IMDbPro.com Free Trial Signup
Offsite — -
Free time-series and micro-data to download
Offsite — -
PyGTrends: Python API for Google Trends Data
Offsite — This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords ... -
Official Google Blog: A new flavor of Google Trends
Offsite — -
Last.fm Music Tags
Offsite — This is a set of artist and genre tag data collected from Last.fm using the Audioscrobbler webservice during the Spring of 2007. The data consists of the raw tag counts for the 100 most frequently occuring tags that Last.fm listeners have applied to over 20,000 artists. Included are artist tags and genre related tags. An undocumented (and deprecated) option of the ... -
i2b2: Informatics for Integrating Biology & the Bedside
Offsite — -
Tiger Data Set Lecture
Offsite — -
Last.fm’s Playground
Offsite — -
ImportGenius.com : U.S. Customs Database and Competitive Intelligence Tools
Offsite — -
Directory Listing of Betfair price files
Offsite — -
Reuters Spotlight - Article and Media API
Offsite — The Reuters Spotlight service provides Reuters.com content in the form of multimedia articles, pictures, videos and text news through a set standards based consumer XML APIs. The Spotlight service also provides an option to receive the content automatically annotated with rich semantic metadata. -
DataSets - Scikits - Trac
Offsite — -
[Wikitech-l] page counters
Offsite — This presents a kind of ‘what pages are visited’ statistics. It is applied to a squid access-log stream and redirected to profiling agent (webstatscollector) then the hourly snapshots are written in very trivial format. This can be used to both noticing strange activities, as well as spotting trends (specific events show up really nicely), let it be a movie premiere, a ... -
Wikipedia article traffic statistics
Offsite — -
Yahoo! Internet Location Platform - YDN
Offsite — -
How to find images on the internet « Random knowledge
Offsite — -
Yahoo offers geographic data to Web sites | Tech news blog - CNET News.com
Offsite — -
Instructions for Obtaining Search Engine Transaction Logs
Offsite — -
TechTC - Technion Repository of Text Categorization Data Sets
Offsite — The Technion Repository of Text Categorization Datasets provides a large number of diverse test collections for use in text categorization research. -
The TechTC-100 Test Collection for Text Categorization
Offsite — -
FEC Election Contributions: Download Detailed Files by Election Cycle
Offsite — -
Juiced Google Analytics Python API: Juice Analytics
Offsite — -
Country Name and ISO 3166 Code MySQL Import File
Offsite — -
Semantic Search the US Library of Congress
Offsite — -
geocoded Hotels « GeoNames Blog
Offsite — Over 70.000 geocoded hotels have been added to the geonames data base. This new hotel data is provided by various hotel booking systems. As of May 2011 geonames.org is working together with three hotel booking systems : hotels.com, diytravel and laterooms. -
GeoNames webservice and data download
Offsite — -
Index of /download/worldcities
Offsite — -
ualberta dependency based thesaurus and word count data
Offsite — -
CommonCrawl - About
Offsite — -
Biomedical Text corpora and related data collection resources.
Offsite — -
Office of Defects Investigation (ODI), Flat File Downloads
Offsite — -
p2psim - kingdata : DNS server latency network distance matrices
Offsite — -
Sep Kamvar / Personalization /
Offsite — -
WikiXMLDB: Querying Wikipedia with XQuery
Offsite — -
Walmart Growth Video
Offsite — -
Open Cell Id dataset - phone geolocation from GSM cellids
Offsite — This project is an open source project, aiming to create a complete database of CellID worldwide, with their locations Project will provides free access to tools, data to not only create this database, but also retrieve location informations. A CellID is the unique number of a GSM cell for a given operator. -
The Cornell Web Lab - The Cornell Web Lab
Offsite — -
im2gps: estimating geographic information from a single image
Offsite — -
Datasets: MUSCLE WP2 Evaluation, Integration and Standards
Offsite — -
Open Economics - Store - Index
Offsite — -
welcome @ omdb
Offsite — -
Cogblog » Blog Archive » Cogmap APIs
Offsite — -
Wal-Mart : Freebase - The World's Database
Offsite — -
Cogmap: The Org Chart Wiki
Offsite — -
German English Parallel Corpus "de-news", Daily News 1996-2000
Offsite — -
Welcome to the CRCNS data sharing activity website — CRCNS
Offsite — -
Infochimps.org: Free Redistributable Rich Data Sets
Offsite — -
Frequent Itemset Mining Dataset Repository
Offsite — -
Dolores Labs Blog » Blog Archive » Our color names data set is online
Offsite —