5 datasets
  • Last.fm Music Tags

    Offsite — This is a set of artist and genre tag data collected from Last.fm using the Audioscrobbler webservice during the Spring of 2007. The data consists of the raw tag counts for the 100 most frequently occuring tags that Last.fm listeners have applied to over 20,000 artists. Included are artist tags and genre related tags. An undocumented (and deprecated) option of the ...
  • Delicious bookmarks, September 2009

    Offsite — A record of all bookmarking activity on delicious.com for a roughly 10-day period in September 2009. The data comes from Arvind Narayanan, a post-doctoral researcher in Computer Science at Stanford University. Format is JSON, one record per line. There are 1.25 million entries. Download size is 170 MB. Sample record: {"updated": “Tue, 08 Sep 2009 08:45:00 +0000”, ...
  • Document Metadata Based on a Sample of Web Documents from the Open Directory

    Offsite — DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research. Michael G. Noll
  • CiteULike: Available datasets

  • MovieLens Data Sets

    Offsite — Description This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens. Users were selected at random for inclusion. All users selected had rated at least 20 movies. Unlike previous MovieLens data sets, no demographic information is included. Each user is represented by an id, and ...