Tag

tagging

8 datasets
  • Last.fm Music Tags

    Offsite — This is a set of artist and genre tag data collected from Last.fm using the Audioscrobbler webservice during the Spring of 2007. The data consists of the raw tag counts for the 100 most frequently occuring tags that Last.fm listeners have applied to over 20,000 artists. Included are artist tags and genre related tags. An undocumented (and deprecated) option of the ...
  • Delicious bookmarks, September 2009

    Offsite — A record of all bookmarking activity on delicious.com for a roughly 10-day period in September 2009. The data comes from Arvind Narayanan, a post-doctoral researcher in Computer Science at Stanford University. Format is JSON, one record per line. There are 1.25 million entries. Download size is 170 MB. Sample record: {"updated": “Tue, 08 Sep 2009 08:45:00 +0000”, ...
  • Document Metadata Based on a Sample of Web Documents from the Open Directory

    Offsite — DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research. Michael G. Noll
  • TAGora » Integrated IMDB and Netflix Dataset

    Offsite — To support the investigation of communal data structures, such as folksonomies, in the context of recommendation, we have created a large knowledge base about movies and how users rate movies. To achieve this, a large portion of the Internet Movie Database (IMDB) was downloaded from to provide information about movies, actors and production personnel, as well a large set ...
  • OpenDover API

    Offsite — OpenDover is the leading webservice that lets you tag your documents based on sentiments and emotions found in your documents. The OpenDover API can handle different ways of sentiment tagging, depending on what your needs are, or what the content is that you provide via the API. The OpenDover knowledge base consists of thousands of opinion words, domain-related words and ...
  • Fast Treebank Part-of-Speech Tagger for Python NLTK

    Free Download — This data download is a pre-trained model for a Bayesian classifier. If you do not have experience with Python NLTK, you may not be interested in this data set. A 99.3% accurate part-of-speech tagger trained on the treebank corpus. It is many times faster than the default NLTK tagger and is a fraction of the size (which means less loading time and lower memory ...
  • Brown Simplifed Tags Part-of-Speech Tagger for Python NLTK

    Free Download — This data download is a pre-trained model for a Bayesian classifier. If you do not have experience with Python NLTK, you may not be interested in this data set. A 98.1% accurate simplified tags part-of-speech tagger trained on the brown corpus. It requires Python and NLTK 2.0 and is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported ...
  • Chinese Part-of-Speech Tagger for Python NLTK

    Free Download — This data download is a pre-trained model for a Bayesian classifier. If you do not have experience with Python NLTK, you may not be interested in this data set. A 98.3% accurate Chinese part-of-speech tagger trained on the sinica_treebank corpus. It requires Python & NLTK 2.0 and is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: ...