Tag

classification

14 datasets
  • Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species

    Free Download — This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
  • Document Metadata Based on a Sample of Web Documents from the Open Directory

    Offsite — DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research. Michael G. Noll
  • Opinion Extraction, Opinion Mining, Sentiment Analysis, Summarization of Customer Reviews

    Offsite
  • The TechTC-100 Test Collection for Text Categorization

    Offsite
  • ICONCLASS - Multilingual Thematic Classification

    Offsite — About From the website: > This is an experimental service that makes the ICONCLASS Iconographic Classification system available as linked-data using the SKOS vocabulary. This service is inspired by the excellent Library of Congress Subject Headings linked data service. It is intentionally copied in spirit and conventions used. The idea is to enable others to make ...
  • Mushroom Data Set

    Offsite — This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no ...
  • DC Pedestrian Classification Benchmark

    Offsite
  • 20 Newsgroups Dataset (De-Duped Version)

    Free Download — The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It is speculated that it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 Newsgroups collection has become a ...
  • TechTC - Technion Repository of Text Categorization Data Sets

    Offsite — The Technion Repository of Text Categorization Datasets provides a large number of diverse test collections for use in text categorization research.
  • Standard Occupational Classification

    Offsite — The Standard Occupational Classification (SOC) system is used by Federal statistical agencies to classify workers into occupational categories for the purpose of collecting, calculating, or disseminating data. All workers are classified into one of over 820 occupations according to their occupational definition. Additional facts from data.gov Dataset Summary Date ...
  • National Accounts Sector Classification

    Offsite — Provides information on the classification of organisations and institutions in the National Accounts. Source agency: Office for National Statistics Designation: National Statistics Language: English Alternative title: Sector Classification Statistical classification of Northern Rock plc Financial support for the banking industry: classification issues Classification of ...
  • Mobile User Short Message Data of One Mobile Operator in China

    No Data — Mobile User Short Message Data, comes from One Mobile Operator in China. Data mainly includes formal short message and spam message. There are 170229 records of spams and 33588 records of formal messages.
  • Victorian Dryland Salinity Assessment 2000 – Best Case Trends (VIC_MIN_TREND) from data.gov.au

    No Data — A 685.00KB dataset from data.gov.au. This information shows those areas with rising, flat or falling watertable trends based on the minimum (or best case) trend derived from the bore hydrograph analysis. No information has been compiled in defined ...
  • Victorian Dryland Salinity Assessment 2000 – Worst Case Trends (VIC_MAX_TREND) from data.gov.au

    No Data — A 1.05MB dataset from data.gov.au. This information shows those areas with rising, flat or falling watertable trends based on the maximum (or worst case) trend derived from the bore hydrograph analysis. No information has been compiled in defined ...