Free Download
—
This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
Offsite
—
DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research.
Michael G. Noll
Offsite
—
About From the website: > This is an experimental service that makes the ICONCLASS Iconographic Classification system available as linked-data using the SKOS vocabulary. This service is inspired by the excellent Library of Congress Subject Headings linked data service. It is intentionally copied in spirit and conventions used. The idea is to enable others to make ...
Offsite
—
This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no ...
Free Download
—
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It is speculated that it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 Newsgroups collection has become a ...
Offsite
—
The Technion Repository of Text Categorization Datasets provides a large number of diverse test collections for use in text categorization research.
Offsite
—
The Standard Occupational Classification (SOC) system is used by Federal statistical agencies to classify workers into occupational categories for the purpose of collecting, calculating, or disseminating data. All workers are classified into one of over 820 occupations according to their occupational definition. Additional facts from data.gov Dataset Summary Date ...
Offsite
—
Provides information on the classification of organisations and institutions in the National Accounts. Source agency: Office for National Statistics Designation: National Statistics Language: English Alternative title: Sector Classification Statistical classification of Northern Rock plc Financial support for the banking industry: classification issues Classification of ...
No Data
—
Mobile User Short Message Data, comes from One Mobile Operator in China. Data mainly includes formal short message and spam message. There are 170229 records of spams and 33588 records of formal messages.
No Data
—
A 685.00KB dataset from data.gov.au. This information shows those areas with rising, flat or falling watertable trends based on the minimum (or best case) trend derived from the bore hydrograph analysis. No information has been compiled in defined ...
No Data
—
A 1.05MB dataset from data.gov.au. This information shows those areas with rising, flat or falling watertable trends based on the maximum (or worst case) trend derived from the bore hydrograph analysis. No information has been compiled in defined ...