Source
DMOZ100k06 - Michael G. Noll
1 dataset
http://www.michael-noll.com/wiki/DMOZ100k06
Please add a description to this source
-
Document Metadata Based on a Sample of Web Documents from the Open Directory
Offsite — DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research. Michael G. Noll