Document Metadata Based on a Sample of Web Documents from the Open Directory
Overview
DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research.
Michael G. Noll
Application Gallery
Do you have an application, visualization or otherwise great use of this data?
Submit it now, and be featured here!
Visit Source
Infochimps Platform
Use this data on the Infochimps Big Data Platform to unlock:
- Advanced analytical capabilities
- Hosting for customer databases
- Access to tools such as Hadoop, Pig, and R
- …and more to come!
Learn More »
Tags
Categories
Stats
| Sources: | ||
|---|---|---|
| Added by: | Infochimps | |
| Collection: | Pete Skomoroch's Bookmarks | |
| Link: | http://www.michael-noll.com/wiki/DMOZ100k06 | |
| Created: | about 3 years ago | |
| Updated: | 4 months ago | |
Share
