CiteULike datasets
Overview
- Description
From the [data page](http://www.citeulike.org/faq/data.adp):
Who-posted-what data
The latest data snapshot can always be downloaded at
http://static.citeulike.org/data/current.bz2Older datasets are available on a daily basis and can be found
at URLs of the form http://static.citeulike.org/data/2007-05-30.bz2Data is available from 2007-05-30 onwards.
The file constitutes an anonymous dump of who posted
what and when the posting took place. There is no
data in this file which is not already available publicly through the
web site, so there are no privacy implications for making it
available. The advantage is that it’s available in one file rather
than having to spider the entire site to get at the information
(please don’t do that!).The file is a simple unix (“\n” line endings) text file with pipe (“|”)
delimiters. The columns are:
- The CiteULike article id which was posted
- An obfuscated representation of the username (a salted MD5 hash of
the true username). Again, it is possible to piece back together what
the true username is by scraping the site, but I’d rather you didn’t
do that. The reason I’ve gone to the trouble of obfuscation is
primarily a slightly paranoid anti-spam measure- The date and time the article was posted to the site
- The tag the user used to post it
NB Note that if a user posts an article with n
tags, then this will result in n rows in the fileArticle linkout data
Mapping CiteULike article_ids to resources on the web can be done
with the linkout table. The current snapshot is available at http://static.citeulike.org/data/linkouts.bz2
- Openness: OPEN (?)
- License: no license specified but manner in which it is made available suggests it is open.
- Access: good.
- bulk: yes.
Application Gallery
Do you have an application, visualization or otherwise great use of this data?
Submit it now, and be featured here!
Visit Source
Infochimps Platform
Use this data on the Infochimps Big Data Platform to unlock:
- Advanced analytical capabilities
- Hosting for customer databases
- Access to tools such as Hadoop, Pig, and R
- …and more to come!
Tags
Categories
Stats
| Sources: | ||
|---|---|---|
| Added by: | Infochimps | |
| Collection: | The Comprehensive Knowledge Archive Network (CKAN) Collection | |
| Link: | http://static.citeulike.org/data/current.bz2 | |
| Created: | about 3 years ago | |
| Updated: | about 1 year ago | |
Share
