Jester Jokes and Joker Recommender System Ratings
Overview
Jester uses a collaborative filtering algorithm called Eigentaste to recommend jokes to you based on your ratings of previous jokes
Three datasets: Dataset 1 contains over 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes, that’s right…jokes, from 73,421 users collected between April 1999 – May 2003, Dataset 2 contains Over 1.7 million continuous ratings (-10.00 to +10.00) of 150 jokes from 63,974 users collected between November 2006 – May 2009, and the third dataset includes the jokes themselves.
Format
Data files are in .zip format, when unzipped, they are in Excel (.xls) format. Ratings are real values ranging from -10.00 to +10.00 (the value “99” corresponds to “null” = “not rated”). One row per user. The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 – 100. The sub-matrix including only columns {5, 7, 8, 13, 15, 16, 17, 18, 19, 20} is dense. Almost all users have rated those jokes (see discussion of “universal queries” in the above paper).
License
Freely available for research use when acknowledged with the following reference: Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.
Application Gallery
Do you have an application, visualization or otherwise great use of this data?
Submit it now, and be featured here!
Visit Source
Infochimps Platform
Use this data on the Infochimps Big Data Platform to unlock:
- Advanced analytical capabilities
- Hosting for customer databases
- Access to tools such as Hadoop, Pig, and R
- …and more to come!
Tags
Stats
| Sources: | ||
|---|---|---|
| Added by: | Infochimps | |
| Collection: | Pete Skomoroch's Bookmarks | |
| Link: | http://eigentaste.berkeley.edu/dataset/ | |
| Created: | about 3 years ago | |
| Updated: | 12 months ago | |
Share
