Jester Jokes and Joker Recommender System Ratings

Added By Infochimps

Jester uses a collaborative filtering algorithm called Eigentaste to recommend jokes to you based on your ratings of previous jokes

Three datasets: Dataset 1 contains over 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes, that’s right…jokes, from 73,421 users collected between April 1999 – May 2003, Dataset 2 contains Over 1.7 million continuous ratings (-10.00 to +10.00) of 150 jokes from 63,974 users collected between November 2006 – May 2009, and the third dataset includes the jokes themselves.


Data files are in .zip format, when unzipped, they are in Excel (.xls) format. Ratings are real values ranging from -10.00 to +10.00 (the value “99” corresponds to “null” = “not rated”). One row per user. The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 – 100. The sub-matrix including only columns {5, 7, 8, 13, 15, 16, 17, 18, 19, 20} is dense. Almost all users have rated those jokes (see discussion of “universal queries” in the above paper).


Freely available for research use when acknowledged with the following reference: Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.