Dataset

Jester Jokes and Joker Recommender System Ratings

Added By Infochimps

Jester uses a collaborative filtering algorithm called Eigentaste to recommend jokes to you based on your ratings of previous jokes

Three datasets: Dataset 1 contains over 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes, that’s right…jokes, from 73,421 users collected between April 1999 – May 2003, Dataset 2 contains Over 1.7 million continuous ratings (-10.00 to +10.00) of 150 jokes from 63,974 users collected between November 2006 – May 2009, and the third dataset includes the jokes themselves.

Format

Data files are in .zip format, when unzipped, they are in Excel (.xls) format. Ratings are real values ranging from -10.00 to +10.00 (the value “99” corresponds to “null” = “not rated”). One row per user. The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 – 100. The sub-matrix including only columns {5, 7, 8, 13, 15, 16, 17, 18, 19, 20} is dense. Almost all users have rated those jokes (see discussion of “universal queries” in the above paper).

License

Freely available for research use when acknowledged with the following reference: Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.