Collected by Cai-Nicolas Ziegler in a 4-week crawl from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems.
Feedback type: Explicit + Implicit
Rating scale: 1 to 10
Dataset Link
http://www2.informatik.uni-freiburg.de/~cziegler/BX/
Date Range
August 2004 - September 2004
Data Size
Book ratings data: 30.7 MB
Book metadata: 77.8 MB
User data: 12.3 MB
Basic Statistics
No. of users: 278k
No. of books: 271k
No. of interactions: 1.1 million
The dataset contains six million ratings for ten thousand most popular books (with most ratings). There are also:
The ratings come from a site similar to goodreads.com, but with more permissive terms of use.
Feedback type: Explicit + Implicit (marked to read)
Rating scale: 1 to 5
Dataset Link
https://github.com/zygmuntz/goodbooks-10k
Date Range
NA
Data Size
Book ratings data: 69 MB
Book metadata: 3 MB
Basic Statistics
No. of users: 53.4k
No. of books: 10k
No. of interactions: 6 million
The datasets were collected in late 2017 from goodreads.com, by scraping users' public shelves, i.e. everyone can see it on the web without login. User IDs and review IDs are anonymized.
We collected three groups of datasets:
These datasets are collected for academic use only. Please do not redistribute them or use for commercial purposes.
Feedback type: Implicit
Dataset Link
https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home
Date Range
Late 2017
Data Size
Book interaction data: 4.1 GB
Book reviews data: 5 GB
Book metadata: 2GB
Basic Statistics
No. of users: 876k
No. of books: 2.36 million
No. of interactions: 229 million
No. of book reviews: 15 million