Books-Crossing Dataset

Collected by Cai-Nicolas Ziegler in a 4-week crawl from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems.

Feedback type: Explicit + Implicit

Rating scale: 1 to 10

Dataset Link

Date Range

August 2004 - September 2004

Data Size

Book ratings data: 30.7 MB

Book metadata: 77.8 MB

User data:  12.3 MB

Basic Statistics

No. of users: 278k

No. of books: 271k

No. of interactions: 1.1 million

Goodbooks-10k Dataset

The dataset contains six million ratings for ten thousand most popular books (with most ratings). There are also:

  1. books marked to read by the users
  2. book metadata (author, year, etc.)
  3. tags/shelves/genres

The ratings come from a site similar to, but with more permissive terms of use.

Feedback type: Explicit + Implicit (marked to read)

Rating scale: 1 to 5

Dataset Link

Date Range


Data Size

Book ratings data: 69 MB

Book metadata: 3 MB

Basic Statistics

No. of users: 53.4k

No. of books: 10k

No. of interactions: 6 million

Goodread Dataset

The datasets were collected in late 2017 from, by scraping users' public shelves, i.e. everyone can see it on the web without login. User IDs and review IDs are anonymized.

We collected three groups of datasets:

  1. meta-data of the books,
  2. user-book interactions (users' public shelves)  
  3. users' detailed book reviews.

These datasets are collected for academic use only. Please do not redistribute them or use for commercial purposes.

Feedback type: Implicit

Dataset Link

Date Range

Late 2017

Data Size

Book interaction data: 4.1 GB

Book reviews data: 5 GB

Book metadata: 2GB

Basic Statistics

No. of users: 876k

No. of books: 2.36 million

No. of interactions: 229 million

No. of book reviews: 15 million

See instant AI recommendation results with caboom
Start with your data or a sample for instant results right away.
Request Access