What Makes Books “Goodreads”

An Analysis of Goodreads’ Book Data

Temweka Chirwa
3 min readSep 13, 2020
Courtesy of the New York Post

I like books and I like data. Sooo…. the obvious step was to see if I could draw useful information from book data.

Firstly, I had to ask myself a few questions about what I hoped to get out of this investigation:

  1. What does the average book look like?
  2. Does the number of people who have read a book correlate to the rating of said book? ie does the popularity of a book suggest its “goodness”
  3. Which genres perform better?

The dataset was obtained from Kaggle. It contained 52128 rated books in various languages and 31 features, some key features included rating_count, review_count, average_rating, number_of_pages.

The Most Average Book

TL;DR: The average book would be a stand-alone 330-page fantasy novel, that would be read by less than 20000 people and wouldn’t be nominated for any awards.

Highly Rated Books

Though the average livre has an average rating of 4 out of 5. There are a few livres out there with a perfect 5 out of 5. But is having a perfect but all it’s cracked up to be?

846 books have the elusive perfect score. However, on average these books have less than 4 ratings, with the highest number of ratings being 375. This suggests that these books aren’t particularly popular, with a readership far below the average.

The five-star book with the most ratings

The number of ratings and reviews has close to zero correlation with the average rating of a book. So people will rate and comment on books regardless of if they like it or hate it. I’m assuming that the thought is that if a book is absolutely amazing or particularly vile, the book world needs to know about it, pronto!

Genre

So far, I’ve been looking at things in terms of general averages. But books in different categories may be structured differently and will fare accordingly. I used the genre classification with the most votes for each book. There are 197 genres listed as a result.

The most common being Fantasy and Fiction. The highest-rated genres are “Computers” and “Biblical Fiction”, while the most popular genres, using the average count of ratings as a proxy, are “Classics” and “Young Adult”.

In Conclusion…

So a slightly better book than the most average book would be a 387-page fantasy series, with 26500 readers and 25% chance of being nominated for some prize. Average rating = 4.04!

To all aspiring writers, I wish you well. May your works be considered more than just average.

For more information, the project is available on Github.

--

--