Judge a NYT bestseller by its cover

2 min readAug 26, 2017

The cover of a book makes an important first impression. I wonder which colors tend to make the covers of NYT bestsellers.

As a proof of concept, I collected the cover images of any book that made any of the NYT bestseller list in the past 2 months, and plotted in three primary colors used in each of the covers with rPlotter package.

The underlying concept of color extraction is strikingly simple: using K-means clustering to group RGB of each pixel and extract cluster average. I used 3 clusters here as people tend to a few main hues and their neighboring/similar colors rather than 5 or 6 distinct hues.

A few possible next steps:

One can extract more book cover images, and do a cross-genre (fiction vs non-fiction, adult vs kid etc) or cross-country (Amazon best-seller by country) comparison
the rPlotter package only return colors but not proportion of colors used in the image. Other APIs can return proportion of colors too.
One can convert the resultant main color into RGB and cluster them again to observe which hues make more bestseller and in which genre.

This is #day52 of my #100dayprojects on data science and visual storytelling. Full code on my github. Thanks for reading. Suggestions of new topics and feedbacks are always welcomed.

Judge a NYT bestseller by its cover

Written by Hannah Yan Han