Judging a Book by Its Cover — the Deep Learning Way

Making a SOTA image classifier using FastAI

Published in

The Startup

9 min readJul 5, 2020

They say “don’t judge a book by its cover”. Well, we are going to break that rule here. In this article, I have explained an end to end project which is based on judging genres of books based on their cover pages. For convenience purposes, I have decided to classify them into five genres -

Children
Sci-fi
Horror
Romance
Political

The full code is available on my public GitHub:

adityarc19/Book-Genre-classifier

Test on your own book covers using the web app . For a detailed explaination of the project, visit my blog page ! This…

github.com

Additionally, test your own book cover images using this simple to use web app that I created for this project:

Book Genre Detector

book-genre-detector.onrender.com

Data collection

For this project, I have scraped images of books’ cover pages from Google Images using the following JS code:

urls = Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl'));
 
window.open('data:text/csv;charset=utf-8,'+escape(urls.join('\n')));

Steps to do this :

Go to Google Images and search for the kind of images you want.
After the page opens, right click and go to the ‘Inspect’ option that is provided in Google Chrome.
In the console section, type in the above JS code.

This downloads the image urls in a .csv file to the default path in your system. In this way, we download five .csv files for the five categories that we are going to predict. Then, I have uploaded all of the five files in My Drive account because it is easier to access files from Google Drive without worrying about any storage issues on the local system. Additionally, I have used Google Colab as my coding environment because it provides a very convenient way to access data from Google Drive, and also gives us the invaluable GPU support that is essential for such deep learning projects.

Note: This way of scraping data from Google Images is not very accurate as this includes a lot of noisy data as well as incorrect data which at times might not belong to the intended category at all, which means that there might be some data that is totally wrong in the training set itself. So, this affects the performance of the model highly and poses greater challenges in cleaning the data.

Importing packages and setting up our data

We start by importing all the necessary packages from fastai in addition to the standard data science packages such as numpy and pandas.

Package imports

Once we have imported all our necessary packages for the project, we set up our data that in a way that is easy to access. I have used Google Drive for this, while you can choose to do it any way you like.

Setting up our data

After we have successfully got our image data downloaded into their respective folders, we perform the train-validation split to all the images from all the folders. That is done as:

Train-validation split (80-20%)

ImageDataBunch.from_folder takes the images from the folders and assigns their folder names as their labels. These images with their corresponding labels are used for train-validation split. In addition to the train and validation split, we also perform some standard transformation and normalisations to our image data. Another important point to mention here is that we set our image size to 224x224 pixels, because CNNs only work with square images.

We can have a look at the train and validation sets in detail by printing out ‘data’.

Clearly, training set has 1957 image items and validation set has 489 image items.

Data Exploration

Now that we have our image data the way we want them, let us get to exploring more about the data in detail.

We’ll start by looking at the data visually to get a feel of how our images look. For that, we say:

Code for displaying images

Output :

As we can see, the images have their corresponding labels written above them. This data includes images from training set as well as the validation set.

Then, we can have a look at our prediction classes once again and length of train and validation sets.

more details of classes and datasets

Output:

After having explored our dataset, it’s time for us to start building our model. We’ll start with ResNet34 and then eventually move on to deeper layered architectures.

Training: ResNet34

The convolutional neural network model is saved in the variable called ‘learn’ in this case. This model is actually a ResNet34 model which is a pre-trained model 34 layers deep. To gain more understanding of this model, we can print out ‘learn’.

Before starting our training process, we need to find a suitable learning rate. And to do so, we call lr_find() method.

Output:

For choosing a lr, we usually check for the point after which the graph starts going downwards drastically. Here, that point seems to be somewhere between 1e-04 and 1e-03. So lets take 1e-03 (i.e., 0.001). With lr=0.001, we’ll train our model.

Output:

It is clear that we are at an accuracy of 66% with a high train and validation loss. Now, we’ll unfreeze all the layers in the network and then find the best learning rate for our model.

Output :

In this case, we can see that we have a very different looking curve than what we had got above. So here, we choose a point on x-axis(which has lr) after which the curve drastically starts going up, or in other words, a point after which loss starts increasing exponentially. Looking at this graph, that point seems to be 1e-03. But I’m going to take a point lower than 1e-03, which is 1e-04, just due to a proven working strategy.

Output:

As evident, we have jumped from an accuracy of 66% in the last training loop to an accuracy of 70% here, with a significantly lower training loss and a very high validation loss this time around. This phenomenon is called overfitting (lower training loss than validation loss). This is a very common problem in ML and DL implementations. To solve this problem, we have to do some data cleaning.

Before going ahead, we save our model.

Data Cleaning

FastAI gives us a very convenient and hands-on way to clean our data through a built in widget application inside our notebook environment. The app allows us to clean the top losses found in our model and also update and delete images manually.

Output:

This combination of computer and human intelligence proves to be a great boost for our learning model. The cleaned data is saved in the provided path in the form of a .csv file called ‘cleaned.csv’. This file is then converted into a pandas dataframe so that we can work on it easily.

Converting .csv file to pandas df

Output:

Now that we have our images in a dataframe, we can easily extract out the images into PyTorch tensors with train and validation split.

We can have a look at how many images got removed from what we had previously:

Output:

Before cleaning

After cleaning

Clearly, the training set went down to 1544 from 1957 while validation set size went down to 385 from 489 previously.

To confirm that our images are downloaded all right, we can have a look at them visually:

output:

At this point, we can confirm that our data cleaning process is complete. Now, we freeze our model to avoid any further updates to the learnable parameters. After doing that, we again find a suitable lr by calling lr_find() and subsequently the recorder plot.

Output:

Going with the same basis of intuition, let’s choose lr=1e-03 or lr=0.001 for the next training loop.

Output:

We can see that we have drastically jumped from 70% accuracy to almost 99% accuracy just by performing data cleaning. Our training and validation losses are also in control and there is no problem of overfitting anymore. So this has pretty much given us a state of the art result by now. Hence, we save our ResNet34 model here for the last time.

Evaluation

The best way to evaluate a classifier model is to build a confusion matrix.

output:

We can print out the mis-classified classes separately for better insights:

output:

most confused

From the above results, we can see that our model mostly mis-classified images between sci-fi and horror classes. This is understandable because many a time the cover page images of both these genres look very similar and ambiguous even to a human eye to discriminate. So we can spare our model for doing that.

We can also see the top losses visually as:

output:

Predicting on test data

It is important to check how our model is performing on unseen data, which is why below are some predictions on unseen data or test data.

output:

We can say that this is a sci-fi novel by looking at the image. Lets see if our model can predict the same.

output:

model prediction

So our model predicted it correctly ! This of course wont always be the case as we know that our model isn’t a 100% correct but 98% correct. You can experiment with various test images in a similar fashion as shown above to see where the model goes wrong.

Conclusion

We started with scraping our own data from Google Images by typing in a JS code snippet.
Our data collection technique was not the best way to collect data. The original data had a lot of cleaning to do, and even after much cleaning, there still was a lot of unwanted data in the form of data that was already mis-classified, or data that was blurred and noisy, or data that was very ambiguous, etc. If we had a better source to collect the same data, our model would have probably produced even better results.
Going forward, one can try using other pre-trained models such as wide resnets, effiencient resnet, etc. to see how they would perform on such a dataset.
Although this wasn’t the ideal data that we wanted, but FastAI implementation still gave us state of the art performance which is hard to produce with other frameworks.

References

FastAI course github :https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson2-download.ipynb
Practical Deep Learning for Coders, v3 : https://course.fast.ai/

Judging a Book by Its Cover — the Deep Learning Way

Making a SOTA image classifier using FastAI

adityarc19/Book-Genre-classifier

Test on your own book covers using the web app . For a detailed explaination of the project, visit my blog page ! This…

Book Genre Detector

Data collection

Importing packages and setting up our data

Data Exploration

Training: ResNet34

Data Cleaning

Evaluation

Predicting on test data

Conclusion

References

Written by Aditya Chakraborty