Judging a Book by Its Cover — the Deep Learning Way

Making a SOTA image classifier using FastAI

Aditya Chakraborty
The Startup
9 min readJul 5, 2020

--

They say “don’t judge a book by its cover”. Well, we are going to break that rule here. In this article, I have explained an end to end project which is based on judging genres of books based on their cover pages. For convenience purposes, I have decided to classify them into five genres -

  • Children
  • Sci-fi
  • Horror
  • Romance
  • Political

The full code is available on my public GitHub:

Additionally, test your own book cover images using this simple to use web app that I created for this project:

Data collection

For this project, I have scraped images of books’ cover pages from Google Images using the following JS code:

Steps to do this :

  1. Go to Google Images and search for the kind of images you want.
  2. After the page opens, right click and go to the ‘Inspect’ option that is provided in Google Chrome.
  3. In the console section, type in the above JS code.

This downloads the image urls in a .csv file to the default path in your system. In this way, we download five .csv files for the five categories that we are going to predict. Then, I have uploaded all of the five files in My Drive account because it is easier to access files from Google Drive without worrying about any storage issues on the local system. Additionally, I have used Google Colab as my coding environment because it provides a very convenient way to access data from Google Drive, and also gives us the invaluable GPU support that is essential for such deep learning projects.

Note: This way of scraping data from Google Images is not very accurate as this includes a lot of noisy data as well as incorrect data which at times might not belong to the intended category at all, which means that there might be some data that is totally wrong in the training set itself. So, this affects the performance of the model highly and poses greater challenges in cleaning the data.

Importing packages and setting up our data

We start by importing all the necessary packages from fastai in addition to the standard data science packages such as numpy and pandas.

Package imports

Once we have imported all our necessary packages for the project, we set up our data that in a way that is easy to access. I have used Google Drive for this, while you can choose to do it any way you like.

Setting up our data

After we have successfully got our image data downloaded into their respective folders, we perform the train-validation split to all the images from all the folders. That is done as:

Train-validation split (80-20%)

ImageDataBunch.from_folder takes the images from the folders and assigns their folder names as their labels. These images with their corresponding labels are used for train-validation split. In addition to the train and validation split, we also perform some standard transformation and normalisations to our image data. Another important point to mention here is that we set our image size to 224x224 pixels, because CNNs only work with square images.

We can have a look at the train and validation sets in detail by printing out ‘data’.

Data

Clearly, training set has 1957 image items and validation set has 489 image items.

Data Exploration

Now that we have our image data the way we want them, let us get to exploring more about the data in detail.

We’ll start by looking at the data visually to get a feel of how our images look. For that, we say:

Code for displaying images

Output :

image data

As we can see, the images have their corresponding labels written above them. This data includes images from training set as well as the validation set.

Then, we can have a look at our prediction classes once again and length of train and validation sets.

more details of classes and datasets

Output:

After having explored our dataset, it’s time for us to start building our model. We’ll start with ResNet34 and then eventually move on to deeper layered architectures.

Training: ResNet34

The convolutional neural network model is saved in the variable called ‘learn’ in this case. This model is actually a ResNet34 model which is a pre-trained model 34 layers deep. To gain more understanding of this model, we can print out ‘learn’.

Before starting our training process, we need to find a suitable learning rate. And to do so, we call lr_find() method.

Output:

lr finder
loss vs lr graph

For choosing a lr, we usually check for the point after which the graph starts going downwards drastically. Here, that point seems to be somewhere between 1e-04 and 1e-03. So lets take 1e-03 (i.e., 0.001). With lr=0.001, we’ll train our model.

Output:

training 10 epochs

It is clear that we are at an accuracy of 66% with a high train and validation loss. Now, we’ll unfreeze all the layers in the network and then find the best learning rate for our model.

Output :

lr finder
loss vs lr

In this case, we can see that we have a very different looking curve than what we had got above. So here, we choose a point on x-axis(which has lr) after which the curve drastically starts going up, or in other words, a point after which loss starts increasing exponentially. Looking at this graph, that point seems to be 1e-03. But I’m going to take a point lower than 1e-03, which is 1e-04, just due to a proven working strategy.

Output:

training 10 epochs

As evident, we have jumped from an accuracy of 66% in the last training loop to an accuracy of 70% here, with a significantly lower training loss and a very high validation loss this time around. This phenomenon is called overfitting (lower training loss than validation loss). This is a very common problem in ML and DL implementations. To solve this problem, we have to do some data cleaning.

Before going ahead, we save our model.

Data Cleaning

FastAI gives us a very convenient and hands-on way to clean our data through a built in widget application inside our notebook environment. The app allows us to clean the top losses found in our model and also update and delete images manually.

Output:

This combination of computer and human intelligence proves to be a great boost for our learning model. The cleaned data is saved in the provided path in the form of a .csv file called ‘cleaned.csv’. This file is then converted into a pandas dataframe so that we can work on it easily.

Converting .csv file to pandas df

Output:

First five rows of df

Now that we have our images in a dataframe, we can easily extract out the images into PyTorch tensors with train and validation split.

We can have a look at how many images got removed from what we had previously:

Output:

Before cleaning
After cleaning

Clearly, the training set went down to 1544 from 1957 while validation set size went down to 385 from 489 previously.

To confirm that our images are downloaded all right, we can have a look at them visually:

output:

sample batch

At this point, we can confirm that our data cleaning process is complete. Now, we freeze our model to avoid any further updates to the learnable parameters. After doing that, we again find a suitable lr by calling lr_find() and subsequently the recorder plot.

Output:

lr finder

Going with the same basis of intuition, let’s choose lr=1e-03 or lr=0.001 for the next training loop.

Output:

training 10 epochs

We can see that we have drastically jumped from 70% accuracy to almost 99% accuracy just by performing data cleaning. Our training and validation losses are also in control and there is no problem of overfitting anymore. So this has pretty much given us a state of the art result by now. Hence, we save our ResNet34 model here for the last time.

Evaluation

The best way to evaluate a classifier model is to build a confusion matrix.

output:

We can print out the mis-classified classes separately for better insights:

output:

most confused

From the above results, we can see that our model mostly mis-classified images between sci-fi and horror classes. This is understandable because many a time the cover page images of both these genres look very similar and ambiguous even to a human eye to discriminate. So we can spare our model for doing that.

We can also see the top losses visually as:

output:

top losses

Predicting on test data

It is important to check how our model is performing on unseen data, which is why below are some predictions on unseen data or test data.

output:

random testing image

We can say that this is a sci-fi novel by looking at the image. Lets see if our model can predict the same.

output:

model prediction

So our model predicted it correctly ! This of course wont always be the case as we know that our model isn’t a 100% correct but 98% correct. You can experiment with various test images in a similar fashion as shown above to see where the model goes wrong.

Conclusion

  • We started with scraping our own data from Google Images by typing in a JS code snippet.
  • Our data collection technique was not the best way to collect data. The original data had a lot of cleaning to do, and even after much cleaning, there still was a lot of unwanted data in the form of data that was already mis-classified, or data that was blurred and noisy, or data that was very ambiguous, etc. If we had a better source to collect the same data, our model would have probably produced even better results.
  • Going forward, one can try using other pre-trained models such as wide resnets, effiencient resnet, etc. to see how they would perform on such a dataset.
  • Although this wasn’t the ideal data that we wanted, but FastAI implementation still gave us state of the art performance which is hard to produce with other frameworks.

References

--

--

Aditya Chakraborty
The Startup

Aspiring Data Scientist | Neural Networks enthusiast