How I achieved a 95.5% accuracy on a Kaggle Deep Learning competition

Published in

Analytics Vidhya

8 min readMay 5, 2020

It is a very weird time to be alive. Suddenly, you have so much time in your hands that you really do not know what to do with it. I am writing this blog cause I am bored of procrastinating. Never thought it would come to this but it has.
I have two tasks on my To-Do list which have been pending for quite some time now. First being, participate in an online ML competition and second being, writing a tech blog. At the onset, I would like to thank Shia LaBeouf for his motivational pep talk. If you are lacking motivation, I suggest you check this video out. His wise words can motivate a rock to erode faster, just saying.

Cut to the chase, this blog is intended to provide some direction to newbies who have taken the ML courses by Andrew Ng or are familiar with the concepts of ML and Deep Learning but do not know where to start and how to write good code for ML. I have written this blog as a series of questions I had in mind and how I answered it when I came across the Kaggle challenge.

The objective for me was just to learn how to train models the right way and organize my code with the best coding practices and hopefully achieve a good accuracy in the process. I found an interesting challenge called the Plant Pathology Challenge hosted by Cornell University. The problem statement seemed fairly simple and a good challenge to begin with. The problem’s objectives as described in Kaggle read as follows:-

TL;DR: Given an image of a leaf, you have to diagnose the health of the plant. Classify it into one of 4 classes: healthy, multiple diseases, rust or scab.

“Objectives of ‘Plant Pathology Challenge’ are to train a model using images of training dataset to 1) Accurately classify a given image from testing dataset into different diseased category or a healthy leaf; 2) Accurately distinguish between many diseases, sometimes more than one on a single leaf; 3) Deal with rare classes and novel symptoms; 4) Address depth perception — angle, light, shade, physiological age of the leaf; and 5) Incorporate expert knowledge in identification, annotation, quantification, and guiding computer vision to search for relevant features during learning.”

What are the first thoughts I had after reading about the challenge?

The first thoughts I had was that I had to read on the state-of-the-art CNN models. At the moment, EfficientNet B7 has achived 84.4% top-1 accuracy on the ImageNet datasets and seems a pretty solid ConvNet to tackle the challenge.

How do I train a state-of-the-art ConvNet?
It is not advisable to train large ConvNets from scratch. The state-of-the-art ConvNets are large and they have a huge number of parameters(EfficientNet-B5 has 30 million parameters). It is better to load pretrained weights trained on the large datasets and then train specific to your task aka transfer learning.

Think of the pretrained weights like the foundations of a house. Training from pretrained weights is like constructing the rest of the house based upon that foundation. The pretrained weights are tuned to extract important features from the images. Thus when you train with your custom dataset, the model does not need to learn how to extract important features. Features could be as basic as vertical lines on an image to as complex as detecting a car on an image. What the model learns to extract is solely dependent on the dataset on which the network is pretrained. So a model trained on a larger dataset like the ImageNet is pretty solid!

How do I structure my code?

I have 2+ years of professional experience as a software developer. In these 2 years I have picked up some good practices for writing well structured code. Furthermore, I went through various jupyter notebooks published on Kaggle and multiple open source GitHub projects. Everybody has their own way to write code but there is a general trend amongst most developers. I follow a structure which makes sense to me. I have used this structure for developing the codebase for the Plant Pathology challenge. Hopefully it makes sense to you as well. :)

Here is the structure I follow:-

1. Put all your hyperparameters and constants in the beginning and in a single place

batch_size = 16
epoch = 50
model_name = 'efficientnet-b5'
image_size = EfficientNet.get_image_size(model_name)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

This is helpful because you can easily change your hyperparameters from one place and it is good to keep it in one place cause you will do whole of of hyperparameter tuning.

2. Create a custom class for your dataset and define functions that caters to your needs.

It is a very Pythonian way to go about doing stuff. Basically the idea is that you create a custom class that imports the data (say from a csv) and does all sorts of transformations and outputs the transformed data which can be fed into your model.

For example, I have defined a class Dataset that loads the train/cross validation/test data based on the parameters that I pass to it while creating its object.

Code for the custom class that fetches data — Code for class Dataset

Futhermore, I have defined two other functions __next__() and __iter__() which make the objects of class Dataset iterable.

Finally, I added a parse_annotations() function which augments the images on the fly. Data Augmentation is necessary if you want to generalise and prevent your model to overfit on the training data. Check out this article for an overview on the various augmentation libraries.

3. Initialise your datasets

Note: I have split the training set 80–20 into training and cross validation set.

4. Define your loss function

The Plant Pathology challenge is a multi class classification problem. Cross Entropy Loss works best for such a scenario. Luckily, Pytorch has an implementation of the loss function. It is fairly easy to implement it.

5. Define your Neural Net

Normally this is where you define you custom Neural Network class. I have used an opensource Pytorch implementation of EfficientNet. I have customised the forward layers to suit my purpose.

6. Define your optimisation function

It is the de facto rule in modern neural networks to have learning rate decays. It helps both optimisation and generalisation. More on this can be found here.

You might have already guessed that I was using learning rate decay because I had not defined it in hyperparameter definitions. I have used a library that implements cosine learning rate decay.

Optimisation function

7. Define functions that train and cross validate your model

The tasks of the training function is as follows:-
a) It feeds the training data into the network
b) It computes the loss based on the output of the network and the target data.
c) It backprogates the loss and tunes the network by changing the weights.
d) It records the loss to be used later to make the Training and Cross Validation loss graph for understanding your model’s performance.

The task of cross validation function is as follows:-
a) It feeds the cross validation data(no data augmentation) into the network
b) It outputs the data and computes the loss
c) It does NOT train the network
d) It records the cross validation loss to make the Training and Cross Validation loss graph

8. Train your damn model!

Now, we finally come to the answering the question which is in the title. I initially trained a neural net I made myself. It was not fancy. It was like AlexNet but had smaller length,width and resolution. I trained first on my own Neural Net cause I wanted see if my code works as intended. I achieved an accuracy of 72% on the test dataset.
To further increase by accuracy, I trained on ConvNets. The first ConvNet I trained was DenseNet with a batch size of 16 and a learning rate of 0.001 for 30 epochs. I was able to get an accuracy of 94% on the cross validation set and 93% on test set.
I got a bump in accuracy with the state-of-the-art EfficientNet . I have mentioned below the list of networks I trained until I got the 95.5 % accuracy.

a) EfficientNet B3 , Batch size -16, Resolution-494, Epochs-50,Cosine Learning Rate Decay, Cross Validation Accuracy- 94.3%
b) EfficientNet B7, Batch Size-2, Resolution-494, Epochs-50,Cosine Learning Rate Decay, Cross Validation Accuracy- 93.44%
Batch Size is only 2 for B7 cause it has 66 million parameters and the 12 GB GPU got overwhelmed by the computation of batch size more than 2. :(
c) EfficientNet B5, Batch Size-4, Resolution-494, Epochs-50,Cosine Learning Rate Decay, Cross Validation Accuracy- 96.74%, Test Accuracy-95.5%

The leading contestant as of now has an accuracy of 98.7%. I am 3.2% shy which is still a lot! Hopefully, in the next challenge I get better results. Not bad for my first competition eh? :)

These are trying times and there is lot of uncertainty everywhere. Most of the things are beyond our control. What is in our control is how we manage our time and what we do with it. Do something that you enjoy. For me, it’s learning how to train networks better and getting more intuitive understanding of how neural nets work. You find your thing.

Ending this blog with a beautiful quote from the prophet and hero of our age Shia LaBeouf,

“Don’t let your dreams be dreams. JUST DO IT!”

https://giphy.com/gifs/just-do-it-24xRxrDCLxhT2

You can find my full code here.

Edit: I trained EfficientNet B5 with the whole training dataset. In other words, cross validation set was also now used in training. I trained the network for 50 epochs and got an accuracy of 97.5%.

How I achieved a 95.5% accuracy on a Kaggle Deep Learning competition

Written by Debayan Bhattacharya