Image Classification Model

Using Transfer Learning — ResNet34

Published in

Analytics Vidhya

7 min readJul 1, 2020

In this blog, we will be using transfer learning technique to build a CNN model for Intel Image Classification. This will blog will be a walkthrough to this problem. We will be linking the related articles about transfer learning and convolutional neural network.

This Dataset contains around 25k images of size 150x150 distributed under 6 categories.

{‘buildings’ -> 0,
‘forest’ -> 1,
‘glacier’ -> 2,
‘mountain’ -> 3,
‘sea’ -> 4,
‘street’ -> 5 }

There are around 14k images in Train, 3k in Test and 7k in Prediction.
This dataset was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge. Let’s start.

Importing the required modules

Preparing the Data

Preproccessing of Dataset

Let’s look at a sample element from the training dataset. Each element is a tuple, containing a image tensor and a label. Since the data consists of 150x150 px color images with 3 channels (RGB), each image tensor has the shape (3,150,150).

We can view the image using matplotlib, but we need to change the tensor dimensions to (150,150,3). Let's create a helper function to display an image and its label.

We can now create a validation dataset (2000 images) using the training set. We’ll use the random_split helper method from PyTorch to do this. To ensure that we always create the same validation set, we'll also set a seed for the random number generator.

We can now create data loaders for training and validation, to load the data in batches.

We can look at batches of images from the dataset using the make_grid method from torchvision . Each time the following code is run, we get a different batch, since the sampler shuffles the indices before creating batches.

Let’s define the model by extending an ImageClassificationBase class which contains helper methods for training & validation.

Using a GPU

As the sizes of our models and datasets increase, we need to use GPUs to train our models within a reasonable amount of time. GPUs contain hundreds of cores that are optimized for performing expensive matrix operations on floating point numbers in a short time, which makes them ideal for training deep neural networks with many layers.

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device and to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

Based on where you’re running this notebook, your default device could be a CPU (torch.device(‘cpu’)) or a GPU (torch.device(‘cuda’))

We can wrap our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available), and use to_device to move our model to the GPU (if available).

Training the model

Before we train the model, we’re going to make a bunch of small but important improvements to our fit function:

Learning rate scheduling: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we’ll use is called the “One Cycle Learning Rate Policy”, which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs. Learn more: https://sgugger.github.io/the-1cycle-policy.html
Weight decay: We also use weight decay, which is yet another regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function.Learn more: https://towardsdatascience.com/this-thing-called-weight-decay-a7cd4bcfccab
Gradient clipping: Apart from the layer weights and outputs, it also helpful to limit the values of gradients to a small range to prevent undesirable changes in parameters due to large gradient values. This simple yet effective technique is called gradient clipping. Learn more: https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48

Let’s define a fit_one_cycle function to incorporate these changes. We'll also record the learning rate used for each batch.

We’re now ready to train our model. Instead of SGD (stochastic gradient descent), we’ll use the Adam optimizer which uses techniques like momentum and adaptive learning rates for faster training.You can learn more about optimizers here: https://ruder.io/optimizing-gradient-descent/index.html

The initial accuracy is around 17%, which is what one might expect from a randomly initialized model.

We’ll use different hyper-parameters (learning rate, no. of epochs, batch_size etc.) to train our model.

Let’s plot the validation set accuracies to study how the model improves over time.

We can also plot the training and validation losses to study the trend.

It’s clear from the trend that our model isn’t overfitting to the training data just yet. Finally, let’s visualize how the learning rate changed over time, batch-by-batch over all the epochs.

Testing with individual images

While we have been tracking the overall accuracy of a model so far, it’s also a good idea to look at model’s results on some sample images. Let’s test out our model with some images from the predefined test dataset.

Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hypeparameters.

We expect these values to be similar to those for the validation set. If not, we might need a better validation set that has similar data and distribution as the test set (which often comes from real world data).

We are getting pretty good validation accuracy and our model is predicting the images correctly from the prediction folder. Try different transfer learning approaches and hyper parameters to get even better model.

Thanks for reading and see you on the next one!

Leave a comment if you need the link to notebook of this blog.

For further reading for the blog ,

What is Transform and Transform Normalize? (Lesson 4 — Neural Networks in PyTorch)

This part of Lesson 4 teaches us how to train a neural networks to recognise handwritten digits! How cool is that. May…

medium.com

Residual blocks — Building blocks of ResNet

Understanding a residual block is quite easy. In traditional neural networks, each layer feeds into the next layer. In…

towardsdatascience.com

Intuitively Understanding Convolutions for Deep Learning

Exploring the strong visual hierarchies that makes them work