Image Classification Model

Using Transfer Learning — ResNet34

Nikhil R Nath
Analytics Vidhya
7 min readJul 1, 2020

--

In this blog, we will be using transfer learning technique to build a CNN model for Intel Image Classification. This will blog will be a walkthrough to this problem. We will be linking the related articles about transfer learning and convolutional neural network.

Source

This Dataset contains around 25k images of size 150x150 distributed under 6 categories.

{‘buildings’ -> 0,
‘forest’ -> 1,
‘glacier’ -> 2,
‘mountain’ -> 3,
‘sea’ -> 4,
‘street’ -> 5 }

There are around 14k images in Train, 3k in Test and 7k in Prediction.
This dataset was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge. Let’s start.

Importing the required modules

Generated By Author

Preparing the Data

Generated By Author

Preproccessing of Dataset

Generated By Author

Let’s look at a sample element from the training dataset. Each element is a tuple, containing a image tensor and a label. Since the data consists of 150x150 px color images with 3 channels (RGB), each image tensor has the shape (3,150,150).

Generated By Author

We can view the image using matplotlib, but we need to change the tensor dimensions to (150,150,3). Let's create a helper function to display an image and its label.

Generated By Author

We can now create a validation dataset (2000 images) using the training set. We’ll use the random_split helper method from PyTorch to do this. To ensure that we always create the same validation set, we'll also set a seed for the random number generator.

Generated By Author

We can now create data loaders for training and validation, to load the data in batches.

Generated By Author

We can look at batches of images from the dataset using the make_grid method from torchvision . Each time the following code is run, we get a different batch, since the sampler shuffles the indices before creating batches.

Generated By Author

Let’s define the model by extending an ImageClassificationBase class which contains helper methods for training & validation.

Generated By Author
Generated By Author

Using a GPU

As the sizes of our models and datasets increase, we need to use GPUs to train our models within a reasonable amount of time. GPUs contain hundreds of cores that are optimized for performing expensive matrix operations on floating point numbers in a short time, which makes them ideal for training deep neural networks with many layers.

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device and to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

Generated By Author

Based on where you’re running this notebook, your default device could be a CPU (torch.device(‘cpu’)) or a GPU (torch.device(‘cuda’))

We can wrap our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available), and use to_device to move our model to the GPU (if available).

Generated By Author

Training the model

Before we train the model, we’re going to make a bunch of small but important improvements to our fit function:

  • Learning rate scheduling: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we’ll use is called the “One Cycle Learning Rate Policy”, which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs. Learn more: https://sgugger.github.io/the-1cycle-policy.html
  • Weight decay: We also use weight decay, which is yet another regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function.Learn more: https://towardsdatascience.com/this-thing-called-weight-decay-a7cd4bcfccab
  • Gradient clipping: Apart from the layer weights and outputs, it also helpful to limit the values of gradients to a small range to prevent undesirable changes in parameters due to large gradient values. This simple yet effective technique is called gradient clipping. Learn more: https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48

Let’s define a fit_one_cycle function to incorporate these changes. We'll also record the learning rate used for each batch.

Generated By Author
Generated By Author

We’re now ready to train our model. Instead of SGD (stochastic gradient descent), we’ll use the Adam optimizer which uses techniques like momentum and adaptive learning rates for faster training.You can learn more about optimizers here: https://ruder.io/optimizing-gradient-descent/index.html

The initial accuracy is around 17%, which is what one might expect from a randomly initialized model.

We’ll use different hyper-parameters (learning rate, no. of epochs, batch_size etc.) to train our model.

Generated By Author
Generated By Author
Generated By Author
Generated By Author

Let’s plot the validation set accuracies to study how the model improves over time.

Generated By Author

We can also plot the training and validation losses to study the trend.

Generated By Author
Generated By Author

It’s clear from the trend that our model isn’t overfitting to the training data just yet. Finally, let’s visualize how the learning rate changed over time, batch-by-batch over all the epochs.

Generated By Author
Generated By Author

Testing with individual images

While we have been tracking the overall accuracy of a model so far, it’s also a good idea to look at model’s results on some sample images. Let’s test out our model with some images from the predefined test dataset.

Generated By Author
Generated By Author

Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hypeparameters.

We expect these values to be similar to those for the validation set. If not, we might need a better validation set that has similar data and distribution as the test set (which often comes from real world data).

Generated By Author
Generated By Author
Generated By Author
Generated By Author

We are getting pretty good validation accuracy and our model is predicting the images correctly from the prediction folder. Try different transfer learning approaches and hyper parameters to get even better model.

Thanks for reading and see you on the next one!

Leave a comment if you need the link to notebook of this blog.

--

--