Natural Image Classification using ResNet9 Model

Published in

The Startup

7 min readJun 29, 2020

In this post, we are going to try and classify Images from the intel-image-classification data set ( a kaggle data set)using a ResNet9 Model (using pytorch).

This data set have around 25k images of 6 Natural Scenes around the world. It consists of images belonging to six different classes including building, glacier, mountain etc. Each image is a color image with dimensions with 150*150 pixels. The entire data set is divided as :

Training Data set — Consisting of 14034 images. This is used to to train the model i.e. compute the loss and adjust the weights of the model using gradient descent.
Validation set — Consisting of 3000 images. This is used to evaluate the model while training, adjust hyper-parameters (learning rate etc.) and pick the best version of the model.
Test Data set — Consisting of 7301 images. This is used to compare different models, or different types of modeling approaches, and report the final accuracy of the model.

All images come labelled with their proper class except test data set. Let us look the code.

A Look at intel-image-classification with PyTorch

First, import the required libraries

Importing necessary libraries

Let’s look what the data set consist:

Now, we create train data set, validation data set and test data set before exploring the data set.

Preparing Datasets and Data loaders

Data Preprocessing

Before we create our datasets, we have do Data Augmentation which is a technique that can be used to artificially expand the size of a training data set by creating modified versions of images in the data set. In order to improve the performance and ability of the model to generalize.

We can do this by resizing, shifting, flipping, cropping, zoom-in or zoom-out a images and many more…

Now, let’s see the three datasets and explore these:

Each element from the training data set is a tuple, containing a image tensor and a label. Since the data consists of 150 x 150 px color images with 3 channels (RGB). So, each image tensor has the shape (3, 150, 150) :

The list of classes is stored in the .classes property of the data set. The numeric label for each element corresponds to index of the element’s label in the list of classes.

This data set consists of 3-channel color images (RGB). We can view the image using matplotlib, but we need to change the tensor dimensions to (150,150,3) as matplotlib expects channels to be the last dimension of the image tensors (whereas in PyTorch they are the first dimension), so we’ll the .permute tensor method to shift channels to the last dimension. Let’s create a helper function to display an image and its label.

Now, look at some images of data set:

Now, we’ll create Data Loaders, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

Let’s look at the batches of images from the data set using the make_grid method from torchvision :

Using a GPU

It is advisable to use GPU instead of CPU when dealing with images dataset because CPUs are generalized for general purpose and GPUs are optimized for training artificial intelligence and deep learning models as they can process multiple computations simultaneously. They have a large number of cores, which allows for better computation of multiple parallel processes. Additionally, computations in deep learning need to handle huge amounts of data — this makes a GPU’s memory bandwidth most suitable.

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

Now, check the device we are working with…

And now, wrap up our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available).

Now, Define Our Model

In this model, unlike in traditional neural networks, each layer feeds into the next layer, we use a network with residual blocks, each layer feeds into the next layer and directly into the layers about 2–3 hops away, to avoid over-fitting (a situation when validation loss stop decreasing at a point and then keeps increasing while training loss still decreases ) .

Here is a very simply Residual block:

There we define our ImageClassificationBase class whose functions are:

To figure out how “wrong” the model is going after training or validation step, other than just an accuracy metric that is likely not going to be differentiable (this would mean that the gradient can’t be determined, which is necessary for the model to improve during training)? A quick look at the PyTorch docs that yields the cost function: cross_entropy.
Because an accuracy metric can’t be used while training the model, doesn’t mean it shouldn’t be implemented! Accuracy in this case would be measured by a threshold, and counted if the difference between the model’s prediction and the actual label is lower than that threshold.
We want to track the validation losses/accuracies and train losses after each epoch, and every time we do so we have to make sure the gradient is not being tracked.
We also want to print validation losses/accuracies, train losses and learning rate too because we are using learning rate scheduler (which will change the learning rate after every batch of training) after each epoch.

We also define an accuracy function which calculates the overall accuracy of the model on an entire batch of outputs, so that we can use it as a metric in fit_one_cycle.

We will use the ResNet9 architecture :

And also after each convolutional layer, we’ll add a batch normalization layer, which normalizes the outputs of the previous layer.

Now, we define a model object and transfer it into the device with which we are working ...

Now, Train Our Model

Before we train the model, Let’s define a utility functionan evaluate function, which will perform the validation phase, and a fit_one_cycle function which will perform the entire training process. In fit_one_cycle, we have use some techniques:

Learning rate scheduling: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we’ll use is called the “One Cycle Learning Rate Policy”, which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs.
Weight decay: We also use weight decay, which is a regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function.
Gradient clipping: Apart from the layer weights and outputs, it also helpful to limit the values of gradients to a small range to prevent undesirable changes in parameters due to large gradient values. This simple yet effective technique is called gradient clipping.

We'll also record the learning rate used for each batch.

Let’s check our validation loss and accuracy.

Since there are randomly initialized weights, that is why accuracy come to near 0.16 (that is 16% chance of getting the right answer or you can say model randomly chooses a class).

Now, declare some hyper parameters for the training of the model. We can change it if result is not satisfactory.

Let’s start training our model…

Now, see the result of training of the model using graphs…

Let’s plot the validation set accuracies to study how the model improves over time.

Let’s plot the training and validation losses to study the trend.

It’s clear from the trend that our model isn’t over fitting to the training data just yet. Finally, let’s visualize how the learning rate changed over time, batch-by-batch over all the epochs.

Predictions by our Model

Let’s predict some images. In this data set test_ds data set doesn’t have labels but images are pretty much clear that we can guess it by seeing, that our model predict it well or not.

We define a helper function is created to pass an image to the model and return the model prediction.

Let’s see some of the predictions…

Predictions by our model are pretty welll….

But here are some images which even we can too confused that images belong to which one of the class…

There two classes ‘building’ or ‘street’ are present in these image. Model can guess any of them, and even we too. I cannot say anything about these which one is right or wrong prediction, and labels for test_ds data set is also not given. This can be resolved by multi-label image classification problem, where each image can belong to several classes or take that data set having each data belong to any one of the given class and test data set also have the labels….

Conclusion

We were able to build a ResNet9 model using convolutional neural network that can recognize images with an accuracy of 91% using Pytorch. We have achieve this accuracy by pre-processing the images to make the model more generic, split the data set into a number of batches and finally build and train the model.