Using ResNet for image classification.

This project was made as part of Deep Learning with PyTorch: Zero to GANs course.

Kenny Jaimes
5 min readJul 1, 2020

There are many ways of tackling an image classification problem using ML, even for simple problems, the current ML landscape presents us many options from simple linear Neural Networks to bigger more complex architectures. On this project, I used the Intel image classification dataset hosted on Kaggle, this dataset was initially created by Intel for an image classification challenge. The dataset contains about 25k images divided on 14k for training, 3k for testing (validation), and 7k used in the original challenge for the participants to make their predictions.

Using this Dataset, I’m going to present results of Residual neural networks (ResNet) used for Image classification to test the accuracy they present for these images, first creating it piece by piece and then importing and adapting a pre-trained ResNet, all using PyTorch as framework.

The problem

The idea on this project is to test performance of ResNets on a image classification, and see why are they so famous. For this dataset images are divided in 6 classes, each representing a “Scene”. Here are some examples:

Building and forest.
Glacier and mountain.
Sea and street.

Images are 150x150 pixels (although there seems to be some exceptions), which means that for our network they are a [3x150x150] input data. Inspecting the datasets, we have:

For the training data, we apply some transformations. This allows us to augment data which means that our model would be able to generalize better and prevent overfitting. The transformations are:

Brightness alteration: image brightness goes up or down by a random value.
Horizontal flip: there’s a 50% chance the image will be flipped.
Rotation: the image can rotate to either side by a 10° angle.

Some examples of these transformations:

Creating the models.

As mentioned, on this project we’re goin to use Residual networks. But we’re goin to take two approaches. First, we’re building a small ResNet by small blocks and then compare it to a bigger, pretrained one.

ResNet9

For our implementation, the ResNet is a series of convolutional blocks that encapsulate: a convolutional layer, normalization of the data, a nonlinear activation function (RELU) and in some steps a max pooling layer.

On this specific implementation the architecture it’s as follows:

  1. Convolutional Block (CL-Norm-ReLu), output shape: 32x150x150
  2. Convolutional Block (CL-Norm-ReLu-MaxPool4), output shape: 64x37x37
  3. Residual Block (ConvB-ConvB), output shape: 64x37x37
  4. Convolutional Block (CL-Norm-ReLu-MaxPool4), output shape: 128x9x9
  5. Convolutional Block (CL-Norm-ReLu-MaxPool4), output shape: 256x2x2
  6. Residual Block (ConvB-ConvB), output shape: 256x2x2
  7. Final block (MaxPool2-Lineal Layer), output shape: 6

Remember that the input its a 3x150x150 element and after going through we get 6 elements that represent the probability of the image being of each class predicted by the model.

Now, because residual block is actually a couple of convolutional blocks, this means this network has 9 “blocks”, that’s why it is called ResNet9.

ResNet34

This architecture it’s basically a bigger version of the implemented ResNet, which allows for a deeper and potentially more powerful model. At the same time this architecture already exists on Pythorch Libraries pretrained, which means it already knows how to work images (even more complex than the test done on this project), so we’re going to take advantage of that training and adapt it to this specific problem.

To do that, we replace its final layer with a simple linear layer that outputs the amount of classes required by our project. Retraining this layer quickly allows us to adapt it. To be able to train the last layer without touching the whole network, freeze and unfreeze methods are created which allow just that.

Training the models

To make comparison easier, hyperparameters are set the same for both models. For this, let’s define the training loop.

Parameters used for ResNet9:

  • Epochs: 10. In the case of ResNet34, 5 for the last layer alone and 5 for the whole model.
  • Maximum learning rate: 0.001.
  • Weight decay: 0.0001.
  • Gradient clipping: 0.1.
  • Optimization Function: Adam.

Parameters used for ResNet34:

  • Epochs: 5 for the last layer alone and 5 for the whole model.
  • Maximum learning rate: 0.00005.
  • Weight decay: 0.0001.
  • Gradient clipping: 0.1.
  • Optimization Function: Adam.

Results

ResNet9

Training
Accuracy
Loss
Example prediction

ResNet34

Only final layer training
Full model training
Accuracy
Loss
Example prediction

Conclusions

Both models achieved a relatively high performance. Resnet34 clearly shows that it already knows how to classify images, giving an 87% score on the test dataset after the first epoch, more that time it would require careful hyperparameter testing and selection to improve its impressive performance. But our smaller model also shows that this type of architecture it’s just powerful enough for this “simple” classification problem, rapidly achieving high scores.

For the complete project:

--

--