Analytics Vidhya
Published in

Analytics Vidhya

ResNet for Image Classification.

Prediction Digits (Source)

ResNets or Residual Networks was introduced by Kaiming He, Xiangyu Zhang, Shaoqin Ren, Jian Sun of the Microsoft Research team (Link to the paper). It solved the degradation problem when neural networks are too deep by introducing skip connections or shortcut connections.

The Degradation Problem — When a model gets deeper, after a certain point, the accuracy of the model starts decreasing. This happens because as the model becomes too deep, it becomes difficult for the layers to propogate information from shallow layers and the information is lost.

From the ResNet paper:

When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error.

The Solution — Using Skip Connections or Shortcuts. It uses Identity Networks to directly connect shallow layers with the deep layers. This allows the data to flow easily between two layers. If a layer is hampering the models performance, that layer will be skipped. Thus getting the name skip connections.

Residual Networks — y= f(x) + x.

Here +x is the skip connection. And f(x) denotes the residual mapping to be learned. f(x) + x is obtained by element wise addition of skip connection.

Now let us look at the ResNet architecture.

The first layer is a 7x7 kernel and then there are repeated blocks of 3x3 kernels with a skip connection after every block comprising of 2 3x3 convolution layers. Whenever the feature map is halved, the numbers of filters is doubled to preserve the time complexity per layer.

Now let us look at the part which makes Resnets different than the plain Networks — Skip Connections. These are identity shortcuts with kernel size of 1x1. If the input and output dimensions are same, the identity shortcut can be directly used. In the scenario that the input and output dimensions are different, we have 2 options-

  1. We can add zero padding to maintain the dimensions
  2. A projection shortcut is used to match the dimension.

Both are used with a stride of 2.

Now let us look at an example. You can find the complete code here for MNSIT dataset.


Let us create a class for our model. We will define each block seperately so that is it easier to understand and debug.

The first block — that is our self.master block is a 7x7 convolution layer. I have added a padding of 3 and a stride of 2 to maintain the dimensions of the image. We then add a non-linearity — ReLU and reduce the dimension of the image by 2 using maxpool according the the architecture in the paper.

Next we have 3 blocks of 2 convolution layers with 3x3 kernel size with BatchNorm and ReLU to stabilise the learning process. The self.downsample layers are our SKIP Connections. As you can see, these are 1x1 identity layers. We end with a linear layer. In forward, after the master layer, at each layer, we add the result of that layer and the result of the previous layer passing it through the identity skip connection. You can also use CONCAT instead of addition to add the layers.

Now that we have our model, you can go ahead and load and train the images. I have also added tensorboard logging. You can run tensorboard by using the command —

 tensorboard — logdir log_directory — reload_interval 1

And the train file by —

python --log_dir=log_directory

Running the train file should give you an accuracy of 0.99 within 10 epochs :)




Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Recommended from Medium

Linear Regression

How neural networks actually work

Cooking an Empanada with the Ising model. The Markov Chain Monte Carlo Recipe.

Install TensorFlow 2.0 GPU (CUDA), Keras, & Python 3.7 in Windows 10

Deep Learning with Neural Networks-Part 3

Vectors and Matrices

Ad Click Prediction: a View from the Trenches

Machine learning basics (part 7): Evolutionary learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ankita Sinha

Ankita Sinha

Work Hard. Be Kind. Take Pride.

More from Medium

Journal Summary: ImageNet Classification with Deep Convolutional Neural Networks

Generating Cifar-10 Fake Images using Deep Convolutional Generative Adversarial Networks (DCGAN)


Convolutional Neural Network (CNN) In Deep Learning

Hyperparameters Selection in Deep Learning