ResNet for Image Classification.
ResNets or Residual Networks was introduced by Kaiming He, Xiangyu Zhang, Shaoqin Ren, Jian Sun of the Microsoft Research team (Link to the paper). It solved the degradation problem when neural networks are too deep by introducing skip connections or shortcut connections.
The Degradation Problem — When a model gets deeper, after a certain point, the accuracy of the model starts decreasing. This happens because as the model becomes too deep, it becomes difficult for the layers to propogate information from shallow layers and the information is lost.
From the ResNet paper:
When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error.
The Solution — Using Skip Connections or Shortcuts. It uses Identity Networks to directly connect shallow layers with the deep layers. This allows the data to flow easily between two layers. If a layer is hampering the models performance, that layer will be skipped. Thus getting the name skip connections.
Residual Networks — y= f(x) + x.
Here +x is the skip connection. And f(x) denotes the residual mapping to be learned. f(x) + x is obtained by element wise addition of skip connection.
Now let us look at the ResNet architecture.
The first layer is a 7x7 kernel and then there are repeated blocks of 3x3 kernels with a skip connection after every block comprising of 2 3x3 convolution layers. Whenever the feature map is halved, the numbers of filters is doubled to preserve the time complexity per layer.
Now let us look at the part which makes Resnets different than the plain Networks — Skip Connections. These are identity shortcuts with kernel size of 1x1. If the input and output dimensions are same, the identity shortcut can be directly used. In the scenario that the input and output dimensions are different, we have 2 options-
- We can add zero padding to maintain the dimensions
- A projection shortcut is used to match the dimension.
Both are used with a stride of 2.
Now let us look at an example. You can find the complete code here for MNSIT dataset.
Code for the article - Building your own simple ResNets with ~0.99 accuracy on MNSIT To run the code - Go into the…
Let us create a class for our model. We will define each block seperately so that is it easier to understand and debug.
The first block — that is our self.master block is a 7x7 convolution layer. I have added a padding of 3 and a stride of 2 to maintain the dimensions of the image. We then add a non-linearity — ReLU and reduce the dimension of the image by 2 using maxpool according the the architecture in the paper.
Next we have 3 blocks of 2 convolution layers with 3x3 kernel size with BatchNorm and ReLU to stabilise the learning process. The self.downsample layers are our SKIP Connections. As you can see, these are 1x1 identity layers. We end with a linear layer. In forward, after the master layer, at each layer, we add the result of that layer and the result of the previous layer passing it through the identity skip connection. You can also use CONCAT instead of addition to add the layers.
Now that we have our model, you can go ahead and load and train the images. I have also added tensorboard logging. You can run tensorboard by using the command —
tensorboard — logdir log_directory — reload_interval 1
And the train file by —
python train.py --log_dir=log_directory
Running the train file should give you an accuracy of 0.99 within 10 epochs :)