DLOA (Part-18)-ResNet CNN and Implementation

10 min readMay 16, 2023

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly implemented GoogLeNet CNN. If you didn’t read that you can go through this link. In this blog, we’ll be discussing ResNet CNN, it’s working, and its implementation.

Introduction

ResNet (Residual Network) is a deep convolutional neural network architecture that was introduced in 2015 by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. ResNet made a significant breakthrough in the field of image recognition and computer vision by allowing the training of deep neural networks with up to 152 layers while maintaining accuracy and avoiding the vanishing gradient problem.

The fundamental problem that ResNet aims to solve is the degradation of accuracy in deep neural networks as the depth of the network increases. In practice, as more layers are added to a neural network, the accuracy starts to saturate and then degrades rapidly, even though the training error continues to decrease. This problem is known as the “degradation problem”.

The ResNet architecture uses the concept of “residual learning” to tackle this degradation problem. Residual learning is based on the observation that adding more layers to a neural network should not hurt its performance if these additional layers are identity mappings. By stacking a residual block that contains a skip connection alongside several convolutional layers, the ResNet architecture is able to effectively learn these identity mappings.

The skip connection (also called shortcut connection) provides a shortcut path for the gradient to flow through the network without passing through the intermediate layers of the block. This allows the gradient to propagate through the network more efficiently and prevents the degradation problem.

Architecture and Working of ResNet CNN

The ResNet architecture consists of a series of residual blocks, which are connected to form a deep neural network. Each residual block is made up of several convolutional layers, along with a skip connection that bypasses these layers.

A standard residual block has two convolutional layers with a Batch Normalization and ReLU activation function followed by a skip connection. The output of the second convolutional layer is then added to the input of the residual block through the skip connection. The sum of the input and output of the second convolutional layer is then passed through a ReLU activation function.

To increase the depth of the network, ResNet uses the concept of “bottleneck layers”, which are introduced to reduce the number of parameters in the network. A bottleneck layer consists of a 1x1 convolutional layer followed by a 3x3 convolutional layer, followed by another 1x1 convolutional layer. The 1x1 convolutional layers are used to reduce the depth of the input feature maps, while the 3x3 convolutional layer extracts features from the input feature maps.

In addition to the standard residual blocks, ResNet also includes “identity blocks” that have the same structure as the residual block, but without the convolutional layers. Identity blocks are used to connect the bottleneck layers to the next set of residual blocks.

ResNet is typically trained using stochastic gradient descent (SGD) with mini-batches. The network is initialized using weights from a pre-trained model and then fine-tuned on a target dataset.

ResNet50 CNN

Let’s take the example of ResNet50 architecture which is commonly used in various image classification tasks.

The ResNet50 architecture has 50 layers in total, consisting of convolutional layers, batch normalization layers, activation layers, pooling layers, and fully connected layers. It also utilizes residual connections to enable easier training of deep neural networks.

Here’s the architecture of ResNet50:

Now we are going to discuss ResNet 50 and also the architecture for the above talked 18 and 34 layers ResNet is also given residual mapping and not shown for simplicity.

There was a small change that was made for the ResNet 50 and above that, before this the shortcut connections skipped two layers but now they skip three layers also there were 1 * 1 convolution layers added that we are going to see in detail with the ResNet 50 Architecture.

So as we can see in Table 1 the ResNet 50 architecture contains the following element:

A convolution with a kernel size of 7 * 7 and 64 different kernels all with a stride of size 2 gives us 1 layer.
Next, we see max pooling with also a stride size of 2.
In the next convolution, there is a 1 * 1,64 kernel following this a 3 * 3,64 kernel, and at last, a 1 * 1, 256 kernel, These three layers are repeated in total 3 times so giving us 9 layers in this step.
Next, we see a kernel of 1 * 1,128 after that a kernel of 3 * 3,128, and at last a kernel of 1 * 1,512, this step was repeated 4 times so giving us 12 layers in this step.
After that, there is a kernel of 1 * 1,256 and two more kernels with 3 * 3,256 and 1 * 1,1024 and this is repeated 6 times giving us a total of 18 layers.
And then again a 1 * 1,512 kernel with two more of 3 * 3,512 and 1 * 1,2048 and this was repeated 3 times giving us a total of 9 layers.
After that, we do an average pool and end it with a fully connected layer containing 1000 nodes and at the end a softmax function so this gives us 1 layer.

We don’t actually count the activation functions and the max/ average pooling layers.

so totaling this it gives us a 1 + 9 + 12 + 18 + 9 + 1 = 50 layers Deep Convolutional network.

The overall architecture of ResNet enables it to perform very well on large-scale image classification tasks, such as ImageNet, with significantly better accuracy than previous models.

In summary, ResNet is a powerful architecture that utilizes residual connections to enable easier training of deep neural networks. Its use of residual blocks allows for more efficient gradient propagation and better training of deeper networks.

Implementation

Complete code for ResNet-50 implementation in TensorFlow:

import tensorflow as tf
from tensorflow.keras import layers

# Define the ResNet-50 model
def resnet50():
    input_shape = (224, 224, 3)
    
    # Input layer
    inputs = tf.keras.Input(shape=input_shape)
    
    # Stage 1
    x = layers.Conv2D(64, (7, 7), strides=(2, 2), padding='same')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)

    # Stage 2
    x = convolutional_block(x, f=3, filters=[64, 64, 256], stage=2, block='a', s=1)
    x = identity_block(x, f=3, filters=[64, 64, 256], stage=2, block='b')
    x = identity_block(x, f=3, filters=[64, 64, 256], stage=2, block='c')

    # Stage 3
    x = convolutional_block(x, f=3, filters=[128, 128, 512], stage=3, block='a', s=2)
    x = identity_block(x, f=3, filters=[128, 128, 512], stage=3, block='b')
    x = identity_block(x, f=3, filters=[128, 128, 512], stage=3, block='c')
    x = identity_block(x, f=3, filters=[128, 128, 512], stage=3, block='d')

    # Stage 4
    x = convolutional_block(x, f=3, filters=[256, 256, 1024], stage=4, block='a', s=2)
    x = identity_block(x, f=3, filters=[256, 256, 1024], stage=4, block='b')
    x = identity_block(x, f=3, filters=[256, 256, 1024], stage=4, block='c')
    x = identity_block(x, f=3, filters=[256, 256, 1024], stage=4, block='d')
    x = identity_block(x, f=3, filters=[256, 256, 1024], stage=4, block='e')
    x = identity_block(x, f=3, filters=[256, 256, 1024], stage=4, block='f')

    # Stage 5
    x = convolutional_block(x, f=3, filters=[512, 512, 2048], stage=5, block='a', s=2)
    x = identity_block(x, f=3, filters=[512, 512, 2048], stage=5, block='b')
    x = identity_block(x, f=3, filters=[512, 512, 2048], stage=5, block='c')

    # Output layer
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    
    # Create the model
    model = tf.keras.Model(inputs=inputs, outputs=outputs, name='resnet50')
    return model

# Define the identity block function
def identity_block(x, f, filters, stage, block):
    """
    Implementation of the identity block function in ResNet-50

    Arguments:
    x -- input tensor of shape (batch_size, height_prev, width_prev, channels_prev)
    f -- integer, specifying the shape of the middle CONV's window for the main path
    filters -- python list of integers, defining the number of filters in the CONV layers of the main path
    stage -- integer, used to name the layers, depending on their position in the network
    block -- string/character, used to name the layers, depending on their position in the network

    Returns:
    x -- output of the identity block, tensor of shape (batch_size, height, width, channels)
    """

    # Define name basis
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # Retrieve filters
    filters1, filters2, filters3 = filters

    # Save the input value
    x_shortcut = x

    # First component of the main path
    x = layers.Conv2D(filters1, (1, 1), strides=(1, 1), name=conv_name_base + '2a')(x)
    x = layers.BatchNormalization(axis=3, name=bn_name_base + '2a')(x)
    x = layers.Activation('relu')(x)

    # Second component of the main path
    x = layers.Conv2D(filters2, (f, f), strides=(1, 1), padding='same', name=conv_name_base + '2b')(x)
    x = layers.BatchNormalization(axis=3, name=bn_name_base + '2b')(x)
    x = layers.Activation('relu')(x)

    # Third component of the main path
    x = layers.Conv2D(filters3, (1, 1), strides=(1, 1), name=conv_name_base + '2c')(x)
    x = layers.BatchNormalization(axis=3, name=bn_name_base + '2c')(x)

    # Add shortcut value to the main path, and pass it through a RELU activation
    x = layers.Add()([x, x_shortcut])
    x = layers.Activation('relu')(x)

    return x

Explaining the ResNet50 model

The ResNet-50 model created in the code consists of 5 stages, each containing several blocks. The input to the model is an image with a shape of (224, 224, 3). The input goes through a convolutional layer with 64 filters, a kernel size of (7, 7), and a stride of 2. The output of this layer then goes through a batch normalization layer and a ReLU activation function. Finally, it goes through a max pooling layer with a pool size of (3, 3) and a stride of 2.

After this initial processing, the output is passed through a series of residual blocks. Each residual block contains multiple convolutional layers with batch normalization and ReLU activation functions. The input to each residual block is passed through a shortcut connection that bypasses the convolutional layers. This allows the network to learn the identity function and avoid the vanishing gradient problem.

The first stage of the ResNet-50 model contains three residual blocks with 64, 64, and 256 filters, respectively.

The second stage contains four residual blocks with 128, 128, 512, and 512 filters, respectively.

The third stage contains six residual blocks with 256, 256, 1024, 1024, 1024, and 1024 filters, respectively.

The fourth stage contains three residual blocks with 512, 512, and 2048 filters, respectively.

Finally, the output of the last stage is passed through a global average pooling layer and a fully connected layer with 1000 units (corresponding to the 1000 classes in the ImageNet dataset), followed by a softmax activation function.

The resulting model is a deep neural network that is able to classify images with high accuracy, thanks to its ability to effectively learn the relevant features and avoid the vanishing gradient problem.

Explaining the Identity Block function

This function takes in five arguments: x, f, filters, stage, and block.

The x argument represents the input tensor for this block. f is an integer representing the shape of the middle convolutional layer's window for the main path. filters is a list of integers, defining the number of filters in the convolutional layers of the main path. stage and block are used to name the layers, depending on their position in the network.

This function implements the identity block as defined in ResNet-50. It performs the following steps:

Saves the input tensor as x_shortcut.
Defines the name basis for the layers in this block.
Retrieves the number of filters for each convolutional layer in the main path.
Applies the first component of the main path: a 1x1 convolutional layer, followed by batch normalization and a ReLU activation.
Applies the second component of the main path: an fxf convolutional layer with padding ‘same’, followed by batch normalization and a ReLU activation.
Applies the third component of the main path: another convolutional layer with filters3 filters of size (1, 1) and a stride of (1, 1). This is followed by a batch normalization layer and a ReLU activation function. The filters3 parameter determines the number of filters in the third component.
7. Adds the shortcut value to the main path (i.e., x_shortcut), using the Add() function.
Passes the output of step 7 through a ReLU activation function.
Returns the output tensor x.

The identity block is used in the ResNet-50 model to add skip connections between blocks of the same size. These skip connections help to address the vanishing gradient problem and make it easier for the model to learn and converge.

Conclusion

ResNet is a widely used CNN architecture that addresses the vanishing gradient problem by introducing skip connections between layers. This allows for a deeper network to be trained without the loss of information and improved performance on various computer vision tasks such as object detection and image classification.

The ResNet architecture introduced various novel techniques such as bottleneck layers, which allows for efficient processing of information, and the use of residual blocks, which greatly improves gradient flow. The ResNet-50 model is one of the most popular variations of the ResNet architecture, with 50 layers that perform well on a variety of image classification tasks.

Overall, ResNet has greatly impacted the field of computer vision by enabling the training of deeper neural networks and achieving state-of-the-art performance on various benchmarks.

That’s it for now….I hope you liked my blog and got to know about ResNet CNN, it’s working, and the example I had taken while implementing the code.

In the next blog, I will be discussing the DenseNet CNN.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

DLOA (Part-18)-ResNet CNN and Implementation

Introduction

Architecture and Working of ResNet CNN

Implementation

Conclusion

Written by Dewansh Singh