Implementation of ResNet Architecture for CIFAR-10 and CIFAR-100 Datasets.

Published in

deepkapha notes

6 min readMar 27, 2023

Introduction:

Deep learning models have revolutionized the field of computer vision and have significantly improved the accuracy of image classification tasks. The ResNet (Residual Network) architecture is one such model that has shown remarkable performance on various benchmark datasets. In this article, we will discuss the implementation of the ResNet architecture for the CIFAR-10 and CIFAR-100 datasets.

CIFAR-10 and CIFAR-100 are widely used benchmark datasets in computer vision for image classification tasks. CIFAR-10 consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class, while CIFAR-100 has the same number of images but in 100 classes.

ResNet Architecture:

After the first CNN-based architecture (AlexNet) that win the ImageNet 2012 competition, Every subsequent winning architecture uses more layers in a deep neural network to reduce the error rate. This works for less number of layers, but when we increase the number of layers, there is a common problem in deep learning associated with that called the Vanishing/Exploding gradient. This causes the gradient to become 0 or too large. Thus when we increase the number of layers, the training and test error rate also increases.

In order to solve the problem of the vanishing/exploding gradient, this architecture introduced the concept called Residual Blocks. In this network, we use a technique called skip connections. The skip connection connects activations of a layer to further layers by skipping some layers in between. This forms a residual block. Resnets are made by stacking these residual blocks together.

The ResNet architecture was proposed in 2015 by Kaiming He et al. in their paper titled “Deep Residual Learning for Image Recognition.” The main idea behind ResNet is to use skip connections or residual connections that allow the network to learn the residual mapping, i.e., the difference between the input and output features.

The residual connections are added between two convolutional layers, allowing the network to learn the residual mapping between the input and output of the layer. This helps in reducing the vanishing gradient problem and allows the network to learn deeper representations. The ResNet architecture has shown remarkable performance on various benchmark datasets and has become the de facto standard for deep learning models.

In the CIFAR-10 and CIFAR-100 datasets, the images are small in size (32x32) compared to other benchmark datasets like ImageNet (224x224). Therefore, a modified version of the ResNet architecture is used for these datasets, known as CifarResNet.

Implementation of CifarResNet:

The implementation of CifarResNet is done using PyTorch, which is an open-source machine-learning library. The code for the CifarResNet model is given above.

The CifarResNet class is defined, which takes in the block type, number of layers, and number of classes as input parameters. The block type refers to the type of residual block used in the network, which can be either a basic block or a bottleneck block. The number of layers refers to the number of residual blocks used in the network.

In the init method of the CifarResNet class, the initial parameters are defined. The inplanes variable represents the number of input channels, which is initialized to 16. The conv1 layer is a 3x3 convolutional layer with 3 input channels and 16 output channels. The bn1 layer is a batch normalization layer with 16 output channels, followed by a ReLU activation function.

Layer1, layer2, and layer3 represent the three stages of the network, each consisting of multiple residual blocks. The _make_layer method is defined, which creates a sequence of residual blocks for each stage. The downsample variable is defined, which is used to downsample the input tensor in case the stride is not equal to 1 or the number of input channels is not equal to the number of output channels.

Forward Method

In the forward method, the input tensor is passed through the initial convolutional layer, batch normalization layer, and ReLU activation function. It is then passed through the three stages of the network, each consisting of multiple residual blocks. The output tensor is then passed through a global average pooling layer, which computes the average of the feature maps across the spatial dimensions. The output of the global average pooling layer is then flattened and passed through a fully connected layer, which produces the final output.

The resnet20, resnet32, resnet44, and resnet56 functions are defined, which create instances of the CifarResNet class with different numbers of layers. These functions can be used to create ResNet models with different depths depending on the application requirements.

ResNet models have achieved state-of-the-art results on a variety of image classification tasks, including the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and the COCO object detection challenge. The architecture has also been adapted for other tasks, such as semantic segmentation and object tracking.

CifarResNet class

The CifarResNet class defined in the code is a variant of the ResNet architecture specifically designed for the CIFAR-10 and CIFAR-100 datasets. These datasets consist of 32x32 RGB images of various objects and animals, with 10 and 100 classes, respectively. Compared to the original ResNet architecture, which was designed for the ImageNet dataset with 224x224 images, the CifarResNet has smaller filters and fewer layers to accommodate the smaller image size and reduce overfitting.

The CifarResNet architecture consists of an initial convolutional layer with 16 filters, followed by batch normalization and ReLU activation. The output of this layer is then passed through a series of residual blocks, which consist of a shortcut connection and two 3x3 convolutional layers with batch normalization and ReLU activation. The shortcut connection allows for the gradient to flow directly through the block, bypassing the convolutional layers, which helps alleviate the vanishing gradient problem.

Residual Blocks

The ResNet architecture is characterized by the use of residual blocks, which enable the training of much deeper networks without suffering from the vanishing gradient problem. The residual blocks are composed of shortcut connections and convolutional layers and allow for the direct flow of gradients through the network. This architecture has been shown to improve the accuracy of deep neural networks on a variety of image recognition tasks.

The CifarResNet architecture also includes a global average pooling layer, which computes the average of the feature maps across the spatial dimensions. This allows the network to capture spatial information without relying on fully connected layers, which can lead to overfitting. The output of the global average pooling layer is then passed through a fully connected layer with softmax activation, which produces the final output probabilities for each class.

Make Layer Function

The CifarResNet class also includes the _make_layer function, which creates a sequence of residual blocks with the specified number of layers and filters. The function takes in the block type (BasicBlock or Bottleneck), the number of filters, the number of blocks, and the stride as inputs. It also includes a downsample shortcut connection, which is used when the input and output feature maps have different dimensions. The _make_layer function is called multiple times in the CifarResNet constructor to create the different stages of the network.

The resnet20, resnet32, resnet44, and resnet56 functions are defined as convenience functions for creating CifarResNet models with different numbers of layers. The numbers 20, 32, 44, and 56 refer to the total number of convolutional layers in the network, including the initial layer. The functions take in a pre-trained argument, which is used to load pre-trained weights for the model if provided. The pre-trained settings and URLs for the pre-trained models are stored in the pretrained_settings dictionary, which is defined in another module.

Overall, the CifarResNet architecture is a variant of the ResNet architecture designed for the CIFAR-10 and CIFAR-100 datasets. It includes shortcut connections and residual blocks to enable the training of deeper networks, and a global average pooling layer to capture spatial information. The resnet20, resnet32, resnet44, and resnet56 functions are convenience functions for creating CifarResNet models with different numbers of layers and can be used for various image recognition tasks.