Implementing DenseNet-121 in PyTorch: A Step-by-Step Guide

Shuvam Das
deepkapha notes
Published in
7 min readMar 19, 2023

Introduction

Deep Learning has revolutionized the field of artificial intelligence and machine learning, and convolutional neural networks (CNN) have played a vital role in this revolution. CNNs are widely used for image classification, object detection, and other computer vision tasks. However, the success of CNNs is highly dependent on the architecture of the network.

DenseNet is a type of CNN architecture that is known for its efficiency and accuracy in image classification tasks. In this article, we will explain the DenseNet architecture and the code implementation in PyTorch. We will cover the code of the bottleneck layer, transition layer, and DenseNet-BC.

Dense Net Architecture

DenseNet is a convolutional neural network (CNN) architecture introduced in 2016 by Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. It is a state-of-the-art architecture that has achieved outstanding performance in image classification tasks. DenseNet stands for “Densely Connected Convolutional Networks,” and it is so named because it connects every layer to every other layer in a feedforward fashion. DenseNet has two key features that make it stand out from other CNN architectures. First, it has a dense block structure, where each layer is connected to every other layer in a feedforward fashion. Second, it uses bottleneck layers that help reduce the number of parameters without reducing the number of features learned by the network.

Explanation of the DenseNet-121

The code in this article implements DenseNet-121, which is a variant of DenseNet with 121 layers. The code is implemented using PyTorch, which is a popular open-source machine-learning library. The code defines three classes: Bottleneck, Transition, and DenseNet. The Bottleneck class represents the bottleneck layer of the DenseNet architecture. The Transition class represents the transition layer, which is used to reduce the spatial dimensionality of the feature maps. The DenseNet class represents the entire DenseNet architecture.

Let’s dive into the code to understand how these classes are implemented and how they work together to create the DenseNet architecture.

import torch
import torch.nn as nn
class Bottleneck(nn.Module):
def __init__(self, in_channels, growth_rate):
super().__init__()
inner_channel = 4 * growth_rate
self.bottle_neck = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels, inner_channel, kernel_size=1, bias=False),
nn.BatchNorm2d(inner_channel),
nn.ReLU(inplace=True),
nn.Conv2d(inner_channel, growth_rate, kernel_size=3, padding=1, bias=False)
)
def forward(self, x):
return torch.cat([x, self.bottle_neck(x)], 1)

Bottleneck Class

The Bottleneck class represents the bottleneck layer of the DenseNet architecture. The bottleneck layer consists of two convolutional layers and a batch normalization layer in between them. The first convolutional layer is a 1x1 convolution that reduces the number of input feature maps to the bottleneck layer. The second convolutional layer is a 3x3 convolution that produces the output feature maps. The inner_channel is set to 4 * growth_rate, where growth_rate is a hyperparameter that determines the number of feature maps that each layer in a dense block should produce. The output of the bottleneck layer is obtained by concatenating the input feature maps with the output of the second convolutional layer.

Transition Class

class Transition(nn.Module):
def __init__(self, in_channels, out_channels):
super().__init__()
self.down_sample = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.Conv2d(in_channels, out_channels, 1, bias=False),
nn.AvgPool2d(2, stride=2)
)
def forward(self, x):
return self.down_sample(x)

The Transition class represents the transition layer of the DenseNet architecture. The transition layer consists of a batch normalization layer, a 1x1 convolutional layer, and a 2x2 average pooling layer. The batch normalization layer and the convolutional layer reduce the number of feature maps, and the pooling layer reduces the spatial dimensionality of the feature maps. The Transition class plays an important role in the DenseNet architecture by serving as a way to compress and downsample the feature maps between the dense blocks. This is important because as the feature maps pass through the dense blocks, the number of feature maps increases, which can become computationally expensive and memory-intensive.

To address this issue, the Transition class first applies batch normalization to normalize the feature maps, helping to improve the stability and speed of training. Next, a 1x1 convolutional layer is applied to reduce the number of feature maps. The 1x1 convolutional layer uses a small kernel size of 1x1, which allows it to perform dimensionality reduction without affecting the spatial dimensions of the feature maps.

Finally, a 2x2 average pooling layer is applied to downsample the feature maps, reducing the spatial dimensionality of the feature maps by a factor of 2. This helps to further reduce the computational cost and memory usage of the network. By using the Transition class between the dense blocks, the DenseNet architecture is able to efficiently use its parameters and memory while maintaining a high level of accuracy on a wide range of image classification tasks.

The DenseNet Model

The DenseNet model class defines the architecture of the DenseNet network. The __init__ function of the DenseNet class takes four arguments: block, nblocks, growth_rate, reduction, and num_class. The block argument specifies the type of block to use in the DenseNet network. In our case, we’re using the Bottleneck block. The nblocks argument is a list that specifies the number of blocks in each dense block. The growth_rate argument specifies the number of output feature maps of each convolutional layer within the block. The reduction argument specifies the compression factor of the transition layers. The num_class argument specifies the number of output classes.

The DenseNet class initializes a few variables, including the growth rate, the number of input channels, and the number of output channels. It then defines the first convolutional layer of the network. After that, it defines the dense blocks and transition layers of the network.

Dense Blocks and Transition Layers

A dense block consists of multiple bottleneck layers connected to each other in a feedforward fashion. Each bottleneck layer takes the output of the previous layer as input and produces a fixed number of output feature maps. The bottleneck layers are implemented in the Bottleneck class.

A transition layer consists of a batch normalization layer, a 1x1 convolutional layer, and a 2x2 average pooling layer. The purpose of the transition layer is to reduce the number of feature maps output by the dense block and to reduce the computational cost of the network. The transition layers are implemented in the Transition class.

The Forward Function

The forward function of the DenseNet class defines the forward pass of the network. It takes the input tensor and passes it through the layers of the network in sequence. The output of the last layer is passed through a global average pooling layer and then through a fully connected layer to produce the final output. The forward function is the most important function in any neural network model because it defines how the model processes input data and produce output. In the case of DenseNet, the forward function takes an input tensor and passes it through a series of layers, or “blocks,” in sequence. Each block in the DenseNet architecture is composed of several convolutional layers followed by a concatenation operation that merges the input from all previous layers. This is the key innovation of DenseNet — the dense connections between layers allow information to flow more freely and reduce the risk of vanishing gradients, which can occur in very deep neural networks.

After passing through all of the blocks in the network, the output is then passed through a global average pooling layer. This layer averages the values across all of the feature maps, resulting in a single value for each feature. This helps to reduce the dimensionality of the output and ensures that the model is more robust to variations in the input.

Finally, the output of the global average pooling layer is passed through a fully connected layer, which performs a linear transformation of the input and produces the final output. This fully connected layer is often referred to as the “classification layer” because it maps the features extracted by the earlier layers to the desired output classes.

Overall, the forward function is a critical component of the DenseNet architecture because it defines how the model processes input data and produce output. By leveraging dense connections between layers and including a global average pooling layer, DenseNet is able to achieve state-of-the-art performance on a wide range of computer vision tasks.

The densenet121 Function

The densenet121 function is a convenience function that creates a DenseNet-121 network. It takes two arguments: num_class and pre-trained. The num_class argument specifies the number of output classes, and the pre-trained argument specifies whether to use a pre-trained version of the network.

If pre-trained is set to None, the function creates a new instance of the DenseNet-121 network and returns it. If pre-trained is set to a path to a pre-trained model file, the function loads the pre-trained model weights and returns the model.

Conclusion

In conclusion, the DenseNet architecture is a deep neural network that uses dense blocks to connect the layers of the network. Each dense block consists of multiple bottleneck layers connected to each other in a feedforward fashion. The network also includes transition layers that reduce the number of feature maps output by the dense blocks and reduce the computational cost of the network. The DenseNet architecture has been shown to achieve state-of-the-art performance on a variety of image classification tasks. The PyTorch implementation of the DenseNet architecture is available in the torchvision.models module.

--

--