DLOA (Part-19)-DenseNet CNN and Implementation

Dewansh Singh
8 min readMay 16, 2023

--

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed and implemented ResNet CNN. If you didn’t read that you can go through this link. In this blog, we’ll be discussing DenseNet CNN, it’s working, and its implementation.

Introduction

DenseNet (Dense Convolutional Network) is a type of convolutional neural network that is characterized by its dense connections. Unlike traditional CNNs, where information is only passed sequentially from one layer to the next, DenseNet connections allow for information to be propagated in both forward and backward directions across multiple layers. This leads to the efficient use of parameters, improved accuracy, and better gradient propagation.

The key idea behind DenseNet is to address the vanishing gradient problem that often occurs in very deep neural networks. As the gradient is propagated through multiple layers, it can become increasingly small, making it difficult for the network to learn and update its parameters effectively. Dense connections provide an efficient way to address this issue by allowing information to be reused across layers, making the gradient signal stronger and more consistent.

DenseNet CNN Architecture

DenseNet is composed of several densely connected blocks. Each block is composed of a set of convolutional layers, followed by a batch normalization layer and a non-linear activation function (usually ReLU). The output of each block is then passed on to the next block as input, where it is concatenated with the output of all previous blocks. This creates a dense connection between all layers within a block, as well as between different blocks.

The most popular variant of DenseNet is DenseNet-121, which has 121 layers and is used in many applications such as image classification, object detection, and segmentation.

Architecture and Working of DenseNet CNN

The core building block of DenseNet is the dense block, which consists of multiple layers. In a dense block, the output of each layer is concatenated with the output of all previous layers and passed as input to the next layer. This allows the network to learn both low-level and high-level features in a single pass and helps to combat the vanishing gradient problem.

Another important component of DenseNet is the transition layer, which is inserted between dense blocks. The transition layer consists of a batch normalization layer, a 1x1 convolutional layer, and a 2x2 average pooling layer. The batch normalization layer normalizes the activations, the 1x1 convolutional layer reduces the number of feature maps, and the average pooling layer reduces the spatial dimensions of the feature maps.

DenseNet also uses a global average pooling layer at the end of the network, which averages the feature maps across all spatial dimensions and produces a single feature vector. This feature vector is then passed through a fully connected layer and a softmax activation function to produce class probabilities.

Overall, DenseNet is a powerful architecture that achieves state-of-the-art performance on a variety of computer vision tasks, while also being more parameter-efficient than other popular architectures like VGGNet and ResNet.

Here is a step-by-step breakdown of the working of DenseNet:

  1. Input image: The input to the network is an image of size (H, W, C), where H is the height, W is the width, and C is the number of channels (usually 3 for RGB images).
  2. Convolutional layer: The first layer in the network is a standard convolutional layer with a small kernel size (e.g., 3x3) and a small number of filters (e.g., 16). This layer extracts low-level features from the input image.
  3. Dense block: The output of the first convolutional layer is passed through a dense block, which consists of multiple layers. In a dense block, the output of each layer is concatenated with the output of all previous layers and passed as input to the next layer. This allows the network to learn both low-level and high-level features in a single pass.
  4. Transition layer: After each dense block, a transition layer is inserted. The transition layer consists of a batch normalization layer, a 1x1 convolutional layer, and a 2x2 average pooling layer. The batch normalization layer normalizes the activations, the 1x1 convolutional layer reduces the number of feature maps, and the average pooling layer reduces the spatial dimensions of the feature maps.
  5. Global average pooling: At the end of the network, a global average pooling layer is applied to the output feature maps. This layer averages the feature maps across all spatial dimensions and produces a single feature vector.
  6. Fully connected layer: The output of the global average pooling layer is passed through a fully connected layer with a small number of neurons (e.g., 256) and a ReLU activation function. This layer reduces the dimensionality of the feature vector and extracts higher-level features.
  7. Softmax layer: The output of the fully connected layer is passed through a softmax activation function, which produces class probabilities.

The DenseNet architecture is highly modular and can be easily customized for different tasks and datasets by adjusting the number of layers, filters, and blocks.

Implementation

Here’s an example code to implement a DenseNet-121 model using the Keras framework:

DenseNet121 CNN Architecture
  • We first import the necessary modules from the TensorFlow framework — tf and layers.
  • Then we define the densenet121 function that will build and return the DenseNet-121 model.
  • The input_shape variable is defined as a tuple of 3 integers representing the dimensions of the input image.
  • We create an input layer using the Input class from keras.
  • The initial convolutional layer has 64 filters with a kernel size of 7x7, a stride of 2, and same padding to preserve the spatial dimensions of the input image.
  • Batch normalization and ReLU activation functions are applied to the output of the convolutional layer.
  • The output is then max-pooled with a pool size of 3x3, a stride of 2, and same padding.
import tensorflow as tf
from tensorflow.keras import layers, models

# Define the DenseNet-121 model
def densenet121():
input_shape = (224, 224, 3)

# Input layer
inputs = tf.keras.Input(shape=input_shape)

# Initial convolutional layer
x = layers.Conv2D(64, 7, strides=2, padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.MaxPooling2D(3, strides=2, padding='same')(x)
  • We then define the first dense block consisting of 6 layers with 64 filters per layer. The dense_block function is called with x as the input, 64 as the growth rate, and 6 as the number of layers.
  • The output of the dense block and the number of filters is returned and stored in x and nb_filters, respectively.
  • We then add a transition layer to reduce the number of feature maps. The transition_layer function is called with x and nb_filters as inputs.
    # Dense block 1
x, nb_filters = dense_block(x, 64, 6)

# Transition layer 1
x = transition_layer(x, nb_filters)
  • We repeat the process with the second dense block, consisting of 12 layers with 128 filters per layer.
  • The output of the dense block and the number of filters is returned and stored in x and nb_filters, respectively.
  • We then add another transition layer to further reduce the number of feature maps.
    # Dense block 2
x, nb_filters = dense_block(x, 128, 12)

# Transition layer 2
x = transition_layer(x, nb_filters)
  • We repeat the process with the third dense block, consisting of 24 layers with 256 filters per layer.
  • The output of the dense block and the number of filters is returned and stored in x and nb_filters, respectively.
  • We then add another transition layer to further reduce the number of feature maps.
    # Dense block 3
x, nb_filters = dense_block(x, 256, 24)

# Transition layer 3
x = transition_layer(x, nb_filters)
  • We add the final dense block, consisting of 16 layers with 512 filters per layer.
  • The output of the dense block and the number of filters

are returned and stored in x and nb_filters, respectively.

    # Dense block 4
x, nb_filters = dense_block(x, 512, 16)
  • We then apply global average pooling to the output of the final dense block.
  • The output of the global average pooling layer is fed into a fully connected layer with 1000 units and a softmax activation function.
  • Finally, we create the Model instance using the inputs and outputs layers defined earlier.
  • The model is then returned as the output of the densenet121 function.
    # Global average pooling and output
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(1000, activation='softmax')(x)

# Instantiate the model
model = models.Model(inputs, outputs)

return model

Complete Code:

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dense, BatchNormalization, Activation, concatenate
from tensorflow.keras.models import Model

def conv_block(x, growth_rate):
x1 = BatchNormalization()(x)
x1 = Activation('relu')(x1)
x1 = Conv2D(growth_rate, (3, 3), padding='same')(x1)
x = concatenate([x, x1], axis=-1)
return x

def dense_block(x, num_layers, growth_rate):
for i in range(num_layers):
x = conv_block(x, growth_rate)
return x

def transition_block(x, compression):
num_filters = int(x.shape[-1] * compression)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(num_filters, (1, 1))(x)
x = MaxPooling2D((2, 2))(x)
return x

def DenseNet121(input_shape, num_classes):
inputs = Input(shape=input_shape)

# Initial convolution layer
x = Conv2D(64, (7, 7), strides=(2, 2), padding='same')(inputs)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)

# Dense blocks
x = dense_block(x, num_layers=6, growth_rate=32)
x = transition_block(x, compression=0.5)
x = dense_block(x, num_layers=12, growth_rate=32)
x = transition_block(x, compression=0.5)
x = dense_block(x, num_layers=24, growth_rate=32)
x = transition_block(x, compression=0.5)
x = dense_block(x, num_layers=16, growth_rate=32)

# Final layers
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = GlobalAveragePooling2D()(x)
x = Dense(num_classes, activation='softmax')(x)

model = Model(inputs, x)
return model

Explaining the complete code section-wise:

  1. First, we import the necessary libraries — tensorflow, numpy, and matplotlib.
  2. We define the input image dimensions and the number of classes.
  3. Next, we define the DenseNet-121 model architecture using the tensorflow.keras API. The architecture consists of a series of convolutional blocks, transition layers, and a final fully connected layer. We use the Dense layer from tensorflow.keras.layers to define the final output layer with softmax activation.
  4. We compile the model using the adam optimizer, categorical_crossentropy loss function, and accuracy metric.
  5. We use the flow_from_directory function of tensorflow.keras.preprocessing.image.ImageDataGenerator to generate batches of data from the training and validation directories. We also specify the batch size, image size, and the number of classes.
  6. Finally, we train the model using the fit_generator function. We specify the number of epochs and the steps per epoch for the training and validation sets.
  7. After training, we evaluate the model on the test set using the evaluate_generator function.

Conclusion

In conclusion, DenseNet-121 is a powerful CNN architecture that has shown excellent performance in various computer vision tasks. It is a deep network that allows for feature reuse and effective gradient propagation, which helps to mitigate the vanishing gradient problem commonly found in deep networks. Dense connections between layers help to strengthen feature propagation and enable feature reuse, leading to more efficient and accurate model training. The use of batch normalization, ReLU activation, and dense blocks also contribute to the model’s high performance.

Overall, DenseNet-121 is a highly effective architecture that has shown superior performance compared to other popular CNN architectures. Its strengths in feature reuse, gradient propagation, and efficient parameter usage make it a valuable tool for a wide range of computer vision tasks.

That’s it for now….I hope you liked my blog and got to know about DenseNet CNN, it’s working, and the example I had taken while implementing the code.

In the next blog, I will be discussing the MobileNet CNN.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

--

--

Dewansh Singh

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies