DLOA (Part-20)-MobileNet CNN and Implementation

Dewansh Singh
7 min readMay 20, 2023

--

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed and implemented DenseNet CNN. If you didn’t read that you can go through this link. In this blog, we’ll be discussing MobileNet CNN, it’s working, and its implementation.

Introduction

MobileNet is a convolutional neural network (CNN) architecture designed for efficient and lightweight deep learning models. It was developed by Andrew G. Howard et al. from Google Research. The primary objective of MobileNet is to provide a solution for running deep neural networks on resource-constrained devices such as mobile phones, embedded systems, and other low-power devices.

MobileNet CNN Architecture

The key idea behind MobileNet is to use depthwise separable convolutions instead of traditional convolutions to reduce the computational complexity and model size. Depthwise separable convolutions split the standard convolution operation into two separate operations: a depthwise convolution and a pointwise convolution.

  • Depthwise Convolution:
  1. The depthwise convolution performs a separate convolution on each input channel independently, using a single filter per channel. This reduces the number of parameters and computations compared to standard convolutions.
  2. It applies a 3x3 depthwise convolution with a stride of 1, followed by batch normalization and a ReLU activation function.
  3. Depthwise convolution helps capture spatial information within each channel.
  • Pointwise Convolution:
  1. The pointwise convolution applies a 1x1 convolution to the output of the depthwise convolution to combine information across channels.
  2. It uses a small number of 1x1 filters to perform cross-channel feature combinations and dimensionality reduction.
  3. Pointwise convolution helps mix and transform features from different channels.

By separating the convolution operation into these two stages, MobileNet achieves a significant reduction in the number of parameters and computations while still maintaining good accuracy. The depthwise separable convolutions enable the network to learn efficient and compact representations of the input data.

Architecture of MobileNet CNN

The MobileNet architecture consists of several building blocks:

MobileNet CNN Architecture Example
  • Input Layer:
  1. The input layer receives the input image data.
  • Convolutional Layers:
  1. MobileNet starts with a standard 3x3 convolutional layer with a stride of 2, which reduces the spatial dimensions of the input image.
  2. Following the initial convolutional layer, a stack of depthwise separable convolutional layers is used.
  3. Each depthwise separable convolutional layer is composed of a depthwise convolution, batch normalization, ReLU activation, and a pointwise convolution.
  • Depthwise Separable Convolution Blocks:
  1. MobileNet typically consists of multiple depthwise separable convolution blocks, which are stacked together to form the network.
  2. These blocks gradually increase the complexity and abstractness of the learned features.
  • Downsampling and Upsampling:
  1. To reduce the spatial dimensions of the feature maps, MobileNet incorporates downsampling layers such as stridden convolutions or max pooling.
  2. To recover the spatial dimensions, MobileNet includes up-sampling layers such as bilinear interpolation.
  • Global Average Pooling:
  1. At the end of the network, a global average pooling layer is applied to reduce the spatial dimensions to a single vector.
  2. Global average pooling computes the average value of each feature map, resulting in a fixed-length feature vector.
  • Fully Connected Layer:
  1. MobileNet typically ends with a fully connected layer followed by a softmax activation function for classification tasks.
  2. The fully connected layer maps the extracted features to the number of target classes.

MobileNet offers different variants, such as MobileNetV1, MobileNetV2, and MobileNetV3, each with specific improvements and optimizations over the previous version. These variants incorporate techniques like inverted residuals, linear bottlenecks, squeeze-and-excitation blocks, and improved activation functions to further enhance the efficiency and accuracy of the models.

Implementation

Here’s an example implementation of MobileNet using the TensorFlow framework:

  • Importing Dependencies: The code imports the necessary libraries, including TensorFlow and the layers module from Tensorflow.keras.
import tensorflow as tf
from tensorflow.keras import layers
  • Defining the MobileNet Model Function: The mobilenet() function is defined to create an instance of the MobileNet model. The function takes no arguments.
  • Input Shape: The input_shape variable is set to (224, 224, 3), representing the desired input shape of the model. This means the input image should have dimensions of 224x224 pixels with 3 color channels (RGB).
  • Input Layer: The inputs variable is set to tf.keras.Input(shape=input_shape), which creates the input layer of the model with the specified input shape.
# Define the MobileNet model
def mobilenet():
input_shape = (224, 224, 3)

# Input layer
inputs = tf.keras.Input(shape=input_shape)
  • Convolutional Layers: The code defines a series of convolutional layers using the layers.Conv2D function. The first convolutional layer applies a 3x3 kernel with a stride of 2 to downsample the input image. Batch normalization (layers.BatchNormalization) and ReLU activation (layers.ReLU) are applied after each convolutional layer.
 # Convolutional layers
x = layers.Conv2D(32, (3, 3), strides=(2, 2), padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
  • Depthwise Separable Convolutions: The code uses depthwise separable convolutions to reduce the number of parameters and computations. Depthwise convolution is performed with a small kernel size (3x3) and a stride of 1. Pointwise convolution (1x1) is used to combine information across channels.
 # Depthwise separable convolutions
x = layers.DepthwiseConv2D((3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(64, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
  • Downsample and Upsample: The code includes downsampling layers to reduce the spatial dimensions of the feature maps. Strided convolutions or max pooling operations are commonly used for downsampling. Upsampling layers, such as bilinear interpolation, can be used to recover the spatial dimensions if needed.
x = layers.DepthwiseConv2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(128, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
  • Global Average Pooling: The code applies global average pooling to reduce the spatial dimensions of the feature maps to a single vector. Global average pooling computes the average value of each feature map, resulting in a fixed-length feature vector.
    x = layers.GlobalAveragePooling2D()(x)
  • Output Layer: The outputs variable is set to layers.Dense(1000, activation='softmax')(x), which represents the output layer of the model. This layer consists of a fully connected (dense) layer with 1000 units, followed by a softmax activation function for classification.
# Output layer
outputs = layers.Dense(1000, activation='softmax')(x)
  • Create the Model: The model variable is set to tf.keras.Model(inputs=inputs, outputs=outputs), which creates the model using the input and output layers.
 # Create the model
model = tf.keras.Model(inputs=inputs, outputs=outputs)

return model

# Create an instance of the MobileNet model
model = mobilenet()
  • Print Model Summary: model.summary() prints a summary of the model, including the layer types, shapes, and the total number of parameters.
# Print the model summary
model.summary()

This code defines the MobileNet model architecture using the TensorFlow framework. It starts with an input layer of shape (224, 224, 3) to match the size of the input image. Then, it builds a series of convolutional layers, depthwise separable convolutions, and batch normalization layers to capture and transform the image features. Finally, it ends with a global average pooling layer and a fully connected output layer with a softmax activation function.

To use the model, you can compile it with an appropriate optimizer and loss function, and then train it on your dataset. You can also load pre-trained weights if available to leverage the learned features from large-scale datasets like ImageNet.

Note: The above code written only defines the architecture of the MobileNet model. Additional steps such as data preparation, training, and evaluation need to be implemented separately based on your specific use case and dataset.

Complete Code

import tensorflow as tf
from tensorflow.keras import layers

# Define the MobileNet model
def mobilenet():
input_shape = (224, 224, 3)

# Input layer
inputs = tf.keras.Input(shape=input_shape)

# Convolutional layers
x = layers.Conv2D(32, (3, 3), strides=(2, 2), padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

# Depthwise separable convolutions
x = layers.DepthwiseConv2D((3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(64, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.DepthwiseConv2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(128, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.DepthwiseConv2D((3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(128, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.DepthwiseConv2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(256, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.DepthwiseConv2D((3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(256, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.DepthwiseConv2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(512, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

for _ in range(5):
x = layers.DepthwiseConv2D((3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(512, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.DepthwiseConv2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(1024, (1, 1), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)

x = layers.GlobalAveragePooling2D()(x)

# Output layer
outputs = layers.Dense(1000, activation='softmax')(x)

# Create the model
model = tf.keras.Model(inputs=inputs, outputs=outputs)

return model

# Create an instance of the MobileNet model
model = mobilenet()

# Print the model summary
model.summary()

Conclusion

The main advantage of MobileNet is its ability to achieve good accuracy with a significantly smaller model size compared to traditional convolutional neural networks. It achieves this by using depthwise separable convolutions, which separate the spatial and channel-wise operations, reducing the number of parameters and computations.

However, due to its smaller model size and reduced complexity, MobileNet may not perform as well as larger and more complex models like ResNet or Inception on certain challenging datasets. It is important to consider the trade-off between model size, efficiency, and accuracy when choosing a CNN architecture.

Overall, MobileNet is a powerful tool for deploying deep learning models on resource-constrained devices without compromising too much on performance. Its efficiency and compactness make it a popular choice for applications where computational resources are limited or where real-time processing is required.

That’s it for now….I hope you liked my blog and got to know about MobileNet CNN, it’s working, and the example I had taken while implementing the code.

In the next blog, I will be discussing the Recurrent Neural Network(RNN).

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

--

--

Dewansh Singh

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies