Creating MobileNetsV2 with TensorFlow from scratch

Sumeet Badgujar
Analytics Vidhya
Published in
6 min readJul 17, 2021

MobileNet models are very small and have low latency. The MobileNet models can be easily be deployed easily on the mobile and embedded edge devices.

In this blog, we will look in the improved version of MobileNet i.e. the version 2- MobileNetV2. The first version MobileNet explanatation and creating with Tensorflow is explained in my previous post. In the version 1, the authors used a Depthwise separable Convolution to reduce the computations. In the version2, the authors reduced the computation time further down by reducing the trainable parameters without any significant or no drop in accuracy.

MobileNetV2 paper link: http://export.arxiv.org/pdf/1801.04381

GOOD, then what’s changed in V2?

A new convolution block called Inverted Residuals and Linear Bottlenecks. In this block, the features from a lower dimensional representation are scaled up. Then a Depthwise convolution is applied and then the features are compressed back into the earlier lower dimensional representation.

Here’s a pic of the block to make it easier to understand the intuition behind the structure of the block.

Figure 1: Representation of the block.

The expansion of the layer is decided by a scalar variable t. This variable is multiple by the filter size to get the desired expansion. The compression is decided the output layer we want i.e. the filter size.

Now let’s create these layers in Python. This time we are going to use Relu(6) as the activation function. Relu(6) is the de facto activation function used in many models and papers. How and why? That’s a discussion for another time.

def expansion_block(x,t,filters,block_id):    prefix = 'block_{}_'.format(block_id)
total_filters = t*filters
x = Conv2D(total_filters,1,padding='same',use_bias=False, name = prefix +'expand')(x)
x = BatchNormalization(name=prefix +'expand_bn')(x)
x = ReLU(6,name = prefix +'expand_relu')(x)
return x
def depthwise_block(x,stride,block_id): prefix = 'block_{}_'.format(block_id)
x = DepthwiseConv2D(3,strides=(stride,stride),padding ='same', use_bias = False, name = prefix + 'depthwise_conv')(x)
x = BatchNormalization(name=prefix +'dw_bn')(x)
x = ReLU(6,name = prefix +'dw_relu')(x)
return x
def projection_block(x,out_channels,block_id): prefix = 'block_{}_'.format(block_id)
x = Conv2D(filters=out_channels,kernel_size = 1, padding='same',use_bias=False,name= prefix + 'compress')(x)
x = BatchNormalization(name=prefix +'compress_bn')(x)
return x

Since its a residual block means an addition is ought to be there, so the layer added are the input and the output of the block. The intuition behind the residuals is that it creates a highway between the first and last layer, making flow of data i.e. the gradients easier.

Here’s a pic of the residual block used in MobileNetV2.

Figure 2: Bottleneck Residual Block

You might notice that there is no Activation layer after the batch normalization layer in the Projection layer. That’s because the authors found that since its compressing data to lower dimension, applying non linearity on it would destroy useful information.

Using low-dimension tensors is the key to reducing the number of computations. After all, the smaller the tensor, the fewer multiplications the convolutional layers have to do.

The layer addition can only be done when the channel size is the same. For this, we will be implementing an if function to check the channel size. Now the Python code for it. Let’s see.

def Bottleneck(x,t,filters, out_channels,stride,block_id):
y = expansion_block(x,t,filters,block_id)
y = depthwise_block(y,stride,block_id)
y = projection_block(y, out_channels,block_id)
if y.shape[-1]==x.shape[-1]:
y = add([x,y])
return y

Next, we look at how all these blocks and layers look, and how to implement them in python.

Architecture of MobileNetV2 :

Figure 3: The MobileNetV2 architecture (Source: Original MobileNetV2 paper)

MobileNetV2 starts with a basic 2D convolution layer. Then there are a series of Bottleneck layers attached one after another, having expansion rate (t), different strides (s) , output channels (c ) and the number of time it’s to be repeated (n).

Defining the convolutional block — Each convolutional block after the input has the following sequence: BatchNormalization, followed by ReLU activation and then passed to the next block.

The first convolution block has 32 filters of kernel size (3x3) and a stride of 2. And as said, it is followed by a BatchNormalization layer and a ReLU activation. These three lines can be represented with the following code.

input = Input (input_shape)
x = Conv2D(32,3,strides=(2,2),padding='same', use_bias=False)(input)
x = BatchNormalization(name='conv1_bn')(x)
x = ReLU(6, name='conv1_relu')(x)

Then comes the chain of 17 Bottlenecks. The Bottleneck is called as below. x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 24, stride = 2,block_id = 2)

x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 24, stride = 2,block_id = 2)

In the end, there is a 1x1 Convolution layer of stride 1. Followed by a layer of GlobalAveragePooling, followed by the final output layer. The output layer is a Dense layer where the classes are to be mentioned. If the classes are 3 then it shall be Dense(3). The activation function used is Softmax.

x = Conv2D(filters = 1280,kernel_size = 1,padding='same',use_bias=False, name = 'last_conv')(x)x = BatchNormalization(name='last_bn')(x)
x = ReLU(6,name='last_relu')(x)
x = GlobalAveragePooling2D(name='global_average_pool')(x)output = Dense(n_classes,activation='softmax')(x)

Now that we have all the blocks together, let’s merge them to see the entire MobileNet architecture.

Complete MobileNet architecture:

def MobileNetV2(input_image = (224,224,3), n_classes=1000):    input = Input (input_shape)
x = Conv2D(32,3,strides=(2,2),padding='same', use_bias=False)(input)
x = BatchNormalization(name='conv1_bn')(x)
x = ReLU(6, name='conv1_relu')(x)
# 17 Bottlenecks x = depthwise_block(x,stride=1,block_id=1)
x = projection_block(x, out_channels=16,block_id=1)
x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 24, stride = 2,block_id = 2) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 24, stride = 1,block_id = 3) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 32, stride = 2,block_id = 4) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 32, stride = 1,block_id = 5) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 32, stride = 1,block_id = 6) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 2,block_id = 7) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 1,block_id = 8) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 1,block_id = 9) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 1,block_id = 10) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 96, stride = 1,block_id = 11) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 96, stride = 1,block_id = 12) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 96, stride = 1,block_id = 13) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 160, stride = 2,block_id = 14) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 160, stride = 1,block_id = 15) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 160, stride = 1,block_id = 16) x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 320, stride = 1,block_id = 17) x = Conv2D(filters = 1280,kernel_size = 1,padding='same',use_bias=False, name = 'last_conv')(x)
x = BatchNormalization(name='last_bn')(x)
x = ReLU(6,name='last_relu')(x)
x = GlobalAveragePooling2D(name='global_average_pool')(x)
output = Dense(n_classes,activation='softmax')(x)
model = Model(input, output)
return model
Figure 4: Model summary last few layers

And that’s how we can implement the MobileNetV2 architecture.

To see the code in a much more nicer presentable way, checkout the jupyter notebook on github.

References:

@article{Sandler2018MobileNetV2IR, title={MobileNetV2: Inverted Residuals and Linear Bottlenecks}, author={M. Sandler and Andrew G. Howard and Menglong Zhu and A. Zhmoginov and Liang-Chieh Chen}, journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2018}, pages={4510–4520} }

--

--

Sumeet Badgujar
Analytics Vidhya

A guy interested in Data Science and Ex-Machine Learning Engineer, doing data analysis and fun AI projects. “Ore wa Kaizoku Ou ni naru!”