How to calculate the number of parameters in the CNN?

Image result for convolutional neural network animated gif
Image Source: Google

Every Machine Learning Engineer/Software Developer/Students who interested in Machine Learning have worked on Convolution Neural Network also called CNN. We have a general theory, How network will be trained to classify the image. But the people who are new to Machine Learning/Neural Networks quite don’t understand how exactly the CNN learns the parameters.

We know, In each Conv Layer the network tries to understand the basic patterns. For example: In the First layer, the network tries to learn patterns and edges. In the second layer, it tries to understand the shape/color and other stuff. A final layer called Feature layer/Fully Connected Layer tries to classify the image.

Before we learn about parameters, we need to know some basic concept in the Convolutional network which is very helpful to modify/reuse the source code.

There are various layer in CNN network.

Input Layer: All the input layer does is read the image. So, there are no parameters learn in here.

Convolutional Layer: Consider a convolutional layer which takes “l” feature maps as the input and has “k” feature maps as output. The filter size is “n*m.

Image Source: https://i.stack.imgur.com/2r4XG.png

Before we learn about parameters, we need to know some basic concept in the Convolutional network which is very helpful to modify/reuse the source code.
There is a various layer in CNN network.

Input Layer : All the input layer does is read the image. So, there are no parameters learn in here.

Convolutional Layer : Consider a convolutional layer which takes “l” feature maps as the input and has “k” feature maps as output. The filter size is “n*m”.
Here the input has l=32 feature maps as inputs, k=64 feature maps as outputs and filter size is n=3 and m=3. It is important to understand, that we don’t simply have a 3*3 filter, but actually, we have 3*3*32 filter, as our input has 32 dimensions. And as an output from first conv layer, we learn 64 different 3*3*32 filters which total weights is “n*m*k*l”. Then there is a term called bias for each feature map. So, the total number of parameters are “(n*m*l+1)*k”.

Pooling Layer: There are no parameters you could learn in pooling layer. This layer is just used to reduce the image dimension size.

Fully-connected Layer: In this layer, all inputs units have a separable weight to each output unit. For “n” inputs and “m” outputs, the number of weights is “n*m”. Additionally, this layer has the bias for each output node, so “(n+1)*m” parameters.

Output Layer: This layer is the fully connected layer, so “(n+1)m” parameters, when “n” is the number of inputs and “m” is the number of outputs.

The final difficulty in the CNN layer is the first fully connected layer, We don’t know the dimensionality of the Fully-connected layer, as it as a convolutional layer. To calculate it, we have to start with the size of the input image and calculate the size of each convolutional layer.

In the simple case, the size of the output CNN layer is calculated as “input_size-(filter_size-1)”. For example, if the input image_size is (50,50) and filter is (3,3) then (50-(3–1)) = 48. But the size of the input image of a Convolutional network should not be less than the input, so padding is done. 
To calculate padding, input_size + 2 * padding_size-(filter_size-1). For above case, (50+(2*1)-(3–1) = 52–2 = 50) which gives as a same input size.
If we want to explicitly want to downsample the image during the convolutional, we can define a stride. 
Finally, to calculate the number of parameters the network learned (n*m*k+1)*f.

Let’s see this in given code.

Convolutional Network Model Architecture

The input_1(Input Layer) has shape (None,96,96,1) and parameter is 0. In the whole program stride=1,kernel_size=2*2,padding=same.

Convolutional_1 : ((kernel_size)*stride+1)*filters) = 3*3*1+1*32 = 320 parameters. In first layer, the convolutional layer has 32 filters.

Dropout_1: Dropout layer does nothing. It just removes the nodes that are below the weights mentioned.

Convolutional_2 : As convolutional_1 already learned 32 filters. So the number of trainable parameters in this layer is 3 * 3 * 32 + 1 * 32 = 9248 and so on.

Max_pooling_2d: This layer is used to reduce the input image size. kernal_size = (2,2) used here. So input image 96 is reduced to half 48. And model learns nothing from this layer.

Convolutional_3 : 3 * 3 * 32 + 1 * 64 = 18496 and so on.

At last all parameters sum together.

Total Training Parameter = 7,759,521 Trainable Parameters = 7,759,251 Non-Trainable Parameter = 0.

Article refered from

Have a great day..!