DEEP LEARNING

CNN (Convolution Neural Network)

Tanuj Shrivastava

Published in

Analytics Vidhya

6 min readSep 9, 2020

Why CNN ?

In Fully Connected Neural Networks any neuron in a given layer is fully connected to all the neurons in the previous layer so it consist of many parameters because of that they are more prone to over fitting.
As there is a huge chain of neurons, the model can suffer with a problem of vanishing gradients .
What we aim is to get a better Optimization Algorithms, Better Activation Functions, Better Initialisation methods, Better Regularization.

If we train DNNs on images with larger dimensions then we get a huge number of parameters which requires a lot of computation power.
Apparently we would like to have DNNs which are complex (many non-linearities) but have fewer parameters and hence less prone to over fitting.

How Image Looks like

https://media.geeksforgeeks.org/wp-content/uploads/RGB-1.jpg

The following figure represents a layer in a colored image. A colored image consist if three color channels red , green and blue . This is how actual image looks like internally, every colored images is a combination of three pixels.

Convolution operation vs Neural Network

In CNN instead of taking the weighted sum of all the inputs in order to output one neuron we are taking the weighted sum of fewer inputs.
In the first image of the given figure the box inside the input image is considered as a weight which is known as filter in CNN which will be multiplied with the given set of pixels in that particular portion of the image and then result the output as a pixel for the next layer.
The filter then go through the entire image step by step and compute the resultant output.
We can have as many filters as we want in one layer, the width of the output will represent the no. of filters in the input image.

As you can clearly see in the image by applying some weights or filter in the image the image completely changed . In both the example we have applied edge detection filter, so only the part of the image where edges are present are highlighted and rest are completely black.

In the image you can clearly see how the filter moves through the entire image. Note in this image the input is 3D, the filter is also 3D but the Convolution operation that we perform is 2D, we are sliding only vertically and horizontally buy not along with the depth, this is because the depth of the filter is the same as the depth of the input

Each filter applied to a 3D input will give a 2D output and combining multiple such filters will result in a 3D output.

Some Terminologies

Input Width (WI ), Height (HI ) and Depth (DI )
Output Width (W0 ), Height (H0 ) and Depth (D0 )
The spatial extent of a filter (F), a single number to denote width and height as they are equal
Filter depth is always the same as the Input Depth (DI )
The number of filters (K) f. Padding (P) and Stride (S)

Padding :

As in the image we can see if we use 3x3 filter in a 7x7 input then we get a 5x5 output and we are not allowed to keep the kernel out of the input region, so every time we perform some operation we will loose some information of a image.
So the solution for that is padding, we create an extra layer at the end of the input and then we can slide out filter through it, so we will get the output same as input this is how we will save some information.

Stride :

Stride defines the interval at which the filter is applied . Higher the stride, the smaller the size of the output .
In other words the movement of filter over the input is defined by stride ,if the stride is 1 we will move one step horizontally and vertically over the image similarly if the stride is 2 we will move 2 steps horizontally and vertically over the image .

Max Pooling

We are familiar with almost all the layers in this architecture except the Max Pooling layer
Here, by passing the filter over an image (with or without padding), we get a transformed matrix of values
Now, we perform max-pooling over the convoluted input to select the max-value from each position of the kernel, as specified by stride length.
Here, we select a stride length of 2 and a 2x2 filter, meaning the 4x4 convoluted output is split into 4 quadrants.
The max value of each of these quadrants is taken and a 2x2 matrix is generated.

Max pooling is done to select the most prominent or salient point within a neighborhood. It is also known as sub sampling, as we are sampling just a single value from a region.
Similar to Max pooling, average pooling is also done sometimes and it’s carried out by taking the average value in a sampled neighborhood .
The idea behind Max Pooling is to condense the convolutional input into a smaller size, thereby making it easier to manage.

Full Convolutional Neural Network

The following diagram illustrates the configuration and working of a Convolutional Neural Network. It follows the LeNet architecture, created by Yann LeCun.
For a change the input image takes 32x32x1 pixel inputs as there is no depth component because the images are in black & white .
At the end it consist of 2 fully connected layers; adding Fully-Connected layer is usually a cheap way of learning non-linear combinations of the high-level features as represented by the output of the convolutional layer.
Fully connected layer 1: Number of neurons: 120, Input is h4 flattened, i.e. 5x5x16 = 400, No. of parameters in h5 = 120x400 + 120-bias = 48120 parameters .
Fully connected layer 2: Number of neurons: 84, Input is number of neurons in h5 = 120, No. of parameters in h6 = 84x120 + 84-bias = 10164 parameters .

Note : Overall, this combination of Convolutional and fully-connected layers is much more efficient than an entirely fully connected network. It has a significantly lower number of parameters but still is able to estimate functions of very high complexity.

Train a Convolution Neural Network

A CNN can be implemented as a feedforward network wherein only a few weights(coloured) are active
The rest of the weights (grey) remain zero.Thus, we can train a CNN using backpropagation by thinking of it as a Feed Forward Neural Network with sparse connections .

Note : However, in practice we don’t do this, as most of the weights in the matrix end up being zero. Frameworks like PyTorch and Tensorflow don’t end up creating such large matrices and only focus on dealing with the weights that are to be updated.

References :

https://www.guvi.in/ (AI with Deep Learning certification course)
https://www.deeplearning.ai/ deep-learning-specialization/
Hands on Machine Learning with Scikit (Reference Book , PDF -https://drive.google.com/file/d/16DdwF4KIGi47ky7Q_B-4aApvMYW2evJZ/view?usp=sharing) .