CNN — quick learn

Harsh Borse

Published in

Analytics Vidhya

7 min readMay 9, 2020

Get the idea of a Convolutional neural network quickly and in short.

Everything you should know About CNN

What is CNN?
Why CNN?
What is a Convolutional Layer?
What is a Pooling layer?
What is Padding?
What is Stride?
What is a Flattening layer?
What is the Dense Layer?

CNN

> A CNN is a deep learning algorithm which is mostly used for image classification
> It can take an input image, assign importance (learnable weights and biases) to various aspects in the image, and be able to differentiate them.
> The preprocessing required in a ConvNet (CNN) is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, but with enough training, ConNet have the ability to learn these filters or characteristics of an image

Why CNN?

you would think a normal machine learning classifier also can do the same thing, so why we need this CNN, well it’s all about speed and computation and accuracy.

> CNN has many major advantages over an ML classifier, but you want to know the real deal, the reason of using CNN is it’s overhead on the processor is less than ML classifier and accuracy is higher. and as the world is shifting toward cloud infrastructure, it’s best suited to have a low processing model on the cloud machine.
> And if you think We can use a simple Neural network then. well hear me this out, in a simple Neural Network, a 3x3 (pixels resolution) image matrix is turned into a 9x1 matrix so-called a vector so than we can feed it in a neural network of 9 starting node. and usually, 3x3 images are not used, bigger images are used and with more dimensions of the image matrix also, that’ll lead to a very big and wide Neural Network.
> A ConvNet performs a better fitting to the dataset due to the reduction in the number of parameters involved. and NO by using reduction CNN doesn’t lose the features of the image.

Convolutional Layer

Convolutional layer is not a part of the Neural network, it’s just a technique to preprocess the image and extract features information from the image before sending it to the neural network.

> Filter = just under this as a normal filter which filters only specific information about the image, for example, a filter which will return all the straight lines present in the image, or returns all the curved lines in the image.
> If you put such a filter over an image, it’ll return filtered information. but of course, the whole image is not going to be made up of just one straight line right, that’s why we choose a small filter and then apply it on a small of the image once at a time till the whole image is covered.
> So let’s assume an image 5x5 px resolution and we choose a filter of size 3x3 px, which means image matrix is 5x5 matrix and filter is 3x3 matrix.

> First we choose a 3x3 matrix from the image (this should be the same dimensions as our filter). Then this 3x3 subpart of the image is dot product (matrix multiplication) with our filter, this will return a single value and we will store this value in a new matrix called convolved matrix.
> we repeat this procedure by choosing different parts of the image till we have all the values. A particular part of the image dot product with the filter will have a fixed position in our convolved matrix
> For example: from a 5x5 image we can choose a 3x3 matrix, 9 different times.
hence our convolved matrix is equal to the dimensions of our filter.
> This convolved matrix is now our new matrix representing our image with better accuracy of straight line detection.
> Now you can multiple convolutional layers which will extract all the required information of the image.
> The real deal is you don’t have to choose which layer filters which feature, you don’t have to set the filters. It’s all done by the algorithm itself, you just have to assign the model with a convolutional layer, and weights will be assigned during the training.

The objective of the convolutional operation is to extract high-level features such as edges, lines curves, shadows, etc from the image.

Pooling layer

> The Pooling layer is used to reduce the dimension size of the convolved feature.
> this is done to decrease the computational power required to process the data through dimension reduction
> It is useful for extracting dominant feature and reduces noise
> In simple language suppose your convolved matrix which you got after convolution layer is 4x4, pooling reduces it to 2x2 matrix.
> you would say, that data loss or feature loss, but no it’s more like a reducing the dimension only and still preserving the feature, for this we can you’d different poling techniques.
MAX Pooling = from the 4x4 matrix we choose a 2x2 submatrix and take the maximum value out of it and save it in our new matrix. we repeatedly take a 2x2 matrix (total 4 times) till we complete our 2x2 reduced matrix.
AVERAGE Pooling = from the 4x4 matrix we choose a 2x2 submatrix and take the average of all the 4 values and save it in our new matrix. we repeatedly take a 2x2 matrix (total 4 times) till we complete our 2x2 reduced matrix.

Now that ‘ ConvNet with filter + Pooling ‘ was one convolutional layer. We can add many convolutional layers which will give us better results.

What we got in the result after Conv layer

The original Image we gave in input had many unnecessary features and data that we didn’t need, but still, we had to input them all as a pixel in a matrix,

After Convolutional layers, all the noise and unwanted data from the image are removed and we have a matrix of only dominating features.

Consider this example: You want to train the model so that it can predict if the photo is yours or not.and the sample photos you gave to train had you standing in middle in some pictures, on the edge, far away from the camera, near to the camera, different type of background, etc, these all images will be preprocessed in the convolutional layer and only those pixels will be sent which had some dominating feature.

> After going through the above process, we have successfully enabled the model to understand the features.
Moving on , we are going to flattern the final output of convnet and feed it to a regular neural network for classification purpose

Padding

> It is a term relevant to convNet as refers to the number of pixels added to an image when it is being processed by the kernel of CNN.
> Every time we use a filter, the size of the image will go smaller, if we don’t want that and we want to preserve the original size of the image to extract the same low-level features we use padding
> When we use a 3x3 filter on a 5x5 image we get a 3x3 convolved matrix, and if we don’t want that to happen we can use padding.
> Before sending the 5x5 image to a Conv Layer with 3x3 filter, we do Padding of pixels on the image and make it a 6x6 image, now if we send this 6x6 image into the Conv layer and apply 3x3 filter on it the convolved matrix we will get is of 5x5 resolution.
> This doesn’t increase the size of the image, because the added new pixels a of the lowest possible density. (not as features)

Stride

> It is the number of pixels shift over the input matrix.
> Remember in the Conv Layer we select a submatrix from the Original Image matrix (same size of filter) to do dot product with the filter, Stride decides from where the next subMatrix is to be taken, how far should be the next submatrix from the current.
> when Stride is 1, then we move the submatrix to 1 pixel at a time.

Flattening

It is the conversion of data into a one-dimensional array for input in a Neural Network.

> Now after the covn layer, it reduced the size of the image and turned it into a feature pack image array, we then feed this array to a Normal neural network.
> As we know a normal Neutral network takes a vector of the image, in simple terms each pixel of the image needs to be fed into a Node of the Neural network.
> Hence we convert the involved array into a 1-D array that process is called Flattening

Dense Layer

It is just a regular layer of neurons in a neural network. Each neuron receives input from all the neurons from the previous layer, this densely connected. A dense layer is a classic fully connected neural network layer: each input node is connected to each output node.

This layer gives you the output of your Model.

The number of nodes in this layer is equal to the number of outputs we need.