Introduction to Convolutional Neural Networks

Published in

DataSeries

4 min readJun 5, 2020

Convolutional Neural Networks are a part of Deep Learning which is employed in image recognition, image classification, object detection, etc. These take an image as input, process it, and classify it under specific categories.

Here every image is passed through a series of filters, pooling, flattening, a fully connected layer, and an activation function is applied to classify the image with probabilistic values between 0 and 1.

The main reason for applying the softmax function is that, as an example, if we are classifying a cat and dog, let’s say we got a probability that the image is a dog as 0.80 and cat as 0.35. It doesn’t sum to 1 and doesn’t make sense. To overcome it we use the softmax function what it does is, takes the values from the last layer of the hidden layer and then scale the values between 0 and 1 and make sure it sums to 1.

How CNN works

1. Convolution Layer

It is the first layer to extract features from the image. It keeps the relation between pixels by learning image features using small squares of input data. It extracts features such as edges, and corners from the input image.

It performs a dot product between the two matrices, one is the kernel and the other is the portion of the image. The kernel can be of any size but in the visualization shown below the kernel is of size 3 by 3 matrix. It moves from left to right till the end of the image and if any values match with the kernel values, that count is added to the feature detector. If anyone's value matches we place 1 at the respective position and if any two values are matched with the kernel we place 2 in the feature detector i.e in the resultant matrix. The representation can be shown below

Credits: giphy.com

Stride: It is the number of pixels by which we slide our filter matrix over the input matrix. When the stride is 1 then we move the filters one pixel at a time.

Padding is adding zeros so as to fit it for the stride.

Relu Layer

We apply a Rectifier function to the Feature Map i.e the Convolved Feature to add Non-Linearity to our image. The reason why we apply non-linearity is, Initially the images are highly non-linear in nature but when we apply Convolution to create Feature Maps, there is a risk that it might create something linear and erases non-linearity. So to make image non-linear we apply rectifier function.

2.Max Pooling

In this layer, the dimensionality of the feature map gets reduced by 75% keeping important information. pooling can be of different types:

Max Pooling
Average Pooling
Sum Pooling

Mostly we use Max pooling, pooling won’t only reduce the size by 75% but also prevents overfitting and helps us a lot in terms of processing.

3.Flattening

It breaks the spatial structure of the data and transforms your two-dimensional data into one dimensional. This is done to feed the output of CNN to the fully connected network(to classify features learned by CNN) or to feed output to the softmax function to get the probability.

4. Fully Connected layer

It is a simple feedforward Neural Network, these are the last few layers in the model. The output from the final pooling layer is flattened and then fed into a Fully connected layer. The role of this layer is to take the results from the flatten layer and use them to classify the image to its corresponding label.

Till now we have highlighted some important features of the image and reduced the sized and removed some unnecessary features to speed up the process but didn’t classify an image yet.

In the above diagram, the feature map matrix is converted as a vector, with fully connected layers, we combine all these features together to create a model. We have an activation function such as softmax or sigmoid to classify the outputs as a cat, dog, car, truck, etc.,

That’s all for now, hope you enjoyed the post. In my next blog, we will discuss about Recurrent Neural Networks.