Convolution Neural Network (CNN) & MATHS behind CNN

This topic will cover CNN, its importance and maths behind it.

Before jumping right into the maths behind CNN’s lets first walk through the need of CNN and what is it?

Problem with traditional Neural Networks

I assume that you are already familiar with traditional neural networks called the multilayer perceptron (MLP). There are several drawbacks of MLP’s when it comes to image processing. When we need to process large images, the weight becomes unmanageable which may arise many difficulties. MLP react differently to the input images and its translational image. Apart from that, there is also a loss in spatial information when images are flattened for the feed forwarding process.

MLP is a terrible idea to use in case of image processing tasks.

Why we use CNN?

CNN is mainly used to look for the patterns in an image. It understands the right features by itself as it goes deep and we don’t need to provide it.
This is the main reason we used CNN for problems related to CNN.

Convolutional networks were inspired by biological processes in case of connectivity patterns of neurons. They are also known as shift invariant as they were based on the architecture of shared-weights. Unlike MLP which uses fully connected layer, CNN uses different layers to detect patterns among images which are feed forwarded. CNN has sparsely connected layers and accepts a matrix as an input while MLP accepts vectors as an input.

Applications of CNN are very vast ranging from the domain of Image recognition, Natural language processing, Video analysis to Health risk assessment and biomarkers of aging discovery.

How does CNN works ?

Consider any image, as you know every image represent some pixels in simple terms. We analyze the influence of nearby pixels in an image by using something called a filter (can be called as weights, kernels or features)

Filters are tensor which keeps track of spatial information and learns to extract features like edge detection, smooth curve, etc of objects in something called a convolutional layer. The major part is to detect edges in the images and these are detected by the filters. It helps to filter out unwanted information to amplify images. There are high-pass filters where the changes occur in intensity very quickly like from black to white pixel and vice-versa.

Image Credits: Udacity
Image Credits: Udacity

Let’s consider an image of size 5X5 size and 3 filters (since each filter will be used for each color channel: RGB) of 3X3 size.

Image Credits: Deep Learning MachineLearning.ai

For simplicity, we took 0 and 1 for filters, usually, these are continuous values.

The filter convolute with the image to detect patterns and features.

Image Credits: Deep Learning MachineLearning.ai

Check out this website it will allow you to create your own filter.

Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix which is called “Feature Map”. We apply the dot product to the scaler value and then move the filter by the stride over the entire image.

Sometimes filter does not fit perfectly fit the input image. Then there is a need to pad the image with zeros as shown below. This is called padding

Next, we need to reduce the size of images, if they are too large. Pooling layers section would reduce the number of parameters when the images are too large

Image Credits: Deep Learning MachineLearning.ai

As shown in the above image, the padding is applied so that the filter perfectly fits the given image. Adding pooling layer then decrease the size of the image and hence decrease the complexity and computations.

Next Step, is Normalization. Usually, an activation function ReLu is used. ReLU stands for Rectified Linear Unit for a non-linear operation.

The output is ƒ(x) = max(0,x).
The purpose of ReLu is to add non-linearity to the convolutional network. In usual cases, the real-world data want our network to learn non-linear values.

A rectified linear unit has output 0 if the input is less than 0, and raw output otherwise. That is, if the input is greater than 0, the output is equal to the input. Here we are assuming that we have negative values since dealing with the real-world data. In case, if there is no negative value, you can skip this part.

Image Credits: Deep Learning MachineLearning.ai

The final step is to flatten our matrix and feed the values to fully connected layer.

Image Credits: Convolutional Neural Network (CNN)

Next, we need to train the model in the same way, we train other neural networks. Using the ceratin number of epochs and then backpropagate to update weights and calculate the loss.

Overall Structure of CNN

Image Credits: Convolutional Neural Network (CNN)

Using the above diagram, we know that there are:

  1. Convolution layer where convolution happens.
  2. Pooling layer where the pooling process happens
  3. Normalization usually with the use of ReLu
  4. Fully Connected Layers
Image Credits: Cs231n stanford.edu

This is all about CNN. I hope now you have an idea of why we use CNN and how it works and what maths we used behind it. Lets Recap

Summary

  1. Pass the input image to Convolutional Neural Network.
  2. Convolute with filters with stride and apply the padding if necessary.
  3. Apply Pooling and ReLu activation function
  4. Flaten the output and Feedforward it to fully connected layers
  5. Output the class using activation function and classify the image.

If you like it or it helped you in any way. Please feel free to clap 👏 and share this article with others. You claps 👏 will motivate me to write further and to come up with some awesome articles.

References

--

--

Seeratpal Jaura
Secure and Private AI Math Blogging Competition

Applied Computer Science Student, strong programmer, Enthusiastic about AI and deep Learning, facebookUdacity scholar