Understanding maths behind Convolution neural networks(The uncool way)

Abhishek Patnaik
Nov 1 · 5 min read

Convolution neural networks are considered to be one of the most good to go solutions for classification tasks especially in case of images. CNN (convolution neural networks) works on the basic fundamentals of convolutions. Convolution can be defined as the mathematical operation of two functions (f and g) which produces a third function expressing how the shape of one is modified by the other.

We just summarised the whole convolution into just 4 lines. It seems unfair to me ;) Let's break it down.

Lessons from a Dropped Ball

Consider a one dimension motion, and we are dropping a ball from above the ground at some distance. Now, what's the likelihood that the ball will reach a distance c.

Let's break it down into pieces.

At the first drop, the ball will traverse a distance with a given probability f(a). Where f is the probability distribution.

Now after the first drop, we again pick up the ball and drop it from ‘a’ i.e the place it landed before. Now the probability of ball covering distance b after dropping from before point be g(b).

Now we if fix the distance of the first drop to be a and the distance traversed after the 2nd drop b. So summing both the distance up we get c i.e a+b=c. So the probability is simply the happenings f(a).g(b).

Just to simplify let's take some numbers into consideration.

Total distance to be covered c=3. let's consider a=2 and b=1. We can also take other possibilities into consideration such that we cover the distance of c or 3 (i.e a+b=3). So the probability of a ball reaching c is f(2).g(1) .

So the probability of a ball being dropped, such that a=2 and b=1 is f(2).g(1). Similarly, the probability of a ball being dropped such that a=1 and b=2 is f(1).g(2). We consider all possible ways to reach at point c. So summing up the possibility we get:-

… f(0)⋅g(3) + f(1)⋅g(2) + f(2)⋅g(1) …

We already know that probability for each case of a+b=c is simply f(a).g(b). Putting in into an Equation, we get:-

Turns out, we’re doing a convolution! In particular, the convolution of f and g, evaluated at c is defined:

If we substitute b=c−ab=c−a, we get:

To make it a little more concrete let's consider all possible cases of the balls landing on the ground. After the first drop, it will land at an intermediate position aa with probability f(a). If it lands at a, it has probability g(c−a).

ok, now it looks pretty familiar to neural networks.

Now that was all about convolution now let's get a quick understanding of CNN’s. For more information on the same, you can refer to this link.

CNN’s in simple words can be defined as regularized multilayer perceptrons.

Ok to make it simpler CNN does the work on extracting features from data. Now that data can be in the form of images, texts, etc. CNN deals with a kernel being passed over it such that it extracts typical patters from it.

Source

The above example above shows, an input matrix ‘x’ a kernel of size ‘K’. We get a resultant matrix of size I*K. I hope we are now able to relate the convolution we studied above image that really correlates in a great way. Kernal does the work of extracting features from the input image. Now, this how the matrix multiplication works.

The resultant image is a matrix of convolved features.

The size of the output matrix is dependent upon the below formula:-

Since we are learning CNN’s the uncool way it would be so great to see the working of how filters actually work. So below is the code.

Convolve2d is nothing but the things we did above to understand convolutions. Let's plot an image and see how filters work on CNN’s.

Original Image:-

Image after applying the filter:-

We see the clear difference between the two images. The second one seems to be a little bluer than the first image its because of the convolutions.

The following are the must-know terms when dealing with CNN’s.

Strides

When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. The below figure shows convolution would work with a stride of 2.

Padding

Sometimes filter does not fit perfectly fit the input image. We have two options:

  • Pad the picture with zeros (zero-padding) so that it fits
  • Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.

So this was not all but a small introduction about the working of CNN’s.

There are a lot of free videos available on Youtube. But I would strongly suggest reading research papers for any topics related to Machine Learning. It takes time but it's worth it in the end.

Do follow me on Github for more awesome content. I have a repo that under the name “Go-ML tutorials”(link) that is enough for getting started.

If you like the blog do share, like and comment. Follow me on Linkedin. And let's learn together.

letsprep

We help fresher in placements with the help of AI and letsprep courses

Abhishek Patnaik

Written by

Tech Lead @ Ripplex India, Data scientist and Researcher, Full Stack developer, Graph Database

letsprep

letsprep

We help fresher in placements with the help of AI and letsprep courses

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade