How Do Convolutional Neural Network Works

satyabrata pal
ML and Automation
Published in
5 min readFeb 24, 2020

Behind The Scenes Of A Convolutional Neural Network

Photo by Markus Spiske temporausch.com from Pexels

Update: I am so excited to announce that my deep learning course is now available on Udemy.

In my earlier posts here and here I talked about building a neural network using fastai.

Over there I talked about the “code-first” approach of creating a neural network and I didn’t talk about any theory.

Sometimes it becomes necessary to get into the theory of a concept to get it.

Don’t worry! I won’t bore you with several lines of maths and deep philosophy about how a neural network works on a subatomic level. Rather I am planning to keep it as much visual as possible.

How Computer Sees An Image

A digital image is actually a matrix.

The Matrix

No! not this matrix. Actually this one →

Matrix Representation of Felix the cat

I have taken the matrix representation of Felix- the cat from the Klein Project Blog. Here the cat image on the left is represented as a 35*35 matrix on the right. The elements of this matrix are numbers 0 and 1 and these numbers represent the colours of each pixel in the image.

0 means black and 1 means white.

How Does A CNN Processes This Image Data

A deep neural network is nothing but a collection of matrix multiplication and addition.

If you cut open a CNN and look inside it then you can see the following operations being performed →

  • The image matrix goes into the network as an input.
  • The network does not act on the large image matrix all at once. Rather it goes chunk by chunk taking a couple of pixels at a time. To select these pixels we use what is known as a “filter”. A filter is another smaller matrix usually a 3*3 matrix.
  • Each element of this smaller matrix is multiplied with each element of that part of the image matrix which is covered by this filter.
  • Finally we add them together to get the result.
  • This computation is known as convolution.

Got Confused? Me Too!

There’s no confusion that a good visualization can’t resolve. Suppose that we have a 4*4 image and a 2*2 filter. Going by the above description a CNN would perform the following operation.

Computation Inside a CNN

Notice how we got the following equations as a result of each convolution operation →

  • aA+bB+cE+dF = x1
  • ac+bD+cG+dH = x2
  • aI+bJ+cM+DN = x3
  • aI+bJ+cM+DN = x4

The result of these four equations reduces to a single matrix →

Convolution Result

This way the convolution operation reduces the 4*4 image matrix into a 2*2 matrix and thus the final matrix takes up less memory in further operations down the network.

That’s it! This is what a CNN is.

What if? the input matrix i.e. the image and the filter are of same size. In such a case we cheat. We add zeros to the spaces around the input image to make it of a bigger size. This cheating is known as “padding”. This way it sounds more technical. Just kidding!!

Convolution of a padded image

The Reality

All this was a simplified version of a CNN . Yet in reality a colour image is a 3D matrix.

So, in practice we have to do element wise multiplication of 3*3*3 = 27 elements of the 3D filter with each element of the image matrix.

Now, think of the image matrix of the “Felix- The Cat” which we saw in the first section. It’s a 35*35 matrix. When we multiply a 3*3 filter with this matrix then it’s a lot of computation. Think of how much computation you would need to do when you have a colour image ? The short answer is a lot more.

What we do in such a scenario? Well! we cheat again. This time we cheat by jumping over a couple of pixels while moving the filter through the image.

This jumping is known as a “stride” . Another fancy word to sound more technical.

If we jump 2 pixels at a time then it’s known as stride 2. If we jump 3 pixels at a time then it’s known as stride 3 and so on.

The below doodle will help you visualize it. Yes! I call it a doodle because it’s no where close to a “visualization”.

Strides in a CNN

In the first doodle the blue shaded reason is the area which the filter covers. Here it skips two columns i.e. it jumps 2 pixels. It is a stride 2.

In the second doodle the blue shaded reason is the area which the filter covers. Here it skips three columns i.e. it jumps 3 pixels. It is a stride 3.

Conclusion

Well! what you expected ? I would write more about a CNN ? I won’t. Not at least in this article because this is pretty much all of it. This is how a CNN works behind the scene.

There’s no magic and no fancy stuff going inside it. Sorry about that.

Announcement

My deep learning course is available Udemy at 95% discount till 31st May midnight. Use this link to availe the discount.

--

--

satyabrata pal
ML and Automation

A QA engineer by profession, ML enthusiast by interest, Photography enthusiast by passion and Fitness freak by nature