Simple CNN using NumPy Part II (Convolution Operation)

Published in

Analytics Vidhya

4 min readJun 20, 2021

In the previous post, I gave a brief introduction to convolutional neural networks together with code for converting CSV data of flattened images to their actual shapes. In this post, I will try to explain the following

Convolution operation
Why is convolution needed?
Implementing it using NumPy

Convolution Operation

In the context of ConvNets, the convolution operation involves calculating the dot products between a fixed matrix and different regions of an image. The fixed matrix is also known as the convolutional filter. The different regions of the image have the same shape as the fixed matrix. These regions are decided primarily by three parameters; stride, the width of the filter, and the height of the filter.

The stride parameter decides the number of steps taken between each dot product calculation.

The below examples showcase the convolution operation

For stride =1 and number of channels = 1

Convolution between the filter and example image

In the above example, the filter is moved across the image in steps of 1, and for each step, the dot product is calculated.

For stride = 1 and number of channels = 2

Convolution Operation for the multi-channel image. The number of image channels should be equal to the number of filter channels.

For multi-channel convolution operation, the number of image channels and the number of filter channels should be equal.

The resulting dimensions of the convolution operation can be calculated using the following equation:

Resultant Height or Width = ((Image Height or Width-Filter Height or Width)/stride) + 1

Why is convolution needed?

The convolution operation helps un-cover useful features of an image by selectively increasing and decreasing the pixel intensities. These useful features help distinguish one image from the other, thus making the task of image recognition much more efficient.

For example, in the first example above, the convolution operation detects left-leaning diagonal lines. The second example detects left and right-leaning diagonal lines. The following examples help detect vertical and horizontal lines in the Kannada digit 9(“ombatu”).

Convolution Operation that helps detect vertical lines

Convolution Operation that helps detect horizontal lines

Implementing it using NumPy

Before implementing the convolution operation, I would like to initialize my convolutional filters. The convolution operation would occur between a 1X1X28X28 image and 2 filters of dimensions (1X5X5) each. The result of this operation would be an image of dimensions (1X2X24X24).

import numpy as np
conv1 = np.random.randn(2,1,5,5) * np.sqrt(1. / 5)

The following is the pseudo-code for the naive implementation of the convolution operation.

Let N be the number of images

Create resultant matrix (R) of zeros of dimensions (NX2X24X24)
Choose a given image
Choose a filter. Let this be the ith filter. This will have the dimensions (1X5X5)
From the chosen image, select a rectangular portion of size (1X5X28)
From this rectangular portion, consecutively select portions of sizes (1X5X5), horizontally.
Take dot products of the chosen filter and the (1X5X5) image portion and append the result to matrix R.
Repeat step 4 with the next (1X5X28) portion derived by shifting the filter by the stride amount.
Repeat steps 5 & 6 until the whole image is covered.
Repeat steps 3 to 8 for all images

The code is as follows

Implementing convolution operation using im2col

Another way to implement convolution is to convert each stride of the convolutional filter over an image, into a column of a matrix.

A three-channeled image, with a (3X2X2) convolution filter with stride =1, has the following im2col representation

Let the im2col matrix be known as X_im2col. Then the calculation is as follows

Flatten convolutional filter across the number of filters. Let this be known as conv1_flatten
Calculate C = conv1_flatten@X_im2col, where @ is matrix multiplication.
Reshape C to fit the resulting shape of the matrix

im2col implementation together with the convolution result can be coded the following way.

Sanity Check

Convolution between

X = np.array([[1,0,0],[1,2,3],[3,4,5]])
X = X.reshape(1,1,3,3)

and

conv1 = np.array([[1,0],[0,1]])
conv1 = conv1.reshape(1,1,2,2)

should result in a matrix that has the shape (1,1,2,2) and the following entries

(1*1)+(0*0)+(0*1)+(2*1) = 3

(1*0) + (0*0)+(2*0)+(3*1) = 3

(1*1)+(2*0)+(3*0)+(4*1) = 5

(2*1)+(3*0)+(4*0)+(5*1) = 7

The output of the normal convolution is

array([[[[3., 3.],
         [5., 7.]]]])

& the output of the im2col convolution is

array([[[[3. 3.]
   [5. 7.]]]])

Note:

The entries of the convolutional filters are randomized at the start. For every batch of data, the entries are gradually adjusted via backpropagation in order to minimize the loss function. The loss function used here is the cross entropy loss, which is common for classification problems.