Implementing Convolution without for loops in Numpy!!!

Shashank Shekhar
Analytics Vidhya
Published in
5 min readMar 7, 2020

INTRODUCTION

Convolution with different kernels (3x3, 5x5) are used to apply effect to an image namely sharpening, blurring, outlining or embossing.

Images are bunch of numbers which is represented as an array of some width by height pixels, and each pixel is associated to three float values ranging from 0 to 255. This three numbers represent the red-ness, green-ness and blue-ness of given pixel, the combination of the three captures its colour.

If the image is grayscale, a single value can be used per pixel, with 0 meaning black and 255 meaning white. The convolution is obtained by putting the kernel in front of every area of the picture, like a sliding window, then do the element-wise product of the values in our kernel by the ones in the picture it overlaps and sum it up as shown below:

If we have a regular colour image with 3 channels then our kernel should also have 3 channels as shown below.

For this blog i will mostly be using grayscale images with dimension [1,1,10,10] and kernel of dimension [1,1,3,3].

Approach

We won’t code the convolution as a loop since it would be very inefficient. Instead, we will vectorize our image so that the convolution operation just becomes a matrix product. This means taking each receptive field window and write the numbers in a column as shown below.

*Note only 10 receptive field columns are displayed in above image.

Similar vectorization can be done for colour images having 3 channels as shown below.

vectorization for colour images.

Let’s code this!

So, let’s try implementing the convolution layer from scratch using Numpy!

Firstly we will write a class Conv_Module which will have basic code flow, here forward pass is not implemented and backward pass requires bwd method to be defined.
Secondly we will be using a class Convolution which inherit from Conv_Module and then overrides forward class and it also contains bwd method required by backward pass.

import numpy as np
import matplotlib.pyplot as plt
img=np.random.rand(1,1,10,10)
ker=np.random.rand(1,1,3,3)
Input image and kernel visualisation

Above img, ker represents the image/kernel we will be using for our implementation. Naming convention is something like [B,D,H,W] where B is he batch size, D is the number of channels or depth, H is Height & W is width.
Default padding and stride is 1, b is the bias which we will initialise as 0 .

Conv_Module

There major functionality of numpy we will be using is
1) np.repeat()
2) np.tile()
3) np.add.at()

Forward Pass

Most of the places has been commented in the code for better understanding. At the end of this blog code snippet is attached, I would be going through the code but mainly index calculation and backward pass which i feel is the key aspects.

output height(self.out_h) & width(self.out_w) which will be similar to input height and width i.e 10,10 as we are using same padding. Using pad_img padding is created around the original image.

Index calculation

To store the receptive field as vectors, index are required. Here i is related to first index , j in related to second index of the receptive field and k is channel dimension. self.i is calculated as the sum of i0 and i1, similarly self.j is calculated as the sum of j0 and j1.

Index corresponding to each receptive field.
Forward pass formula

Using the i,j,k indexes images are stored in vectorized form, then this is multipled by the weights to get the output of forward pass(convolve). Below is the visualisation of output from forward pass.

Backward Pass

During backward pass three gradients are required to to calculated:-
a) gradient with respect to bias.
b) gradient with respect to weight.
c) gradient with respect to input image.

Formula for calculating gradients wrt to different parameters.

db is the gradient calculated wrt to bias, dw is the gradient calculate wrt to weights and X is the gradient calculated wrt to input.

empty pad of zeros with dimension [1,1,12,12] is created and using the index values filled with the gradient values.

Visualisation of gradient wrt to input image.
Time taken for forward and backward calculation.

As we can see that forward pass requires mean 376 µs and backward pass requires mean 113 µs. Entire convolution code snippet is below…

Convolution Class

Conclusion

As convolution is basic block for any architecture, implementing it without any for loops save a lot of computation time. Please Share, Leave your comment if you liked it.

--

--