Implementing Convolution without for loops in Numpy!!!
INTRODUCTION
Convolution with different kernels (3x3, 5x5) are used to apply effect to an image namely sharpening, blurring, outlining or embossing.
Images are bunch of numbers which is represented as an array of some width by height pixels, and each pixel is associated to three float values ranging from 0 to 255. This three numbers represent the red-ness, green-ness and blue-ness of given pixel, the combination of the three captures its colour.
If the image is grayscale, a single value can be used per pixel, with 0 meaning black and 255 meaning white. The convolution is obtained by putting the kernel in front of every area of the picture, like a sliding window, then do the element-wise product of the values in our kernel by the ones in the picture it overlaps and sum it up as shown below:
If we have a regular colour image with 3 channels then our kernel should also have 3 channels as shown below.
For this blog i will mostly be using grayscale images with dimension [1,1,10,10] and kernel of dimension [1,1,3,3].
Approach
We won’t code the convolution as a loop since it would be very inefficient. Instead, we will vectorize our image so that the convolution operation just becomes a matrix product. This means taking each receptive field window and write the numbers in a column as shown below.
Similar vectorization can be done for colour images having 3 channels as shown below.
Let’s code this!
So, let’s try implementing the convolution layer from scratch using Numpy!
Firstly we will write a class Conv_Module which will have basic code flow, here forward pass is not implemented and backward pass requires bwd method to be defined.
Secondly we will be using a class Convolution which inherit from Conv_Module and then overrides forward class and it also contains bwd method required by backward pass.
import numpy as np
import matplotlib.pyplot as pltimg=np.random.rand(1,1,10,10)
ker=np.random.rand(1,1,3,3)
Above img, ker represents the image/kernel we will be using for our implementation. Naming convention is something like [B,D,H,W] where B is he batch size, D is the number of channels or depth, H is Height & W is width.
Default padding and stride is 1, b is the bias which we will initialise as 0 .
There major functionality of numpy we will be using is
1) np.repeat()
2) np.tile()
3) np.add.at()
Forward Pass
Most of the places has been commented in the code for better understanding. At the end of this blog code snippet is attached, I would be going through the code but mainly index calculation and backward pass which i feel is the key aspects.
output height(self.out_h) & width(self.out_w) which will be similar to input height and width i.e 10,10 as we are using same padding. Using pad_img padding is created around the original image.
Index calculation
To store the receptive field as vectors, index are required. Here i is related to first index , j in related to second index of the receptive field and k is channel dimension. self.i is calculated as the sum of i0 and i1, similarly self.j is calculated as the sum of j0 and j1.
Using the i,j,k indexes images are stored in vectorized form, then this is multipled by the weights to get the output of forward pass(convolve). Below is the visualisation of output from forward pass.
Backward Pass
During backward pass three gradients are required to to calculated:-
a) gradient with respect to bias.
b) gradient with respect to weight.
c) gradient with respect to input image.
db is the gradient calculated wrt to bias, dw is the gradient calculate wrt to weights and X is the gradient calculated wrt to input.
empty pad of zeros with dimension [1,1,12,12] is created and using the index values filled with the gradient values.
As we can see that forward pass requires mean 376 µs and backward pass requires mean 113 µs. Entire convolution code snippet is below…
Conclusion
As convolution is basic block for any architecture, implementing it without any for loops save a lot of computation time. Please Share, Leave your comment if you liked it.