Simple CNN using NumPy Part II (Convolution Operation)
In the previous post, I gave a brief introduction to convolutional neural networks together with code for converting CSV data of flattened images to their actual shapes. In this post, I will try to explain the following
- Convolution operation
- Why is convolution needed?
- Implementing it using NumPy
Convolution Operation
In the context of ConvNets, the convolution operation involves calculating the dot products between a fixed matrix and different regions of an image. The fixed matrix is also known as the convolutional filter. The different regions of the image have the same shape as the fixed matrix. These regions are decided primarily by three parameters; stride, the width of the filter, and the height of the filter.
The stride parameter decides the number of steps taken between each dot product calculation.
The below examples showcase the convolution operation
For stride =1 and number of channels = 1
In the above example, the filter is moved across the image in steps of 1, and for each step, the dot product is calculated.
For stride = 1 and number of channels = 2
For multi-channel convolution operation, the number of image channels and the number of filter channels should be equal.
The resulting dimensions of the convolution operation can be calculated using the following equation:
Resultant Height or Width = ((Image Height or Width-Filter Height or Width)/stride) + 1
Why is convolution needed?
The convolution operation helps un-cover useful features of an image by selectively increasing and decreasing the pixel intensities. These useful features help distinguish one image from the other, thus making the task of image recognition much more efficient.
For example, in the first example above, the convolution operation detects left-leaning diagonal lines. The second example detects left and right-leaning diagonal lines. The following examples help detect vertical and horizontal lines in the Kannada digit 9(“ombatu”).
Implementing it using NumPy
Before implementing the convolution operation, I would like to initialize my convolutional filters. The convolution operation would occur between a 1X1X28X28 image and 2 filters of dimensions (1X5X5) each. The result of this operation would be an image of dimensions (1X2X24X24).
import numpy as np
conv1 = np.random.randn(2,1,5,5) * np.sqrt(1. / 5)
The following is the pseudo-code for the naive implementation of the convolution operation.
Let N be the number of images
- Create resultant matrix (R) of zeros of dimensions (NX2X24X24)
- Choose a given image
- Choose a filter. Let this be the ith filter. This will have the dimensions (1X5X5)
- From the chosen image, select a rectangular portion of size (1X5X28)
- From this rectangular portion, consecutively select portions of sizes (1X5X5), horizontally.
- Take dot products of the chosen filter and the (1X5X5) image portion and append the result to matrix R.
- Repeat step 4 with the next (1X5X28) portion derived by shifting the filter by the stride amount.
- Repeat steps 5 & 6 until the whole image is covered.
- Repeat steps 3 to 8 for all images
The code is as follows
Implementing convolution operation using im2col
Another way to implement convolution is to convert each stride of the convolutional filter over an image, into a column of a matrix.
A three-channeled image, with a (3X2X2) convolution filter with stride =1, has the following im2col representation
Let the im2col matrix be known as X_im2col. Then the calculation is as follows
- Flatten convolutional filter across the number of filters. Let this be known as conv1_flatten
- Calculate C = conv1_flatten@X_im2col, where @ is matrix multiplication.
- Reshape C to fit the resulting shape of the matrix
im2col implementation together with the convolution result can be coded the following way.
Sanity Check
Convolution between
X = np.array([[1,0,0],[1,2,3],[3,4,5]])
X = X.reshape(1,1,3,3)
and
conv1 = np.array([[1,0],[0,1]])
conv1 = conv1.reshape(1,1,2,2)
should result in a matrix that has the shape (1,1,2,2) and the following entries
(1*1)+(0*0)+(0*1)+(2*1) = 3
(1*0) + (0*0)+(2*0)+(3*1) = 3
(1*1)+(2*0)+(3*0)+(4*1) = 5
(2*1)+(3*0)+(4*0)+(5*1) = 7
The output of the normal convolution is
array([[[[3., 3.],
[5., 7.]]]])
& the output of the im2col convolution is
array([[[[3. 3.]
[5. 7.]]]])
Note:
The entries of the convolutional filters are randomized at the start. For every batch of data, the entries are gradually adjusted via backpropagation in order to minimize the loss function. The loss function used here is the cross entropy loss, which is common for classification problems.
Feedback
Thanks for reading! If you have any feedbacks/suggestions please email me at padhokshaja@gmail.com . I will try my best to get back to you.
Resources that I referred to
Next Post
ReLU,Max Pool, Softmax