Simple CNN using NumPy Part V (Back Propagation Through Max pool Layer & Convolutional Filter)

Pradeep Adhokshaja
Analytics Vidhya
4 min readJun 20, 2021

--

In the previous blog posts , I tried to explain the following

In this post, I will try to cover back propagation through the max pooling and the convolutional layers. We had worked our way through calculating the gradients till the first fully connect layers. Let’s re-visit the architecture once more through the following hand drawn image

Architecture of CNN

We have calculated the gradients at the flattened layer so far.

We need to now calculate the gradients at the rest of the neural network.

Calculating gradients at Max pooling Layer

Given that the first fully connected layer is a reshaped version of the max pooling layer, we just need to reshape our gradient matrix at the first fully connected layer, back to the shape of the max pooling layer. The code snippet for the same is as follows

delta_maxpool = delta_0.reshape(X_maxpool.shape)

Calculating gradient at Convolutional Layer

The max pooling operation, uses filters of fixed sizes to extract maximum pixel values in regions of the images that have the same size as the filter. This filter is moved across the image using two user defined parameters : stride and filter size.

To calculate the gradients at the convolutional layer, we need to move each gradient element back to the positions in the convolutional layer, from where the maximum pixel values were extracted.

delta_conv = np.zeros(X_conv.shape)
for image in range(len(max_indices)):
indices = max_indices[image]
for p in indices:
delta_conv[image:image+1,p[1],p[2],p[3]] = delta_maxpool[image:image+1,p[5],p[6],p[7]]
delta_conv = np.multiply(delta_conv,dReLU(X_conv))

Calculating gradient of convolutional filter

Now that we have found the gradient of the convolutional layer, we need to calculate the gradient of the convolutional filter. This will then be used to optimize the filter in each learning step.

Simple Pseudo-Code to calculate error

Let G be the gradient matrix for the convolutional layer. This has a dimension of (1,2,24,24). Let I be the input image of shape (1,1,28,28).

  1. For a given channel C in G, pick an element
  2. Use 5X5 filters with stride=1 to create 5X5 chunks of image input I.
  3. Multiply the chosen gradient element with the consecutive chunks and add them up.
  4. The resultant matrix after the consecutive addition operation is the gradient associated with the channel C in the convolution filter.
  5. Repeat steps 1 to 4 for the rest of the channels in G.

The above pseudo code was for a single image , we need to repeat that for the entire batch.

The following GIF from HackMD explains the process better

Diagrammatic explanation of back propagation through convolutional filter

Given that we have to find the gradient with respect to the convolutional filter by pixel, this process will take a lot of time and effort. One way to combat this is to use the im2col() function.

Pseudo Code for faster computation of gradient of convolutional filter

Let G be the gradient matrix at the layer after the convolution operation . This has a shape of (1,2,24,24). Let the input to the convolutional filter be defined by I . This has a shape of (1,1,28,28).

  1. Convert the input image to im2col format ; im2col matrix is a 2D matrix where each column is a flattened vector of elements covered in a single stride of the convolutional filter.
  2. Reshape the error matrix G to a 2D matrix ; each row is a flattened vector of the error in each channel.
  3. Multiply these two matrices and reshape the results.

The below code snippet presents the function to reshape the gradient matrix

def error_layer_reshape(error_layer):
test_array = error_layer
test_array_new = np.zeros((test_array.shape[1],test_array.shape[0]*test_array.shape[2]*test_array.shape[3]))
for i in range(test_array_new.shape[0]):
test_array_new[i:i+1,:] = test_array[:,i:i+1,:,:].ravel()
return test_array_new

Code snippet to find the gradient with respect to convolutional filter is as follows

X_batch_im2col = im2col(X=X_batch,conv1=conv1, stride=1, pad=0)
delta_conv = np.random.rand(10,2,24,24)
delta_conv_reshape = error_layer_reshape(delta_conv)
conv1_delta = (delta_conv_reshape@X_batch_im2col.T).reshape(2,1,5,5)

Helpful resources

Feedback

Thanks for reading! If you have any feedback/suggestions please feel free comment below/ email me at padhokshaja@gmail.com

Next Post

Putting it all together

--

--

Pradeep Adhokshaja
Analytics Vidhya

Data Scientist @Philips. Passionate about ML,Statistics & hiking. If you like to buy me a coffee, you can use this link https://ko-fi.com/pradeepadhokshaja