Simple CNN using NumPy Part V (Back Propagation Through Max pool Layer & Convolutional Filter)
In the previous blog posts , I tried to explain the following
- Introduction of CNNs and Data Processing
- The Convolution Operation
- ReLU, Maxpooling and Softmax
- Backpropagation through fully connected layers
In this post, I will try to cover back propagation through the max pooling and the convolutional layers. We had worked our way through calculating the gradients till the first fully connect layers. Let’s re-visit the architecture once more through the following hand drawn image
We have calculated the gradients at the flattened layer so far.
We need to now calculate the gradients at the rest of the neural network.
Calculating gradients at Max pooling Layer
Given that the first fully connected layer is a reshaped version of the max pooling layer, we just need to reshape our gradient matrix at the first fully connected layer, back to the shape of the max pooling layer. The code snippet for the same is as follows
delta_maxpool = delta_0.reshape(X_maxpool.shape)
Calculating gradient at Convolutional Layer
The max pooling operation, uses filters of fixed sizes to extract maximum pixel values in regions of the images that have the same size as the filter. This filter is moved across the image using two user defined parameters : stride and filter size.
To calculate the gradients at the convolutional layer, we need to move each gradient element back to the positions in the convolutional layer, from where the maximum pixel values were extracted.
delta_conv = np.zeros(X_conv.shape)
for image in range(len(max_indices)):
indices = max_indices[image]
for p in indices:
delta_conv[image:image+1,p[1],p[2],p[3]] = delta_maxpool[image:image+1,p[5],p[6],p[7]]
delta_conv = np.multiply(delta_conv,dReLU(X_conv))
Calculating gradient of convolutional filter
Now that we have found the gradient of the convolutional layer, we need to calculate the gradient of the convolutional filter. This will then be used to optimize the filter in each learning step.
Simple Pseudo-Code to calculate error
Let G be the gradient matrix for the convolutional layer. This has a dimension of (1,2,24,24). Let I be the input image of shape (1,1,28,28).
- For a given channel C in G, pick an element
- Use 5X5 filters with stride=1 to create 5X5 chunks of image input I.
- Multiply the chosen gradient element with the consecutive chunks and add them up.
- The resultant matrix after the consecutive addition operation is the gradient associated with the channel C in the convolution filter.
- Repeat steps 1 to 4 for the rest of the channels in G.
The above pseudo code was for a single image , we need to repeat that for the entire batch.
The following GIF from HackMD explains the process better
Given that we have to find the gradient with respect to the convolutional filter by pixel, this process will take a lot of time and effort. One way to combat this is to use the im2col() function.
Pseudo Code for faster computation of gradient of convolutional filter
Let G be the gradient matrix at the layer after the convolution operation . This has a shape of (1,2,24,24). Let the input to the convolutional filter be defined by I . This has a shape of (1,1,28,28).
- Convert the input image to im2col format ; im2col matrix is a 2D matrix where each column is a flattened vector of elements covered in a single stride of the convolutional filter.
- Reshape the error matrix G to a 2D matrix ; each row is a flattened vector of the error in each channel.
- Multiply these two matrices and reshape the results.
The below code snippet presents the function to reshape the gradient matrix
def error_layer_reshape(error_layer):
test_array = error_layer
test_array_new = np.zeros((test_array.shape[1],test_array.shape[0]*test_array.shape[2]*test_array.shape[3]))
for i in range(test_array_new.shape[0]):
test_array_new[i:i+1,:] = test_array[:,i:i+1,:,:].ravel()
return test_array_new
Code snippet to find the gradient with respect to convolutional filter is as follows
X_batch_im2col = im2col(X=X_batch,conv1=conv1, stride=1, pad=0)
delta_conv = np.random.rand(10,2,24,24)
delta_conv_reshape = error_layer_reshape(delta_conv)
conv1_delta = (delta_conv_reshape@X_batch_im2col.T).reshape(2,1,5,5)
Helpful resources
Feedback
Thanks for reading! If you have any feedback/suggestions please feel free comment below/ email me at padhokshaja@gmail.com