The below post demonstrates the use of convolution operation for carrying out the back propagation in a CNN.
Difference between Convolution Operation and Correlation.
Let’s consider the input and the filter that is going to be used for carrying out the convolution as given above.
Then the correlation of the filter matrix with the input matrix is described in the figure below
Now, The convolution of the filter matrix with input image is same as rotating the filter by 180 degrees and then carrying out the correlation of the rotated filter matrix with the input matrix.
As can be seen from the above image the convolution operation is same as that of the correlation operation but with rotated filter.
Forward and Backward Propagation using Convolution operation.
Note : To derive the equation of the gradients for the filter values and the input matrix values we will consider that the convolution operation is same as correlation operation, just for simplicity.
Therefore, The convolution operation can be written as described in the figure below.
It can be visualized in the figure below.
Now, to calculate the gradients of filter ‘F’ with respect to the error ‘E’, following equations needs to solved.
which evaluates to
If we look closely this above equation can be written in form of our convolution operation.
Similarly we can find the gradients of the input matrix ‘X’ with respect to the error ‘E’.
Now, the above computation can be obtained by a different type of convolution operation known as full convolution. In order to obtain the gradients of the input matrix we need to rotate the filter by 180 degree and calculate the full convolution of the rotated filter by the gradients of the output with respect to error, As represented in the image below.
The full convolution can be visualized as carrying out the procedure as represented in the figure below.
Hence both the forward and backward propagation can be performed using the convolution operation.
For calculating the gradients of the pooling and Relu layers the gradients can be calculated by following the same procedure of using chain rule of derivatives.