An Illustrated Explanation of Performing 2D Convolutions Using Matrix Multiplications
Introduction
In this article, I will explain how 2D Convolutions are implemented as matrix multiplications. This explanation is based on the notes of the CS231n Convolutional Neural Networks for Visual Recognition (Module 2). I assume the reader is familiar with the concept of a convolution operation in the context of a deep neural network. If not, this repo has a report and excellent animations explaining what convolutions are. The code to reproduce the computations in this article can be downloaded here.
Explanation
Small Example
Suppose we have a single channel 4 x 4 image, X, and its pixel values are as follows:
Further suppose we define a 2D convolution with the following properties:
This means that there will be 9 2 x 2 image patches that will be element-wise multiplied with the matrix W, like so:
These image patches can be represented as 4-dimensional column vectors and concatenated to form a single 4 x 9 matrix, P, like so: