4 Pictures of Matrix Multiplication

Joshua Pickard
Geek Culture
Published in
8 min readFeb 1, 2022

Computer graphics, deep learning, whatever device you are reading this on all perform matrix multiplication. With the amount of data generated and stored in the digital world, matrix multiplication is required to efficiently process and store information. This article describes 4 simple and powerful ways to think about matrix multiplication.

Image by Shutterstock

Understanding different way to think about matrix multiplication is helpful for anyone that is interested in data science or linear algebra, wants to optimize how they use computers, or wants to think about the world differently.

Why Matrices

Beyond being interesting, matrices are incredibly useful to work with. On the one hand, they are an incredible useful tool for storing, organizing, and representing data. A matrix is essentially a spreadsheet of numbers. On the other hand, matrices are used as functions that can relate information across multiple domains. The basis of deep learning is using matrix multiplication to construct functions of interest. This second use of matrices as functions is a little more abstract, but it will become clearer throughout the different representations of matrix multiplication.

If matrices represent data or functions, then matric multiplication represents information that can be found by relating multiple data sources or functions. Understanding how matrix multiplication manipulates whatever data or function you are working with can help you think about new ways to work with and solve your problems.

A Simple Mathematical Background

The amount of math required to read this article is relatively minimal. It can be summarized as follows.

A matrix is a grid of numbers that are used to store all sorts of information. A m×n matrix has m rows and n columns. In order for 2 matrices to multiply, the first matrix must have as many columns as the second matrix has rows. For example, a m×n matrix can be multiplied with a n×p matrix, giving a product that is the size m×p.

Image by Kahn Academy

Linear Transformations

A linear transformation or linear transformation is a special type of function that only uses addition and multiplication. These transformations are nice to work with because almost anyone can do addition and multiplication, but they are also very powerful tool. Matrices and linear transformations are synonymous, since every matrix is a linear transformation and every linear transformation can be written as a matrix.

Since a matrix is a linear transformation, the product of matrix multiplication, a new matrix, is also a linear transformation. The properties of the product can be studied by considering the properties of each matrix and thinking about the multiplication in the following 4 ways.

4 Pictures of Matrix Multiplication

The remainder of the article will explain these 4 methods. From now on, consider the multiplication problem A×B=C, where A and B are matrices that multiply and C is the product. Generally summarized with:

1. Elementwise: calculate each element in C by doing a series of element by element multiplications with the values from A and B. This is how most people learn to do matrix multiplication.

2. Column-wise: calculate each column of C by taking a linear combination of columns in A.

3. Row-wise: calculate each row of C is a linear combination of rows in B.

4. Matrix Sum: calculate C by summing outer products of columns of A with rows of B.

Elementwise Multiplication

This is typically this first approach taught for matrix multiplication. To compute C, we need to calculate each element C(i,j), where i is the row and j is the column of the element. C(i,j) is calculated by multiplying or taking the dot product of the ith column of A by the jth row of B.

Dot Product

The dot product is a very simple operation that can be applied to 2 matrices of the same size. Typically, this operation is typically done on a vector, or a matrix column or row. For a vectors v and u, both of which have n elements, the dot product or multiplication of u×v is calculated by multiplying the ith element in u times the ith element in v, for all values of i, and summing up the products.

When doing this method by hand, we typically think to circle a row of the first matrix and a column of the second to compute an element in the product.

Image by mathisfun.com

In the figure above, the dot product is computed: 1×7+2×9+3×11 = 58.

Fortunately, NumPy has a built in function np.dot to calculate the dot product between a row and column.

def elementwise_multiplication(A, B):
C = np.zeros((len(A),len(B[0])))
for i in range(len(C)):
for j in range(len(C[0])):
C[i,j] = A[i,:].dot(B[:,j])
return C

In the code above, A[i,:] is the ith row of A and B[:,j] is the jth column of B. The double for loop iterates over each value in the range of i and j, which are set by the size of the output matrix C.

This function has some properties that will be similar in the following sections. C is initialized to be a matrix of all zeros. Its size is set to have the number of rows in A and the number of columns in B. All code in this article can be found and run in this Google Colab notebook.

While this method is easy to learn, it doesn’t provide much insight into how the rows of A relate to the columns of B or how either of these relate to the structure of C. The next 2 methods give more insight into this.

Column Picture

Column-wise multiplication relies on thinking about matrix vector multiplication, and is almost identical to the row picture framework. I wrote an article on matrix vector multiplication recently that explains in detail 2 different ways to think about it, but I will also give a short description for what is needed here as well.

Consider the problem of multiplying a matrix by a column vector on the right, as seen in the diagram below. From the perspective of a matrix, the product of the multiplication will be a linear combination of the columns of the matrix. This is seen in the figure below, where the columns are multiplied by coefficients from column vectors and then added together to give the final product.

Image by Eli Bendersky on thegreenplace

From the perspective of linear transformations, a m×n matrix is a function to map vectors with n elements to vectors with m elements. In such a matrix, there are n column vectors, which span the column space, each with m terms. The product the matrix function to be a vector with m elements, and since a matrix is a linear transformation, the product will be a linear combination of the n columns. The vector being multiplied contains the coefficients for the linear combination of the n columns of the matrix.

From both the matrix and linear transformation perspective, the product is a linear combination of the columns of the matrix. The code below performs matrix matrix multiplication based on this way of thinking.

def columnwise_multiplication(A,B):
C = np.zeros((len(A),len(B[0])))
for column in range(len(B[0])):
C[:,column] = A.dot(B[:,column])
return C

The above code loops over the columns of C, with the term C[:,column], calculating them one at a time. Each column is calculated by multiplying a column of B, B[:, column], by the entire matrix A, which takes a linear combination of the rows in A.

Row Picture

This process is identical to the column picture with the exception that it considers left multiplication rather than right multiplication. Using this perspective, the product C will have rows that are linear combinations of the rows in matrix B. The columns of A will be the coefficients of the linear combinations. The below figure is a great example of what this looks like.

Image by Eli Bendersky on thegreenplace

This diagram shows how to product C will have rows that are linear combinations of the rows in B. This diagram and the one above are nearly identical. The only difference is working in the column versus the row space.

The code for this looks very similar to the code for the column picture.

def rowwise_multiplication(A,B):
C = np.zeros((len(A),len(B[0])))
for row in range(len(A)):
C[row,:] = B.T.dot(A[row,:])
return C

This code is set up and run very similar to the code above it. One thing to note is that in order to apply left multiplication of a row in A, A[row,:] with the matrix B, the transpose of B is right multiplied with the row of A. These operations are equivalent.

Matrix Sum Picture

This is perhaps the least intuitive representation of matrix multiplication, but it gives a great intuition for how to think about concepts such as the Singular Value Decomposition or Principal Component Analysis, 2 very powerful techniques in data science and linear algebra.

In the first method, elementwise multiplication, a row of A was multiplied by a column of B to give a single element in C. In this method, a column of A is multiplied by a row in B to give a matrix the size of C. This operation is sometimes called the outer product. Since the size or shape of the vectors being multiplied changes, the size of the product will change. Adding up the many matrices that are the size of C will give a final value for the product.

This process can be tedious to perform by hand, but the code for it is relatively simple.

def matrix_sum_multiplication(A,B):
C = np.zeros((len(A),len(B[0])))
for n in range(len(A[0])):
C += np.dot(np.atleast_2d(A[:,n]).T, np.atleast_2d(B[n,:]))
return C

In the above code, A[:,n] and B[n,:] represent a column and row of A and B respectively. The function np.atleast_2d is used to make the row and column vectors appear being 2D in NumPy rather than 1D so that np.dot will work as expected. The output of np.dot in this case is a matrix the same size as C, so it is added to C with +=.

If you made it this far, hit the clap button or give me a follow. I’m new to Medium, and trying to crank out some content about how I think about math, data science, and computers. Follow me if that’s your sort of thing.

--

--

Joshua Pickard
Geek Culture

Computer Science and Bioinformatics @ University of Michigan. Website: https://jpickard1.github.io/ Twitter: @JoshuaPickard_