Math for AI — Matrix Operations (part 2)

9 min readNov 24, 2023

Matrix operations are at the core of modern deep learning architectures and having a strong grasp of this will help you greatly in understanding what goes “under the hood” of these architectures.

This is part 2 of a multi-blog post series which covers Matrix Operations from theory to practice. The other blog posts of this series will be referenced below:

Math for AI — Matrix Operations (part 1)

If you haven’t already, I would highly recommend you to look into the previous posts (linked above)!

Okay let’s start then, today we will go head first into matrix operations.
The structure will be as following:
1. 2 dimensional, 3 dimensional, …, 1000 dimensional (??) matrices
2. Matrix (addition and subtraction)
3. Matrix (multiplication and division)
4. Matrix and beyond!

1. 2 dimensional, 3 dimensional, …, 1000 dimensional (??) matrices

In the last section we covered scalars and vectors which are instances of 0 dimensional and 1 dimensional matrices. Then what are 2 dimensional (2D) and 3D matrices? Can we have more than 3 dimensions? What are they useful for?
2D matrices can look as following:

Example of a 2D matrix with 3 rows and 2 columns

and 3D matrices as following:

Example of a 3D matrix with depth of 3, 3 rows and 3 columns

To answer if we can have matrices with more than 3 dimensions, the answer is YES! Below is an example of a 4D matrix:

Example of a 4D matrix with 2 instances of a 4-th dimension, depth of 3, 3 rows and 3 columns.

Ok, it’s getting quite hard to visualize them, but in principle you can have as many dimensions as you need (… or want)! Also, each dimension can represent any additional information which you need!

For 2D matrices, we could for example store what people have in their pantries, we can treat the rows as people and the columns as pantry items (it’s also possible to switch the rows and columns). Let’s say we only consider 3 people: John, Alex and Sam. Also, we live in a simple world with only 3 possible pantry items: eggs, cookies and baguettes. This represents a 3x3 matrix, as following:

We can now see what each person has in their pantry. How many cookies does each person have?
John has 1 cookie.
Alex has 7 cookies.
Sam has 6 cookies.

Now let’s take the 3 dimensional case for instance. A 3D matrix can be used to represent an image. The number of rows represents the height of an image, the number of columns represents the width of an image, and the depth represents the color channels of an image (e.g. RGB = 3 channels). Each number in this matrix represents a pixel value. Conceptually an image can be split (3D matrix) can be split into its 3 RGB channels:

RGB image (left) & R, G and B channels (aka 3 2D matrices) (right). The order of the dimensions can be defined in any way. In this case we can write the dimensions as 3x500x500 (color channels, image height, image width). 500x500x3 would also be a valid (image height, image width, color channels).

A 4D matrix, could for example store a movie. If you imagine a movie to be a collection of RGB images (3D matrices), the 4th dimension is time (or each consecutive image in the movie).

For the calculations we will only consider 2D matrices for now, but we will go into options for operations with 3+ dimensional matrices in the next post!

Say we have a matrix, B (capital letters denote matrices), with size 3x2 (3 rows and 2 columns). An element in B is denoted as b_i,j. Now what is i and j?
These are indexing elements, i is for indexing the rows and j is for indexing the columns. If we want to get the number on the 2nd row and 1st column of the matrix B, the position of this number in the matrix is referred to as b_2,1. Note that for B we only have 3 rows and 2 columns (total 6 numbers).
Which element represents the 3rd row and 2nd column of B?

2. Matrix (addition and subtraction)

If you remember from our last lesson, addition and subtraction are only defined for vectors with the same dimensions, this rule applies for matrices as well. Say we have 3 matrices, B, C and D. Matrices B and C are 2x4 matrices. Matrix D is a 4x2 matrix.

Since we know B and C have the same number of rows and columns we can directly sum them. That is, we can sum the elements at the same row and column position from each matrix. The general equation for any 2 matrices which have the same number of rows (n) and columns (m) is shown below:

Adding (or subtracting) 2 matrices with the same sizes to each other. The addition/subtraction of 2 matrices preserves the number of rows (n) and the number of columns (m).

The calculation for B + C is as following:

Now let’s find out how we can get D (4x2 matrix) to the same shape as B and C (2x4 matrices). We see that if we swap the rows and columns of D we can change the matrix size from 4x2 to 2x4. This is the transpose operation for matrices. For vectors this was easy, but how do we do this for matrices?

Let’s take a step back and consider the vector transpose example from the previous lesson. Our vector j, is a row vector with 5 elements: 0, 4, 3, 1, 9. We know that the vector j is also a matrix of size 1x5, so let’s rephrase the vector j as a matrix, A. If we want to index A we would specify the row and column (e.g. a_1,2 = 4).
Now, if we take the tranpose of A (the result is the same as taking the transpose of j), we convert the 1x5 matrix to a 5x1 matrix. What we have done here is that for each a_i,j in A we have swapped the values for i and j with each other (a_i,j → a_j,i). Let’s verify that this makes sense. a_1,2=4 is an element in A (a row vector), when we tranpose A (a column vector) the value 4 is now in the new position a_2,1.

Here is a general expression for the transpose operation of any matrix:

Tranpose operation (note that e_1,1 remains in the same position!).

This might be quite hard to grasp, so here’s a shortcut. Imagine you place a mirror at an angle on the top right corner of your matrix. The tranpose of the matrix is the reflection on the mirror of the original matrix. Here’s a picture to visualize this:

Original matrix (bottom left), mirror (black line) and transposed (reflected) matrix (top right). Now it’s clear to see that if we transpose the transposed matrix we get the original matrix again!

Okay, that was a lot to take in! Take a deep breath in,… and now out…, alright let’s continue.

Let’s tranpose D:

Matrix D (left) and tranpose of D (right)

Now let’s subtract the transpose of D from C (both are 2x4 matrices now).

3. Matrix (multiplication and division)

For matrices we can either scale the matrices with a scalar either by multiplication or division, or perform the the equivalent of a “dot product” between 2 matrices. The cross product operation is not defined for matrices.

Let’s start with scaling matrices. This is very easy, if you multiply or divide a matrix by a scalar, the resulting matrix is the same size as the original matrix and each element of the original matrix is either multiplied or divided by the scalar. An example of a scalar multiplication of a matrix:

Now let’s move to matrix and matrix multiplication. This is the fundamental operation of neural networks present in modern deep learning methods. If you have a solid understanding of this operation you will be able to easily design meaningful neural networks which do what you want it to do! Ok, maybe not that easy, but easier, so let’s dive in!

There are 3 characteristics of matrix multiplication:
1. Much like in the dot product of 2 vectors, 2 matrices can only be multiplied together if the first matrix’s # of columns match the # of rows of the second matrix.
2. The resulting matrix of multiplying 2 matrices, has the # of rows of the first matrix and the # of columns of the second matrix.
3. The elements of the resulting matrix (e.g. e_i,j) are the vector dot products of the i-th row of the first matrix and the j-th column of the second matrix.

Characteristic 1 tells us that we can only multiply matrix A and B if the # of columns of A are the same as # of columns of B.
Characteristic 2 tells us that the size of the resulting matrix, C, of the matrix multiplication of A and B is (# of rows in A x # of columns in B).
Characteristic 3 may be hard to grasp at first, but let’s go through it slowly.

Let’s say that A * B = C (A, B and C are matrices). A is a 3x2 matrix, B is a 2x4 matrix and C is then a 3x4 matrix.
a_i are the row vectors of A and b_j are the column vectors of B. To calculate c_i,j (a scalar value), we take the dot product of a_i (row vector) and b_j (column vector). Let’s express this as a general equation:

Matrix A consists row vectors a_i, matrix B consists of column vectors b_j. (The # of columns of a_i matches the number of rows of b_j.) The multiplication of A and B results in a matrix C with entries c_i,j. c_i,j is a result of a dot product of a_i and b_j.

Let’s take the matrices C and D which were defined in the matrix addition section. We want to perform the following matrix multiplication: C * D

Lets decompose C into row vectors and D into column vectors:

Matrix C and D represented as row and column vectors respectively

Now let’s perform the calculation:

All clear? If it’s still confusing, I would like to ask you to think about which part is confusing. If it’s the dot product operation of the vectors, I suggest you go to the previous lesson and get a better feeling for it. If the confusion is coming from the matrix multiplication, lucky for you, there is another way of viewing this operation:

Let’s start by defining the resulting matrix of C * D:

Matrix multiplication of C and D results in a new matrix corresponding to the dot product of the row vectors of C with the column vectors of D. The new matrix will have a size of 2x2 due to the # of rows in C and # of columns in D.

Let’s visualize each calculation:

The position corresponding to the **first row** and **first column** of the new matrix is the dot product of the **first row vector of C** and the **first column vector of D**.

The position corresponding to the **first row** and **second column** of the new matrix is the dot product of the **first row vector of C** and the **second column vector of D**.

The position corresponding to the **second row** and **first** **column** of the new matrix is the dot product of the **second row vector of C** and the **first** **column vector of D**.

The position corresponding to the **second row** and **second** **column** of the new matrix is the dot product of the **second row vector of C** and the **second** **column vector of D**.

That concludes our second lesson in Math for AI — Matrix Operations. I would suggest that you practice these operations by hand with matrices which you define. Remember that in matrix multiplication the shapes of the matrices matter a lot. Getting comfortable with this will help you immensely when you start to create your own deep learning neural networks. This is because deep learning neural networks consist of 100’s or 1000’s of matrix multiplications and a very common mistake which developers of deep learning networks run into is matrix shape mismatches!
Don’t worry, you will not need to calculate all 1000 of matrices by hand, but knowing which matrices can be multiplied together will save you some headache!

4. Matrix and beyond!

There are a lot of other properties and operations which can be performed on matrices and it’s quite easy to get into a rabbit hole of wanting to know all the theory before jumping into practice! Don’t worry though, in these last 2 lessons we have learnt very important concepts which will allow us to start creating our own deep learning networks! Easy right?
Nevertheless, this lesson series will continue and we will build up our knowledge on what matrices can do for us and what they can tell us about our data.

Until the next lesson then!

Math for AI — Matrix Operations (part 2)

1. 2 dimensional, 3 dimensional, …, 1000 dimensional (??) matrices

2. Matrix (addition and subtraction)

3. Matrix (multiplication and division)

4. Matrix and beyond!

Written by Enrique Lopez