Math for AI — Matrix Operations (part 1)

7 min readNov 20, 2023

Matrix operations are at the core of modern deep learning architectures and having a strong grasp of this will help you greatly in understanding what goes “under the hood” of these architectures.

This is part of a multi-blog post series which will go into Matrix Operations from theory to practice. The other blog posts of this series will be referenced below:

Math for AI — Matrix Operations (part 2)

So what’s a matrix in the first place? What is it useful for?

A matrix is a rectangular table which is used for storing information, such as numbers. All matrices (plural form of matrix) consists of a number of rows and a number of columns. Below are 3 examples of matrices:

The first matrix consists of 2 rows and 2 columns so we would call this a 2x2-matrix. The order is (# of rows)x(# of columns).
Let’s see if we understood this, the second matrix consists of 2 rows and 3 columns, therefore it is a 2x3-matrix.
The last example is an interesting case, where the matrix consists of only 1 row and 1 column. This means that this matrix is a 1x1 matrix, or equivalently a scalar. A scalar is any single number (e.g., 1, 2, 34, 42, etc.).

Another special instance of a matrix is a vector which can either consist of multiple rows and 1 column (column vector) or 1 row and multiple columns (row vector). Below are examples of a column and a row vector:

Column vector (left) & Row vector (right)

We can imagine the column vector from above as x, y and z coordinates of a point in 3D space (x=1, y=5, z=7). The vector on the right could also represent items in a fridge, for example 4 apples, 3 eggs, 1 banana and 2 milk cartons.

Let’s have a look at how we can reference points in a vector (also called indexing). The column vector above has 3 elements: 1, 5 and 7. Let’s call this column vector x (the bolded x represents a vector). The first element, x_1, has a value 1, the second element x_2, has a value 5 and the third element, x_3, has a value 7. (Note that when we write out math the starting index is 1, opposed to programming where the index starts at 0!)
We can do the same for the row vector (y) above with 4 elements. y_1=4, y_2=3, y_3=1 and y_4=2.
(For matrices, there is a slightly more advanced method of indexing, but we will cover this in the next post!)

Today we will start with scalar and vector operations which will give us a foundation for the next lesson, which will be matrix operations.
We will cover:
1. scalar and scalar operations (very quickly)
2. scalar and vector operations
3. vector and vector operations (addition and subtraction)
4. vector and vector operations (multiplication)

1. Scalar and scalar operations:

This is just a fancy way of saying operations with numbers (e.g. 4+5, 3/9, 1*1, 10–6, etc.). We will take the +, -, * and / operators and look at how we can apply it for the other type of operations.

2. Scalar and vector operations:

The operations which are allowed for scalar and vector operations are multiplication and division (in programming subtraction and addition is supported, but we will cover this in another post). Let’s imagine a column vector (x) consisting of the values 1, 0, and 3 and a scalar (b) = 4. The product of x and b is a multiplication of each entry of x multiplied by b. This will result in a column vector with the same shape (3 rows and 1 column) but with its entries scaled to 1*4, 0*4 and 3*4 respectively. Check out the picture below to visualize the steps.

This works exactly the same way for division as well, let’s define y with entries 6, 2, and 8 and c = 2:

3. Vector and vector operations (addition and subtraction):

For vector and vector operations we need to be a bit more careful on how we perform the operations.
Let’s first look at addition and subtraction. For these operators we need to ensure that the shape of the vectors match each other. For example vector x has a shape of 5x1 and vector y has a shape of 5x1, a summation or subtraction of x and y is possible. Let’s say vector z has a shape of 4x1 and vector j has a shape of 1x5 then we know that it is not possible to add or subtract: x with z, x with j, y with z, y with j and z with j. Let’s see why:

We see that we can add each entry of x and y, because both vectors have 5 entries. For z, with only 4 entries, there isn’t a 5-th entry of z to add or subtract into x or y. For the vector j (1x5), at first glance we would say that it doesn’t have the same shape as x or y (5x1), which is correct! Fortunately we have a solution for this, and it’s called the transpose operator. Transposing a vector (denoted with a superscript T), swaps the rows and columns, for example let’s take the j vector:

Now, the transpose of j is the same shape as x and y (5x1) so now we can add and subtract these vectors freely. For example let’s subtract y from transpose of j:

4. Vector and vector operations (multiplication):

The same rules for vector shapes apply for multiplication too, but with a little twist! For vector and vector operations we have 2 general concepts: dot product (or inner product) and cross product (or outer product). Typically AI practitioners will say dot product and cross product, while mathematicians typically will refer to these as inner and outer product, although they mean the same thing!

Let’s start with the dot product. The result of a dot product between 2 vectors is a scalar (aka a single number). For a dot product between 2 vectors, x and y, to work the number of columns in x needs to be the same as the number of rows in y. Take x to be a 1x3 row vector and y to be a 3x1 column vector. The dot product, <x, y> (this is how dot products are written) will be a 1x1 vector, in other words a scalar value. See that the size of the dot product is determined by the # of rows in the first vector and the # of columns in the second vector.
First let’s have a look why the dot product is a scalar conceptually. The dot product is defined as the sum of the products of each i-th element of both vectors. The mathematical expression:

In words: The dot product of the vectors x and y equals the sum of the product x_i and y_i over each i-th element in x and y. (This is why the # of columns in x must be the same as # of rows in y.)

Now an example with the following x and y vectors:

What would happen if we wanted to take the dot product of y and x instead of x and y? This would result in the cross product, and not a dot product. Let me explain, taking the same x and y vectors, we would like to take the product of a 3x1 column vector and a 1x3 row vector. The result of this cross product is a 3x3 matrix (# of rows from first vector and # of columns from the second vector). Note that since the columns of the first vector matches the rows of the second column the other dimensions can be any number! The mathematical expression in general is:

y is a column vector with n # of rows and x is a row vector with m # of columns. n and m do not have to be the same number!

Now using the same example as above:

Recap:

Today we learnt what matrices are, and that vectors and scalars are types of matrices with special conditions. Additionally, we have built a solid foundation of scalar and vector operations.

In the next lesson we will use this knowledge and expand it to matrix operations.