Linear algebra for machine learning — part 1

Patrick Stewart
Patrick’s notes
Published in
6 min readJan 8, 2022

You need to learn linear algebra to truly understand the wonderful world of machine learning. This is part 1 of a number of articles on the subject exploring the linear algebra concepts that are most relevant to machine learning and data science. If you want to support this content then please subscribe to me and Patrick’s notes.

Figure 1: Linear algebra is essential to understand machine learning algorithms

Unfortunately or not so unfortunately, a good understanding of linear algebra is essential to master most machine learning and deep learning algorithms (some argue it’s less important if you are just beginning your journey). If you have already studied the subject in depth, then feel free to skip these articles but if you need a refresher or are keen to study machine learning in depth then you’re in the right place. Please note that these articles will not be an exhaustive study of the subject but will explain the key things to know.

1.1 — What are scalars, vectors, matrices and tensors?

Linear algebra involves studying several types of mathematical objects.

Scalars — A scalar is a physical quantity that is completely described by its magnitude. Scalars are described by real numbers and are usually positive.

Vectors — A vector is an array of numbers, arranged in order. We can identify each individual number by its index in that ordering.

This can be shown like so below:

We can think of vectors as identifying points in space, with each element giving the coordinate along some different axis.

In geometry, vectors represent a movement from a point. Looking at figure 2, the vector from A to B would be:

That is a 6-point movement in the x direction and 4-point movement in the y direction.

Figure 2: Example geometry vector (source — https://mathbitsnotebook.com/Geometry/Transformations/TRVectors.html)

Matrices — A matrix is a 2D array of numbers, so each element is identified by two indices instead of just one (although this does vary). We usually give matrices uppercase variable names with bold typeface, such as A. If a real valued matrix A has a height m and a width of n, then we say that:

A full matrix can be shown as follows:

Tensors — There are instances, where an array with more than two axes are required. In the general case, an array of numbers on a regular grid with a variable number of axes is known as a tensor.

We denote a tensor like so:

1.2 — Basic operations

Transpose

Transpose is essentially the production of the mirror image of a matrix, along a 45-degree axis, running down and to the right. This converts an mxn matrix into a nxm matrix. An example of a matrix transpose is shown below:

Adding matrices to each other

We can add matrices to each other as long as they are the same shape. This is demonstrated like so:

Scalar multiplication

We can multiply a scalar to a matrix like so:

Scalar addition

We can add a scalar to a matrix like so:

1.3 — Multiplication

In machine learning, understanding the multiplication of matrices and vectors is paramount and one of the most important operations in the subject.

Vector multiplication

There are two types of vector multiplication: dot product and Hadamard product.

Starting with Hadamard product, this is elementwise multiplication where one vector is outputted. This can be shown like so below:

Meanwhile, the output from a dot product is simply a scalar value (i.e. just one value).

Matrices multiplication

In order to multiply two matrices, then one matrix must have the same number of columns as the second has rows with the output forming a matrix with the number of rows of the first matrix and the number of columns of the second matrix.

We apply the Hadamard product to form a new matrix based on the dot product of the two original matrices.

Perhaps this is confusing to look at? Well, there is a simple way of implementing this multiplication procedure. The second matrix can be split into different column vectors and multiply the matrix by each of these vectors. This can be shown algebraically below:

1.4 — Properties of matrix multiplication

Not commutative

Matrix multiplication is not commutative. What do we mean by this? Well, if we multiply two matrices together A and B, this is not the same as multiplying B by A.

Distributive

Matrix multiplication is distributive. What do we mean by this? Well algebraically this can be shown as follows through three separate matrices A, B and C:

Associative

Matrix multiplication is associative. What do we mean by this? Again using our three separate matrices, this can be shown algebraically below:

Identity matrix and matrix inversion

This one takes a little longer to explain. Linear algebra offers a powerful tool known as matrix inversion.

So how can we define an identity matrix? An identity matrix is a matrix that does not change any vector when we multiply that vector by that matrix. For example:

The concept of an identity matrix is important as we can’t divide matrixes but we can multiply a matrix by what is known as its inverse to form its identity matrix.

This can be shown algebraically below like so:

1.5 — Summary

In this article, we have discussed the basics of linear algebra across scalars, vectors, matrices and touched on the idea of tensors. In the next part, we will move into some of the more complex points around linear algebra such as eigendecomposition, singular value decomposition and norms.

If you want to support the continuation of this content then please give this article a clap and subscribe to me and Patrick’s notes.

References

https://www.deeplearningbook.org/contents/linear_algebra.html

https://www.khanacademy.org/math/linear-algebra

https://mathbitsnotebook.com/Geometry/Transformations/TRVectors.html

--

--