Published in

Phi Skills

# Mathematics for Deep Learning: Matrix, Matrices Addition and Matrices Multiplication

What you will learn:

The main focus is on building intuitive understanding and being able to put it in practice. If you are interested in a deep dive in Linear Algebra here are a few awesome resources:

Rober Messer, Linear Algebra: Gateway to Mathematics

Originally, I planned to write a synthetic article presenting the most important Mathematical concepts for Deep Learning. However, as I started, I realized that, to someone unfamiliar with these concepts, it is more effective and quicker to focus on a single notion at once.
Therefore, I am going to write exclusively about matrices and matrices multiplication focusing on intuition and practical advice. In future articles, I am going to address other fundamental concepts such as function derivative and the law of large numbers.

## What is a matrix?

I love Matrix movies. But we are not going to talk about it here. Instead, I want to show you that matrices are not some sort of esoteric spell hiding dark secrets that only the geekiest of us can grasp.

Have you ever played one of these card games where you need a scoreboard? If so, I have a piece of good news: you already know what a matrix is and you can remember how it works at any time by reverse engineering a scoreboard.

As you can see in the image above, there is one column for each player and a row for each turn. The matrix is filled with values representing a specific player score at a precise turn. For instance, on the third turn, Daniel had 4 points.

According to Wikipedia:
“A matrix is a rectangular array of numbers or other mathematical objects for which operations such as addition and multiplication are defined”.
This definition of a matrix gravitates around three concepts:

1. Dimension: a fixed number of columns and a fixed number of rows.
2. Values: the values held by the matrix must be consistent. If some of the entries hold information about oranges and others about nicotine, your matrix will be of no use.
3. Operations: a tool-set of mathematical operations such as addition and multiplication.

As with everything in Mathematics, a matrix is an idea translated into a definition and represented with a notation.
The latter is straightforward, we just wrap box brackets or parentheses around numbers:

If you are not used to Mathematics notation, the image above might seem daunting at a first look. Let’s break it down together.

The characters with green and red underscores represent the entries of the matrix:

That notation comes super handy when we are manipulating huge matrices or when we want to stay general and express something for each matrix with m rows and n columns. In this case, we would say that the matrix has dimensions (m, n).

The following matrix represents the scoreboard we have seen in the previous section:

If another player were to join the game, we would have to add a column for that player to the scoreboard. The same would happen with the matrix: we would need to have 5 columns to represent the whole game.

To add two matrices together you need to perform the usual addition entry by entry as in the image below.

You might have noticed that we are adding two matrices with the same dimensions. This is extremely important with matrices. In Mathematics, an addition (or any other operation) is defined by the values you are adding. You can think of it as the word “bear”. In Finance, when someone says “bear” it usually refers to the stock market going down whereas in a forest it usually means someone spotted a bear and you better get the hell out of there as quickly as you can. What we need to remember here is that the meaning of a mathematical operation depends on its context.

Let’s look at a simple example to make sure we are all on the same page.

Suppose that we want to define addition for emojis. You clearly cannot use the addition you learn at school but you can always create a rule for it.

For instance, we could say the following:

`😃 + 🙁 = 😐`

Here our addition represents an average emotion. I.e. a happy face with a sad face gives a neutral face. But, we could be interested in counting the number of faces instead.

`😃 + 🙁 = 2`

Here adding two faces is equivalent to counting the number of faces. As you can see, it’s a totally different thing.

If you find this interesting, I invite you to check Giuseppe Peano axiomatization works.

To sum up, matrix addition is a two-step process:

1. Check that dimensions match i.e. that both matrices are of dimension (m,n)
2. Sum all the entries respecting their position in the matrix

What happens with multiplication?

## Matrices Multiplication

Matrices multiplication is a more involved operation. You could expect it works as the addition but it doesn’t.

If you ever did some high school Mathematics, you might remember when you used to solve equations with more than one variable like the following:

To solve that, you could isolate a variable first and then replace it in the other equations and then repeat the process.
It turns out Mathematicians are very clever and they realized we can define Matrix multiplication in a way that can automize the method to solve the above system of equations.

Matrix multiplication’s rules ensure that the following equation is equivalent to the system of equations above:

As you can notice, we multiply the matrix by the vector of unknown variables x, y, and z. Matrices can be seen as dictionaries that let you translate a language into another. In our example, this means we are translating the unknown variables to get the right-hand side of the equation. The columns of the matrix represent the language to be translated and the rows represent the language to be translated into.

Consequently, matrix multiplication is not commutative: you cannot switch the order of the factors and expect to end up with the same result. This means that you usually cannot say that a*b = b*a. You cannot do that because on the left you are translating from English to Chinese whereas on the right from Chinese to English. I am pretty sure that English and Chinese are not the same though.

Now that you have the intuition, I will give you the recipe to multiply matrices. Suppose A and B are two matrices with dimensions respectively (ma,na) and (mb, nb) and that you computing A * B.

1. You can perform the matrix multiplication if and only if na = mb. In other words, you can only perform if the number of columns of the left matrix is equal to the number of rows of the right matrix.
2. The result of the multiplication is a matrix with dimensions (ma, nb).
3. Each entry of the new matrix will be the sum of the product of the corresponding row in A and column in B.

These considerations might still seem obscure to you but after a quick example, everything will seem clearer.

Say we are multiplying the following matrices:

The matrix on the left has dimension (3, 2): three rows and two columns.
The matrix on the right has dimension (2,1): two rows and one column.

Therefore the number of columns of the first matrix is equal to the number of rows and we can perform the multiplication.

Now the easiest way by far to compute the multiplication is the following (use pen and paper):

The green and brown dotted lines show you which number you need to multiply at each step of the process. Then you repeat it for each line in the left matrix:

Finally, we end up with the desired result:

This example was very easy but what happens when the right matrix has more than one column?

As you see, we repeat the same process for every row of the left matrix and every column of the right one.

We end up with the following:

## Conclusion

In this article, we gave an informal introduction to matrices and their two most important operations. I tried my best to make an article that didn’t suppose any previous mathematical knowledge.
In a few minutes, you have learned:

1. What is a Matrix
3. Perform Matrix Multiplication

Just a few minutes to learn the single most important mathematical concept to understand how neural networks work.

Matrices can be used to parallelize computations. Remember the matrix multiplication above. The steps we compute are independent meaning that you can have different people or computers computing the results at the same time. If the matrix is small like the one we saw, the benefits are not huge but, in a neural network, you might find yourself handling matrices with millions of rows. Suppose that each computation takes one second. Computing one million computations sequentially would take one million seconds which is about 260 hours or 11 days. Instead, if we computed them on parallel it would still take 1 second. I bet everyone would choose the second.

Linear Algebra is an amazing field of study with applications in basically any industry. You can use them to link movies ratings with users if you work at Netflix or you can use them in Physics and Finance. Any computer scientist saw them at some point in his life. You can even use them as scoreboards when playing cards with your friends!

To anyone willing to learn more I strongly recommend Rober Messer’s book Linear Algebra: Gateway to Mathematics.

Ciao,

Michele

--

--

## Michele Rexha

23 Followers

Software Developer & Founder @ Fidia Tech