Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

An Introduction to Linear Algebra for Deep Learning

Deep learning is all about data, and we need to represent data and perform operations on them to train our deep networks.

6 min readAug 6, 2021

--

Linear algebra constitutes the foundations of deep learning. A better grasp of the basics of this field will help you develop better intuitions regarding the way data gets manipulated in deep learning algorithms.

Press enter or click to view image in full size
https://en.wikipedia.org/wiki/Linear_algebra

Deep learning is all about data, and we need to represent data and perform operations on them to train our deep networks. Data has to be represented in matrix form. A better understanding of matrix operations and matrix algebra will help you develop a better intuition on how deep learning algorithms work. That is why Linear algebra is probably the most important branch of mathematics in deep learning. In this post, I’ll try to clarify the essential topics in this field.

What do we mean by Data?

Let’s consider a simple example in which you have the attributes of each house, and your goal is to try and predict the price of a given house. These attributes are also known as explanatory variables, and we will make use of them to train our model. For the sake of simplicity, we will only consider three attributes: number of bedrooms, house size, location. Now, each house will be represented as a vector of three values.

[X_numberOfBedrooms , X_size, X_location]
But wait, here we are only considering a single house. We usually have data sets comprised of thousands of houses, and each house would be called a data point. At this point, all we have to do is to stack the vectors of houses and form a matrix. Each row will represent a single house, and each column will represent a single explanatory variable. Great, we now have our design matrix at hand!

x₁₁: The number of rooms in the first house
x₂₁: Size of the first house
x₃₁: Location of the first house

Simple Linear Regression

Here, we will try to build a simple model in order to predict the price of a given house. Let’s take the linear combinations of the three explanatory variables. I mean, that’s probably the simplest model you can get; a simple linear regression. Now let’s look at this formally:

Y = Xβ+ϵ

As you can see, we have three weights that multiply each EV. You can think of them as the importance of each variable in determining the price. Think of it in simple terms, if the house is big and is in a good location, the price must be high. Therefore, all the EVs are positively correlated with the price. By looking at the highest weight, we can determine the most pertinent variable, and that will give us a good sense of the model’s sensitivity to each variable. Now, let’s rewrite everything in matrix notation.

Press enter or click to view image in full size
Image by Author

As you can see, writing everything in matrix form allows for a much more concise description of what is going on. But how do we multiply matrices? Don’t worry. It’s shockingly is easy and intuitive.

Multiplying Matrices

Firstly let’s think about it intuitively; we simply want to multiply each EV with its corresponding weight. We have n number of houses/examples, so logically, we should multiply each row in the design matrix with the column vector W. For the sake of brevity, we will consider a simple example with two examples and three explanatory variables:

Press enter or click to view image in full size
Image by Author

The multiplication of a matrix and a column vector will result in another column vector.

Now let’s consider multiplying two matrices. Don’t forget that to multiply matrices, the number of columns in the first matrix should be the same as the number of rows in the second matrix. The size of the resultant matrix can easily be calculated: If A=[aij] is an m×n matrix and B=[bij] is an n×k matrix, the product of AB is an m×k matrix. I have some good news; you already know how to multiply two matrices. The procedure is the same as multiplying a matrix with a vector, but this time, imagine that you have more than one column vector. You then stack the resultant column vectors side by side into a matrix.

PyTorch and Tensors

In this section, we will be looking at PyTorch tensors and use them for matrix multiplication. PyTorch is well known deep learning library, and Tensors play a crucial role. You can think of Tensors as higher-dimensional matrices, and PyTorch allows us to perform numerical operations on them efficiently. As you can probably guess by now, matrices and tensors constitute the foundations of deep learning.
Let’s look at a simple example in which we initialize two matrices and perform a matrix operation on them:

A = torch.tensor([[1,2,3] , [2,3,4]])B = torch.tensor([[3,1] , [4,2] , [2,3]])torch.matmul(A,B)
Output of matrix multiplication

Neural Networks.

Taking everything we have learned up until now, we can start applying matrix operations to represent neural networks. Here, I will be assuming that you know the basics of neural nets. So, firstly, let’s look at what our model architecture would like with a single hidden layer.

Press enter or click to view image in full size
http://alexlenail.me/NN-SVG/index.html

As seen from the image above, we have input neurons that will be represented the same way as our house data. Next, we have our hidden layer with four neurons. Each neuron will be a linear combination of the input neurons passed through a non-linearity function. In this example, we will consider a widely used and simple to understand activation function. A Rectified Linear Unit is an activation function that outputs zero if the input value is negative and outputs the input otherwise. Mathematically, The ReLU function is f(x)=max(0,x). In order to represent the four neurons in the hidden layer, we will multiply our design matrix with a weights matrix that has four columns and three rows; the number of rows should be equal to the dimensionality of inputs, and the number of columns should be equal to the number of target neurons in the subsequent layer.

Image by Author

All that is left is the final output layer. The output neuron is again a linear combination of the neurons in the previous layer. Since we are dealing with a regression problem in which we want a value that is not bounded, we do not need an activation function for the final layer. The matrix multiplication for this layer is much easier as we will only take a linear combination of the hidden layer. This should resemble a linear regression, and in fact, it exactly is a linear regression. The entire model could be represented as the following:

Press enter or click to view image in full size

Summary

All deep learning operations are represented using matrix calculation. Learning the basics of how data is represented in matrices and tensors will allow you to develop a better intuition of what is going under the hood. I would recommend anybody to learn more about linear algebra by watching 3Blue1Brown’s series on the essence of linear algebra. Of course, one can never forget the wonderful lectures provided by Prof. Gilbert Strang. Happy learning, and may the matrix be with you!

References

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Taha Binhuraib
Taha Binhuraib

Written by Taha Binhuraib

AI and Machine Learning enthusiast. Machine learning engineer

No responses yet