Linear algebra in Data Science

8 min readAug 30, 2018

This is part of The ULTIMATE Curriculum in Data Science which you can refer for more topics related to Data Science.

As long as algebra is taught in school, there will be prayer in school.

Linear Algebra is a continuous form of mathematics and is applied throughout science and engineering because it allows you to model natural phenomena and to compute them efficiently. Because it is a form of continuous and not discrete mathematics, a lot of computer scientists don’t have a lot of experience with it. Linear Algebra is also central to almost all areas of mathematics like geometry and functional analysis. Its concepts are a crucial prerequisite for understanding the theory behind Data Science. You don’t need to understand Linear Algebra before getting started in Data Science, but at some point, you may want to gain a better understanding of how the different Machine Learning algorithms really work under the hood. So if you really want to be a professional in this field, you will have to master the parts of Linear Algebra that are important for Machine Learning.

As for this section there are many resources that are easy and have excellent way of teaching. For these topics i will share their references and add some of my personal touches that enhances this chapter.

First lets get our minds ready for some linear algebra.

To learn the basics of Linear Algebra see this youtube video playlist

In fact i would highly recommend you to follow and learn the beauty of mathematics from this youtube channel named as 3Blue1Brown.

Additional topics

Tensors

As now you have known how a basic matrix looks like, you too have learnt how to work and use matrix for calculating math equations. A Tensor extends the concept of matrix by adding additional dimensionalities to it. Basically a tensor is a matrix within a matrix.

As you can see this a 8x8 matrix can be represented by 4x4x4 tensor.

A tensor is a mathematical entity that lives in a structure and interacts with other mathematical entities. If one transforms the other entities in the structure in a regular way, then the tensor must obey a related transformation rule.

This tensor is not essential as of right now, in fact you will not use tensor in your data science-machine learning except in deep learning. There tensor plays a crucial part in it. Even so that the most popular deep learning framework named as Tensorflow is named after Tensors and if you have still doubt in understanding it, here's a very simple and clear explanation of this.

Norm

Norm is a very simple way to find out the length of a vector. Now it has many termonologies such as vector norm or vector magnitude.

This vector norm can be divided into three parts namely

Vector L1 norm
Vector L2 norm
Vector Max norm

Vector L1 norm

The length of a vector can be calculated using the L¹ norm. The notation for the L¹ norm of a vector is ||v||¹ . As such, this length is sometimes called the taxicab norm or the Manhattan norm because it uses manhattan distance to find length of vectors.

A manhattan distance eg. :

X = (1, 2, 2) and Y = (2, 5, 3)

| 1–2 | + | 2–5 | + | 2–3 |

= 1 + 3 + 1

= 5

Therefore L¹ or ||v||¹ = |a1| + |a2| + |a3|

note: in numpy there is predefined function to calculate norm. but for L¹ norm you have to add parameter as

norm(a, 1)

Vector L2 norm

This vector obeys all the properties of L¹ norm except instead of manhattan distance, it uses euclidian distance to find out the length of a vector. It is denoted by L² or ||v||²

An euclidean distance eg. :

X = (1, 2, 2) and Y = (2, 5, 3)

= sqrt ((1–2)² + (2–5)² + (2–3)²)
= sqrt (1 + 9 + 1)
= sqrt (11)

= 3.316

Therefore L² or ||v||² = sqrt(a1² + a2² + a3²)

note: in numpy there is predefined function to calculate norm. you dont have to specify any parmeter so norm in numpy is

norm(a)

In Data Science especially in machine learning L¹ and L² norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small and, in turn, the model less complex. Comparing L¹ and L² norms, L² norm are the most common norms that we will be using mostly in our models.

L¹ is specifically called as sum of the absolute values of the vector and L² is called as square root of the sum of the squared vector values.

Vector Max norm

In this norm we normally find out distance by calculating the max distance between the vectors. Max norm of a vector is referred to as L^ ∞ and can be represented with the infinity symbol. The notation for max norm is||x|| ^∞.

An euclidean distance eg. :

X = (1, 2, 5)

= max(1,2,5)
= 5

Therefore L^ ∞ or ||v||^ ∞ = max(a¹, a², a³)

note: in numpy there is predefined function to calculate norm. but for L^ ∞ norm you have to add parameter as

norm(a, inf)

Max norm is also used as a regularization in machine learning, such as on neural network weights, called max norm regularization.

Singular value decomposition

The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. The SVD allows us to discover some of the same kind of information as the eigendecomposition. However, the SVD is more generally applicable.

Singular value decomposition or SVD is one of the most popular dimensionality reduction techniques that helps us to model, understand and visualize the data effectively.

eli5: This is simple story that you have learnt in your childhood days

“After been fed up by his sons lack of unity, their father made a plan to teach them a lesson. He brought a bundle of wooden sticks and tied them together. He told each son to break it with their hands. At first the sons were confident of breaking the bundle of wooden sticks but when they tried they realised that it was very hard to break than expected. In the end nobody broke the bundle of sticks. Now their father gave one stick from the bundle to each of his sons and told them to break them. Now it becomes very easy for them to break the sticks. The end.”

This story teaches us the importance of unity which is great, but think of it in Data Scientist way, you have an image think of like this.

Now you have to analyse the image, but this image has 500 rows, ie 500 rank which is computationally expensive for any computer and it is not feasible to load the whole image our system. With the help of SVD we can easily break down the image to smaller ranked images and then work on the image. Now to break down the image, we have to reduce the dimension of the image, and SVD is the perfect tool for this.

Now let’s think of SVD in terms of Vectors and Matrices

This video of Stanford university clearly explains the basic of SVD with the help of an example.

After this learn from this article which teaches it in a more mathematical way.

Singular Value Decomposition (SVD) Tutorial: Applications, Examples, Exercises

A complete tutorial on the singular value decomposition method

blog.statsbot.co

Finally if you want to code yourself in python then refer this.

PS: If you have some knowledge of Python and numpy then only you can understand this.

A Gentle Introduction to Singular-Value Decomposition for Machine Learning

Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent…

machinelearningmastery.com

PS: If you want to really want to master in Data Science and want to clear up the mind with all the problems related to linear algebra and can spend time in it then you can follow this tutorial by MIT which covers all the remaining topics which isn't covered by me or other tutorials referred by me.

Video Lectures

Don't show me this again This is one of over 2,200 courses on OCW. Find materials for this course in the pages linked…

ocw.mit.edu

Now you have understood the basic concepts related to Linear Algebra, now it’s time to use it to our’s benefit. There’s a awesome post by Analytics Vidhya which covers and understand’s linear algebra in perspective to Data Scientist.

A comprehensive beginners guide to Linear Algebra for Data Scientists

Introduction One of the most common question we get on Analytics Vidhya is: How much maths do I need to learn to be a…

www.analyticsvidhya.com

To get the latest updates, tips and anything you want or have issue just post in the comments.

Till then….

Happy coding :)

And Don’t forget to clap clap clap…

References

Boost your data science skills. Learn linear algebra.

I’d like to introduce a series of blog posts and their corresponding Python Notebooks gathering notes on the Deep…

towardsdatascience.com

What’s the difference between a matrix and a tensor?

There is a short answer to this question, so let’s start there. Then we can take a look at an application to get a…

medium.com

Gentle Introduction to Vector Norms in Machine Learning

Calculating the length or magnitude of vectors is often required either directly as a regularization method in machine…

machinelearningmastery.com

A Gentle Introduction to Singular-Value Decomposition for Machine Learning

Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent…

machinelearningmastery.com

Cool Linear Algebra: Singular Value Decomposition

One of the most beautiful and useful results from linear algebra, in my opinion, is a matrix decomposition known as the…

andrew.gibiansky.com

Linear algebra in Data Science

Additional topics

Tensors

Norm

Vector L1 norm

Vector L2 norm

Vector Max norm

Singular value decomposition

Singular Value Decomposition (SVD) Tutorial: Applications, Examples, Exercises

A complete tutorial on the singular value decomposition method

A Gentle Introduction to Singular-Value Decomposition for Machine Learning

Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent…

Video Lectures

Don't show me this again This is one of over 2,200 courses on OCW. Find materials for this course in the pages linked…

A comprehensive beginners guide to Linear Algebra for Data Scientists

Introduction One of the most common question we get on Analytics Vidhya is: How much maths do I need to learn to be a…

Till then….

Happy coding :)

References

Boost your data science skills. Learn linear algebra.

I’d like to introduce a series of blog posts and their corresponding Python Notebooks gathering notes on the Deep…

What’s the difference between a matrix and a tensor?

There is a short answer to this question, so let’s start there. Then we can take a look at an application to get a…

Gentle Introduction to Vector Norms in Machine Learning

Calculating the length or magnitude of vectors is often required either directly as a regularization method in machine…

A Gentle Introduction to Singular-Value Decomposition for Machine Learning

Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent…

Cool Linear Algebra: Singular Value Decomposition

One of the most beautiful and useful results from linear algebra, in my opinion, is a matrix decomposition known as the…

Written by Gaurav Chauhan