Linear algebra in Data Science

Gaurav Chauhan
8 min readAug 30, 2018

--

This is part of The ULTIMATE Curriculum in Data Science which you can refer for more topics related to Data Science.

As long as algebra is taught in school, there will be prayer in school.

Linear Algebra is a continuous form of mathematics and is applied throughout science and engineering because it allows you to model natural phenomena and to compute them efficiently. Because it is a form of continuous and not discrete mathematics, a lot of computer scientists don’t have a lot of experience with it. Linear Algebra is also central to almost all areas of mathematics like geometry and functional analysis. Its concepts are a crucial prerequisite for understanding the theory behind Data Science. You don’t need to understand Linear Algebra before getting started in Data Science, but at some point, you may want to gain a better understanding of how the different Machine Learning algorithms really work under the hood. So if you really want to be a professional in this field, you will have to master the parts of Linear Algebra that are important for Machine Learning.

As for this section there are many resources that are easy and have excellent way of teaching. For these topics i will share their references and add some of my personal touches that enhances this chapter.

First lets get our minds ready for some linear algebra.

To learn the basics of Linear Algebra see this youtube video playlist

In fact i would highly recommend you to follow and learn the beauty of mathematics from this youtube channel named as 3Blue1Brown.

Additional topics

Tensors

As now you have known how a basic matrix looks like, you too have learnt how to work and use matrix for calculating math equations. A Tensor extends the concept of matrix by adding additional dimensionalities to it. Basically a tensor is a matrix within a matrix.

As you can see this a 8x8 matrix can be represented by 4x4x4 tensor.

A tensor is a mathematical entity that lives in a structure and interacts with other mathematical entities. If one transforms the other entities in the structure in a regular way, then the tensor must obey a related transformation rule.

This tensor is not essential as of right now, in fact you will not use tensor in your data science-machine learning except in deep learning. There tensor plays a crucial part in it. Even so that the most popular deep learning framework named as Tensorflow is named after Tensors and if you have still doubt in understanding it, here's a very simple and clear explanation of this.

Norm

Norm is a very simple way to find out the length of a vector. Now it has many termonologies such as vector norm or vector magnitude.

This vector norm can be divided into three parts namely

  • Vector L1 norm
  • Vector L2 norm
  • Vector Max norm

Vector L1 norm

The length of a vector can be calculated using the L¹ norm. The notation for the L¹ norm of a vector is ||v||¹ . As such, this length is sometimes called the taxicab norm or the Manhattan norm because it uses manhattan distance to find length of vectors.

A manhattan distance eg. :

X = (1, 2, 2) and Y = (2, 5, 3)

| 1–2 | + | 2–5 | + | 2–3 |

= 1 + 3 + 1

= 5

Therefore or ||v||¹ = |a1| + |a2| + |a3|

note: in numpy there is predefined function to calculate norm. but for L¹ norm you have to add parameter as

norm(a, 1)

Vector L2 norm

This vector obeys all the properties of L¹ norm except instead of manhattan distance, it uses euclidian distance to find out the length of a vector. It is denoted by or ||v||²

An euclidean distance eg. :

X = (1, 2, 2) and Y = (2, 5, 3)

= sqrt ((1–2)² + (2–5)² + (2–3)²)
= sqrt (1 + 9 + 1)
= sqrt (11)

= 3.316

Therefore or ||v||² = sqrt(a1² + a2² + a3²)

note: in numpy there is predefined function to calculate norm. you dont have to specify any parmeter so norm in numpy is

norm(a)

In Data Science especially in machine learning L¹ and L² norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small and, in turn, the model less complex. Comparing L¹ and L² norms, L² norm are the most common norms that we will be using mostly in our models.

L¹ is specifically called as sum of the absolute values of the vector and L² is called as square root of the sum of the squared vector values.

Vector Max norm

In this norm we normally find out distance by calculating the max distance between the vectors. Max norm of a vector is referred to as L^ ∞ and can be represented with the infinity symbol. The notation for max norm is||x|| ^∞.

An euclidean distance eg. :

X = (1, 2, 5)

= max(1,2,5)
= 5

Therefore L^ ∞ or ||v||^ = max(a¹, a², a³)

note: in numpy there is predefined function to calculate norm. but for L^ ∞ norm you have to add parameter as

norm(a, inf)

Max norm is also used as a regularization in machine learning, such as on neural network weights, called max norm regularization.

Singular value decomposition

The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. The SVD allows us to discover some of the same kind of information as the eigendecomposition. However, the SVD is more generally applicable.

Singular value decomposition or SVD is one of the most popular dimensionality reduction techniques that helps us to model, understand and visualize the data effectively.

eli5: This is simple story that you have learnt in your childhood days

“After been fed up by his sons lack of unity, their father made a plan to teach them a lesson. He brought a bundle of wooden sticks and tied them together. He told each son to break it with their hands. At first the sons were confident of breaking the bundle of wooden sticks but when they tried they realised that it was very hard to break than expected. In the end nobody broke the bundle of sticks. Now their father gave one stick from the bundle to each of his sons and told them to break them. Now it becomes very easy for them to break the sticks. The end.”

This story teaches us the importance of unity which is great, but think of it in Data Scientist way, you have an image think of like this.

Now you have to analyse the image, but this image has 500 rows, ie 500 rank which is computationally expensive for any computer and it is not feasible to load the whole image our system. With the help of SVD we can easily break down the image to smaller ranked images and then work on the image. Now to break down the image, we have to reduce the dimension of the image, and SVD is the perfect tool for this.

Now let’s think of SVD in terms of Vectors and Matrices

This video of Stanford university clearly explains the basic of SVD with the help of an example.

After this learn from this article which teaches it in a more mathematical way.

Finally if you want to code yourself in python then refer this.

PS: If you have some knowledge of Python and numpy then only you can understand this.

PS: If you want to really want to master in Data Science and want to clear up the mind with all the problems related to linear algebra and can spend time in it then you can follow this tutorial by MIT which covers all the remaining topics which isn't covered by me or other tutorials referred by me.

Now you have understood the basic concepts related to Linear Algebra, now it’s time to use it to our’s benefit. There’s a awesome post by Analytics Vidhya which covers and understand’s linear algebra in perspective to Data Scientist.

To get the latest updates, tips and anything you want or have issue just post in the comments.

Till then….

Happy coding :)

And Don’t forget to clap clap clap…

--

--