Linear Algebra

Published in

Journey to Machine Learning/Deep Learning/Artificial Intelligence

8 min readAug 22, 2019

After observing the humans thrust for intelligence God then commanded “Let there be Mathematics” and then the bulbs got lit.

So I’ve come across a lot of conversations in which a lot of people claimed that “I feel like Machine Learning is totally abstract”, I would like to clear this out that this field is not at all abstract in nature if you can understand it’s origin or the basics it’s just as easy as solving simple arithmetical problems and the basics begins with “PURE MATHEMATICS!!”

So let’s start with a formal introduction about today’s topic and what are the requirements for this.

So what is Machine learning and how does Maths comes into play in this field? Machine Learning is nothing but a fancy name for “brute force learning”. In this the model trains itself to adjust the parameters to approximate itself to the real world values , and how does maths comes into play? Well every calculation that we do in Machine learning at the very core level we are dealing with vector /matrix or in computer’s terminology tensors manipulation or arithmetic.

So in today’s topic I will limelight you all with the Vector Arithmetic. So let’s begin….

1. Definition of Vectors, Matrices and Tensors

Vectors: A formal definition of vector is , a quantity having direction as well as magnitude, especially as determining the position of one point in space relative to another. But if we see from the perspective of Computer Applications it’s an array of numbers, either continuous or discrete and the space that consists of vectors is called a vector space. In machine learning, we deal with multi-dimensional data , so vectors become a very crucial part of it. For let’s say we are trying to predict the housing price based on the input feature vectors being number of bathrooms, number of bedrooms, population density of the location and floor number. So these 4 features form the input — feature vector for the housing price problem.

Vectors and their Vector Space(the white space)

Matrices: A matrix is a two — dimensional array of numbers arranged in rows and columns. The size of a matrix is determined by its rows length and columns length. So let’s say we have a matrix A which has m rows and n columns , it can be represented as a rectangular object having m x n elements.

Tensors: Tensors are a multidimensional array of numbers. In fact vectors and matrices are 1-D and 2-D tensors. In machine learning or deep learning tensors are mostly used for storing and processing data.

Now I assume that all of us in here are pretty much familiar with matrix operations such as addition and subtraction, so I’m moving forward to the topics which are of utmost importance.

2. Dot Product of Two Vectors.

The dot — product of two vectors is the sum of the Product of corresponding components, i.e. along the same dimension and can be expressed as:

where the corresponding vectors are as follows.

Now a question might come up why is this dot product concept so important or where do we see it’s use? Well we use it everywhere at the very root level of Machine Learning and that being presented in the below image.

A snippet view from a basic Linear Regression Formula

The multiplication between θ and x is actually the dot product.

3. Linear Independence of Vectors

A vector is said to be linearly dependent on other vectors if it can be expressed as the linear combination of other vectors. i.e. v₃ = 7v₁+5v₂, then v₁,v₂, v₃ are not linearly independent.

A set of vectors v₁,v₂,v₃,…,vₙ are said to be linearly independent if they can be expressed as a₁v₁+a₂v₂+a₃v₃+ …+aₙvₙ = 0 ⇒ aᵢ=0 ∀ i ∈ {1,2,3…}.

If a₁v₁+a₂v₂+….+aₙvₙ = 0 and not all aᵢ = 0 ∀ i ∊ {1,2,3,…,n} then the vectors are not linearly independent.

Okay, now so much for the concept of Linear Independence. But where do we use it in Machine Learning??

When vectors are linearly independent then they can span in any direction in the whole dimension of their vector space. Now this concept of Linear Independence is seen in Linear Regression where our output matrix is in some other direction and our input matrix in some other direction and then there is the error matrix which lies in the plane between the input matrix and output matrix as follows.

Illustrating Linear Independence in Machine Learning. This is an example from Linear Regression

4. Norm of a Vector

The Norm of a Vector is a measure of its magnitude. There are several kinds of such norms. The most familiar is the Euclidean norm defined as l² norm which is defined as follows:

||x||₂ = √(|x₁|₂+|x₂|₂+…..+|xₙ|₂) = √(x.x) = √(xᵗx)

Similarly, the l¹ norm is the sum of the absolute values of the components:

||x||₁=|x₁|+|x₂|+…..+|xₙ|

Now where do we apply this concept in Machine Learning? Let’s hop into Linear Regression for a while, the cost function of this model is as follows

C(Ө) = ||e||₂² = || XӨ — Y||₂²

So from the expression we can see that we are calculating the l² norm of the error vector to compute the cost function of the Linear Regression Model. Generally, for machine learning we use both the l² and l¹ norm for several purposes. For instance we use the l² norm to calculate the least square cost function. Similarly, very often we would have to use regularisation for our model, with the result that the model doesn’t fit the training data very well and fails to generalise to new data. To achieve regularisation, we generally add the square of either the l² norm or the l¹ norm of the parameter vector for the model as a penalty in the cost function for the model.

When the l² norm of the parameter vector is used for regularisation, it is generally known as Ridge Regularisation, whereas when the l¹ norm is used instead it is known as Lasso Regularisation.

5. Pseudo Inverse of a Matrix

Suppose we have an equation Ax = b where A is a matrix of ℝ of size n× n and x is a vector of ℝ of size n × 1 then b is a vector of ℝ of size n × 1. To solve for x we can reform the equation x = A⁻¹b now since A is a square hence A⁻¹ considering the fact that A is not a singular matrix.

Now what if A is a rectangular matrix of size m × n where m > n , then A⁻¹ does not exists. Hence the prior solution to the above equation isn’t possible. Hence to solve the above equation we need a work around which will be given by

x = (AᵗA)⁻¹Aᵗb

The value (AᵗA)⁻¹Aᵗ is called the pseudo inverse of Matrix A, since it acts as an inverse of A to provide optimal solution to the equation.

Now where do we find it’s usage in Machine learning? If we see the least square error in linear regression we can see that the equation to calculate the parameter feature Ө is given by (XᵗX)⁻¹XᵗY , so the term (XᵗX)⁻¹Xᵗ acts as the pseudo inverse of X, to provide the optimal solution to the above equation.

6. Projection of a Vector

Projection of vector v₁ in the direction of v₂ is the dot product of v₁ with the unit vector in the direction of v₂. Thus the equation can be written as

||v₁₂|| = v₁.u₂ = v₁ᵗ.u₂ = v₁[v₂/√(v₂ᵗv₂)]

where ||v₁₂|| is the projection of v₁ onto v₂.

The Projection of a Vector can be seen in this equation (XӨ — Y) where XӨ and Y are in different dimensions and the error vector (XӨ — Y) is trying to minimise and transform the dimension of the predicted value to the original value i.e. XӨ and Y

7. Eigen Vectors and Values

This is the most fundamental part of Machine learning. Eigen values and vectors comes in various fields of machine learning.

For example the principal component in principal-components analysis are the Eigen vectors of the covariance matrix, while the Eigen values are the covariances along the principal components.
Similarly in Google’s page ranking algorithm the vector of the page-rank score is nothing but an Eigen vector of the page transition probability matrix corresponding to the Eigen value of 1.

When a matrix A of ℝ of size n× n works on a vector x of ℝ of size n × 1, the result is another vector Ax of ℝ of size n × 1. Generally the magnitude as well as the direction of the new vector different from that of the original vector. If in such a scenario the newly generated vector has the same direction or exactly the opposite direction as that of the original vector, then any such vector in that direction is called the Eigen vector and the magnitude by which the vector stretches is called the Eigen value.

The equation for a Eigen vector is as follows:

Ax = λx

[where x is the Eigen vector and λ is the eigen value]

So did you all find it interesting? Cause these are the simple linear algebra concepts that are required for Machine learning. It’s just the key to unlock just one door in this journey. And with these concepts I’m signing off for now and will catch up with you all in my next story. See ya!!! Bye!!!

Thanks for viewing my post. If you like my post please press the clap button to the left.