Mathematics for Machine Learning

Part — 2 (Linear Algebra and Matrix)

Published in

Analytics Vidhya

13 min readNov 10, 2019

In my Part — 1, I talked about matrices and matrix operations. In this blog, I’ll be discussing those concepts of linear algebra which are important for Machine Learning (ML) and some more concepts of matrices like rank, adjoint of a matrix, inverse of a matrix, etc. If you are comfortable with matrices then go ahead and read the blog otherwise I would recommend going through Part — 1 blog.

Introduction

Linear algebra is a branch of mathematics that deals with the linear equations. In an equation, there are mainly three terms — variables (X = [x_1, x_2, x_3, …]), coefficients (A = [a_1, a_2, a_3, ….]) and a constant term (B = [b]). An equation whose all variables have degree 1 or 0 is called a linear equation. Each equation can be written using multiplication of a row and a column vectors i.e. (AX = B).

Here A is a row vector of coefficients and X is a column vector of variables.

Note: Any variable in the linear equation can have degree 1 or 0. But there must be at least one variable whose degree is 1.

Hyperplanes

A hyperplane is a geometric entity whose dimension is one less than the dimension of its ambient space.

Examples:

A line is one-dimensional (1) and its ambient space is two-dimensional (1+1).
A plane is two-dimensional (2) and its ambient space is three-dimensional (2+1).
Other hyperplanes that are k-dimensional (k) and their ambient space are (k+1)-dimensional. (k>2).

Note: Equation of hyperplane is always a linear equation.

Line

The concept of a line or straight line was introduced by mathematicians to represent the straight objects (with no curvatures). A line does not have any width and depth. ~ Wikipedia.

The equation of a line has only two variables. There are various forms of representing a linear equation but I’ll be discussing only three forms: slope form, intercept form and general form. Conversion of one form to another form is very easy.

General Form: This is one of the simplest forms and this form looks very similar to the above figure. So it can be written in matrix form very easily. In the below equation ‘a’ and ‘b’ are the coefficients of variables ‘x’ and ‘y’ respectively. ‘c’ is a constant term.

A and X are a row and column vectors respectively.

The slope of the line from this form can be found using coefficient terms. slope = -a/b.

Mathematically slope is the tangent of angle formed by line with the x-axis and angle is measured in anti-clock direction from x-axis.
Intuitively, slope measures how fast the ‘y’ changes when ‘x’ changes.

If you want to see how the line changes with change in ‘a’, ‘b’ & ‘c’ click here and play with it.

Slope Form: This is the most popular and simplest form. This form has two main terms: first is the y-intercept term (‘c’) and the second term is slope (‘m’).

If you want to see how the line changes when slope and intercept term changes, click here.

Intercept Form: This form of the equation is used when we have the x & y-intercepts or we need to find the axis-intercepts. In the below equation ‘a’ & ‘b’ are the x & y-intercepts respectively.

If you want to see the changes in the line when ‘a’, ‘b’ & ‘c’ changes, click here.

The below figures show how a general form of a line can be converted into slope and intercept form. Similarly, other conversions can be done.

Angle between two lines: Let the slope of the first and second lines are ‘m_1’ & ‘m_2’ respectively. Then the angle between two lines can be obtained using the below formula.

If want to see how above the angle between lines changes when ‘m_1’ & ‘m_2’ changes click here.

To find the smaller angle we use this formula.

By using the above formula —

Lines will be parallel or same if angle (theta) between them is 0, parallel lines do not intersect each other if they are not the same line.

Lines will be perpendicular if angle (theta) between them is 90 degrees.

Intersection of two lines: There are various methods to find the intersection of two lines but I’ll be discussing only one-method i.e. matrix method in the later section of this blog.

Plane

A plane is a flat two-dimensional surface that extends infinitely far and it has zero thickness. — Wikipedia

Equation of a plane has three variables [x,y,z], their corresponding coefficients [a,b,c] which are called direction ratios/direction cosines (if unit vector) and a constant term (‘d’). In general, direction-ratios are represented by [l,m,n]. Direction ratios are always unit-vector (a vector whose magnitude is 1). I’ll be discussing in detail about vectors in my next blog.

If the coefficients of variables are direction-ratios then constant term |d| is the shortest distance of the plane from the origin. A plane passing through origin will have a constant term equal to 0. The direction-ratios/direction-cosines are related to the normal vector (n) of the plane.

Note: Direction cosines of a line/vector are defined as the cosine of the angle made by the line/vector with x, y & z axes. e.g. direction cosines of the z-axis will be <cos(90), cos(90), cos(0)> = <0,0,1>. The z-axis is normal to xy-plane, so the equation of xy-pane will be —

The angle between two vectors can be found using dot product between them. Let n_1 & n_2 be the two vectors then the angle between them will be —

Angle between two planes: The angle between planes (theta) and the angle between normal vectors (phi) of the planes is the supplementary angle.

The angle between normal vectors (n_1 & n_2) of two planes can found using the above concept and here the angle between them will be phi. So the angle between two planes whose normals are given will be —

Intersection of planes: The planes that have parallel normal vectors are parallel or overlapping planes. The intersection of two planes is always a straight line if the normal vectors of planes are not parallel. The solution of three planes will be discussed in the later section of this blog.

NOTE: I’ll be using linear equation and hyperplane terms interchangeably. So don’t be confused.

Inverse of a Matrix

If you know how to find the inverse of a matrix then you can skip this section of the blog and jump to the next section. To find the inverse of a matrix we must know how to find the determinant and adjoint of a matrix.

Determinant of a Matrix

The determinant of a matrix exists if and only if the matrix is a square matrix. Let A is a square matrix.

Calculating the determinant of a square matrix A of order 2 is very easy. The first element of the first row a will be multiplied by that number which is neither in the row nor in the column of a, so a will be multiplied by d (d is the cofactor of a) Similarly b will be multiplied by -c(- c is the cofactor). Then add the products (i.e. ad +(-bc)).

Now we can find the determinant of a matrix of order 2. Using the same method we can find the determinant of a matrix of order 3. Figure — 8 is showing how to find the cofactors of each element of the first row. These cofactors will be used to find the determinant. The term (-1)^(i+j) is there where ‘i’ is i_th row and ‘j’ is j_th column. Here i, j = {1,2,3}.

Figure — 8 (Cofactors of each element {a,b,c} of the first row respectively)

Figure — 9 (Determinant of a square matrix of order 3)

In the above example i = 1 and j = {1,2,3} is used.

In the same way, we can calculate the determinant of a matrix of order 4 or higher.

Figure — 10 (Determinant of a square matrix of order 4)

The determinant of a square matrix can be found using any row or any column. Pick any column or row and multiply each element of selected row/column by its corresponding cofactor and then add them.

Adjoint of a Matrix

The adjoint of a matrix A is the transpose of the cofactor of matrix A. In Figure — 8, I’ve shown how to find the cofactors of each element of the first row of a matrix. In the same way, we can find the cofactors of other elements of the matrix.

Figure — 11 (Cofactors of the above matrix A)

There are 3-steps to find the inverse of a square matrix. Let’s A is a square matrix.

Step 1: Calculate the determinant of matrix A. Determinant of a matrix (it must be square matrix) is denoted by det(A) or |A|. If |A| = 0 then A is not invertible.

Step 2: Calculate the adjoint of A.

Step 3: Calculate the inverse of A by using the below formula.

Figure — 14 (Flowchart to find the inverse of a matrix)

Rank of a Matrix

Before going to the system of a linear equation and their solution it is better is to know that what is the rank of a matrix.

Rank is the maximum number of linearly independent rows or the maximum number of linearly independent columns. I’m going to call the number of linearly independent rows as row-rank and number of linearly independent columns as column-rank of a matrix. But the rank of a matrix is always unique and greater than or equal to 0. I’ll be discussing in detail about this concept in this blog.

Note: A matrix whose all rows and columns are linearly independent and row-rank is equal to column-rank of that matrix is called the full rank matrix.
The rank of a matrix will be zero if the matrix has no elements in it (i.e. null matrix) otherwise rank of a matrix will be greater than or equal to 1.

System of Linear Equations and their Solution

It is a collection of one or more linear equations. The system of linear equations can be represented in the matrix form. A system can have equal numbers of equations and variables or different numbers of equations and variables.

Figure — 15 (b) System of linear equations

Figure — 15 is showing how a system of linear equations can be written in the matrix form (AX = B).

Let the order of matrix A, X & B be (m,n), (n,1) & (m,1) respectively. There are different cases possible depending on the value of m & n. Here m is the number of equations (or rows) and n is the number of variables (or columns).

Case — 1: m = n

m = n: equal number of equations and variables.

If A is a full rank matrix which means its all rows and columns are linearly independent and row-rank and column-rank of this matrix are equal. So the determinant of such matrix is never 0 (i.e. |A| is not 0). So the solution this system will be X = inverse(A) B. It’ll have a unique solution.

When A is not a full rank matrix which means |A| = 0 there are two possibilities — consistent and inconsistent case. In such cases, the hyperplanes which correspond to this system of linear equations are parallel or overlapping.

Consistent Case: In this case, hyperplanes are overlapping so there are infinite possible solutions for such system of linear equations.

The above two-equation is the same because the second equation can be derived from multiplying the first equation by multiplying 2 or the first equation can be derived from the second equation by multiplying 1/2. When we’ll plot both lines they will be overlapping. So this system of linear equation will have infinite solutions.

Inconsistent Case: In this case, hyperplanes are parallel to each other or there is at least one pair of hyperplanes that are parallel which means their coefficients are proportional. This type of system of linear equations doesn't have a solution.

In the above example, the first two equations can be derived from one another. So the first two equations are the same. The third equation/hyperplane is parallel to the first two equations/hyperplanes. The fourth equation/hyperplane will intersect all the above equations. But as a whole system, they’ll not have a solution.

Note: In the consistent case all equations can be derived from one another.
In the inconsistent case, all equations can not be derived from one another whereas in a few cases some equations can be derived from other equations and vice versa BUT NOT ALL equations.

Case — 2: m > n

In this case, the number of equations is greater than the number of variables and generally, this type of system of linear equations does not has a solution.

So to find the solution of this type of system of linear equations we’ll find X such that (AX-B) is minimum which means each element of E (=AX-B) is very close to 0.

Now our objective will be minimizing the (transpose(E)E).

To minimize f(X) we will differentiate f(X) w.r.t. X and set it equal to 0. Then calculate the value of X.

Note: The above solution is only possible if and only if |(transpose(A)A| is not 0 i.e. matrix (transpose(A)A) is invertible.

Case — 3: m< n

In this case, the number of equations is less than the number of variables and this type of system of linear equations has infinite solutions. To get a solution that is closest to the origin, we can optimize a constraint problem. To solve the constraint optimization problem we use the Lagrangian Function. I’m not going to explain how this function works because this blog is already too lengthy. I’ll explain this function in detail in my blog.

To get the solution which closest to the origin we need to minimize((transpose(X)X)/2) which subjected to AX = B. Using Lagrangian Function concept this will problem look like this —

To solve the above problem we can differentiate f(X) w.r.t. X and set it equal to 0.

Above equation has two variables, let’s premultiply by A on both sides —

Now we have the value of lambda and it can be used to calculate the X which will be closest to the origin for this type of system of linear equations. This solution is known as the minimum norm solution.

Figure — 16 (Flowchart for the solution of the system of linear equations)

Conclusion

Variables in the linear equation have degree 1. All hyperplanes have a linear equation. A set of linear equations make a system of linear equations and its solution can be calculated using the concept of the inverse of a matrix. The inverse of a matrix can be found using the concept of determinant and adjoint of a matrix. If the coefficient matrix A of a system of linear equations is a full rank matrix then there is always a unique solution otherwise system can infinite or no solution depending on whether it is a consistent or inconsistent case respectively.