The Linear Algebraic Love for ML: Or How Change of Basis, EigenDecomposition & PCA all come together beautifully!

Published in

Analytics Vidhya

6 min readJun 3, 2020

When we deal with Vector Spaces, more often than not we use the concepts of Vectors and Co-ordinates of the vectors interchangeably. But they are two very different ideas. The co-ordinates of a vector is always with respect to a basis.

And if we omit the basis, we naturally assume it to be the standard basis {(0,1), (1,0)}. For the purpose of this article, let us call the standard basis co-ordinate system as CS-1. A vector in CS-1, is usually written as:

It can also be written in the matrix form as:

So the above actually represents the co-ordinates of the vector a, and can be written as:

Change of Basis:

Now, let us consider a second co-ordinate system CS-2 whose basis vectors are given as follows, w.r.t. that co-ordinate system:

An alternate co-ordinate system can be thought of as a Vector Space spanned by its own basis vectors. If you have your default vector space represented by the standard co-ordinate system CS-1 as your anchor point, you can think of this 2nd vector space as something that resulted after applying one or more transformations (translation, rotation, etc.) on the default vector space.

And consider a vector in the CS-2, whose co-ordinates w.r.t. CS-2 are given. This written as a linear combination of its basis vectors will look like:

But the last equation is how the vector is represented in CS-2. We understand CS-1 better, right? So in order to translate this vector in CS-1, we first have to represent CS-2’s basis vectors in CS-1. From a transformations perspective, this is like saying, how the standard basis vectors, i and j got transformed to b1 and b2. Let us say that:

The matrix in the last equation above, is what is called as a Change of Basis matrix. And we get the vector represented in CS-1.

In a generic form, we can say:

In plain English, we can say, the transformation matrix (change of basis matrix) gives the new coordinate system’s (CS-2) basis vectors — represented in original coordinates CS-1.

Decomposing a Transformation:

Before moving onto eigendecomposition, we need to understand that a transformation can be broken down into multiple transformations applied one after the other and that will produce the same effect as the original transformation.

That is instead of applying the original complex transformation A, we try to apply an equivalent simpler transformation (B) by taking it to a different vector space (represented by another set of basis (M)) and then bring the transformed vector back to our original vector space (M^-1).

Where:

M : change of basis matrix

B : simpler transformation on the alternate

M^-1 : inverse change of basis matrix

EigenDecomposition:

An interesting property of a real symmetric transformation matrix has to be quoted here: an n×n real symmetric matrix has n linearly independent and orthogonal eigenvectors (that is, the eigenvectors are perpendicular to each other), and it has n real eigenvalues corresponding to those eigenvectors. So the eigenvectors also span the whole n dimensional space. In other words, the eigenvectors can be used as the basis vectors for the same n-dim space.

Also, a real symmetric matrix is said to be orthogonally diagonalizable. This is the so-called “Spectral Theorem”. Fun Fact: the name is because, the eigenvalues of a matrix are called its “spectrum” — explaining why this is so, requires touching upon Banach spaces, functional analysis, eigenstates in Quantum mechanics, etc. — all of which are out of the scope of this author!

So coming backto where we were, by orthogonal diagonalization, the matrix A can be decomposed as:

Where D is a diagonal matrix, whose diagonal elements are nothing but the eigenvalues of A. And this equation is called the “EigenDecomposition” of A.

To quote directly from the [Ref.] “That is to say, A acts like a diagonal matrix, when we change coordinates: more precisely, the mapping x -> Ax in standard basis is the same as [xm] -> D[xm] when written in M’s coordinates.”

Combining the above ideas, therefore, we can say that when a transformation matrix is real symmetric, the corresponding transformation can be decomposed into 3 simpler transformations: rotation, scaling, reverse-rotation. That is, it transforms a vector by: taking it to the eigenbasis coordinates, stretching or shrinking the vector along its eigenvectors (the amount of stretching or shrinking is proportional to the corresponding eigenvalues), and then bringing it back to our standard coordinates.

Principal Component Analysis (PCA):

PCA can be explained from multiple frames of reference — as maximizing the variance or minimizing the reconstruction errors. Here we will see PCA explained as the EigenDecomposition of the Covariance matrix. Let the Data Matrix and the Transformation Matrix be defined as follows:

The transformation is given as:

Find the Covariance matrix of this transformed matrix:

V is the Covariance Matrix of the original Data Matrix before transformation.

The best transformation W will reduce the covariance (after transformation) to a Diagonal matrix (meaning, there is no correlation between the variables, after the transformation)

For WVW^T to be a diagonal matrix,

W : Eigenbasis: eigenvectors of V as the basis.

So the data points are transformed from Rd to Rk where Rk is the subspace spanned by the k Eigenvectors of the original Covariance matrix V.

From the above flow of ideas, you can see how several seemingly unrelated concepts all amalgamate to help understand a specific Machine Learning construct.

References:

https://www.math.wustl.edu/~freiwald/309orthogdiag.pdf