What is PCA(Principle Component Analysis)?

3 min readJun 22, 2022

What exactly is PCA?

Many data scientists have to take the help of PCA on a frequent basis.Everybody knows that PCA is used to help in dimensionality reduction so that the model is not affected by the curse of dimensionality.Yet,only few know it on an intuitive level or the mathematics behind it.This blog will help you understand the core idea behind PCA and some basic mathematics behind it with a very simple explanation.

PCA core idea.

You can consider PCA to be like a cameraman in a football game.The cameraman has to telecast 2D video in an 3D environment because most live television in 2022 require only 2D format.

So the cameraman will try to capture 2D video/images which will have maximum information about the 3D environment.We can say that he tends to telecast the maximum variance in the football game.

He tends to take maximum footage from the an angle where maximum players can be seen(max variance).Now you can create an analogy where players are the data points,the angle is the PC1 and the Cameraman is PCA.

What is variance?

In simple words,variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance is directly proportional to the spread of data.More the variance,more is the spread in data.As you can see in the figure the variance is more in x axis than in y axis.This means that x-axis has a larger spread of data points and is therefore considered to a more important feature.It’s like that camera angle where maximum no of players can be captured.

Intuition behind PCA

PCA tends to find convert all the features in a dataset to a finite number(a lot less than number of features) of Principle Components.Principal components are nothing but some axis on the given space who tend to capture the maximum variance.

Mathematics intuition behind PCA

PCA works on the fundamentals of Matrices and eigenvectors.Its main objective function is to project the data points on every possible unit vector/axis in a space and find out the axis/unit vector(PC) where maximum variance/spread of data is achieved.Now its tends to maximize the variance formula for that particular task and find out the particular unit vector.

Now by Rayleigh quotient,

It was proved that for finding that particular unit vector,a covariance matrix of all the features is required.Covariance is nothing but variance which also captures the positive or negative relationship between the features.Covariance is very similar to correlation except the fact that it is not limited to a range of (-1,1).A covariance matrix tends to contain the covariances of the given features in an input.

A covariance matrix of a dataset of 2 and 3 features looks like this,

Now the second step is to find the largest eigenvector(eigenvector of a matrix A is a vector which only changes by a scalar factor when it is linearly transformed/multiplied by the matrix A).

Now the largest eigenvector is your PC1(Principle component 1),the second largest eigenvector is your PC2 and so on.

Now you can convert any degree of data to a 1D(PC1),

2D (PC1,PC2)

or 3D (PC1,PC2,PC3) space.

by transforming/projecting the data points to the given PC’s.