Principal Component Analysis, But Why?

5 min readJan 10, 2023

If you are reading this blog, I’m sure you are aware of Exploratory Data Analysis or EDA. If not, then in simple words, it’s an approach for maximizing insights on the data before formal modeling. There is a reason, however, why we perform EDA, some include:

to guide hypothesis testing
assess our assumptions about data
identify essential features of data
uncover hidden structures
graphical analysis

And the list goes on and on, but one thing EDA can be done is for unsupervised learning methods. What is unsupervised? When your data does not have a target or output variable. In other words, your data is filled with features and nothing to map these features, unsupervised include famous methods such as cluster analysis, and of course, PCA or Principal Component Analysis.

Principal Component Analysis

Why did I mention it with EDA and not as a model itself? Because it can be part of EDA and it can be an unsupervised learning method as well, just depends on how you use it.

To get a better understanding of what PCA is let us break down the term principal component. The term principal component means a linear combination of the predictor variables. The idea in PCA is to combine multiple numeric predictor variables, also known as features, into a smaller set of variables, which are weighted linear combinations of the original set.

In simpler words, I have 6 columns, of my features, then I go ahead and make these 6 columns into 3/2 columns that “explain” that explains most of the variability of all the 6 columns, reducing the dimension of the data.

variability here means the extent to which data points in a statistical distribution or data set diverge — vary — from the average value, as well as the extent to which these data points differ from each other. Because often, variables will vary together, and a variation in one variable is duplicated by a variation in another.

Let’s take an example of two variables/features X and Y. Now, for these two variables there are two PCs of principal components PCi ( i = 1 or 2 ).

PCi= wi,1 X + wi,2 Y

The weights wi,1 and wi,2 are known as the component loadings, and these transform the original variables into PCs (principal components).

The first principal component, PC1 is the linear combination that best explains the total variation, and the second principal component PC2 is orthogonal to PC1 and explains as much of the remaining variation as it can.

if there were additional components, each additional one would be orthogonal to others

In general, PCA is about generating new variables so that they are better at explaining our data, and PCs have 3 properties:

unique
orthogonal
linear combination of the original attributes

Let’s take an Example. Consider that our data has 4 attributes/variables/features. Hence, we derive 4 PCs.

The PCs are ranked according to the variance, where the first PC is the highest variance and the second goes down in variance, and so on.

Now, our data, instead of having the 4 columns which originally exist, it should have 4 columns, which are PC1, PC2, PC3, and PC4. Each row of the column is a score for the transformed attribute values for each data point.

Visual Data Exploration by PCA

The most common use case for PCA is for feature extraction, and the other common use case for PCA is for dimension reduction.

Since the PCs are ranked by variance as seen above in Fig 2. the top PCs are signals, and the bottom PCs are background noise. which arise due to little variation or background noise.

From the plot in Fig 2. and the transformed dataset, how do you select the number of PCs to keep?

Well, one trick is to look at the plot of variance also known as the elbow method, which tells us how many PCs to keep depending on the curve, which forms the elbow.

This method ensures we have the right number of principal components for our analysis.

As far as dimension reduction is concerned, you can interpret the data as such. You have 4 columns and no target column, and you would like to find patterns and analyze the data to find hidden structures, well the first step for this is to plot the data. But how are you going to plot 4 columns in a 2 axis plot? Well, PCA is your answer, you can refer to Fig 1. where you have X and Y and you have PC1 and PC2, now you can go ahead and go about with your cluster analysis.

If you are interested, you can read my other blogs to understand the theory behind machine learning

k-NN — Love thy neighbor

This probably might be one of my shortest blogs ever. Something so short might not seem credible, or it does feel like…

medium.com

Support Vector Machines — What Are They?

Are they really machines? What are these SVMs people keep talking about? SVMs are quite the popular models out there…

medium.com

Getting started with Neural Networks: For dummies

Here’s the thing about Neural Networks a.k.a NN, it is a stone for a lot of up-and-coming development within ML. It…

medium.com

Tree Algorithms: Decision Trees

One of the most popular models out there is decision trees or just “trees” used for classification and regression. But…

medium.com

Principal Component Analysis, But Why?

Principal Component Analysis

Visual Data Exploration by PCA

k-NN — Love thy neighbor

This probably might be one of my shortest blogs ever. Something so short might not seem credible, or it does feel like…

Support Vector Machines — What Are They?

Are they really machines? What are these SVMs people keep talking about? SVMs are quite the popular models out there…

Getting started with Neural Networks: For dummies

Here’s the thing about Neural Networks a.k.a NN, it is a stone for a lot of up-and-coming development within ML. It…

Tree Algorithms: Decision Trees

One of the most popular models out there is decision trees or just “trees” used for classification and regression. But…

Written by Bhanu Kiran