Principal Component Analysis

Karteek Menda
5 min readDec 18, 2023

Hello Aliens…..

In continuation of my commitment to regular blogging, as indicated in the previous post, I am pleased to present another blog that I believe will be beneficial for the community.

This blog aims to offer insights into principal component analysis by providing you with valuable information on the topic.

What does PCA do?

It seeks to identify the direction that maximizes data variation, effectively explaining a significant amount of information.

For example: During a survey conducted at five different houses, we discovered the following:

We have “Number of Noses” on the X-axis and “Number of Necks” on the Y-axis. In order to encompass all the information, diverse features such as (1, 1), (2, 2), (3, 3), (4, 4), and (5, 5) are utilized. However, by examining these data points from a specific perspective and introducing a new axis, which we refer to as the “number of people," a more insightful representation is achieved.

This tells us that House 1 has one person, House 2 has two, and so on.

We effortlessly transformed this two-dimensional data into a one-dimensional format.

How does PCA actually work?

Let’s say we have a dataset of two features.

Step — 1:

Find the centroids of this dataset, which has two features.
Centroid = (Mean Value of all X coordinates, Mean value of all Y coordinates)
Green-colored
dot indicated above is the Centroid.

Step — 2:

We move the entire dataset in a way that positions the centroid precisely at the origin (0, 0), all while maintaining the relative direction of each data point.

Thus, we can effectively relocate the dataset to a condition where the centroid is situated at the origin.

Step — 3:

Therefore, we draw multiple lines passing through the origin and identify the line that maximizes its ability to capture information.

Now arises the question: How do we determine the line that maximizes the amount of captured information?

a. Imagine that this below is one of the contender lines

b. Project all the points onto this specific line in a perpendicular manner
c. Compute the Euclidean distance between each projected point and the origin for all the data points
d. Next, square each of these distances and sum them up

Now, we calculate the sum of squared distances for each and every line (Step 3). The line that gives the maximum value for that is principal component 1. And its corresponding Eigan value is:

Where n is the number of data points

Eigan value of PC 1 is 8.59
Eigenvalues provide a measure of the extent of variation around the line.

Step — 4:

How do I find the Eigan vectors?
Now that we have found the line that maximizes the variation around the central point, we have the equation for that line. So, let's say we have the line equation.

By employing the Pythagorean Theorem, we determine the hypotenuse of the right-angled triangle to be 1.47. Subsequently, to ensure that this specific vector becomes a unit vector, we normalize it by dividing all three sides by 1.47. Following this adjustment, the new hypotenuse becomes 1, with the sides measuring 0.68 and 0.73, denoted as the loading scores of PC-1. These loadings indicate the significance of a specific feature for a given principal component, reflecting the importance of that feature in contributing to the principal component’s variation.

The principal components are oriented orthogonally to each other and intersect at the origin. Consequently, when deriving PC-2 after obtaining PC-1 in a two-dimensional space with features X and Y, PC-2 represents the perpendicular line to PC-1, ensuring that both components pass through the origin.

So, the equation of the line (PC-2) is

Eigan Value of PC-2 is 0.46

The principal components are essentially linear combinations of the original features. For instance, if there are five distinct features, there would be five principal components, each perpendicular to the others. This orthogonal arrangement resolves the issue of multicollinearity, ensuring that the principal components are linearly independent.
Recall: Eigan value of PC-1 is 8.59 , signifying that a larger proportion of information is encapsulated in PC- 1, whereas PC-2 captures a relatively smaller amount of information. Consequently, the eigenvalue and corresponding principal component value are diminished for PC-2.

To examine the individual contributions of eigenvalues
for PC-1

for PC-2

Lets plot this to see the variation captured by each of the two PC’s

Based on the depicted plot, we can infer that utilizing only PC1 allows us to capture 95 percent of the total variation in the data.

Advantages of PCA:
1.
Improves algorithmic performance
2.
Reduction of dimension of data
3.
Removes correlated features
4.
Reduces high variance (Overfitting)
5.
Improves visualization

Disadvantages of PCA:
1.
Less Interpretable
2.
Loss of information
3.
PCA is a linear dimensionality reduction technique, but not all real-world datasets may be linear

Thanks for reading the article! If you like my article, do 👏 this article. If you want to connect with me on LinkedIn, please click here.

I plan to share additional blog posts covering topics such as robotics, drive-by-wire vehicles, machine learning, deep learning, etc..

Stay tuned.

This is Karteek Menda.

Signing Off

--

--

Karteek Menda

Robotics GRAD Student at ASU, Project Engineer in Dynamic Systems and Control Lab at ASU, Ex - Machine Learning Engineer, Machine Learning Blogger.