Principal Component Analysis — 18 Questions Answered

One-stop place for your most of the questions regarding PCA

Rukshan Pramoditha
Data Science 365
Published in
9 min readMar 18, 2022

--

Photo by Parrish Freeman on Unsplash

Principal Component Analysis (also called PCA) is one of the most essential topics in the fields of data science and machine learning. It has so many uses so that it is a trending topic in search engines.

I’ve already published many articles about this topic. From theory to practical implementation, I’ve covered most of the parts of this topic.

Today, I’m going to answer the questions you might have about PCA. So, today’s article is in the Q&A format. This article will be the one-stop place for your most of the questions regarding PCA. This is also a great summary of the articles that I’ve already published on PCA.

So, I invite you to read this article from beginning to end! Without further delay, let’s begin the Q&A session in PCA.

Question 1

What is PCA?

PCA is a linear dimensionality reduction technique (algorithm) that transforms a set of correlated variables (p) into a smaller k (k<p) number of uncorrelated variables called principal components while retaining as much of the variance in the original dataset as possible — Source: 11 Dimensionality reduction techniques you should know in 2021 (my own article).

Question 2

What will happen to the dataset after applying PCA?

According to the above definition of PCA, the following things happen to the dataset after applying PCA.

  • PCA transforms original data into a completely new set of values.
  • PCA reduces the number of features (variables) in the original dataset. If the original dataset has a p number of variables, the transformed dataset has a k (k is much less than p) number of variables after applying PCA.
  • The original dataset has correlated variables. In other words, variables are highly correlated with one or more of the other variables in the dataset. This is known as multicollinearity. PCA eliminates multicollinearity. So. the transformed dataset has uncorrelated variables after applying PCA. You can easily prove this by visualizing the correlation matrices of the…

--

--

Rukshan Pramoditha
Data Science 365

2,000,000+ Views | BSc in Stats | Top 50 Data Science, AI/ML Technical Writer on Medium | Data Science Masterclass: https://datasciencemasterclass.substack.com/