PCA Clearly Explained -When, Why, How To Use It and Feature Importance: A Guide in Python

In this post, I explain what PCA is, when, and why to use it, and how to implement it in Python using scikit-learn. Also, I explain how to get the feature importance after a PCA analysis.

Serafeim Loukas, PhD
Geek Culture

--

Handmade sketch made by the author.

1. Introduction & Background

Principal Components Analysis (PCA) is a well-known unsupervised dimensionality reduction technique that constructs relevant features/variables through linear (linear PCA) or non-linear (kernel PCA) combinations of the original variables (features). In this post, we will only focus on the famous and widely used linear PCA method.

The construction of relevant features is achieved by linearly transforming correlated variables into a smaller number of uncorrelated variables. This is done by projecting (dot product) the original data into the reduced PCA space using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The resulting projected data are essentially linear combinations of the original data capturing most of the variance in the data (Jolliffe 2002).

--

--

Serafeim Loukas, PhD
Geek Culture

Data Scientist @ Natural Cycles (Switzerland). PhD, MSc, M.Eng. Bespoke services on demand