Data Science 365

Bring data into actionable insights.

Member-only story

2 Plots That Help Me to Choose the Right Number of Principal Components

6 min readMar 12, 2023

--

Image by Carlos / Saigon — Vietnam from Pixabay

Choosing the right number of principal components is the most challenging part of PCA.

There are various methods to do that.

When calling the Scikit-learn’s PCA function, the number of components to be chosen is specified as a model hyperparameter.

from sklearn.decomposition import PCA
pca = PCA(n_components=?)

The n_components hyperparameter usually takes an integer that is always less than the number of input features in the dataset.

How can we determine the exact value for n_components for a given dataset?

Today, we’ll answer this question by creating two types of machine learning visualizations: Cumulative Explained Variance Plot and Scree Plot.

Let’s start with the scree plot.

Scree Plot

What is the scree plot in PCA?

PCA is performed by computing the eigenvalues of the covariance matrix of standardized data[ref¹].

ref¹: An In-depth Guide to PCA with NumPy

--

--

Data Science 365
Data Science 365
Rukshan Pramoditha
Rukshan Pramoditha

Written by Rukshan Pramoditha

3,000,000+ Views | BSc in Stats (University of Colombo, Sri Lanka) | Top 50 Data Science, AI/ML Technical Writer on Medium

Responses (1)