Member-only story
2 Plots That Help Me to Choose the Right Number of Principal Components
Creating the cumulative explained variance plot and the scree plot in PCA
Choosing the right number of principal components is the most challenging part of PCA.
There are various methods to do that.
When calling the Scikit-learn’s PCA function, the number of components to be chosen is specified as a model hyperparameter.
from sklearn.decomposition import PCA
pca = PCA(n_components=?)
The n_components hyperparameter usually takes an integer that is always less than the number of input features in the dataset.
How can we determine the exact value for n_components for a given dataset?
Today, we’ll answer this question by creating two types of machine learning visualizations: Cumulative Explained Variance Plot and Scree Plot.
Let’s start with the scree plot.
Scree Plot
What is the scree plot in PCA?
PCA is performed by computing the eigenvalues of the covariance matrix of standardized data[ref¹].