Dimension Reduction: A Dive into PCA

5 min readDec 28, 2023

A common adversary in data analysis is the curse of dimensionality. The proliferation of variables complicates computations, demands substantial resources, and often leads to the ‘Curse of Dimensionality.’ To combat this, we delve into dimensionality reduction, a powerful technique to transform complex data into a more manageable form while retaining essential information.

3D to 2D conversion visualization — Unrolling higher dimensions to be represented in lower dimensions

The Duo: Feature Selection and Feature Extraction

Feature Selection: A strategic process that sieves through the feature set, discarding redundant or irrelevant variables. Techniques like stepwise regression, Elastic Net, and LASSO Regularization come to the fore, simplifying models without significant information loss.
Feature Extraction: This method crafts new, informative attributes from the original set. Feature extraction, exemplified by PCA (Principal Component Analysis), unravels patterns in high-dimensional data, offering a reduced set of features while preserving maximum information.

Advantages of Dimension Reduction

Efficiency: Accelerates model training with reduced computational burden.
Interpretability: Simplifies models, making them more accessible for interpretation.
Curse of Dimensionality: Mitigates issues associated with high-dimensional datasets.
Noise Reduction and Visualization: Enhances clarity in data analysis and visualization.
Facilitates Other Analyses: Acts as an intermediate step for subsequent analyses.

Note: Techniques like PCR, Ridge Regression, and Lasso Regression are applied post-standardization, ensuring each variable has a mean of zero and a standard deviation of one.

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) emerges as a stalwart in the realm of unsupervised learning. Operating on datasets with a multitude of correlated features, PCA illuminates the path to a lower-dimensional space, distilling the essence of the original data.

PCS summarizes a large set of correlated variables into a smaller number of representative variables called the ‘**principal components**’.

How PCA Works: A Glimpse

Information Preservation: PCA strives to retain the maximum data variation while minimizing dimensionality.
Visualization Power: Enables multidimensional data visualization through EDA ‘Biplots.’
Principal Component Regression (PCR): Forms the basis for supervised predictive models.

Principal Components

PC1: The first principal component, a derived variable, captures the most variance in the data. It defines the dimension along which observations exhibit the most variability.
PC2: The second principal component elucidates the remaining variance once the influence of the first component is removed. PC1 and PC2 are orthogonal, ensuring zero correlation.

PC1 and PC2 components of a PCA analysis — Green line = PC1 and Blue dashed line = PC2

Advantages of PCA

Efficient Information Summarization: Summarizes correlated variables into a compact set of principal components.
Dimensionality Reduction: Transforms data into a lower-dimensional space, easing the computational load.
Multivariate Data Visualization: Unleashes the power of EDA ‘Biplots’ for insightful visualization.

The blue dot represents the point — (mean pop, mean ad). The green line (PC1) represents the direction along which there is greatest variability (maximizes the variance of the projected observations). (from ISLR book)

The Challenge of Multitudinous Features

The task of scrutinizing scatterplots for every conceivable pair of features becomes a daunting challenge when dealing with a large number of features ‘p.’ Beyond the practicality concerns, these scatterplots offer only fragmented insights, capturing mere fractions of the total dataset information. Here, Principal Component Analysis (PCA) emerges as a beacon of efficiency, unraveling a low-dimensional representation that encapsulates maximal information. Instead of drowning in an avalanche of scatterplots, PCA computes principal components that serve as insightful summaries of the entire dataset. These components form the foundation for constructing low-dimensional views of the data in the form of a ‘biplot.’

Decoding the Biplot: A Visual Odyssey

A biplot is no ordinary graph; it elegantly overlays a ‘score plot’ with a ‘loading plot,’ providing a comprehensive snapshot of both samples and variables in a single graphical masterpiece. The orange arrows, representative of the first two principal component loading vectors, adorn the top and right axes, revealing the directions in feature space where the data exhibits the most variability. Meanwhile, the blue labels signify the scores for the first two principal components, succinctly summarizing the combined variables.

Biplot exploratory graph obtained by Principal Component Analysis — **Biplot** exploratory graph obtained by PCA (ISLR statistical learning book)

In many scenarios, the first two principal components take center stage, becoming protagonists in the creation of a captivating two-dimensional biplot. This visual narrative facilitates the identification of clusters, revealing closely related data points and steering us away from the complexities of the higher-dimensional space.

Key Insights from Biplot:

Score Plot: Offers concise single-number summaries (principal component scores) for each observation.
Loading Plot: Unveils the influence of each feature on a specific principal component, gauged by the magnitude of the loading vectors.

The Scree Plot: Guiding the Way

While the biplot is a visual symphony, the scree plot serves as a strategic guide. Displaying the proportion of variance explained (PVE) by each principal component, the scree plot aids in pinpointing the sweet spot — the optimal number of principal components needed to capture substantial variation in the data. The ‘elbow point’ in the scree plot becomes the compass, directing us toward an optimal balance between information retention and dimensionality reduction.

A scree plot depicting the proportion of variance explained by each PCs — A **scree plot** depicting the *proportion of variance explained* by each of the principal components. We choose the smallest number of principal components that are required to explain a sizable amount of the variation in the data (i.e., elbow point in the scree plot).

The Proportion of Variance Explained is the yardstick for assessing the effectiveness of each principal component. It quantifies the amount or percentage of the total variance encapsulated by an individual principal component, offering insights into the degree of information retained.

Conclusion

In the journey to unravel complex data, PCA stands as a beacon, illuminating the path to efficiency, interpretability, and clarity. The fusion of biplots, scree plots, and the elegance of PCA provides a visual journey through the data labyrinth, transforming complexity into clarity. The challenges posed by feature explosion are met with strategic insights and efficient computations. The journey from high-dimensional complexity to clarity becomes a reality, unlocking the true potential of the data at hand.