Unveiling the Hidden Gem: The Power of Principal Component Analysis

Vamsi K
Nybles
Published in
5 min readSep 4, 2023
Image of a 3-D dataset by Casey Cheng

I like to compare PCA with writing a book summary.

Finding the time to read a 1000-pages book is a luxury that few can afford. Wouldn’t it be nice if we can summarize the most important points in just 2 or 3 pages so that the information is easily digestible even by the busiest person? We may lose some information in the process, but hey, at least we get the big picture.

-Casey Cheng

In the world of data analysis, there are several unsung heroes — algorithms and techniques that quietly revolutionize the way we make sense of complex data. One such unsung hero is Principal Component Analysis (PCA). Despite its incredible utility, PCA often lurks in the shadows, overshadowed by more glamorous data science techniques. In this blog, we’ll shed light on PCA, explore its various uses, and dive into its inner workings.

The Basics of Principal Component Analysis

Before we dive into the underrated prowess of PCA, let’s start with the fundamentals. At its core, PCA is a dimensionality reduction technique. It takes high-dimensional data and transforms it into a lower-dimensional representation while retaining as much of the original variance as possible. But what does that mean?

Imagine you’re in a room filled with people, and you want to capture the essence of the group in a single photograph. PCA is like finding the best angle to take that picture so that you capture the most important aspects while minimizing background noise. In other words, it simplifies complex data without losing the essence of what makes it interesting.

The Versatility of PCA

  1. Data Compression: PCA is a powerful tool for reducing the dimensionality of datasets without losing too much information. This is incredibly useful for tasks like image compression and feature selection in machine learning.
  2. Noise Reduction: In signal processing, PCA can help filter out noise from data, leaving you with a cleaner and more accurate signal.
  3. Visualization: PCA is often used to visualize high-dimensional data in a two- or three-dimensional space. It’s like creating a map that helps you explore complex datasets visually.
  4. Feature Engineering: In machine learning, PCA can be applied to create new features that capture the most important information in the data, potentially improving model performance.
  5. Anomaly Detection: By capturing the most significant sources of variance, PCA can help identify outliers or anomalies in datasets.

It’s all cool! But how does it actually work?

Imagine you’re handed a basket of various fruits — apples, bananas, and oranges — all jumbled together. Your task? Sort them by similarity without ever tasting or peeling a single fruit. Sounds impossible, right? Well, that’s precisely the kind of magic Principal Component Analysis performs in the realm of data analysis. In this blog, we’re going to peel back the layers and reveal the enchanting inner workings of PCA, step by step.

Step 1: Standardization — Preparing the Canvas

PCA begins with standardization. Just as an artist prepares a canvas before painting, we must ensure that all variables (or features) in our dataset are on the same scale. This means giving each variable a mean of 0 and a standard deviation of 1. Why? Because we don’t want one variable dominating the analysis just because it has larger values.

Step 2: Covariance Matrix — Unveiling Relationships

Next, PCA calculates the covariance matrix. This matrix reveals how variables relate to each other and how much they vary together. Imagine it as a map showing which fruits tend to cluster together in our basket. The diagonals of this matrix represent the variance of individual variables, while the off-diagonal entries represent covariances between variables.

Step 3: Eigenvalues and Eigenvectors — The Magical Transformation

Here’s where the real magic happens. PCA computes the eigenvalues and eigenvectors of the covariance matrix. Let’s break this down:

  • Eigenvalues: These are like magical coefficients that tell us how much each eigenvector (a direction in our high-dimensional space) matters. High eigenvalues indicate that the corresponding eigenvectors capture a significant amount of variation in the data.
  • Eigenvectors: Think of these as the directions along which our data varies the most. Each eigenvector points to a different way of arranging the fruits in our basket. The first eigenvector captures the most variation, the second captures the second most, and so on.

Step 4: Choosing Components — Selecting the Best Arrangement

Now comes the artistic decision. We have our eigenvectors and their corresponding eigenvalues, but we can’t keep them all. PCA ranks these components based on the magnitude of their eigenvalues. The components with the highest eigenvalues are the most significant, capturing the most essential information in our data. The number of principal components you choose to keep depends on your specific goals. Typically, you aim to retain enough components to capture a significant portion (e.g., 95%) of the total variance.

If your’re still not convinced..

Let us understand with an example!

Original Swiss Roll Data (3D): The original Swiss Roll dataset is inherently three-dimensional, with data points distributed along a spiral-like structure in 3D space. Each data point has three attributes or dimensions, which are represented by the X, Y, and Z coordinates in the 3D plot. The color of each point indicates its position along the spiral.
PCA of Swiss Roll Data (2D): The PCA transformation applied to the Swiss Roll dataset has reduced its dimensionality from 3D to 2D. This means that each data point in the transformed dataset now has only two attributes or dimensions, represented by the X and Y coordinates in the 2D plot.

PCA identifies the directions (principal components) along which the data varies the most and projects the original data points onto these new dimensions. The first principal component (X-axis) captures the most variance in the data, and the second principal component (Y-axis) captures the second most variance while being orthogonal (uncorrelated) to the first one. Therefore, the transformed 2D data retains as much of the original data’s structure as possible while reducing dimensionality.

You can see that the Swiss Roll structure has been effectively unrolled and projected onto the XY plane. Data points that were originally close to each other on the spiral in 3D space are still close to each other in the 2D plot. The color mapping is preserved, indicating the original position along the spiral.

Conclusion

Principal Component Analysis is a quiet hero in the world of data analysis, overshadowed by flashier techniques but packing a punch of its own. Its ability to simplify complex data, reduce dimensionality, and reveal hidden patterns makes it an invaluable tool in various domains. So, the next time you find yourself drowning in high-dimensional data, remember the unsung power of PCA.

About Me

Myself Vamsi K, an undergraduate student at the Indian Institute of Information Technology, Allahabad. I love exploring the ever-fast-moving field of Machine Learning. When I’m not doing that, you could find me watching movies!

Connect with me on Linkedin.

--

--