Understanding the Role of Eigenvectors and Eigenvalues in PCA Dimensionality Reduction.

6 min readMar 26, 2019

In my freshman year of college, Linear Algebra was part of the first topics taken in Engineering Mathematics. I always skipped the section of Eigenvectors and Eigenvalues, due to poor understanding and didn’t see much use of it. In my recent research, I’ve come to see the practical application of them.

My reason for writing this article, is to break down the whole concept of Eigenvectors and Eigenvalues, both pictorially and theoretically. Also, to explain it’s application in real world data, when used in dimensionality reduction. Specifically in the Principal Component Analysis model.

Before we begin the first part, you must have prerequisite knowledge in the following :

Linear Transformation
Determinants
Linear Systems
Change of basis

QUICK START

PART 1:

What are Eigenvectors?

We know that vectors have both magnitude and direction when plotted on an XY (2-dim) plane. As required for this article, linear transformation of a vector, is the multiplication of a vector with a matrix that changes the basis of the vector and also its direction.

When a vector is plotted, it’s direction is along its span. Now, there are some special vectors, which when transformed linearly, their directions don’t change, that is, they don’t get knocked off their span (the line passing through it’s origin and tip). Instead they’re either squished or stretched.

Example of a vector plotted on the XY plane showing it’s span

This leads us to Eigenvalues.

What are Eigenvalues?

They’re simply the constants that increase or decrease the Eigenvectors along their span when transformed linearly.

Think of Eigenvectors and Eigenvalues as summary of a large matrix.

To understand the mathematical principle behind Eigenvectors and Eigenvalues, click here

The core of component analysis (PCA) is built on the concept of Eigenvectors and Eigenvalues.

PART 2:

How Eigenvectors and Eigenvalues come into practice in PCA.

What is Principal Component Analysis?

Principal Component Analysis (PCA) is the general name for a technique which uses sophisticated underlying mathematical principles to transform a number of possibly correlated variables into a smaller number of variables called principal components.

Principal component analysis is a technique for feature extraction — so it combines our input variables in a specific way, then we can drop the “least important” variables while still retaining the most valuable parts of all of the variables!.

They represent the directions in which the data has maximum variance and also the directions in which the data is most spread out. If we are given a large dataset with multiple features, in which it would be difficult to select which of the variables (features) are the most important in determining the target, this PCA plays a huge role.

2. How does PCA work?

Note that this article is not written to understand the in depth operation of PCA, but the role of Eigenvectors and Eigenvalues in it. See the steps here, if you’re considering learning every step to take.

Understanding how PCA works, we should have knowledge on the following:

Mean
Variance
Covariance
Standard deviation
Matrix transposition

Step 1 — Standardize:

By standardizing, we find the zero mean of each column, by subtracting the mean from each row for every column in our dataset. Next, we divide through by the standard deviation to have a specific range of numbers.

Step 2 — Calculating the Covariance:

Find the covariance matrix of the dataset by multiplying the the matrix of features by its transpose. It is a measure of how much each of the dimensions vary from the mean with respect to each other.

The covariance is measured between 2 dimensions to see if there is a relationship between the 2 dimensions, e.g., relationship between the height and weight of students. A positive value of covariance indicates that both the dimensions are directly proportional to each other, where if one dimension increases the other dimension increases accordingly.

A negative value of covariance indicates that both the dimensions are indirectly proportional to each other, where if one dimension increases then other dimension decreases accordingly.

If in case the covariance is zero, then the two dimensions are independent of each other.

Step 3 — Deduce the Eigens:

Suppose we have plotted a scatter plot of random variables, and a line of best fit is drawn between these points. This line of best fit, shows the direction of maximum variance in the dataset. The Eigenvector is the direction of that line, while the eigenvalue is a number that tells us how the data set is spread out on the line which is an Eigenvector.

Line of best fit drawn representing the direction of the first eigenvector, which is the first PCA component

The main principal component, depicted by the black line, is the first Eigenvector. The second Eigenvector will be perpendicular or orthogonal to the first one. The reason the two Eigenvectors are orthogonal to each other is because the Eigenvectors should be able to span the whole x-y area. Naturally, a line perpendicular to the black line will be our new Y axis, the other principal component.

Another line perpendicular to first eigenvector is the second PCA component

We are going to rotate our data to fit these new axes. But what will the coordinates of the rotated data be?

To convert the data into the new axes, we will multiply the original X, Y data by Eigenvectors, which indicate the direction of the new axes (principal components).

But first, we need to deduce the Eigenvectors (there are two — one per axis). Each Eigenvector will correspond to an Eigenvalue, whose magnitude indicates how much of the data’s variability is explained by its Eigenvector.

From the definition of Eigenvalue and Eigenvector:

[Covariance matrix].[Eigenvector] = [Eigenvalue].[Eigenvector]

Step 4 — Reorient the data:

Since the Eigenvectors indicate the direction of the principal components (new axes), we will multiply the original data by the eigenvectors to re-orient our data onto the new axes. This re-oriented data is called a score.

Step 5 — Plot re-oriented data:

We can now plot the rotated data, or scores

New axes of the dataset when re-plotted with the PCA components.

Step 6 — Bi-plot:

A PCA would not be complete without a bi-plot. This is basically the plot above, except the axes are standardized on the same scale, and arrows are added to depict the original variables, lest we forget.

Axes: In this bi-plot, the X prime and Y prime axes are the principal components.
Points: These are the X and Y points, re-oriented to the new axes.
Arrows: The arrows point in the direction of increasing values for each original variable.

But then, why does PCA work?

While PCA is a very technical method relying on in-depth linear algebra algorithms, it’s a relatively intuitive method when you think about it.

First, the covariance matrix ZᵀZ is a matrix that contains estimates of how every variable in Z relates to every other variable in Z. Understanding how one variable is associated with another is quite powerful.
Second, Eigenvalues and Eigenvectors are important. Eigenvectors represent directions. Think of plotting your data on a multidimensional scatterplot. Then one can think of an individual Eigenvector as a particular “direction” in your scatterplot of data. Eigenvalues represent magnitude, or importance. Bigger Eigenvalues correlate with more important directions.
Finally, we make an assumption that more variability in a particular direction correlates with explaining the behavior of the dependent variable. Lots of variability usually indicates signal, whereas little variability usually indicates noise. Thus, the more variability there is in a particular direction is, theoretically, indicative of something important we want to detect. (The setosa.io PCA applet is a great way to play around with data and convince yourself why it makes sense.)

Thus, PCA is a method that brings together:

A measure of how each variable is associated with one another. (Covariance matrix.)
The directions in which our data are dispersed. (Eigenvectors.)
The relative importance of these different directions. (Eigenvalues.)

PCA combines our predictors and allows us to drop the Eigenvectors that are relatively unimportant.

Hope it helps. Please let me know if you have any questions or feedback.