Photo by Johannes Plenio on Unsplash

The power of Eigenvectors and Eigenvalues in dimensionality reduction techniques such as PCA

Pranavi Duvva
WiCDS
Published in
10 min readFeb 2, 2021

--

A no-code deep dive into the core concepts of Eigenvectors, Eigenvalues, and Principal Component Analysis

Here goes my first article, In this, I will be giving an overview about Eigenvectors and how they play a powerful role in feature extraction to achieve dimensionality reduction of the data.

I would further explain the complete working principle of the most commonly used unsupervised dimensionality reduction technique the Principal Component Analysis (PCA).

What is the curse of dimensionality?

With the unprecedented growth of data collection and the revolution of Big data. The datasets that we would deal with will have extremely high dimensions. The dimensions could go up to thousands or more which makes it challenging to visualize and analyze each of the features.

Having such an enormous number of features can often lead to errors like overfitting in the models. It also makes it harder to design regression models as the weights can tend to be highly unstable due to multicollinearity arising from high dimensions. This problem is termed as the ‘Curse of the dimensionality’ where we see an increase of error with the exponential increase of features.

This is where the Dimensionality reduction techniques such as PCA or LDA come into play. They help in transforming the dataset of high dimensions into lower dimensions without having to lose the essence of the data.

It looks like a magic wand transforming the data right?

Before we dive deeper into the magic…

Let’s get our basics right on the magic wand which are the Eigenvectors.

All measurable quantities fall into two categories scalar and vector.

Source: image by Author

Any quantity which has only magnitude and no direction is referred to as a scalar. Scalar quantities are those which can be represented just by a single number, such as volume, length, speed, temperature, etc.

for example, Consider the measurement of temperature the value obtained tells us about the magnitude and it does not have a direction. The speed at which a person is driving the car would essentially be a number that gives the magnitude but no sense of direction in it. Similarly time. Hence these are referred to as scalars.

Conversely, The quantities which have both magnitude and direction are called vectors.

A car moving towards North at a speed of 80 km/hr will now become a vector since it has both magnitude and direction.

Data Sets and Vectors

So, when we look at our data which is essentially a matrix containing rows and columns (features). The number of dimensions the dataset has will correspond to the number of features.

Each value taken individually will be a scalar. While the entire row becomes a vector. Each row will contain all the values for each feature.

Similarly, each column can be considered as a vector.

As the data can be represented in the form of vectors all the transformations which can be applied to the vectors will now hold well for the data set.

Eigenvectors and Eigenvalues

Eigenvectors are those unique or characteristic vectors that do not change their direction when a linear transformation is performed on them. It only changes by a scalar factor.

The corresponding value or the scalar number by which it is stretched or compressed due to the linear transformation is called the Eigenvalue this is denoted by λ.

For better understanding let’s identify the Eigenvectors with the help of the following image,

Mona Lisa eigenvector grid, Source: Creative Commons by Wikimedia

In the above image, which has undergone shear mapping, We observe that the red arrow changes direction, but the blue arrow does not. Hence, The blue arrow is an eigenvector of this shear mapping.

One of the key concepts of Eigenvectors is,

It states that The linear transformation of a n*n matrix A by a non-zero vector x will be equal to the vector x when multiplied by a scalar. This vector x is nothing but the Eigenvector and the scalar value is nothing but the Eigenvalue.

Eigenvalue equation, Source: Creative Commons by Wikimedia

This means we were able to represent the transformed matrix with the help of the Eigenvector. The number of Eigenvectors we can expect for a n*n matrix will be n.

This concept becomes extremely powerful when dealing with high-dimensional data.

In simple words, When you transform a matrix by a vector and if the transformed vector is just the scaled version of the original one with no change in direction then this vector is called the Eigenvector.

Eigenvectors and Eigenvalues play a crucial role in representing matrix in simplified form.

Mathematical expression to calculate Eigenvalues and Eigenvectors,

Let’s now take the help of Wikipedia for the formula

Consider a square matrix A and non zero vector v,

Av= λv

For the equation to hold good and λ to be an eigenvalue, the vector v has to be non-zero.

To calculate the eigenvalue we will now use the identity matrix

Quick note: Identity matrix is a matrix where all the diagonal elements are 1 and all the other elements are zero.

Av= λIv

Which can be written as,

Av- λIv=0

( A- λI)v=0

when the vector v is non zero we can calculate λ with the help of determinant

| A- λI|=0

Solving the above expression will give us a quadratic expression from which we will be able to determine the Eigenvalues.

With the help of the Eigenvalues, we will now be able to calculate the Eigenvectors with the expression

Av- λv=0.

Now that we have understood how to calculate the Eigenvalues and Eigenvectors

Let’s try to calculate the Eigenvalues and Eigenvectors for a 2-D and a 3-D matrix using python

To do so, we will first have to import the Numpy and the linear algebra package which is linalg in Python.

With the knowledge of the magic wand in hand let’s dive deeper,

Principal Component Analysis

Introduction

Principal Component Analysis is an unsupervised dimension reduction technique that focuses on capturing maximum variation of the data.

Quick note: Unsupervised learning Algorithms are those which are trained without the target variable. While Supervised Learning Algorithms are those in which the data set will have a target or a labeled column.

How does PCA achieve dimension reduction?

1.Principal Component analysis reduces high dimensions into low dimension subspace by creating a new set of components that carry most of the essential information of all the features.

2.These new components are a linear combination of all the features and the components thus formed are nothing but the eigenvectors which are now called the Principal components.

3.The eigenvalue corresponding to each of these eigenvectors will tell us about how much variation in the data has been captured by that particular Principal component.

4. Principal components are orthogonal to each other and are uncorrelated that increases maximum variance. As they are uncorrelated it solves the problem of multicollinearity.

Let us now understand the working of Principal Component Analysis with an example

The working of PCA has been divided into 6 sub-sections.

1.Preprocessing Data

1.For ease of understanding consider a data set with 2 features

Source: Image by Author

2.The data set contains two features Maths and Physics with marks of 6 students.

3.After we scale the data using standard scalar we will obtain the following values where the data is centered around the origin

Source: Image by Author

Quick note:

To bring all the features on to a single scale, we have to preprocess the data using any of the scaling techniques that suit best for the data set you are dealing with.

After scaling the data set, the data points will now be centered around the origin.

4.Let us now plot the data on to the graph

Data points plotted on the graph,Source: Image by Author

Note: Scaling the data will not change the relative distance between any two points

2. Identifying the Principal Components

To identify the principal components, PCA will find the best fit line which explains most of the variation in the data

Step-1

It starts with a random line and keeps rotating until the line best fits the data given that the line has to pass through the origin.

Source: Image by Author

Step 2:

Ultimately it finds the best fit line which explains maximum variation

Source: Image by Author

3. So, How does PCA arrive at the best fit line?

To understand these let’s get back to the first random line

Working:

1. PCA projects data points onto the line and it calculates the distance.

2. The distance can be calculated either by the distance between the point and line or it can calculate the distance from the origin and the projected point.

3. Based on this distance it identifies the best fit line.

4. Ideally, The best fit line is the one that has a minimum distance from the point to the line or Maximum distance from the origin to the projected point.

Source: Image by Author

Mathematical calculation:

  1. Let’s now consider a single point on the graph
Source: Image by Author

2.The distance from the origin and this point a is fixed and it cannot be changed. We also observe a right angle between sides b and c.

From the Pythagorean’s theorem, we know,

c²+b²=a²

Here b and c are inversely proportional.

That is if b (distance between the point and random line) gets bigger c (distance between origin and projected point ) gets smaller.

So, if we increase c we will be able to reduce the distance between the random line and point a.

PCA finds the best fit line by maximizing the distances from the origin to the projected point.

3.Let the distance identified by PCA between each of the projected points and the origin be d1,d2,d3,d4,d5,d6

now, these distances are squared so that the negative values do not cancel out.

4.PCA calculates the sum of squared distances of all the projected points to this random line.

5. It keeps rotating this axis until it finds the maximum squared distance.

This line is called the best fit line and it is the First Principal Component

Hurray! We identified our first principal component

4.Finding the singular Vector for Principal component 1

  1. Let the slope of the first principal component be 0.25

which is ¼ (slope=y/x) which means for every 4 units of x, y changes by 1 unit. This means the data for principal component 1 is mostly spread out on the X-axis.

Blue arrow refers to the unit vector of PC1, Source: Image by Author

From the Pythagorean’s theorem we have

c²+b²=a²

(1*1 )+(4*4)=a²

a=4.12 (length of the blue line)

To get unit one length for a, we divide all the sides by 4.12

Therefore we get

a=1,

b=4/4.12=0.97

c=1/4.12=0.24

Now we have achieved our singular vector.

Eigenvector of PC1, Source: Image by Author

2. This unit-long vector consisting of 0.97 parts of Maths and 0.24 parts of physics is called singular vector or the eigenvector for the Principal component 1.

3.The proportions are called the loading factors.

4. The sum of squared distances is the Eigenvalue.

5. Identifying the Principal Component 2

  1. Principal Component 2 is nothing but the line through the origin and is orthogonal to Principal component 1 without any further optimization.

Since this a 2-D graph we can visualize the PC2

Source: Image by Author

2. As The slope of PC1 was 0.25 the slope of PC2 will be -0.25.

3. When we scale to get the unit vector of PC2 it will contain -0.24 parts of Maths and 0.97 parts of physics this is called singular vector or the eigenvector for the Principal component 2.

4. The values-0.24 and 0.97 will be the loading scores of PC2.

5. The Eigenvalues in PC2 are the sum of squared distances from the origin to the projected point in PC2.

6. Calculating the variation captured by the Principal Components

1. Variation for PC1=Sum of squared distances or the Eigenvalue for PC1/n-1

2. Variation for PC2=Sum of squared distances or the Eigenvalue for PC2/n-1

For Example,

Consider the variation captured by PC1 is 20 and the variation captured by PC2 is 3

Total Variation=20+3=23

Percentage of variation captured by PC1=20/23=0.869=86.9%

Percentage of variation captured by PC2=3/23=0.130=13%

Principal Component 1 has captured a maximum variation of 86.9%.

With the help of just the first principal component, we were able to describe 86.9% variation in the data for the above dataset.

The Principal Component Analysis extends the same principle to any number of dimensions. The number of principal components we can achieve for a particular dataset will be equal to the number of features.

In order to identify the optimum number of Principal Components, we can use the scree plot or check the variance explained by each of the principal components.

Conclusion

This article has explained what dimension reduction is and the problems associated with high dimension data. To achieve dimension reduction, we understood the core concepts of Eigenvalues, Eigenvectors and how they play a critical role in it. We then understood the complete working of the unsupervised learning dimension reduction technique which is the Principal Component Analysis with an example.

In case you still find the concept of PCA difficult to understand, I would recommend you to watch the Stat quest video on PCA

I hope you found this article useful.

Thanks for reading!

References

1. https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors

2. https://youtu.be/FgakZw6K1QQ

--

--