Understanding Principle Component Analysis

Published in

Analytics Vidhya

5 min readOct 31, 2019

Principle Component Analysis (PCA) is widely used in machine learning and data science. PCA finds a representation of the model’s data in a lower dimensional space without loosing a large amount of the information. This data compression process can be used for data visualisation and analysis and for speeding up machine learning algorithm’s and pipelines.

Let’s understand how PCA works, we have our model’s data:

where

The model has k data points, each one is an l-dimensional vector, we want to find a representation of the data in a lower dimensional space of dimension m<l.

We start by normalising the data across the l dimensions, for every dimension j we compute the mean across the data points:

And for every dimension j and data point i instead of taking:

we take the normalised value:

For simplicity we continue using the original notation:

for the normalised data points.

We define the data matrix X, the columns are our data points and the rows hold the l dimensions of the data:

For every pair of dimensions i,j =1,2,…,l we compute the covariance:

and define the data covariance matrix COV(X):

For example entry (1,2) is computed by:

using the mean values of the first and second dimensions:

Notice that COV(X) is symmetric since for every i,j=1,2,…,l we have:

Next we compute the eigenvalues:

and eigenvectors of the covariance matrix:

where

These eigenvectors are called the principle components of the data and represent the directions of significant variance in the data, principle component e1 is in the direction of the largest amount of variance, principle component e2 is in the direction of the second largest and so on.

For example, say our data has the form:

Then the principle components e1, e2 will be:

As can be seen e1 captures the direction of largest amount of variance in the data and e2 captures the direction of the second to largest.

The eigenvalues of the covariance matrix also hold valuable information about the data, they represent the amount of variance in the data in the directions of the principle components.

In the image below we have a pair of vectors v1, v2 in the directions of principle components e1, e2 correspondingly, as can be seen the spread of the data is captured by the magnitude of v1, v2.

Now that we have the covariance matrix eigenvalues and eigenvectors we can choose a lower dimension m<l and define the eigenvectors matrix E. The rows of E are the m eigenvectors capturing the largest amount of variance in the data, these eigenvectors are chosen by taking the eigenvectors corresponding to the m largest eigenvalues.

We multiply X by E and project our data from the l-dimensional space to the lower m-dimensional space spanned by the m principle components

For ease of notation we write

by staying true to our original notation we can write EX as

And our m-dimensional data set is

where

and we’re done!

Notice how for every i=1,2,…,k column i in the matrix EX in (1) contains the projections of the of data point xi in the directions of the m principle components, our notation in the matrix EX in (2) captures this, the k m-dimensional data points are the columns columns and the m principle component are represented in the rows.

Understanding Principle Component Analysis

Written by Nir Arbel