VC: PCA (Principal Components Analysis)

Jeheonpark
The Startup
Published in
4 min readSep 4, 2020

--

I explained about how we visualize high dimensional data in the 2D graph. It was not about the dimensionality reduction. This post and the following posts will explain how we embed high-dimensional data into a low dimensional space. I will explain PCA, Kernel PCA, MDS, ISOMAP, and t-SNE. This post will show you all about PCA and Kernel PCA.

PCA

Image from Wikipedia

Basically, PCA is trying to find the best basis. It is orthogonal to each other and the best representative based on its variation. The distance from the PCA basis to original data points is minimized and the next PCA basis will not take into account the previous variation, this point makes the orthogonal basis. This is the basic intuition of PCA.

Goals of Principal Component Analysis

First of all, we need to standardize the original data because it's the absolute data size affects the result of PCA. Changing the basis is just mapping in linear algebra language. U in the equation is what we are trying to find in order to change the basis. The condition of U is to minimize the approximated error. In this way, the variance in the projected data remains maximal because we project x onto the basis keeping the variance. However, it has a limitation the PCA only considers the linear relationship. I will elaborate on these steps.

--

--

Jeheonpark
The Startup

Jeheon Park, Software Engineer at Kakao in South Korea