The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Member-only story

Principal Component Analysis From Scratch in Python

5 min readMay 12, 2020

--

Photo by Markus Spiske on Unsplash

Principle Component Analysis(PCA), whilst being invented more than a century ago, has proven itself to be one of the most important and widely used algorithms in modern data science. With applications spanning visualization of high dimensional data, unsupervised learning and dimensionality reduction, its broad appeal has meant that it has become a mainstay in numeric computing and AI software libraries alike. While these implementations are often ubiquitous, free and efficient, their ease of use has meant that the intuition behind how the algorithm works is often skipped over in favour of instant gratification. In this article, I aim to revisit that intuition with a simple yet informative implementation in python and provide an example of how it can be employed in a data science pipeline.

What is it?

It is defined as the orthogonal transformation of the data into a series of uncorrelated principal components such that the first component explains the most variance in the data with each subsequent component explaining less.

This technique is particularly useful in processing data where multi-colinearity exists between features or when the dimensions of features are high.

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Mikhail Mew
Mikhail Mew

Written by Mikhail Mew

Researcher | Investor | Data Scientist | Curious Observer. Thoughts and insights from the confluence of investing and machine learning.

No responses yet