Understand Principal Component Analysis And Implement It From Scratch

Published in

The Startup

4 min readAug 25, 2020

Principal component analysis is a technique used for dimensionality reduction. It’s widely used for data visualization by extracting information from a dataset with n features (e.g n dimensions) and representing this dataset in a space with lower dimensions (e.g m dimensions with m<n). So, it’s basically a technique that allow us to reduce dimensions for better visualizations while preserving information as much as possible.

Actually a lot of people find a hard time understanding this technique and interpreting it. For this reason, I’ve decided to break it down with a practical implementation through these steps :

Dataset description :

For this task we choose the classic breast cancer dataset for its simplicity in order to show how PCA is useful for visualization. Breast cancer dataset holds 12 features as characteristics of the cell nuclei present in the image and label classifying type of tumor diagnosed as malignant 0 and benign 1 :

The features labeled 0 represent 37% of the dataset while the features labeled 1 represent more than 62% of the data :

Understand Principal Component Analysis And Implement It From Scratch

Dataset description :

Written by Omar Boufeloussen