A High-Level Introduction to Dimensionality Reduction and Principal Component Analysis (PCA)

2 min readOct 21, 2022

My last note shared how to prioritize which parts of a machine learning system are most important to work on using ceiling analysis. This note shares a quick high-level overview of two related topics: dimensionality reduction and principal component analysis (PCA).

Dimensionality Reduction

Dimensionality reduction is a way of compressing data and making it easier to visualize by “flattening” it into a fewer number of dimensions. For example, if you have a 3D cloud of data points, you could project it onto a 2D plane. In a more extreme case, you could take a 50-dimensional data set and reduce it to 3D to help you visualize it and help you use it in an algorithm more efficiently.

Principal Component Analysis (PCA)

PCA is a specific type of dimensionality reduction where you conduct feature scaling and normalization, and solve for the “best fit” object (e.g., line or plane) to project the features onto. With PCA, you’re also able to reconstruct the data back from the compressed representation.

PCA can be used to speed up supervised learning by generating a cheaper (simpler and easier to run) data training set, and also helps you visualize multi-dimensional data sets as a 2D, 3D, (or maybe even 4D) plot.

Up Next

This note concludes the topics from my recap of Andrew Ng’s Machine Learning course. My next note will share a final consolidated recap of the topics in the series.

A High-Level Introduction to Dimensionality Reduction and Principal Component Analysis (PCA)

Dimensionality Reduction

Principal Component Analysis (PCA)

Up Next

Past Notes in this Series

Written by Sean Gahagan