A High-Level Introduction to Dimensionality Reduction and Principal Component Analysis (PCA)

Sean Gahagan
2 min readOct 21, 2022

--

My last note shared how to prioritize which parts of a machine learning system are most important to work on using ceiling analysis. This note shares a quick high-level overview of two related topics: dimensionality reduction and principal component analysis (PCA).

Dimensionality Reduction

Dimensionality reduction is a way of compressing data and making it easier to visualize by “flattening” it into a fewer number of dimensions. For example, if you have a 3D cloud of data points, you could project it onto a 2D plane. In a more extreme case, you could take a 50-dimensional data set and reduce it to 3D to help you visualize it and help you use it in an algorithm more efficiently.

Principal Component Analysis (PCA)

PCA is a specific type of dimensionality reduction where you conduct feature scaling and normalization, and solve for the “best fit” object (e.g., line or plane) to project the features onto. With PCA, you’re also able to reconstruct the data back from the compressed representation.

PCA can be used to speed up supervised learning by generating a cheaper (simpler and easier to run) data training set, and also helps you visualize multi-dimensional data sets as a 2D, 3D, (or maybe even 4D) plot.

Up Next

This note concludes the topics from my recap of Andrew Ng’s Machine Learning course. My next note will share a final consolidated recap of the topics in the series.

Past Notes in this Series

  1. Towards a High-Level Understanding of Machine Learning
  2. Building Intuition around Supervised Machine Learning with Gradient Descent
  3. Helping Supervised Learning Models Learn Better & Faster
  4. The Sigmoid function as a conceptual introduction to activation and hypothesis functions
  5. An Introduction to Classification Models
  6. Overfitting, and avoiding it with regularization
  7. An Introduction to Neural Networks
  8. Classification Models using Neural Networks
  9. An Introduction to K-Means Clustering
  10. Anomaly detection with supervised learning
  11. An Introduction to Machine Learning for Content-Based Recommendations
  12. Machine Learning with Real-Time Data via Online Learning
  13. An Introduction to Optical Character Recognition (OCR)
  14. Ceiling Analysis for Prioritizing Work on ML Systems

--

--