Principal Component Analysis(PCA) Simplified

Problem Statement


Benefits of Dimension Reduction

  • Consumption of less computational resources.
  • Faster running models.
  • Improvement of your model performance.
  • Better Data Visualization.

Applying PCA

  1. Manually calculating and generating the principal components. — PCA has a mathematical approach to it. We will generate principal components manually in order to fully understand the concept.
  2. Using the scikit-learn library — We’ll leverage the scikit-learn library which automatically outputs and generates the principal components for us. This is what you will ideally use when creating a machine learning model. But it is important to understand the concept first using method 1.

Steps to perform PCA

  1. Standardization
  2. Covariance Matrix
  3. Eigen Decomposition
  4. Sort By Eigen Values
  5. Choose your Principal Components


Covariance Matrix

Sort by Eigen Values and Choosing Principal Components

Manually calculating and generating the principal components

  1. Load your data.
[6., 3., 2.],
[3., 2., 7.],
[5., 4., 2.],
[1., 4., 3.],
[7., 3., 1.0],
[5., 1., 8.],
[4., 2., 2.],
[8., 6., 6.],
[6., 3., 2.],
[7., 1., 1.]])

PCA using Scikit- Learn

  1. Load your data — we will use pandas inbuilt dataset for wine.

When to use PCA

  1. When you want to reduce the number of your variables but are not able to clearly identify the variables you want to remove.
  2. When you want to make sure your variables are independent of each other.



