Principal Component Analysis (PCA) in Feature Engineering

Published in

Geek Culture

7 min readDec 5, 2022

The article will explain the concepts and uses of Principal Component Analysis(PCA) and code implementation.

Principal Component Analysis (PCA) is a statistical procedure that uses a technique to convert a set of correlated variables to a set of uncorrelated variables. In the abalone dataset, there are features such as Height and Diameter. An abalone is a creature like an oyster. The concept here is to transform the variables above into a set of other variables known as axes of variation.

The longer axis can be called the “Size” component: small height and small diameter (lower left) in contrast to that of the large height and large diameter (upper right). The shorter axis can be called the “Shape” component: small height and large diameter (flat shape) contrasted with large height and small diameter (round shape).

Axes of variation defined (Image from Kaggle course)

So instead of describing abalones by their Height and Diameter, they could be described by their Size and Shape. This, in fact, is the whole idea of PCA: instead of telling the data with the original features, we describe it with its axes of variation. The axes of variation become the new features.

Principal Component Analysis (PCA) in Feature Engineering

Written by Abhinaba Banerjee