Principal Component Analysis (PCA) in Feature Engineering

Abhinaba Banerjee
Geek Culture
Published in
7 min readDec 5, 2022

--

Photo by Giorgio Tomassetti on Unsplash

The article will explain the concepts and uses of Principal Component Analysis(PCA) and code implementation.

Principal Component Analysis (PCA) is a statistical procedure that uses a technique to convert a set of correlated variables to a set of uncorrelated variables. In the abalone dataset, there are features such as Height and Diameter. An abalone is a creature like an oyster. The concept here is to transform the variables above into a set of other variables known as axes of variation.

The longer axis can be called the “Size” component: small height and small diameter (lower left) in contrast to that of the large height and large diameter (upper right). The shorter axis can be called the “Shape” component: small height and large diameter (flat shape) contrasted with large height and small diameter (round shape).

Axes of variation defined (Image from Kaggle course)

So instead of describing abalones by their Height and Diameter, they could be described by their Size and Shape. This, in fact, is the whole idea of PCA: instead of telling the data with the original features, we describe it with its axes of variation. The axes of variation become the new features.

--

--