How would Feature Engineering Work?

Harsha Vardhan S
Analytics Vidhya
Published in
4 min readJul 19, 2020
Photo by Sneaky Elbow on Unsplash

‘Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.’ says Wiki.

Feature Engineering has been an important practice in Data Science for improving the Machine Learning algorithm . It requires in-depth knowledge of the domain and data. There are multiple ways of creating a new feature, but I would like to give a visual intuition of how Feature Engineering helps us get better result. Before getting into the intuition I would like to thank Mr.Mahesh Anand ,a great tutor who has inspired me to start writing in Medium.

This intuition is explained using a basic linear classification model. We have a dataset with two independent features X1, X2 and a dependent feature Y having two classes(1,0).

Let’s plot the instances of Dependent Variable Y with respect to Independent Variables X1 and X2.

After plotting the dependent variables, we can see that they cannot be classified using a linear method. When we try to pass a decision line, one of the dependent variable will be wrongly classified which can be overcome using non-linear models.

Let’s try to create a new feature X3 which will be the absolute of difference between X1 and X2 .

Now with three independent features X1, X2 and X3, we can plot the dependent feature Y in 3D.

Both the instances of 0 are now at the bottom of the 3D plot, where as instances of 1 are at the top of the 3D plot. Now, we can see that by passing a 2D plane ‘abcd’ , the linear model can predict all the dependent variables providing us the required result.

Now replace the feature X3 with another new feature X4 which is a difference between X1 and X2.

With the features X1,X2 and X4, plotting instances of dependent features Y in a 3D plot.

The plot formed is of cube with extreme values being 1 and -1. In this plot we can see that the dependent variables belonging to instances of 0 (red) lies at the middle of the cube block where as the dependent variable belonging to instance 1 (black) lies at the top and bottom of the cube still being linearly un-separable. No single linear 2D plane can help us even after adding a new feature (X4) the linear classification model algorithm will not give a better result pushing us to go for a non-linear model.

From this, we can infer that performance of a model could be improved by extracting a new feature from the raw data, but it should done with a proper domain knowledge. Randomly creating a new feature will not help in improving the model. If you lack the domain knowledge, consult an expert or you might have to resort to experimentation / trial & error of multiple feature engineering techniques which may go in vain.

--

--