What Is Feature Engineering for Machine Learning?

Amit Shekhar
MindOrks
Published in
4 min readFeb 14, 2018

--

What Is Feature Engineering for Machine Learning?

I am Amit Shekhar, the author of this blog. This blog will help you understand Feature Engineering for Machine Learning.

New content: System Design Playlist

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process. Feature Engineering is an art.

Steps that are involved while solving any problem in machine learning are as follows:

  • Gathering data.
  • Cleaning data.
  • Feature engineering.
  • Defining model.
  • Training, testing the model, and predicting the output.

Feature engineering is the most important art in machine learning which creates a huge difference between a good model and a bad model. Let’s see what feature engineering covers.

Suppose, we are given the data “flight date time vs status”. Then, given the date-time data, we have to predict the status of the flight.

Flight Date Time Data

The status of the flight depends on the hour of the day, not on the date time. We will create the new feature “Hour_Of_Day”. Using the “Hour_Of_Day” feature, the machine will learn better as this feature is directly related to the status of the flight.

Flight Hour Of Day Data

Here, creating the new feature “Hour_Of_Day” is the feature engineering.

Let’s see another example. Suppose we are given the latitude, longitude, and other data with the given label “Price_Of_House”. We need to predict the price of the house in that area. The latitude and longitude are not of any use if they are alone. So, here we will use the crossed column feature engineering. We will combine the latitude and the longitude to make one feature. Combining into one feature will help the model learn better.

Here, combining two features to create one useful feature is feature engineering.

Sometimes, we use the bucketized column feature engineering. Suppose we are given data in which one column is the age and the output is the classification(X, Y, Z). By seeing the data, we realized that the output(X, Y, Z) is dependent on the age range like 11–20 years age-range output to X, 21–40 years output to Y, 41–70 years output to Z. Here, we will create 3 buckets for the age-range 11–20, 21–40 and 41–70. We will create the new feature which is the bucketized column “Age_Range” having the numerical values 1, 2 and 3 where 1 is mapped to the bucket 1, 2 is mapped to the bucket 2 and 3 is mapped to the bucket 3.

Here, creating Age_Range bucket is the feature engineering.

Sometimes, removing unwanted features is also feature engineering. As the feature which is not related degrades the performance of the model.

Now, the steps to do feature engineering are as follows:

  • Brainstorm features.
  • Create features.
  • Check how the features work with the model.
  • Start again from the first until the features work perfectly.

This is what we do in feature engineering.

Some words on feature engineering by the experts

Feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success. Much of the success of machine learning is actually success in engineering features that a learner can understand.

Actually the success of all Machine Learning algorithms depends on how you present the data.

The algorithms we used are very standard for Kagglers. We spent most of our efforts in feature engineering.

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

Feature engineering turn your inputs into things the algorithm can understand.

Last but not least, Automated Feature Engineering is the current hot topic. But it requires a lot of resources. A few companies have already started working on it.

That’s it for now.

Happy Learning AI :)

Prepare yourself for Android Interview: Android Interview Questions

You can find the Roadmap for Android Developer here: Android Developer Roadmap

You can connect with me here.

Also, Let’s become friends on Twitter, Linkedin, Github, Quora, and Facebook.

--

--

Amit Shekhar
MindOrks

Coder | Teacher | Mentor | Open Source | IIT 2010-14 | Android | Machine Learning | Backend | Get High Paying Tech Job: amitshekhar.me