What Is Feature Engineering for Machine Learning?

Amit Shekhar
Feb 14, 2018 · 4 min read
Image for post
Image for post
LetsLearnAI: What Is Feature Engineering for Machine Learning?

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process. Feature Engineering is an art.

Watch the below video for complete understanding.

Mindorks Youtube Channel

Steps which are involved while solving any problem in machine learning are as follows:

  • Gathering data.
  • Cleaning data.
  • Feature engineering.
  • Defining model.
  • Training, testing model and predicting the output.

Feature engineering is the most important art in machine learning which creates the huge difference between a good model and a bad model. Let’s see what feature engineering covers.

Suppose, we are given a data “flight date time vs status”. Then, given the date-time data, we have to predict the status of the flight.

Image for post
Image for post
Flight Date Time Data

As the status of the flight depends on the hour of the day, not on the date-time. We will create the new feature “Hour_Of_Day”. Using the “Hour_Of_Day” feature, the machine will learn better as this feature is directly related to the status of the flight.

Image for post
Image for post
Flight Hour Of Day Data

Here, creating the new feature “Hour_Of_Day” is the feature engineering.

Let’s see another example. Suppose we are given the latitude, longitude and other data with the given label “Price_Of_House”. We need to predict the price of the house in that area. The latitude and longitude are not of any use if they are alone. So, here we will use the crossed column feature engineering. We will combine the latitude and the longitude to make one feature. Combining into one feature will help the model learn better.

Here, combining two features to create one useful feature is the feature engineering.

Sometimes, we use the bucketized column feature engineering. Suppose we are given a data in which one column is the age and the output is the classification(X, Y, Z). By seeing the data, we realized that the output(X, Y, Z) is dependent on the age-range like 11–20 years age-range output to X, 21–40 years output to Y, 41–70 years output to Z. Here, we will create 3 buckets for the age-range 11–20, 21–40 and 41–70. We will create the new feature which is the bucketized column “Age_Range” having the numerical values 1, 2 and 3 where 1 is mapped to the bucket 1, 2 is mapped to the bucket 2 and 3 is mapped to the bucket 3.

Here, creating Age_Range bucket is the feature engineering.

Sometimes, removing the unwanted feature is also feature engineering. As the feature which is not related degrade the performance of the model.

Now, the steps to do feature engineering are as follows:

  • Brainstorm features.
  • Create features.
  • Check how the features work with the model.
  • Start again from first until the features work perfectly.

This is what we do in the feature engineering.

Feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success. Much of the success of machine learning is actually success in engineering features that a learner can understand.

Actually the success of all Machine Learning algorithms depends on how you present the data.

The algorithms we used are very standard for Kagglers. We spent most of our efforts in feature engineering.

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

Feature engineering turn your inputs into things the algorithm can understand.

Last but not least, Automated Feature Engineering is the current hot topic. But it requires a lot of resources. Few companies have already started working on it.

That’s it for now.

Originally published on AfterAcademy.com

Check out my other articles on Machine Learning

Learn Data Structures & Algorithms By AfterAcademy from here.

Happy Learning AI :)

Also, Let’s become friends on Twitter, Linkedin, Github, and Facebook.

MindOrks

Our community publishes stories worth reading on Android…

Amit Shekhar

Written by

Working with the smartest people in the world to change the way we learn. EdTech products: MindOrks | AfterAcademy | CuriousJr | https://amitshekhar.me

MindOrks

MindOrks

Our community publishes stories worth reading on Android Development

Amit Shekhar

Written by

Working with the smartest people in the world to change the way we learn. EdTech products: MindOrks | AfterAcademy | CuriousJr | https://amitshekhar.me

MindOrks

MindOrks

Our community publishes stories worth reading on Android Development

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store