Feature Engineering, Is it really that simple?

Malvin Khoe
tiket.com
Published in
4 min readApr 27, 2023

So, you are an up-and-coming Data Scientist at the start of your journey. You start taking online courses on how to be a Data Scientist and on those courses, you encounter a section called “Feature Engineering”. You read through it and think to yourself, “Hmmm… is it really that simple?”

Yes! It is that simple, in theory, but it is a different story in practice.

Simply to put, Feature Engineering aims to make our raw data more suitable for our machine learning objective. This is an important step in building a machine-learning model because our model depends on the data we put in. If we feed the right data, the model will do great, but if we shove in the wrong or unnecessary data, the model won’t do any good. The data that we put in will affect the accuracy of the model, the prediction speed, the scalability, and many more.

The hardest part of feature engineering is knowing what data and techniques should we apply and when to utilize them. For example, in some cases, we need to normalize our data, but in other cases, we don’t. There are times when we can merge two or more features and there are times when we cannot do that. Just because one feature engineering technique works wonders in one case, that doesn’t mean it will have the same effect in other cases. So here are 3 tips that hopefully can help you with feature engineering

Know Your Model

Different models will require different kinds of features to work optimally. Some models are sensitive to numeric features, like Linear Regression for example. If the raw data have a wide range of numerical values, let’s say on one feature the range is between 0.001 and 1, and on another feature the range is between 1000 and 100,000; normalizing the data beforehand may lead to a better model than using the raw data. Techniques as simple as normalizing the data will have a big impact on a Linear Regression model but if you are using a tree-based model like Decision Tree, it won’t have that much of an impact. Having a lot of binary features, like the results of One-Hot Encoding, may suit well for a Decision Tree model. However, for a Linear Regression model it won’t be as good or even could make the model worse.

Knowing the limit of your model is also important, how much data are you working with? How many features will you use? If you are using a simple model, you need to simplify your features like merging a couple of related features and deleting unnecessary features.

You need to know your model first so that you can make the data in such a way that it will optimize your model performance.

Remember Your Objective

Sometimes when we are conducting feature engineering, we are so into it and tend to create more and more features; we forget the objective of our model. We need to remind ourselves of the objective and the expected results of our Exploratory Data Analysis (EDA).

Remembering the objective will help us in generating ideas on what kind of feature we could create that is relevant to the objective. For example, if we have 2 kinds of data, one for the average price that the user usually bought and one for the actual price of the product, and your objective is to find which product the user will most likely buy. Remembering the objective, we can create a feature that shows whether or not that product fits the user’s budget. We can create some sort of comparison feature, that is the actual price divided by the average items’ price which the user usually bought. The closer the result to 1, the more fitting the product’s price is to the user. Having such feature created, we can probably remove the other 2 features depending on the situation.

The results of our EDA also determine what kind of features we need to create or modify. Remember, EDA is the process where we try to analyze what kind of features will affect the outcome, and from EDA we already know what features matter the most. So if we have an idea for a new feature, but the EDA results say that the new feature won’t matter, then don’t do it, it would only wasting our time.

Don’t Stop Experimenting

Even though there are a lot of do’s and don’ts when we are conducting feature engineering, just remember, we are scientists! Don’t stop experimenting and don’t be afraid of trying new ideas! Sometimes when we are not sure about something, it’s just better to throw everything and see what sticks than over-analyzing and ended up doing nothing. The more we try things out, the more experience we gain. The more experience we gain, the better scientists we will become.

Conclusion

Feature Engineering is a simple concept, but conducting it properly requires a lot of practice and experience. So don’t be afraid to try things out. Sure, we might fail, but when we fail; but it is also a great opportunity for us to further investigate and learn “Why” we failed. Even these tips I just gave you are the results of learning from all of my past mistakes. Just remember, We are scientists and keep on learning is what scientists do.

--

--