Feature Engineering for Machine Learning: A step by step Guide(Part 3 -The final)

Elakiya Sekar
kgxperience
Published in
7 min readAug 14, 2023

“No data is clean, but most is useful” — Dean Abbott

👩‍🔬Data scientists, assemble! 💻

We’ve finally reached the final part of our feature engineering series.

In this final blog post📋, we will discuss feature selection✔️. In the previous two blogs, we’ve covered Feature Construction🏗️, feature transformation🔄, dimensionality reduction technique — Feature extraction🔍.

Feature selection is one of the solutions to the curse of dimensionality.

Feature Selection✔️

Feature selection is like going on a blind date 💓: you have to find the features(your partner) that are most compatible with your model, and then you have to hope that they hit it off🚀.

“Feature selection is the process of identifying and selecting the most important features in a dataset that are relevant to predicting the target variable.”

There are mainly three techniques used for feature selection:

  • Filter methods🎯
  • Wrapper methods🔄
  • Embedded methods🛠️

Filter methods🎯

Filter methods are like girls on a shopping spree💃. They try on every dress in the store, comparing them to each other and trying to decide which one is the best🤔.

Just like shopping takes time⏳, filter methods can be indecisive, and may take a while to select the best features for the task. 🛍️

“Filter methods select features based on a statistical measure, and they do this by focusing on a single feature at a time and comparing it to the other features. The selection of features are not based on learning algorithm.”

Before we process any further, normalizing the data can be very beneficial and effective.

Filter methods can be done in the following ways:

  • Correlation: This method is like a couple on a blind date. If they have a high correlation, they’re a match made in heaven🥰. But if they have a low correlation, they’re better off going their separate ways💔.

In this method, we calculate the correlation(using corr() ) between each feature and the target variable. If the correlation is below a certain value, we remove that feature from consideration, as it doesn’t seem to have a significant impact on the target variable.

  • Variance threshold: This method is like a bouncer at a nightclub🎉. If a feature has low variance, it’s not very interesting, so it gets kicked out🚫.
  • Chi-squared test: This method is like a detective🕵️‍♂️. It looks for associations between two categorical variables and tries to figure out if they’re guilty of being important features🔍.
  • ANOVA: This method is like a judge in a courtroom⚖️. It looks at the means of multiple groups and decides if there’s enough evidence to convict them of being important features🏛️.

Wrapper methods🔄

Why do we use wrapper methods?🤔

Filter methods are a simple way to select features, but they do not consider the relationship between features🔍. This can lead to the selection of features that are not actually relevant to the target variable❌.

Wrapper method is like a dating app for features💞. It takes a bunch of features out on dates with a machine learning algorithm, and then sees which ones the algorithm likes the best💁‍♂️. The feature that gets the most dates is the one that gets selected🤭.

“Wrapper method considers the relationship between features by training a machine learning algorithm on a subset of features and then evaluating the performance of the algorithm.”

This process is repeated for different subsets of features, and the subset that results in the best performance is selected✔️.

There are several common techniques of wrapper methods, including:

  • Exhaustive Feature Selection/ Best Feature Selection🕵️‍♂️
  • Sequential Forward Feature Selection ➡
  • Sequential Backward Feature Selection ⬅️

Exhaustive Feature Selection/ Best Feature Selection🕵️‍♂️

Exhaustive feature selection is like a kid in a candy store who wants to try every single flavor of lollipop🍭, but their mom only lets them buy one🥺.

“Exhaustive feature selection evaluates the performance of a machine learning algorithm on every possible subset of features, and then select the subset that results in the best performance”.

When we have n features, there are 2^n possible subsets of features. Imagine having 3 features, it’s like having 8 different lollipop combos to taste!😲

As a result of this computational expense, other feature selection methods have been developed. Since the candy budget is limited, we’ve come up with quicker ways to pick the tastiest lollipops 🍬 using other methods.

Forward Feature Selection➡

Forward feature selection is like a chef🍳 who adds ingredients to a dish one by one, tasting the dish after each addition to see if it improves the flavor🥰.

The chef continues adding ingredients until the dish is no longer improving🫢, or until the chef reaches the desired number of ingredients😴.

Forward Feature Selection starts with an empty set of features. It then adds the feature that most improves the accuracy of a machine learning algorithm.”

The algorithm continues adding features until the accuracy of the algorithm no longer improves, or until a maximum number of features is reached🚀.

Backward Feature Selection⬅

Backward feature selection is like the same chef🍳who starts with a full pantry and then throws out the ingredients that make the dish taste the worst🗑️.

Backward Feature Selection starts with all features included in the model. It then removes the feature that most decreases the accuracy of a machine learning algorithm.”

The algorithm continues removing features until the accuracy of the algorithm no longer decreases🥱, or until a minimum number of features is reached🤓.

Just change the parameter → direction{‘forward’, ‘backward’}

forward → Forward Feature Selection➡

backward → Backward Feature Selection⬅

Embedded methods🛠️

Embedded methods are like feature selection ninjas🥷. They sneak into the machine learning algorithm and select the best features without anyone noticing🫣.

“Embedded methods integrate the feature selection process into the machine learning algorithm itself.”

Most common types of embedded methods are,

  • Regularization🦸‍♂️
  • Tree Based algorithms🌲

Regularization🦸‍♂️

“Regularization in embedded methods can be done by adding a regularization term to the loss function of the machine learning algorithm.“

The features that have the largest coefficients will be penalized the most, and they may be excluded from the model.

Following regularization techniques are used💁‍♂️

  • Lasso Regularization(L1):
  • Ridge Regularization(L2):
  • Elastic-Net Regularization(L1+L2):

Tree Based algorithms🌲

“Tree-based algorithms builds a tree-like structure of decisions, where each decision is based on the importance of a feature.”

The features that are most important for splitting the data are the ones that are most likely to be included in the model.

Tree-based algorithms that can be used in embedded methods are,

  • Decision trees:
  • Random forests:

“You may ask, ‘When to use what?🤨’ ”

THE SECRET🤫

🥳🥳

Conclusion

In this finale, we’re diving into the world of feature selection ✔️. Feature selection is like a blind date💗: you have to find the features (your partner) that are most compatible with your model, and then you have to hope that they hit it off.😎 But unlike a blind date, feature selection can be a lot of work. You have to try out different methods and different combinations of features to find the best ones. And sometimes, even the best features don’t work out😞.

But don’t worry, you’re not alone💯. Even the most experienced data scientists have to experiment and try different things to find the best features for their models👩‍🔬.

When to use what? 🤨 Secret revealed: 🥳🥳.

In this grand feature engineering adventure, we’ve explored the power of feature selection, extraction, transformation, and construction. It can be a challenging but rewarding process. So don’t be afraid to get creative and let your imagination run wild.🦸‍♂️🌟

PART 1

PART 2

Feel free to Connect🎯

➡️https://www.linkedin.com/in/elakiya-sekar-28465b220/

➡️https://www.instagram.com/elakiya__sekar/

➡️meelakiya24@gmail.com

--

--

Elakiya Sekar
kgxperience

Meet me, Elakiya Sekar! I'm all about everything... or maybe not! But, until my interests change, I'll hook you up with rad reads! Stay tuned!