Feature Engineering for Machine Learning: A step by step Guide(Part 3 -The final)

Elakiya Sekar

Published in

kgxperience

7 min readAug 14, 2023

“No data is clean, but most is useful” — Dean Abbott

👩‍🔬Data scientists, assemble! 💻

We’ve finally reached the final part of our feature engineering series.

In this final blog post📋, we will discuss feature selection✔️. In the previous two blogs, we’ve covered Feature Construction🏗️, feature transformation🔄, dimensionality reduction technique — Feature extraction🔍.

Feature selection is one of the solutions to the curse of dimensionality.

Feature Selection✔️

Feature selection is like going on a blind date 💓: you have to find the features(your partner) that are most compatible with your model, and then you have to hope that they hit it off🚀.

“Feature selection is the process of identifying and selecting the most important features in a dataset that are relevant to predicting the target variable.”

There are mainly three techniques used for feature selection:

Filter methods🎯
Wrapper methods🔄
Embedded methods🛠️

Filter methods🎯

Filter methods are like girls on a shopping spree💃. They try on every dress in the store, comparing them to each other and trying to decide which one is the best🤔.

Just like shopping takes time⏳, filter methods can be indecisive, and may take a while to select the best features for the task. 🛍️

“Filter methods select features based on a statistical measure, and they do this by focusing on a single feature at a time and comparing it to the other features. The selection of features are not based on learning algorithm.”

Before we process any further, normalizing the data can be very beneficial and effective.

Filter methods can be done in the following ways:

Correlation: This method is like a couple on a blind date. If they have a high correlation, they’re a match made in heaven🥰. But if they have a low correlation, they’re better off going their separate ways💔.

In this method, we calculate the correlation(using corr() ) between each feature and the target variable. If the correlation is below a certain value, we remove that feature from consideration, as it doesn’t seem to have a significant impact on the target variable.

Variance threshold: This method is like a bouncer at a nightclub🎉. If a feature has low variance, it’s not very interesting, so it gets kicked out🚫.

sklearn.feature_selection.VarianceThreshold

Variance Threshold

scikit-learn.org

Chi-squared test: This method is like a detective🕵️‍♂️. It looks for associations between two categorical variables and tries to figure out if they’re guilty of being important features🔍.

sklearn.feature_selection.chi2

Examples using sklearn.feature_selection.chi2: Column Transformer with Mixed Types

scikit-learn.org

ANOVA: This method is like a judge in a courtroom⚖️. It looks at the means of multiple groups and decides if there’s enough evidence to convict them of being important features🏛️.

sklearn.feature_selection.SelectKBest

Examples using sklearn.feature_selection.SelectKBest: Release Highlights for scikit-learn 1.1 Pipeline ANOVA SVM…

scikit-learn.org

sklearn.feature_selection.f_classif

Examples using sklearn.feature_selection.f_classif: Pipeline ANOVA SVM Univariate Feature Selection SVM-Anova: SVM with…

scikit-learn.org

Wrapper methods🔄

Why do we use wrapper methods?🤔

Filter methods are a simple way to select features, but they do not consider the relationship between features🔍. This can lead to the selection of features that are not actually relevant to the target variable❌.

Wrapper method is like a dating app for features💞. It takes a bunch of features out on dates with a machine learning algorithm, and then sees which ones the algorithm likes the best💁‍♂️. The feature that gets the most dates is the one that gets selected🤭.

“Wrapper method considers the relationship between features by training a machine learning algorithm on a subset of features and then evaluating the performance of the algorithm.”

This process is repeated for different subsets of features, and the subset that results in the best performance is selected✔️.

There are several common techniques of wrapper methods, including:

Exhaustive Feature Selection/ Best Feature Selection🕵️‍♂️
Sequential Forward Feature Selection ➡
Sequential Backward Feature Selection ⬅️

Exhaustive Feature Selection/ Best Feature Selection🕵️‍♂️

Exhaustive feature selection is like a kid in a candy store who wants to try every single flavor of lollipop🍭, but their mom only lets them buy one🥺.

“Exhaustive feature selection evaluates the performance of a machine learning algorithm on every possible subset of features, and then select the subset that results in the best performance”.

When we have n features, there are 2^n possible subsets of features. Imagine having 3 features, it’s like having 8 different lollipop combos to taste!😲

As a result of this computational expense, other feature selection methods have been developed. Since the candy budget is limited, we’ve come up with quicker ways to pick the tastiest lollipops 🍬 using other methods.

ExhaustiveFeatureSelector: Optimal feature sets by considering all possible feature combinations

Implementation of an exhaustive feature selector for sampling and evaluating all possible feature combinations in a…

rasbt.github.io

Forward Feature Selection➡

Forward feature selection is like a chef🍳 who adds ingredients to a dish one by one, tasting the dish after each addition to see if it improves the flavor🥰.

The chef continues adding ingredients until the dish is no longer improving🫢, or until the chef reaches the desired number of ingredients😴.

“Forward Feature Selection starts with an empty set of features. It then adds the feature that most improves the accuracy of a machine learning algorithm.”

The algorithm continues adding features until the accuracy of the algorithm no longer improves, or until a maximum number of features is reached🚀.

Backward Feature Selection⬅

Backward feature selection is like the same chef🍳who starts with a full pantry and then throws out the ingredients that make the dish taste the worst🗑️.

“Backward Feature Selection starts with all features included in the model. It then removes the feature that most decreases the accuracy of a machine learning algorithm.”

The algorithm continues removing features until the accuracy of the algorithm no longer decreases🥱, or until a minimum number of features is reached🤓.

sklearn.feature_selection.SequentialFeatureSelector

Examples using sklearn.feature_selection.SequentialFeatureSelector: Release Highlights for scikit-learn 0.24…

scikit-learn.org

Just change the parameter → direction{‘forward’, ‘backward’}

forward → Forward Feature Selection➡

backward → Backward Feature Selection⬅

Embedded methods🛠️

Embedded methods are like feature selection ninjas🥷. They sneak into the machine learning algorithm and select the best features without anyone noticing🫣.

“Embedded methods integrate the feature selection process into the machine learning algorithm itself.”

Most common types of embedded methods are,

Regularization🦸‍♂️
Tree Based algorithms🌲

Regularization🦸‍♂️

“Regularization in embedded methods can be done by adding a regularization term to the loss function of the machine learning algorithm.“

The features that have the largest coefficients will be penalized the most, and they may be excluded from the model.

Following regularization techniques are used💁‍♂️

Lasso Regularization(L1):

sklearn.linear_model.Lasso

Examples using sklearn.linear_model.Lasso: Release Highlights for scikit-learn 0.23 Compressive sensing: tomography…

scikit-learn.org

Ridge Regularization(L2):

sklearn.linear_model.Ridge

Examples using sklearn.linear_model.Ridge: Compressive sensing: tomography reconstruction with L1 prior (Lasso)…

scikit-learn.org

Elastic-Net Regularization(L1+L2):

sklearn.linear_model.ElasticNet

Examples using sklearn.linear_model.ElasticNet: Release Highlights for scikit-learn 0.23 Fitting an Elastic Net with a…

scikit-learn.org

Tree Based algorithms🌲

“Tree-based algorithms builds a tree-like structure of decisions, where each decision is based on the importance of a feature.”

The features that are most important for splitting the data are the ones that are most likely to be included in the model.

Tree-based algorithms that can be used in embedded methods are,

Decision trees:

sklearn.tree.DecisionTreeClassifier

Examples using sklearn.tree.DecisionTreeClassifier: Release Highlights for scikit-learn 1.3 Classifier comparison Plot…

scikit-learn.org

Random forests:

sklearn.ensemble.RandomForestClassifier

Examples using sklearn.ensemble.RandomForestClassifier: Release Highlights for scikit-learn 0.24 Release Highlights for…

scikit-learn.org

“You may ask, ‘When to use what?🤨’ ”

THE SECRET🤫

🥳🥳

Conclusion

In this finale, we’re diving into the world of feature selection ✔️. Feature selection is like a blind date💗: you have to find the features (your partner) that are most compatible with your model, and then you have to hope that they hit it off.😎 But unlike a blind date, feature selection can be a lot of work. You have to try out different methods and different combinations of features to find the best ones. And sometimes, even the best features don’t work out😞.

But don’t worry, you’re not alone💯. Even the most experienced data scientists have to experiment and try different things to find the best features for their models👩‍🔬.

When to use what? 🤨 Secret revealed: 🥳🥳.

In this grand feature engineering adventure, we’ve explored the power of feature selection, extraction, transformation, and construction. It can be a challenging but rewarding process. So don’t be afraid to get creative and let your imagination run wild.🦸‍♂️🌟

PART 1

Feature Engineering for Machine Learning: A step by step Guide(Part 1)

“You can have Data without Information but You cannot have Information without Data” — Daniel Keys Morgan

medium.com

PART 2

Feature Engineering for Machine Learning: A step by step Guide(Part 2)

👩‍🔬Data scientists, assemble! 💻

medium.com

Feel free to Connect🎯

➡️https://www.linkedin.com/in/elakiya-sekar-28465b220/

➡️https://www.instagram.com/elakiya__sekar/

➡️meelakiya24@gmail.com

Feature Engineering for Machine Learning: A step by step Guide(Part 3 -The final)

Feature Selection✔️

Filter methods🎯

sklearn.feature_selection.VarianceThreshold

Variance Threshold

sklearn.feature_selection.chi2

Examples using sklearn.feature_selection.chi2: Column Transformer with Mixed Types

sklearn.feature_selection.SelectKBest

Examples using sklearn.feature_selection.SelectKBest: Release Highlights for scikit-learn 1.1 Pipeline ANOVA SVM…

sklearn.feature_selection.f_classif

Examples using sklearn.feature_selection.f_classif: Pipeline ANOVA SVM Univariate Feature Selection SVM-Anova: SVM with…

Wrapper methods🔄

ExhaustiveFeatureSelector: Optimal feature sets by considering all possible feature combinations

Implementation of an exhaustive feature selector for sampling and evaluating all possible feature combinations in a…

sklearn.feature_selection.SequentialFeatureSelector

Examples using sklearn.feature_selection.SequentialFeatureSelector: Release Highlights for scikit-learn 0.24…

Embedded methods🛠️

sklearn.linear_model.Lasso

Examples using sklearn.linear_model.Lasso: Release Highlights for scikit-learn 0.23 Compressive sensing: tomography…

sklearn.linear_model.Ridge

Examples using sklearn.linear_model.Ridge: Compressive sensing: tomography reconstruction with L1 prior (Lasso)…

sklearn.linear_model.ElasticNet

Examples using sklearn.linear_model.ElasticNet: Release Highlights for scikit-learn 0.23 Fitting an Elastic Net with a…

sklearn.tree.DecisionTreeClassifier

Examples using sklearn.tree.DecisionTreeClassifier: Release Highlights for scikit-learn 1.3 Classifier comparison Plot…

sklearn.ensemble.RandomForestClassifier

Examples using sklearn.ensemble.RandomForestClassifier: Release Highlights for scikit-learn 0.24 Release Highlights for…

Conclusion

Feature Engineering for Machine Learning: A step by step Guide(Part 1)

“You can have Data without Information but You cannot have Information without Data” — Daniel Keys Morgan

Feature Engineering for Machine Learning: A step by step Guide(Part 2)

👩‍🔬Data scientists, assemble! 💻

Written by Elakiya Sekar