Feature Selection Techniques

Published in

Analytics Vidhya

5 min readSep 27, 2019

Feature Selection Techniques

Feature Selection is one of the core concepts in machine learning which hugely impacts the performance of your model. The data features that you use to train your machine learning models have a huge influence on the performance you can achieve.

Problem of identifying the related features

We all may have faced this problem of identifying the related features from a set of data and removing the irrelevant or less important features with do not contribute much to our target variable in order to achieve better accuracy for our model.

Fewer attributes is desirable because it reduces the complexity of the model and a simpler model is simpler to understand and explain.

Feature selection can be done in multiple ways but there are broadly 3 categories of it:
1. Filter Method
2. Wrapper Method
3. Embedded Method

Filter Method

In this method you filter and take only the subset of the relevant features. The model is built after selecting the features. The filtering here is done using correlation matrix and it is most commonly done using Pearson correlation and VIF

A] Pearson Correlation

A Pearson correlation is a number between -1 and 1 that indicates the extent to which two variables are linearly related. The Pearson correlation is also known as the “product moment correlation coefficient” (PMCC) or simply “correlation”

Pearson correlations are suitable only for metric variables

The correlation coefficient has values between -1 to 1

A value closer to 0 implies weaker correlation (exact 0 implying no correlation)
A value closer to 1 implies stronger positive correlation
A value closer to -1 implies stronger negative correlation

Here our target (dependent variable) is mpg and from above figure we find out strong and weak correlation with independent variable and set threshold

From the above code, it is seen that the variables cyl and disp are highly correlated with each other (0.902033). Hence we compared with target varibale where target variable mpg is highly correlated with cyl hence would keep and drop the other.Then we check with other variable same process is followed until last variable.we are left with four features wt,qsec,gear,carb. These are the final features given by Pearson correlation.

B] Variance Inflation Factor(VIF)

Collinearity is the state where two variables are highly correlated and contain similar information about the variance within a given dataset. To detect collinearity among variables, simply create a correlation matrix and find variables with large absolute values.

Steps for Implementing VIF

Calculate the VIF factors.
Inspect the factors for each predictor variable, if the VIF is between 5–10, multicollinearity is likely present and you should consider dropping the variable.

VIF method selected 3 features disp, vs, am. These are the final features given by VIF.

2. Wrapper Method

A wrapper method needs one machine learning algorithm and uses its performance as evaluation criteria.
Feed the features to the selected Machine Learning algorithm and based on the model performance you add/remove the features.
It is an iterative and computationally expensive process but it is more accurate than the filter method.

A] Step Forward Selection

Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.

B] Backward Elimination

In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.

Backward Elimination method selected 3 features wt, qsec, am. These are the final features given by Backward Elimination.

C] Recursive Feature elimination

It is a greedy optimization algorithm which aims to find the best performing feature subset. It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration. It constructs the next model with the left features until all the features are exhausted. It then ranks the features based on the order of their elimination.

Recursive Feature elimination method selected 3 features wt, qsec, am. These are the final features given by Recursive Feature elimination.

3. Embedded Method

Embedded methods are iterative in a sense that takes care of each iteration of the model training process and carefully extract those features which contribute the most to the training for a particular iteration. Regularization methods are the most commonly used embedded methods which penalize a feature given a coefficient threshold. Here we will do feature selection using Lasso regularization. If the feature is irrelevant, lasso penalizes its coefficient and make it 0. Hence the features with coefficient = 0 are removed and the rest are taken.

Conclusion:

Filter methods do not incorporate a machine learning model in order to determine if a feature is good or bad whereas wrapper methods use a machine learning model and train it the feature to decide if it is essential or not.

Filter methods are much faster compared to wrapper methods as they do not involve training the models. On the other hand, wrapper methods are computationally costly, and in the case of massive datasets, wrapper methods are not the most effective feature selection method to consider.

Feature Selection Techniques

3. Embedded Method

Conclusion:

Written by Sangita Yemulwar