Feature Selection Techniques in Machine Learning

Charu Makhijani
Analytics Vidhya
Published in
5 min readJul 22, 2020

Technique to make a difference in model performance

Photo by KDnuggets

Many people still think that Data Science/Machine is all about executing the algorithms and you can get results. Yes off-course you will get results but are those good enough? or Have you ever noticed in any Data Science/Machine Learning competitions, why only a few people are the winners? There is a handful of algorithms that everyone is using, then where is the difference?

The answer is very simple and it is the DATA. The data on which you are running your machine learning algorithms is what makes a difference.

There is a simple ML rule- “Garbage-In Garbage-Out” and that is why one needs to be very concerned about the data that is being fed to the ML model. And that’s where Feature Selection comes into the picture.

Also with the recent growth in big data, we have access to a lot of data but most of that data is noisy and contains some entirely irrelevant, insignificant, and unimportant features. This not only influences the model’s accuracy but also demands lots of computational resources.

Therefore, feature selection is very critical in any machine learning pipeline as it will remove most of the irrelevant, redundant, and noisy features and preserve only the relevant features. This process will contribute towards improving model accuracy, lesser computational resources and increase model interoperability.

If this brief introduction about Feature Selection gives us enough reasons to learn this important technique in machine learning, let's dig deeper into this and explore more of Feature Selection.

In this post, we will cover-

1. What is Feature Selection?

2. Importance of Feature Selection

3. Feature Selection vs. Feature Engineering vs. Dimensionality Reduction

4. Feature Selection Methods

5. Stability of Feature Selection Techniques

What is Feature Selection?

Feature selection (also known as Variable Selection or Attribute Selection ) is a pre-processing technique to select the significant features from a data set by removing the irrelevant and redundant features for improving the performance of the machine learning algorithms in terms of accuracy and time to build the model.

Irrelevant or partially relevant features can negatively impact the model's performance. Hence the focus of feature selection is to select a subset of variables from the input which can efficiently describe the input data while reducing effects from noise or irrelevant variables and still provide good prediction results.

Importance of Feature Selection

1. Improves Accuracy

Once you remove the irrelevant and redundant data and fed only the most important features into the ML algorithm, it improves accuracy.

2. Reduces Overfitting

With Feature Selection, the data is less redundant which means there is a lesser opportunity for the model to predict based on the noise. Hence it will reduce the model overfitting and the model will generalize well.

3. Reduces Training Time

When we remove irrelevant and noisy features, it means we have lesser features in the data set to be fed into the ML model and it will reduce the overall training time for the model.

4. Reduce the complexity of the model

Too many variables often add noise to the model rather than predictive value. It will also make the model bulky, time-taking, and harder to implement in production. Eliminating noisy and irrelevant features makes the model generalize better to new, unseen data and makes the model less complex.

5. Better interoperability

With a lesser number of features, it's easier to understand their effect on the outcome of the model.

Feature Selection vs. Feature Engineering vs. Dimensionality Reduction

These terms are often misinterpreted in feature space. Although all 3 techniques are ways to get relevant features that are fed to the model for getting better accuracy, still they are different.

Feature Engineering is a technique to create new features with existing features via aggregation or combining multiple features. Once you create new features, original features can be dropped or retained based on the model requirement.

Whereas Feature Selection is a technique where we select a subset of important features (without changing them) from the original feature list and drop the irrelevant and redundant features.

In Dimensionality Reduction, features are transformed into a lower dimensionality space. It creates a whole new feature space but smaller in dimension which is not reversible as most of the information is lost in the process of reducing the dimension.

Feature Selection Methods

Feature selection is classified into four categories namely filter, wrapper, embedded, and hybrid methods.

1. Filter Methods

The filter method selects the features without the influence of any supervised learning algorithm. Hence, it works for any classification algorithm and achieves more generality with less computational complexity than the wrapper and embedded methods. Therefore, it is suitable for high-dimensional space.

2. Wrapper Methods

The wrapper method incorporates the supervised learning algorithm for validating the generated feature subsets using searching strategies. It yields high classification accuracy only for the particular learning algorithm adopted. Hence, it does not possess a high generality and the computational complexity is higher than embedded and filter methods.

3. Embedded Methods

The embedded method uses a part of a supervised learning algorithm for the feature selection process and it produces better accuracy only for the learning algorithm used in the selection process. Hence, it does not have a high generality and it is more computationally expensive than the filter and lesser than the wrapper method.

4. Hybrid Methods

The combination of wrapper and filter approach is known as the hybrid method.

Stability of Feature Selection Techniques

The stability of the feature selection algorithms is actually an overlooked problem. As it turns out, some feature selection strategies may perform very well when created but tend to break when tested after a while, which means that some of the features that get selected are unstable and might perform badly on new data.

Feature stability, according to Stability of feature selection algorithm: A reviewindicates the reproducibility power of the feature selection method.

The stability of a feature selection algorithm can be viewed as the consistency of an algorithm to produce a consistent feature subset when new training samples are added or when some training samples are removed. If the algorithm produces a different subset for any perturbations in the training data, then that algorithm becomes unreliable for feature selection.

Hence the high stability of the feature selection algorithm is equally important as the high classification accuracy when evaluating feature selection performance. Adding stability measurements over time can provide a better feedback loop to perform a better feature selection in the next iteration.

Final Thoughts

More often we strive for better accuracy in our models, and one cannot get to a good accuracy without doing Feature Selection. I believe that this article has given you a good idea of feature selection to get the best out of your machine learning models. In my next post, I will cover the Feature Selection Methods in detail.

Thanks for the read. I hope you liked the article!! As always, please reach out for any questions/comments/feedback.

--

--

Charu Makhijani
Analytics Vidhya

ML Engineering Leader | Writing about Data Science, Machine Learning, Product Engineering & Leadership | https://github.com/charumakhijani