“men standing in front of vinyl record sale” by Artificial Photography on Unsplash

All about Feature Selection

1. What is Feature Selection?

Feature selection is a process of selecting a subset of relevant features/ attributes (such as a column in tabular data) that are most relevant for the modeling and business objective of the problem. It basically helps in finding the most meaningful inputs from the data.

2. What is the importance of Feature Selection?

There a saying in ML “Garbage Data In, Garbage Results Out”. In real-world problems, it is not always clear what features of data will help you model the problem you are attempting to solve so as to get the best results. Hence, we need to choose features that will give you better accuracy while requiring fewer data. And removes unwanted, irrelevant and redundant attributes from data that do not contribute much to the accuracy of a predictive model or may, in fact, decrease the accuracy of the model. It also becomes important when the number of features is very large we need not need to use every feature at our disposal.

3. What are some other names of Feature Selection?

Feature selection is also called variable selection, variable subset selection, and attribute selection.

4. Why should I use a subset of the data instead of whole data?

Sometimes, less data is better as unnecessary data can affect your model negatively. Also, less data or features can reduce the training and testing times drastically. Also sometimes fewer data can help in increasing the accuracy too.

5. How is Feature Selection different from Dimensionality Reduction?

Although both thrive to reduce the feature space of the data. But Dimensionality Reduction does it by creating a new combination of attributes from the existing features, whereas Feature Selection methods include or exclude features present in the dataset without altering them.

6. What are the top reasons for Feature Selection?
  • It helps the Machine Learning model to train and test faster
  • Simplification of the model and make it easier to interpret
  • It improves the accuracy of the model if the right subset of features are chosen
  • Enhanced generalization to reduces overfitting
  • To Avoid Curse of Dimensionality
7. What are some of the methods of Feature Selection?
  1. Exhaustive Search
  2. Filter Methods
  3. Wrapper Methods
  4. Embedded Methods
8. What is Exhaustive Search in Feature Selection?

Tests every combination of features and returns the combination of features which leads to the lowest loss in our model. For N features it requires 2^N — 1. Works well when there are few features and gives 100% the best possible combination. But takes a huge amount of time for large featured datasets.

9. What are Filter Methods?

Filter Method of feature selection apply a statistical measure to assign a scoring of each attribute/ feature and selection is done according to the ranking of the feature. This methods are often univariate and consider feature independently. These methods are effective in computational time and robust to overfitting. But they tend to select redundant variables because they do not consider the relationship between features.

10. What are some of the examples of Filter methods?
  • Chi-Squared Test
  • Information Gain
  • F Test
  • ANOVA
  • Correlation Coefficient Scores (eg. Pearson’s Correlation Coefficient)
11. What are Wrapper Methods?

A pool of methods which considers the selection of a set of features as a search problem where the various combinations are prepared, evaluated and compared to other combinations on the basis of predictive models and given scores based on model accuracy. Based on the inferences from the previous model we decide to add or remove features from the subset. This methods are usually computationally expensive and have a high risk of overfitting.

12. What are some of the examples of Wrapper methods?
  • Methodical: Best First Search, DFS
  • Stochastic: Random Hill Climbing, Simulated Annealing
  • Heuristics: Forward Selection, Backward Elimination, Recursive Feature Elimination
13. What are Embedded Methods?

Embedded methods combine the advantages of both Filter and Wrapper methods. It is implemented by the learning algorithms that have their own built-in feature selection methods. The most common type of embedded methods of feature selection methods is regularization methods.

14. What are some of the examples of Embedded methods?
  • Lasso Regression
  • Ridge Regression
  • Elastic Nets
  • Decision Trees
15. What are some of the Key Differences between Filter methods (FM) and Wrapper methods (WM)?
  • FM measure the relevant features by correlation with dependent features while WM measures the relevancy of a subset of a feature which takes the relationship between two features into account
  • FM uses mainly statistical methods for evaluation while as WM uses cross-validation
  • FM is much faster to compute while as WM is computationally expensive
  • FM might fail in getting the best subset of features in many instances but WP always provide the best subset of features
  • Using the subset of features from WM are more prone to overfitting rather than FM
16. Do you have good domain knowledge about the data?

If yes, then construct an initial set of ad-hoc feature selection further you can narrow down more using other feature selection methods