A Second Step into Feature Engineering: Feature Selection

Published in

MLJC

3 min readAug 21, 2020

We are ready to start with the second part of Feature Engineering (if you’ve missed the previous article, you can find it here). In this short article we’ll go trough a few simple techniques in Feature Selection and Extraction.

Not all features are created equal
Zhe Chen

Feature Selection

There would always be some features which are less important with respect to a specific problem. Those irrelevant features need to be removed. Feature selection addresses these problems by automatically selecting a subset that is most useful to the problem.

Most of the times the reduction in the number of input variables shrinks the computational cost of modeling, but sometimes it might happen that it also improves the performance of the model.

Among the large amount of feature selection methods we’ll focus mainly on statistical-based ones. They involve evaluating the relationship between each input variable and the target variable using statistics. These methods are usually fast and effective, the only issue is that statistical measures depends on the data type of both input and output variables.

The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets.

Whenever you want to go for a simple approach, there’s always a threshold involved. VarianceThreshold is a simple baseline approach to select features. It removes all features whose variance doesn't reach a certain threshold.

Univariate Feature Selection

Univariate feature selection examines each feature individually to determine the strength of the relationship of the feature with the response variable.

There are a few different options for univariate selection:

We can perform chi-squared (𝝌²) test to the samples to retrieve only the two best features:

We have different scoring functions for regression and classification, some of them are listed here:

Regression: f_regression, mutual_info_regression
Classification: chi2, f_classif, mutual_info_classif

Recursive Feature Elimination

Recursive Feature Elimination (RFE) as its name suggests recursively removes features, builds a model using the remaining attributes and calculates model accuracy. RFE is able to work out the combination of attributes that contribute to the prediction on the target variable.

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features.

Feature Extraction (Bonus)

Feature extraction is very different from Feature selection: the former consists in transforming arbitrary data, such as text or images, into numerical features usable for machine learning. The latter is a machine learning technique applied on these features.

We’ve decided to show you a standard technique from sklearn.

Loading Features from Dicts

The class DictVectorizer transforms lists of feature-value mappings to vectors.

In particular, it turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.

While not particularly fast to process, Python’s dict has the advantages of being convenient to use, being sparse (absent features need not be stored) and storing feature names in addition to values.

DictVectorizer is also a useful representation transformation for training sequence classifiers in Natural Language Processing (NLP).

Feature Hashing

Named as one of the best hacks in Machine Learning, Feature Hashing is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. For this topic, sklearn’s documentation is exhaustive, you can find it in the link above.

Feature Construction

There’s no strict recipe for Feature Construction, I personally consider it as 99% creativity. We’re gonna take a look at some use cases in the next lectures though.

So far, you should take a look at the Feature Extraction part of this marvellous notebook from Beluga, one of the best Competitions Grandmasters on Kaggle.

In the last two articles we’ve been introducing Feature Engineering as a subsequent step to Feature Processing. As you can see, we’re building a data processing pipeline, indeed, the next step would be finding a way to deal with missing values. Stay tuned for the next article and don’t forget to take a look at our Github page, you’ll find the code related to this series of articles.