#04 Feature Engineering: Principles for choosing right features

特徴量生成の大原則

Akira Takezawa

Published in

Coldstart.ml

6 min readFeb 6, 2019

Hola! Welcome to “Short-Cut Machine Learning Series”.

Target is who wanna know …

Reason: why is Engineering Features so so important?
Big Picture: must-know skills for Feature Selection and Extraction
Code: simplest python code ever

— — —

Why you have to read this?

As a machine learning Engineer, you will be required to show your skill particularly in following 3 steps below:

Feature Engineering: select and extract features for feeding model
Model Selection: compare and choose the best ML model
Generalization: adjust your hyperparameter

So today’s topic is one of the most important topic, “Feature Engineering”. Let’s get started!

— — —

1. Why is Engineering Features necessary for ML?

https://www.pinterest.com/pin/173881235598124187/

Basically, machine learning algorithm fairly evaluates given features. As a result, the correlation is found up to features not logically related to labels.

And here are 4 vital benefits from feature engineering.

Improving the accuracy of ML model
Solving Overfitting problem
Speed up your computation
Understandability for ML process

— — —

2. The overall picture for Feature Engineering

In order to acquire above-mentioned benefits, basically, we are trying to reduce the number(dimension) of features. So in machine learning, we have two ways to reduce them.

Type A: Feature Selection
Type B: Feature Extraction (Dimensional Reduction)

I’m assuming you have already heard about mysterious words like PCA or LDA. Those techniques are inside this concept.

5 Keywords: pre-required concept

Before going deeper you better google some words and put on your mind at least names to understand this feature engineering topic.

Correlation
Dimensional Reduction
PCA
LDA
SVD

Code: Scikit-learn Module for Feature Engineering

Well, are you ready to start? Don’t worry, coding part of feature engineering is already well simplified by scikit-learn library. (Merci Google!)

# Feature Selection
sklearn.feature_selection
# Feature Extraction (Dimensional Reduction)
sklearn.decomposition

3. Type A: Feature Selection

On the process of machine learning, we have two timing for selecting efficient features for our model. One is before training model, another is while training model. And I will explain must-know two methods for feature selection.

3–1. before training model

Statical method: Removing features with low variance
Filter method: Univariate feature selection

3–2. while training model

Wrapper method: Recursive feature elimination
Embedded method: L1-based feature selection

Now I will briefly explain each feature selection techniques with a few lines of code. I recommend reading other official definition of them as well for deep understanding. (and this is not my main concern)

3–1. before training model

I will use Boston dataset for this part. So let’s get started!

from sklearn.datasets import load_boston
boston = load_boston()
# X has 13 features
X, y = boston.data, boston.target

Statical method: Removing features with low variance

This means removing features with low variance. Since variations are small, changes in features do not affect our prediction model.

# we cab decide criteria for low variance by "threshold"
from sklearn.feature_selection import VarianceThreshold
sel = VarianceThreshold(threshold = (0.8 * (1 - 0.8)))
sel_X = sel.fit_transform(X)
print(X.shape, sel_X.shape)>>> (506, 13) # Before
>>> (506, 11) # After

Now you can see features are reduced from 13 to 11!

Filter method: Correlation Thresholds feature selection

This means calculating the relationship between each explanatory variable(X) and objective variable(y). Then select only relevant features with the highest certainty factor.

# we can choose the number of features by "k"
from sklearn.feature_selection import SelectKBest, f_regression
sel = SelectKBest(score_func=f_regression, k=7)
sel_X = sel.fit_transform(X, y)
print(X.shape, sel_X.shape)>>> (506, 13) # Before
>>> (506, 7) # After

3–2. while training model

I will use Iris dataset for this part. So let’s get started!

from sklearn.datasets import load_iris
iris = load_iris()
# X has 4 features
X, y = iris.data, iris.target

Wrapper method: Recursive feature elimination

While learning, remove features with small parameter weights in order and continue to loop until the feature decreases to the specified n number

# we can choose the number of features by "n_features_to_select"
from sklearn.feature_selection import RFE
from sklearn.svm import SVR
estimator = SVR(kernel="linear")
sel = RFE(estimator, n_features_to_select=2, step=1).fit(X, y)
sel_X = sel.transform(X)
print(sel.ranking_, X.shape, sel_X.shape)>>> [1 1 1 1 1 6 4 3 2 5] # 
>>> (150, 4) # Before
>>> (150, 2) # After

Now you can see features are reduced from 4 to 2!

Embedded method: L1-based feature selection

from sklearn.feature_selection import SelectFromModel
from sklearn.svm import LinearSVC
lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
sell = SelectFromModel(lsvc, prefit=True)
sel_X = sel.transform(X)
print(clf.feature_importances_, X.shape, sel_X.shape)>>> (150, 4) # Before
>>> (150, 2) # After

4. Type B: Feature Extraction

Here is the list of Feature Extraction Methods:

PCA (Principal Component Analysis)
LDA (Linear Discriminant Analysis)
SVD (Singular Value Decomposition)
TSNE (t-Distributed Stochastic Neighbor Embedding)
Word2Vec (Natural language processing)

So in this Article, I will Compare PCA and SVD with visualization.

Fortunately, we have all the functions in scikit-learn, sklearn.decomposition module. And I will use iris dataset here.

# load scikit-learn built in dataset "iris"
from sklearn import datasets
iris = datasets.load_iris()
X_iris = iris.data
y_iris = iris.target
print(X_iris)
>>>> output
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 .... # total 150 rows.

PCA (Principal Component Analysis)

Subtract the mean from the data 全ての特徴量を平均で引く
Scale each dimension by its variance 分散で各特徴をスカラー倍
Compute the covariance matrix X 共分散行列Xを計算
Compute K largest eigenvectors of X 最大となる固有値ベクトルの算出

Finally, these eigenvectors are the principal components for Features.

from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit_transform(X_iris)

Cumulative contribution rate indicates how correctly reduced dimension can describe features. The closer to 1 this metric is, it means you are succeeding for dimensional reduction of features.

# Cumulative contribution rate
pca.explained_variance_ratio_
>>> [0.92461872 0.05306648] # new 2 dimentions for features

LDA (Linear Discriminant Analysis) 線形判別分析

from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(2)
svd.fit_transform(X_iris)

How to use differently PCA and SVD?

PCA is for …

Irregularly distributed data like Normal distribution
Unsupervised: can be applied Classification and Clustering problem

2. LDA is for …

Uniformly distributed data

— — —

5. References

Feature Engineering: A Framework and Techniques

This Domino Field Note provides highlights and excerpted slides from Amanda Casari's "Feature Engineering for Machine…

blog.dominodatalab.com

Automated Feature Engineering in Python

How to automatically create machine learning features

towardsdatascience.com

Why Automated Feature Engineering Will Change the Way You Do Machine Learning

Automated feature engineering will save you time, build better predictive models, create meaningful features, and…

towardsdatascience.com

Dimensionality Reduction Algorithms: Strengths and Weaknesses

Welcome to Part 2 of our tour through modern machine learning algorithms. In this part, we'll cover methods for…

elitedatascience.com

PCA：素晴らしい解説

分散共分散行列、固有値・固有ベクトルからの続き。前回「日本語の記事を積極的に紹介」と書いたように、初めは日本語の記事を読んだ。期待以上の良い記事もあったが、結局のところ、偶然見つけた今回の英文記事には負ける。…

rindalog.blogspot.com

#04 Feature Engineering: Principles for choosing right features

特徴量生成の大原則

Target is who wanna know …

Why you have to read this?

Menu

1. Why is Engineering Features necessary for ML?

2. The overall picture for Feature Engineering

3. Type A: Feature Selection

Statical method: Removing features with low variance

Filter method: Correlation Thresholds feature selection

Wrapper method: Recursive feature elimination

Embedded method: L1-based feature selection

4. Type B: Feature Extraction

PCA (Principal Component Analysis)

LDA (Linear Discriminant Analysis) 線形判別分析

How to use differently PCA and SVD?

5. References

Feature Engineering: A Framework and Techniques

This Domino Field Note provides highlights and excerpted slides from Amanda Casari's "Feature Engineering for Machine…

Automated Feature Engineering in Python

How to automatically create machine learning features

Why Automated Feature Engineering Will Change the Way You Do Machine Learning

Automated feature engineering will save you time, build better predictive models, create meaningful features, and…

Dimensionality Reduction Algorithms: Strengths and Weaknesses

Welcome to Part 2 of our tour through modern machine learning algorithms. In this part, we'll cover methods for…

PCA：素晴らしい解説

Written by Akira Takezawa