TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Feature Selection and EDA in Machine Learning

11 min readMay 24, 2021

--

Feature Selection and EDA Cheatsheet (image by author, from website)

In Machine Learning Lifecycle, feature selection is a critical process that selects a subset of input features that would be relevant to the prediction. Including irrelevant variables, especially those with bad data quality, can often contaminate the model output.

Additionally, feature selection has following advantages:

1) avoid the curse of dimensionality, as some algorithms perform badly when high in dimensionality, e.g. general linear models, decision tree

2) reduce computational cost and the complexity that comes along with a large amount of data

3) reduce overfitting and the model is more likely to be generalized to new data

4) increase the explainability of models

In this article, we will discuss two main feature selection techniques: Filter Methods and Wrapper Methods, as well as how to take advantage of data visualization to guide decision making.

Data Preprocessing

Before jumping into the feature selection, we should always load the dataset, perform data preprocessing and data transformation:

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Destin Gong
Destin Gong

Written by Destin Gong

On my way to become a data storyteller | Website: www.visual-design.net

Responses (2)