Feature Preprocessing for Numerical Data — The Most Important Step

The model is only as good as the data is.

Sabina Pokhrel
Analytics Vidhya

--

Wooden wall
Turquoise wooden wall that needs a retouch (Photo by Maarten Deckers on Unsplash)

Feature preprocessing is the most important step in data mining. In this post, I will introduce you to the concept of feature preprocessing, its importance, different machine learning models and different feature preprocessing techniques for numerical features.

The quality of the model largely depends on the data that is fed into the model. When data is collected from data mining processes, some of the data are missing (we refer to this as missing values). Also, it is highly susceptible to contain noise. This results of poor quality data, and as you might have heard before, the model is only as good as the data it is trained on.

This is where feature preprocessing comes along. Feature preprocessing turns raw data into a one that is usable by a machine learning model.

Different types of machine learning models

First, let us look at the different categories of machine learning models. Here, we divide models into two types:

  1. Tree-based model: Tree-based model is a type of supervised learning model that provides high

--

--

Sabina Pokhrel
Analytics Vidhya

AI Specialist | Machine Learning Engineer | Writer and former Editorial Associate at Towards Data Science