A Detailed Pre-processing Machine Learning with Python (+Notebook)

Leonardo Anello
25 min readFeb 5, 2023

This first Machine Learning tutorial will cover the detailed and complete data pre-processing process in building Machine Learning models.

We’ll embrace pre-processing in data transformation, selection, dimensionality reduction, and sampling for machine learning throughout this tutorial. In another opportunity, we will apply this process with various algorithms to help you understand what it is and how to use Machine Learning with Python language.

Jupyter Notebook

See The Jupyter Notebook for the concepts we’ll cover on building machine learning models and my Medium profile for other Data Science articles and tutorials.

First of all, we need to define the business problem. After all, we don’t create a machine learning model at random, although Machine Learning is fun in itself. We apply Machine Learning to solve a problem; it is a tool for solving business problems — from the issue or pain we want to address, we begin our work.

To say, dataset capture, variables transformation, the division of the training and test subsets, the learning and validation of the model depend on the definition of the business problem. Suppose we do not define the problem and do not specify it. In that case, we will simply be working randomly…

--

--

Leonardo Anello

Data Scientist. 🐼 @panData is my personal repository showcasing the Data Projects I've applied, studied, and self-taught skills.