How to Handle Missing Data in Machine Learning (Part 2)

Federico Viscioletti
3 min readMay 29, 2024
I am still trying to create my avatar using DALL-E3, but I am not there yet. He looks way older than I am :)

Introduction

In this part, I’ll walk you through a practical example about how to handle missing data using a dataset with missing values. I will show different imputation techniques and discuss their impacts.

Let’s walk through a practical example using a dataset with missing values. We will demonstrate different imputation techniques and discuss their impacts.

Example: Handle Missing Data in the Titanic dataset

I will now demonstrate different imputation techniques using the Titanic dataset, which includes missing values in columns like Age and Embarked.

import pandas as pd
import seaborn as sns
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.model_selection import train_test_split

# Load the Titanic dataset
df = sns.load_dataset('titanic')

Now let’s have a look at the top 5 the rows of the dataframe:

--

--