Handling missing data with SciKit SimpleImputer

Youness ECHCHADI
2 min readJun 24, 2019

When working on data science projects, it’s very likely that you’ll be encountering missing data in your columns. It’s not ideal to disregard or take out all the rows containing missing data for any project. Other columns for the same row where the data is missing can be critical for the data preparation state, so it’ll be wiser to infer or find a way to fill in the missing values in our dataset for a better outcome.

There are many options with which you can fill in the ‘null’ ‘nan’ or ‘na’ in the dataset. SciKitLearn offers one simple solution with SimpleImputer(formerly Imputer, which was deprecated starting from version 0.20 and will be removed in version 0.22 of SciKitLearn)

Let’s get to the code part:

let’s consider an array that we named X.

Original array X

Here is the array X after replacing the missing values with the mean of other values in the same column.

New array X with replaced ‘nan’ values

Originally posted on youness.dev

--

--