Deal with Missing Data

Solomon Xie
Machine Learning Study Notes
1 min readJan 8, 2019

Most libraries (including scikit-learn) will give you an error if you try to build a model using data with missing values.

Refer to Kaggle: Handling Missing Values

Solution 1: Drop Columns with Missing Values

In many cases, you’ll have both a training dataset and a test dataset. You will want to drop the same columns in both DataFrames.

So, it’s somewhat usually not the best solution. However, it can be useful when most values in a column are missing.

Solution 2: Imputation

Imputation fills in the missing value with some number. Imputation is the standard approach, and it usually works well.

--

--

Solomon Xie
Machine Learning Study Notes

Jesus follower, Yankees fan, Casual Geek, Otaku, NFS Racer.