Deal with Missing Data

Solomon Xie
Machine Learning Study Notes
1 min readJan 8, 2019

Most libraries (including scikit-learn) will give you an error if you try to build a model using data with missing values.

Refer to Kaggle: Handling Missing Values

Solution 1: Drop Columns with Missing Values

In many cases, you’ll have both a training dataset and a test dataset. You will want to drop the same columns in both DataFrames.

So, it’s somewhat usually not the best solution. However, it can be useful when most values in a column are missing.

Solution 2: Imputation

Imputation fills in the missing value with some number. Imputation is the standard approach, and it usually works well.



Solomon Xie
Machine Learning Study Notes

Jesus follower, Yankees fan, Casual Geek, Otaku, NFS Racer.