TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Imputing Missing Data with Simple and Advanced Techniques

Idil Ismiguzel
TDS Archive
Published in
8 min readMay 12, 2022

--

Photo by Alessio Roversi on Unsplash

Missing data occurs when there is no data stored for a variable of interest in a dataset. Depending on its volume, missing data can harm the findings of any data analysis or the robustness of machine learning models.

While dealing with missing data using Python, dropna() function from Pandas comes in handy. We use it to remove rows and columns that include null values. It also has several parameters such as axis to define whether rows or columns drop, how to determine if missing values occur in any or all of the rows/columns, and subset to select a group of columns or labels to apply the drop function on.

df.dropna(axis=0, how='any', subset=None, inplace=False)

However, there are other and probably better ways of dealing with missing data. In this article, we will see how to impute (replace) missing data with simple and advanced techniques. We will first cover simple univariate techniques such as mean and mode imputation. Then, we will see forward and backward filling for time series data and we will explore interpolation such as linear, polynomial, or quadratic for filling missing values. Later, we will explore advanced multivariate techniques and learn how to…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Idil Ismiguzel
Idil Ismiguzel

Written by Idil Ismiguzel

Data Scientist | Writing articles on Data Science & Machine Learning | MSc, MBA | https://de.linkedin.com/in/idilismiguzel