How to Address Missing Data

Solutions to Three Types of Missing Data

Destin Gong
Analytics Vidhya

--

Missing data is one of the most common data quality issues among three most common issues: Missing Value, Duplicated Value and Inconsistent Value.

  1. Missing value is the easiest one to identify, it may be in various forms, e.g. null values, blank space or being represented as “unknown”. Apply a filter to data can make missing values more easily identified.
  2. Duplicate value occurs when several rows of data appear to be the same then most likely that they have been mistakenly recorded multiple times.
  3. Inconsistent value usuallyoccurs when the string values of the same attributes do not follow the same naming convention, e.g. both LA and “Los Angeles” are present in the “City” data field (know more about how to address inconsistent data in this article)

Why the data is missing?

1. Missing completely at random (MCAR)

This article will mainly focus on why the data is missing and how to address the issues.

It may be the result of data not recorded in the first place, hence the reason for missing data is unrelated to this attribute. Therefore, we cannot predict what subsets of data are missing, as the result, the missing…

--

--