Different types of missing data.

Abhigyan
Analytics Vidhya
Published in
4 min readMay 3, 2020

--

Missing data are one of the common problem that people in the field of data scince or analytics will agree too.

Statistically there are three distinct types of missing data(namely MCAR,MAR,MNAR),but in the real world data a 4th type of missing data can be found which is called structured missing data.

Photo by Mika Baumeister on Unsplash

Let’s understand each of them individually:

1.Structured Missing Data:-

Data that is missing from the data for a valid reason are called structured missing data. This means that the data is missing because it should not occur considering the other variables.

2.Missing at Random (MAR):-

Missing at random means that the tendency for a data point to be missing is not related to the missing data itself, but it is related to some of the observed data in the dataset.

The takeaway here for MAR is that the values of the missing data can somehow be predicted from some of the other variables in the dataset.

When data is missing at random, it means that we need to either use an advanced imputation method, such as multiple imputation, or an analysis method specifically designed for missing at random data.

3.Missing Completely at Random (MCAR):-

The fact that a certain value is missing has nothing to do with its assumed value and with the values of other variables.

Data which is MCAR happens in practice, although if you are designing an experiment where you decide to eliminate a smaller percentage (let’s say about 5–10%) of the data elements randomly, then the MCAR condition would be met.

When data is missing completely at random, it means that we can undertake analyses using only features that have complete data,if we have enough of such features.

The MCAR assumption is rarely a good assumption. It is only likely to be true in situations where the data is missing due to some truly random phenomena (e.g., if people were randomly asked 10 of 15 questions in a questionnaire).

Missing at random(MAR) is always a safer assumption than missing completely at random(MCAR),because any analysis that is valid with the assumption that the data is missing completely at random will also be valid under the assumption that the data is missing at random, but the opposite is not the case.

4.Missing Not at Random (MNAR):-

Data which is not MAR is called not missing at random (NMAR).MNAR data is the most complicated one both in terms of finding it and dealing with it. The fact that the data is missing is related to the unobserved data, i.e. the data that we don’t have, the missingness is related to factors that we didn’t account for.

It is common to assume that data is MAR unless there is good reason to believe otherwise. Also most of the procedures to handle missing data depend on the MAR assumption.When data is missing not at random(MNAR), it means that we cannot use any of the standard methods for dealing with missing data (e.g., imputation, or algorithms specifically designed for missing values). If the missing data is missing not at random, any standard calculations give the wrong answer.

A small cheat sheet for the types of missing data!

Like my article? Do give me a clap and share it,as that will boost my confidence.Also,I post new articles every sunday so stay connected for future articles of the basics of data science and machine learning series.

Also,Do connect with me on linkedIn.

Photo by Alex on Unsplash

--

--