It's all about Outliers

Published in

Analytics Vidhya

6 min readAug 31, 2020

An outlier is a data point in a data set that is distant from all other observations. A data point that lies outside the overall distribution of the dataset. Or in a layman term, we can say, an outlier is something that behaves differently from the combination/collection of the data.

Outliers can be very informative about the subject-area and data collection process. It’s essential to understand how outliers occur and whether they might happen again as a normal part of the process or study area. To understand outliers, we need to go through these points:

what causes the outliers?
Impact of the outlier
Methods to Identify outliers

What causes the outliers?

Before dealing with the outliers, one should know what causes them. There are three causes for outliers — data entry/An experiment measurement errors, sampling problems, and natural variation.

Data entry /An experimental measurement error

An error can occur while experimenting/entering data. During data entry, a typo can type the wrong value by mistake. Let us consider a dataset of age, where we found a person age is 356, which is impossible. So this is a Data entry error.

It's all about Outliers

What causes the outliers?

Written by Ritika Singh