It's all about Outliers

Ritika Singh
Analytics Vidhya
Published in
6 min readAug 31, 2020

--

An outlier is a data point in a data set that is distant from all other observations. A data point that lies outside the overall distribution of the dataset. Or in a layman term, we can say, an outlier is something that behaves differently from the combination/collection of the data.

Outliers can be very informative about the subject-area and data collection process. It’s essential to understand how outliers occur and whether they might happen again as a normal part of the process or study area. To understand outliers, we need to go through these points:

  • what causes the outliers?
  • Impact of the outlier
  • Methods to Identify outliers

What causes the outliers?

Before dealing with the outliers, one should know what causes them. There are three causes for outliers — data entry/An experiment measurement errors, sampling problems, and natural variation.

  1. Data entry /An experimental measurement error

An error can occur while experimenting/entering data. During data entry, a typo can type the wrong value by mistake. Let us consider a dataset of age, where we found a person age is 356, which is impossible. So this is a Data entry error.

--

--

Ritika Singh
Analytics Vidhya

Over 4 years of experience in solving data driven problems, if you have any opportunity for me please reach out here : https://www.linkedin.com/in/ritikasingh17