Member-only story
Practical implementation of outlier detection in python
IQR, Hampel and DBSCAN method
Outliers, one of the buzzwords in the manufacturing industry, has driven engineers and scientists to develop newer algorithms as well as robust techniques for continuous quality improvement. If the data include even if one outlier, it has the potential to dramatically skew the calculated parameters. Therefore, it is of utmost importance to analyze the data without those deviant points. It is also important to understand which of the data points are considered as outliers. Extreme data points do not always necessarily mean those are outliers.
In this article, I will discuss the algorithm and the python implementation for three different outlier detection techniques. Those are Interquartile (IQR) method, Hampel method and DBSCAN clustering method.
Inter quartile range (IQR) method
Each dataset can be divided into quartiles. The first quartile point indicates that 25% of the data points are below that value whereas second quartile is considered as median point of the dataset. The inter quartile method finds the outliers on numerical datasets by following the procedure below
- Find the first quartile, Q1.
- Find the third quartile, Q3.
- Calculate the IQR…