Member-only story
Outlier Detection with Simple and Advanced Techniques
A tutorial on how to detect outliers using standard deviation, interquartile range, isolation forest, DBSCAN, and local outlier factor
Outliers are data points that are far away from the majority of the observations in the dataset. Outliers can appear for many reasons such as natural deviations in population behavior, fraudulent activities, and human or system errors. However, detecting and identifying outliers is essential before running any statistical analysis or preparing data for training machine learning models.
In this article, we will cover univariate and multivariate outliers, how they differ and how they can be identified using statistical methods and automated anomaly detection techniques. We will see the interquartile range and standard deviation methods to detect univariate outliers and isolation forest, DBSCAN — Density-Based Spatial Clustering of Applications with Noise, and LOF — Local Outlier Factor to detect multivariate outliers.
While following the article, I encourage you to check out the Jupyter Notebook on my GitHub for full analysis and code.
We have a lot to cover, let’s get started! 🚀