Anomaly Detection in Python: Best Practices and Techniques
Comparison of several common anomaly detection methods applied to the Weight and Height dataset
When writing previous articles related to detailed data analytics, such as
I often receive questions about detecting and removing outliers.
For a quick look analysis, excluding the top and bottom percentile of a certain label (similar to what was done in the above article) may be considered a valid approach. However, for a more detailed analysis, other methods of anomaly detection should be used instead. Here, I speak about a few commonly used methods of anomaly detection and demonstrate how they work using a specific example — the Weight and Height dataset available on Kaggle. Full details of the analysis can be found in this public Kaggle notebook.
Interquartile range (IQR) method
Probably, the most common method of outlier detection. For our dataset, it gives the following results (note for metric…