Anomaly Detection in Python: Best Practices and Techniques

Comparison of several common anomaly detection methods applied to the Weight and Height dataset

Dmytro Iakubovskyi
Data And Beyond
Published in
4 min readJul 7, 2023

--

Photo by Elisa Ventur on Unsplash

When writing previous articles related to detailed data analytics, such as

I often receive questions about detecting and removing outliers.

For a quick look analysis, excluding the top and bottom percentile of a certain label (similar to what was done in the above article) may be considered a valid approach. However, for a more detailed analysis, other methods of anomaly detection should be used instead. Here, I speak about a few commonly used methods of anomaly detection and demonstrate how they work using a specific example — the Weight and Height dataset available on Kaggle. Full details of the analysis can be found in this public Kaggle notebook.

Interquartile range (IQR) method

Probably, the most common method of outlier detection. For our dataset, it gives the following results (note for metric…

--

--

Dmytro Iakubovskyi
Data And Beyond

Top writer in AI, Movies | Senior data scientist | Editor in Data And Beyond | https://www.linkedin.com/in/dima806/