Anomaly Detection in Python: Best Practices and Techniques

Comparison of several common anomaly detection methods applied to the Weight and Height dataset

Dmytro Iakubovskyi

Published in

Data And Beyond

4 min readJul 7, 2023

When writing previous articles related to detailed data analytics, such as

Newest salaries in Data Science and AI explained by SHAP values

The 2022–2023 year gross salaries: SHAP values for experience level, job title, and more

medium.com

I often receive questions about detecting and removing outliers.

For a quick look analysis, excluding the top and bottom percentile of a certain label (similar to what was done in the above article) may be considered a valid approach. However, for a more detailed analysis, other methods of anomaly detection should be used instead. Here, I speak about a few commonly used methods of anomaly detection and demonstrate how they work using a specific example — the Weight and Height dataset available on Kaggle. Full details of the analysis can be found in this public Kaggle notebook.

Interquartile range (IQR) method

Probably, the most common method of outlier detection. For our dataset, it gives the following results (note for metric…

Anomaly Detection in Python: Best Practices and Techniques

Comparison of several common anomaly detection methods applied to the Weight and Height dataset

Newest salaries in Data Science and AI explained by SHAP values

The 2022–2023 year gross salaries: SHAP values for experience level, job title, and more

Interquartile range (IQR) method

Written by Dmytro Iakubovskyi