How to evaluate unsupervised anomaly detection models?

Luana Gonçalves
4 min readOct 19, 2023

--

Anomaly detection is the practice of highlighting unusual patterns, rare events, and inconsistencies. Fields such as accounting, banking services, and digital security often employ anomaly detection to identify atypical data that represent potential risks to business models, thereby enhancing data platform security and safeguarding operational integrity.

Selecting the right model to automate this task is crucial, but equally important is choosing the appropriate metric to assess the model’s performance. However, in the field of anomaly detection, despite mathematical functions aiding in performance evaluation, the precise definition of unusual data is contingent on the specific problem and framework. This article aims to introduce evaluation metrics for unsupervised anomaly detection models.

Dataset

First, we will generate a two-dimensional synthetic dataset with isotropic Gaussian blobs, simulating a densely concentrated group. Then, we will contaminate it with random data from a uniform distribution, which can be adjacent to the initial set or represent anomalous points.

Scatter plot of the synthetic dataset.

Modeling

For comparison, we implemented anomaly detection models suggested by scikit-learn and a Z-score statistical model. The chosen models were:

In the figure below, we display results from each of the previously mentioned anomaly detection models. In this representation, orange points denote the samples categorized as isotropic, while blue points represent the instances identified as anomalies.

Scatter plot of anomaly detection models.

Metrics

It is expected that the statistical characteristics of isotropic samples differ from those categorized as anomalies; in other words, the distributions of the two sets must exhibit clear distinctions. In this context, the Kolmogorov-Smirnov test performs a statistical test to accept or reject the null hypothesis that the two sets come from a common distribution.

The scipy.stats.kstest function returns both the p-value and the statistic. The p-value indicates the test's significance, while the statistic represents the distinction between the distribution functions of the samples.

  • p-value: a lower p-value provides greater statistical significance that the sets are distinct. The figure below shows that all models reject the null hypothesis that both sets come from the same distribution because the p-values are less than 0.5. Nevertheless, it’s important to highlight that the Local Outlier Factor and One-Class SVM exhibit higher p-values, suggesting less statistical significance in the test.
  • statistic: a higher statistic provides stronger evidence that the distributions of the sets are different. In the figure below, we can see that for variable v1, Robust Covariance exhibits the greatest distinction between the distributions of the anomalous and isotropic classes. As for variable v2, the algorithms that performed the best were Robust Covariance and Z-score, with nearly identical values.

The Silhouette coefficient measures, for each sample, its cohesion with the categorized group and its separation from other groups. Therefore, we can utilize the average Silhouette coefficient of isotropic samples as a metric to evaluate the consistency and similarity.

  • Silhouette coefficient:A higher coefficient provides stronger evidence the group designated as isotropic is cohesive. The figure below presents that the most similar groups are Covariance and Z-score, whereas the One-Class SVM algorithm exhibited the poorest performance.

Conclusion

For this particular dataset, the top-performing models are Covariance and Z-score. It’s important to emphasize that the context influences the model’s performance. Therefore, in addition to the metrics provided, it’s highly advisable to utilize specific metrics tailored to the particular problem.

The complete project can be found and downloaded on GitHub:

--

--

Luana Gonçalves

Specialist in machine learning, Stats and signal processing. Brazilian.