Modern recipes for anomaly detection

JC Testud
JC Testud
Mar 29 · 10 min read

Anomaly Detection Challenges

It is classification without labels

Anomaly detection models aim to produce a classifier that is able to tell whether a data point is normal or abnormal despite being trained entirely on the normal class. Choosing whether something is normal or abnormal is a two-class classification problem typically solved by supervised learning with a large and balanced mix of labelled points from the two classes. That doesn’t work when you have few or no positive samples (anomalies), and a lot of negative samples (normal). In those cases, anomaly detection is necessary.

It needs a threshold

In general, in the process of training any anomaly detection algorithm, the resulting function is able to assign, for any observation, an anomaly score. Most data points will get low scores, and anomalies will hopefully stand out with higher ones. Anomaly detection needs a score threshold to make a final decision. That threshold will separate the usual from the anomalies, and it must be determined how high the score should be for the data to be considered an anomaly. Let’s take a manufacturing example and say we want an algorithm to visually inspect some parts being produced. The threshold translates here to: How imperfect does a part have to be in order to be discarded?

It will produce false positives

One of the biggest issues with anomaly detection is that it often produces a high amount of false positives. This false positive problem comes primarily from the fact that the model is not optimized for the detection of specific samples. An anomaly detection algorithm will, by design, detect any unusual data, including benign examples.

  • Integrating human feedback: label the detections as true or false positives. This feedback can be used to start training a more typical supervised classifier, which makes it no longer pure anomaly detection.
  • Or just forgetting about anomaly detection. If what you are looking for in your data is very specific, and creating a balanced and representative labeled dataset is an option, you should probably take the time to create that dataset and go the supervised way

Existing approaches to anomaly detection

The following image is a good input for anomaly detection: one of these things is not like the other.

Fitting a Gaussian

If linear regression is the simplest supervised learning model, then the anomaly detection equivalent is Gaussian fitting. Let’s say we have a dataset where each observation is a person. Each person is described by only one variable : their height, in centimeters. In the dataset we have some anomalies, such as two dogs inserted by mistake.

  • Isolation Forest: Build decision trees to see how many random splits on random features it takes to isolate each data point. Easy to isolate points (lone/low density ones) are considered anomalies. Tightly packed ones are inliers.
  • One-class SVM: Find a hyperplane (a plane in 3+D space) with the bulk of the data on one side and little to no outliers on the other side. The plane act as a decision boundary for new data points.

Where Are My Neural Networks?

While these approaches are popular and useful, neural networks have some key advantages when dealing with complexity, such as the ability to work on high-dimensional data, at scale, with flexible architectures.

  1. Train a model to do the task on normal data
  2. Monitor how well it does on new unseen data
  3. Consider poor performance on new data as a sign of anomaly

The Art of Task Engineering

This 4-step anomaly detection recipe is very generic. The first step, defining the task, is critical. It will determine what kind of model architecture you will be able to use. And, as we mentioned, it could also make the output (the anomalies) more relevant to your use case. Here are two families of task to kick-start your imagination:

The Recipe in Practice

Let’s follow the recipe and perform anomaly detection on a face dataset. Specifically, we are going to use the CelebA-HQ dataset which is a dataset containing 30,000 celebrity photos.. To learn interesting concepts about these faces, let’s think about a self-supervised task (step 1).

Edges to Faces, Black & White to Color, Inpainting randomly-placed patches, and Right from Left

GANs, the final frontier

A vanilla GAN job is to learn the probability distribution of the training data. It means that, if you show a GAN enough faces, it can in theory learn what makes a face and will ultimately be able to generate all the faces that could exist.


[1] About the distinction between outlier and novelty detection, not everybody agrees on the terminology but, generally:

  • Novelty detection means finding anomalies in new data not seen at training time. Some people even consider that each training data point should be normal/uncontaminated. The algorithm would then circle the entire training data and give the same prediction (inlier) for each of the training data point

Element AI Lab

Scientists and developers at Element AI discuss the state of the art in artificial intelligence research and deployment.

JC Testud

Written by

JC Testud

Element AI Lab

Scientists and developers at Element AI discuss the state of the art in artificial intelligence research and deployment.