Supervised, unsupervised and blind approaches for detecting anomalies in the industry.

Thibaut Le Magueresse
Amiral Technologies
4 min readOct 10, 2023

When addressing anomaly detection in industrial time series, data scientists have two options: they can employ either a supervised binary classification approach or an unsupervised method. The first option requires a substantial amount of data for both classes, while the second option only enables the identification of abnormal points based on the proportion of outliers within the dataset.
In the industry, it is uncommon to have a comprehensive description of all potential faults, and the level of contamination in the dataset is typically unknown. Based on this observation, Amiral Technologies uses the blind approach known as “one-class-classification” task in the literature [1]. To explain it, let us take a simple example.

The figure below illustrates synthetic data generated by a sensor installed on a hypothetical monitored equipment. The red zones on the plot indicate two instances of faulty behavior that have been labeled by the operator.

Synthetic time series with two unhealthy zones highlighted in red.

The second figure presents a 2-D plot illustrating the relationship between the maximum value and the mean average value calculated over windows of size 100 without overlap. The machine learning model is designed to learn the distinctions between the healthy class (in blue) and the unhealthy class (in red) so that it can automatically predict the class of a new data point. The choice of these two simple features has been done for pedagogical reasons.

2-D feature plot derived from the original time series data over the windows

During the learning phase, the unsupervised approach ignores the labels and focuses on identifying abnormal values. Meanwhile, the blind approach establishes a boundary around the healthy data while ignoring the unhealthy data. Lastly, the supervised approach creates a boundary between these two classes.

2D-plot after learning the three models.
The points corresponding to the unsupervised models are colored in black, as labels are not utilized in this particular model.

After being learned, the models are evaluated on the same dataset used during training (the model evaluation should be conducted on a separate dedicated dataset, which is not being applied here for the sake of brevity). The unsupervised model detects only two unhealthy points, the blind approach identifies four, and the supervised approach achieves perfect prediction.

2D plot after evaluating the models on the entire training dataset. The false negatives, which represent unhealthy points predicted as healthy by the model, are outlined in red, are surrounded in red.

However, if these three models are tested on a new data point with a high maximum value and a low average value, only the supervised method fails. This behavior is typical of such algorithms; they often require exposure to a complete range of fault typologies, which is rare in the industrial context. Consequently, the blind approach is the less risky choice, as it can learn complex feature behaviors without prior knowledge of unhealthy data.

2D plot after testing the models on a previously unseen fault (placed at the top left corner).

Moreover, an essential feature of the blind approach is its reduced data requirement compared to the traditional supervised approach. To illustrate this, consider a training and testing dataset recorded on the same synthetic equipment. The concept is to train the model with an expanding portion of healthy data. For each model trained with an additional window of data, we assess its performance using the F1-Score.

In this simple example, the blind approach gives satisfactory results with just two data points, whereas the supervised approach requires knowledge of all the faults to perform well. It is logical that the supervised approach performs well when all faults are seen during the training phase, but such a scenario is exceedingly rare in the industry.

As a remark, in real-world application with operational contexts and high levels of noise, it becomes imperative to collect a significantly larger amount of healthy data compared to this simple example in order to generate a satisfactory blind model.

To conclude, in order to benefit from the advantages of both approaches (excluding the unsupervised approach for obvious reasons), two potential solutions can be considered:
1. Updating the blind boundary knowing the presence of the default;
2. Voting between the supervised model and the blind model.

The best of these two solutions is available in DiagFit, our failure prediction software which monitors the health of equipment throughout its life.

References

[1] M. M. Moya, M. W. Koch, and L. D. Hostetler, “One-class classifier networks for target recognition applications,” [Online]. Available: https://www.osti.gov/biblio/6755553.

--

--