Stratio-AI Cortex: a self-supervised deep learning framework for anomaly detection in time-series

Miguel Simao

Published in

Stratio

8 min readJun 30, 2020

Context

Author background

My name’s Miguel Simão and I started working at Stratio Automotive over one year ago, where I have the role of Data Scientist in the Research Team. My main goal at the company since an early stage has been to design a custom deep learning framework that would be the core of our anomaly detection and prediction offerings, which I will describe in this article. Besides that, my day-to-day work revolves mostly around cleaning data, model definition, and optimization. When necessary, I also assist our data engineers in the productization of model pipelines.

Our company

In a sentence, Stratio’s scientific core is the development of prognostics models for vehicular systems and components. On the business side, we provide the outputs of these models to our customers so that they can take actions that reduce the downtime of their vehicles.

Our data science problems

Most of the research team projects have the following premise: given a vehicle’s component/system X, develop a general anomaly detection model, or a fault classification and prediction model. All developed models are based on the historical data that we have obtained from a group of connected vehicles.

Some of the challenges we face are the following:

The same variable is likely to be measured by different sensors in different vehicle brands and models;
A huge amount of historical data;
Very high acquisition rates compared to other players in the industry;
Lack of labels or noisy labels.

For context, we get between 200 000 and 1 million unique data points per vehicle every single day.

Typical development process

Like most data science projects, our development process at Stratio typically follows the following steps:

Data selection;
Data preparation (cleaning, outlier detection);
Feature extraction (including research);
Model development (regression/ classification);
Model validation.

The process is followed for each of the vehicle’s components and sometimes even for distinct vehicle types or usage profiles. This is a repetitive process that can benefit from automation. Furthermore, given that we do not have a significant amount of labeled data and labeling in our domain requires extensive knowledge that is challenging to outsource, we have turned to self-supervised deep learning as a tool to jumpstart our anomaly models. Enter Cortex.

Cortex

Cortex is a self-supervised deep learning-based automated model development framework that serves as a blueprint for the development of new models. Tuned to operate in the absence of labeled data, it is particularly well-suited for data mining and general anomaly detection. Like other deep learning approaches, its main advantage is that it removes the need for expert or domain knowledge from the feature extraction and model development. As a result, development time and costs are substantially reduced, and a new model can be developed in a few days instead of weeks.

Generally speaking, Cortex learns what is the typical dynamic behavior of a time series given the historical data of a fleet of vehicles. When deployed, it compares what it learned with actual measured values, and reports back what it considers to be anomalous events.

The role of Cortex in our development cycle

Let’s dive into specifics of the role of Cortex in model development. Although flexible, the typical development cycle has the stages shown in the diagram below.

A data lake and data selection process feeds an endless loop that envelops the stages of model development and deployment.

We start from a datalake containing the vehicles’ historical data which can be queried using Drill.
For each specific problem, the Cortex user is responsible for selecting the variables to be tracked and for guaranteeing that the data are measuring what we expect it to (data cleaning).

The scientist at Stratio then has two options:

start the iterative process of model design as usual (feature extraction, model definition, training, validation). Being the iterative and uncertain process that it is, the first model developed is unlikely to meet the predefined requirements. Therefore, the scientist has to propose changes (data, labels, to the model itself), implement them and test them again.
use Cortex. The scientist may want to re-define the validation strategy and model hyperparameters; the model is then trained and optimized using stochastic grid search. However, the model search space is much more constrained: features are learned automatically and the model definition is static beyond some tunable hyperparameters. Old models can also be retrained in order to reduce training time.

Therefore, the goal of the Cortex is to replace this development cycle at the expense of a larger, highly-parameterized artificial neural network model. This also means that the network can quickly learn the behavior of any physical system with a good corpus of training data, and without expert knowledge. Furthermore, the model trained within the Cortex is also capable of monitoring itself (through the output signal residual) and can be retrained on demand.

Use-case: detecting anomalies in a vehicle’s cooling system

An internal combustion engine produces heat when it is running and must be continuously thermally controlled in order to prevent damage to its components and maintain peak efficiency. The temperature is typically controlled by balancing the internal heat generation with the heat dissipation capability provided by a cooling system. Additionally, the engine efficiency is lower when it is cold, so there is an ideal working temperature range: between 85 and 95 degrees Celsius. As shown in the diagram below, a vehicle has an initial period where it heats up until the working range and then stabilizes at around that temperature.

Typical coolant temperature profile of a vehicle in operation.

There are a number of components involved in keeping the engine at its working temperature. A water-cooled engine block and cylinder head have interconnected coolant channels running through them. A pump, driven by the engine, pulls hot coolant from the engine and pushes it through a radiator, which dissipates heat to the atmosphere. Additionally, a fan forces air through the radiator in order to improve cooling efficiency at low speeds. The coolant flow is also regulated by a mechanical thermostat. A failure in any of these components is very likely to result in engine overheating and a forced stop.

Diagram of a vehicle’s cooling system, showing all its components, taken from here.

So, how can we find anomalies in the components of the cooling system, from readings of the coolant temperature?

While it is trivial to find temperature anomalies — when the system is operating outside of the working specifications –, finding faults in individual components that do not have direct sensing is a significantly harder problem. Our hypothesis is that some anomalies can be found in the dynamics of the temperature signal.

We do not want to follow a model-based approach to solve this problem since we would have to make one for each variation of the system’s parts, which is unfeasible. So, we opt by following data-based approaches through a deep learning model. We are able to model the dynamic behavior of the coolant temperature and find relationships with usage metrics using Cortex’s self-supervised deep neural network.

Anomaly detection model

In this case, we are modeling the temperature signal as a function of itself and usage variables, such as engine load. This is generally represented in the diagram below.

We query our data lake for data from a subset of our fleet that should represent the normal behaviour and train the model with it. Given that it is a regression model, we can train it with a loss function such as the mean squared error. During training, the model learns the expected temperature behaviour in the dataset. The plot below shows the predicted temperature signal in red and the measured temperature in blue. The second plot shows the temperature residual, i.e., the difference between the predicted and measured temperatures. The sample shows that the model captures the temperature dynamic behaviour with significant detail, as intended.

On the first plot, the measured and predicted temperature signals in blue and red, respectively. On the second plot, the difference between them.

Now that we have seen that the model can learn the normal dynamic behavior of a cooling system, we can use the prediction residual for anomaly detection. For each vehicle, there is a residual time series, which is used to find anomalous situations in the vehicle. An example is shown in the figure below over a two-year timespan, we get a residual signal like the one shown in the figure below.

Residual signal plotted in blue; anomaly threshold in yellow; ground-truth and predicted anomalies in green and red, respectively.

In this figure, the blue line represents the prediction residual series, the yellow line the anomaly threshold, the red regions the anomaly intervals and the green regions the manually annotated anomalies. The annotated anomalies are typically overheating events but some are also component failures (e.g., water pumps). For our purposes, we consider this example to have perfect recall but 50% precision. These cases are reviewed by our analysts in order to improve the precision of our predictions. In this specific use-case, Cortex reduced the development time by 80% when compared to the empirical model we have in production, while improving its precision by about 10% in the same validation dataset.

This use-case showcases how we can quickly train and deploy new anomaly detection models without specifying features (apart from cleaning and scaling the signals), taking advantage of the large amounts of unlabeled data we can access. These models also have the advantage that they can be retrained in production for new vehicles without new labels.

Final remarks

The Cortex was designed to be a versatile tool to allow quick development and anomaly detection models while minimizing engineering time. I believe that we have achieved that goal because of a few reasons: the neural network-based model is a universal function approximator; it is flexible enough to allow for the modeling of multiple inputs and outputs; we can use a high sampling rate that is unparalleled in our industry; and it is a self-supervised model, meaning that we are not limited by the amount of labeled data we have.

In our path towards zero downtime, predicting when every single component of a vehicle will fail appears to be an insurmountable task. In response to this, our main goal has to be on finding possible issues within the vehicle (leading to a fault or otherwise) by comparing the measured data with the expected data, and providing information the user may need to do a cost-benefit analysis on a preventive repair. Over time, Cortex allows our analysts and users to flag more and more potential faults, thus providing us with more reliable fault labels, which will in turn significantly improve our prognostics models.

This is how we are driving a zero downtime future.