Time series anomaly detection — in the era of deep learning
Part 2 of 3
by Sarah Alnegheimish
In the previous post, we looked at time series data and anomalies. (If you haven’t done so already, you can read the article here.) In part 2, we will discuss time series reconstruction using generative adversarial networks (GAN)¹ and how reconstructing time series can be used for anomaly detection².
Time Series Anomaly Detection using Generative Adversarial Networks
Before we introduce our approach for anomaly detection (AD), let’s discuss one of today’s most interesting and popular models for deep learning: generative adversarial networks (GAN). The idea behind a GAN is that a generator (G), usually a neural network, attempts to construct a fake image by using random noise and fooling a discriminator (D) — also a neural network. (D)’s job is to identify “fake” examples from “real” ones. They compete with each other to be best at their job. How powerful is this approach? Well, the figure below depicts some fake images generated from a GAN.
In this project, we leverage the same approach for time series. We adopt a GAN structure to learn the patterns of signals from an observed set of data and train the generator “G”. We then use “G” to reconstruct time series data, and calculate the error by finding the discrepancies between the real and reconstructed signal. We then use this error to identify anomalies. You can read more about time series anomaly detection using GAN in our paper.
Enough talking — let’s look at some data.
Tutorial
In this tutorial, we will use a python library called Orion to perform anomaly detection. After following the instructions for installation available on github, we can get started and run the notebook. Alternatively, you can launch binder to directly access the notebook.
Load Data
In this tutorial, we continue examining the NYC taxi data maintained by Numenta. Their repository, available here, is full of AD approaches and labeled data, organized as a series of timestamps and corresponding values. Each timestamp corresponds to the time of observation in Unix Time Format.
To load the data, simply pass the signal name into the load_signal
function. (If you are loading your own data, pass the file path.)
Though tables are powerful data structures, it’s hard to visualize time series through numerical values alone. So, let’s go ahead and plot the data using plot(df, known_anomalies)
.
As we saw in the previous post, this data spans almost 7 months between 2014 and 2015. It contains five anomalies: NYC Marathon, Thanksgiving, Christmas, New Year’s Eve, and a major snow storm.
The central question of this post is: Can GANs be used to detect these anomalies? To answer this question, we have developed a time series anomaly detection pipeline using TadGAN, which is readily available in Orion. To use the model, pass the pipeline json
name or path to the Orion API.
The Orion API is a simple interface that allows you to interact with anomaly detection pipelines. To train the model on the data, we simply use the fit
method; to do anomaly detection, we use the detect
method. In our case, we wanted to fit the data and then perform detection; therefore we used the fit_detect
method. This might take some time to run. Once it’s done, we can visualize the results using plot(df, [anomalies, known_anomalies])
.
The red intervals depict detected anomalies, with green intervals showing ground truth. The model was able to detect 4 out of 5 anomalies. We also see that it detected some other intervals that were not included in the ground truth labels.
Although we jumped straight to the results, let’s backtrack and look at what the pipeline actually did.
Under the hood
The pipeline performs a series of transformations on the data, including preprocessing, model training, and post-processing, to obtain the result you have just seen. These functions, which we refer to as primitives, are specified within the model’s json
file. More specifically, if we were to look at the TadGAN model, we find these primitives applied sequentially to the data:
Each primitive is responsible for a single task; each procedure is described in the course of this tutorial.
Preprocessing
Before we can use the data, we need to preprocess it. Preprocessing requires us to:
time_segments_aggregate
divides the signal into intervals and applies an aggregation function — producing an equally spaced, aggregated version of the time series.SimpleImputer
imputes missing values with a specified value.MinMaxScaler
scales the values between a specified range.rolling_window_sequences
divides the original time series into signal segments.
Prepare Data — First, we make the signal of equal steps. Second, we impute missing values using the mean. Third, we scale the data between [-1, 1].
If we go back to the source of the NYC Taxi data, we find that it records a value every 30 minutes. Since timestamps are defined by seconds, we set the interval as 1800
. We also opt for the default aggregation method, which in this case is taking the mean
value of each interval. We also impute the data with the mean value. In this specific example, we can safely remove the time_segments_aggregate
and impute
primitives since the data is already equally spaced and does not contain missing values(of course, not all data is this pristine). Next, we scale the data between [-1, 1] such that it’s properly normalized for modeling.
After this, we need to prepare the input for training the TadGAN model. To obtain the training samples, we introduce a sliding window to divide the original time series into signal segments. The following illustration depicts this idea.
Here, X
represents the input used to train the model. It is an np.array
of size: number of training examples by window_size
. In our case, we see X
has 10222
training examples. Notice that 100
represents the window_size
. Using plot_rws(X, k=4)
we can visualize X
.
This makes the input ready for our machine learning model.
Modeling
Orion provides a suite of ML models that can be used for anomaly detection; such as ARIMA, LSTM, GAN, and more.
In this tutorial, we will focus on using GAN. In case you are not familiar with GANs, there are many tutorials that help you implement one using different packages, tensorflow, or pytorch.
To select a model of interest, we specify its primitive within the pipeline. To use the GAN model, we will be using the primitive:
TadGAN
trains a custom time series GAN model.
Training— The core idea of a reconstruction-based anomaly detection method is to learn a model that can generate (construct) a signal with similar patterns to what it has seen previously.
The general training procedure of GANs is based on the idea that we want to reconstruct the signal as best as possible. To do this, we learn two mapping functions: an encoder (E) that maps the signal to the latent representation, “z”, and a generator (G) that recovers the signal from the latent variable. The discriminator (Dx) measures the realness of the signal. Additionally, we introduce a second discriminator (Dz) to distinguish between random latent samples “z” and encoded samples E(x). The intention behind Dz is to force E to encode features into a representation that is as close to white noise — as possible. This acts as a way to regularize the encoder E and avoid overfitting. The intuition behind using GANs for time series anomaly detection is that an effective model should not be able to reconstruct anomalies as well as “normal” instances.
To use the TadGAN
model, we specify a number of parameters including model layers (structure of the previously mentioned neural networks). We also specify the input dimensions, the number of epochs, the learning rate, etc. All the parameters are listed below.
It might take a bit of time for the model to train.
Reconstruction— After the GAN finishes training, we next attempt to reconstruct the signal. We use the trained encoder (E) and generator (G) to reconstruct the signal.
We pass the segment of the signal (same as the window) to the encoder and transform it into its latent representation, which then gets passed into the generator for reconstruction. We call the output of this process the reconstructed signal. We can summarize it for a segment s as: s → E(s) → G(E(s))≈ ŝ. When s is normal, s and ŝ should be close. On the other hand, if s is abnormal then s and ŝ should deviate.
The process above reconstructs one segment (window). We can get all the reconstructed segments by using the predict
method in our API — X_hat, critic = tgan.predict(X)
. We can use plot_rws(X_hat, k=4)
to view the result.
Per figure above, we notice that a reconstructed datapoint may appear in multiple windows based on the step_size
and window_size
that we have chosen in the preprocessing step. To get the final value of a datapoint for a particular time point, we aggregate the multiple reconstructed values for that datapoint. This results in a single value for each timestamp, resulting in a fully reconstructed version of the original signal in df
.
To reassemble or “unroll” the signal, we can choose different aggregation methods. In our implementation, we chose it as the median value.
We can then use y_hat = unroll_ts(X_hat)
to flatten the reconstructed samples X_hat
and plot([y, y_hat], labels=['original', 'reconstructed'])
for visualization.
We can see that the GAN model did well in trying to reconstruct the signal. We also see how it expected the signal to be, in comparison to what it actually is.
Post-processing
The next step in the pipeline is to perform post-processing, it includes calculating an error then using it to locate the anomalies. The primitives we will use are:
score_anomalies
calculates the error between the real and reconstructed signal, this is specific to the GAN model.find_anomalies
identifies anomalous intervals based on the error obtained.
Error Scores — We use the discrepancies between the original signal and the reconstructed signal as the reconstruction error score. There are many methods to calculate this error, such as point and area difference.
Analyzing the data, we noticed a large deviation between the two signals, present in some regions more than others. For a more robust measure, we use dynamic time warping (DTW) to account for signal delays and noise. This is the default approach for error calculation in the score_anomaly
method but can be overriden using the rec_error_type
parameter.
During the training process, the discriminator has to distinguish between real input sequences and constructed ones; thus, we refer to it as the critic score. To think of it, this score is also of relevance to distinguish anomalous sequences from normal ones, since we assume that anomalies will not be reconstructed. score_anomaly
leverages this critic score by first smoothing the score through kernel density estimation (KDE) on the collection of critics and then taking the maximum value as the smoothed value. The end error score combines the reconstruction error and the critic score.
Now we can visually see where the error reaches a substantially high value. But how should we decide if the error value determines a potential anomaly? We could use a fixed threshold that says if error > 10
, then the datapoint should be classified as anomalous.
While a fixed threshold raised two correct anomalies, it missed out on the other three. If we were to look back at the error plot, we notice that some deviations are abnormal within its local region. So, how can we incorporate this information in our thresholding technique? We can use window-based methods to detect anomalies in context.
We first define the window of errors that we want to analyze. We then find the anomalous sequences in that window by looking at the mean and standard deviation of the errors. For errors that fall far from the mean (such as four standard deviations away), we classify its index as anomalous. We store the start/stop index pairs that correspond to each anomalous sequence, along with its score. We then move the window and repeat the procedure.
We now have similar results as we saw previously. The red intervals depict the detected anomalies, the green intervals show the ground truth. 4 out of 5 anomalies were detected. We also see that it detected some other intervals that were not included in the ground truth labels.
Orion API
Using the Orion API and pipelines, we simplified this process yet allowed flexibility for pipeline configuration.
How to configure a pipeline?
Once primitives are stitched together, we can identify anomalous intervals in a seamless manner. This serial process is easy to configure in Orion.
To configure a pipeline, we adjust the parameters of the primitive of interest within the pipeline.json
file or directly by passing the dictionary to the API.
In the following example, I changed the aggregation level as well as the number of epochs
for training. These changes will override the parameters specified in the json
file. To know more about the API usage and primitive designs, please refer to the documentation. How we set the model and change the values of the hyperparameters is explained in the mlprimitives
library. You can refer to its documentation here.
Now anomalies
holds the detected anomalies.
In this tutorial, we looked at using time series reconstruction to detect anomalies. In the next post (part 3), we will explore more about evaluating pipelines and how we measure the performance of a pipeline against the ground truth. We will also look at comparing multiple anomaly detection pipelines from an end-to-end perspective.
- In addition to the vanilla GAN, we also introduce other neural networks including: an encoding network to reduce the feature space, as well as a secondary discriminator.
- This tutorial walks through the different steps taken to perform anomaly detection using the TadGAN model. The particulars of TadGAN and how it was architected will be detailed in another post.