Time series anomaly detection — in the era of deep learning

Part 1 of 3

Published in

Data to AI Lab | MIT

6 min readAug 19, 2020

by Sarah Alnegheimish

Time series data is generated across a wide range of domains, including energy, finance, and healthcare. It is one of the most common data structures. As this type of data has proliferated over the years, the need for anomaly detection has increased as well. In a series of posts (divided into multiple parts), we are revisiting this problem, and are going to present how we are using deep learning for it.

In part 1 of the series, we are introducing anomaly detection in time series data. We’re also providing an accompanying python notebook which uses Orion, a python package that makes this all possible and it’s easy for anyone to use. You can directly run the notebook by launching binder.

What is time series anomaly detection?

Before we get to anomaly detection, let’s define a time series. A time series is a collection of data points that are indexed by time.

Univariate time series signal. Each time point is associated with exactly one value.

Time series appear regularly in daily life — if you are looking at yesterday’s hour-by-hour temperature, analyzing your own past purchases on Amazon, or reading a stock market chart, you’re working with time series data.

Although time series data is valuable, sometimes values appear within a time series that do not belong. These are known as anomalies.

Let’s illustrate this idea by looking at an everyday problem: How are taxi or ride-share drivers spatially distributed over a city? When you leave your house or work and call for a taxi or ride-share, it usually doesn’t take long for one to show up. But how does a ride-sharing platform know that someone in your area will require their service around that time? Analyzing time series. By monitoring the number of passengers who need rides, and recording the time and location of such rides, we can get a good estimate of how many drivers each area will require at a given time.

In order to perform such an analysis, though, we must first remove any anomalies. In other applications, such as system monitoring, we must identify anomalies — which could be indications of suspicious activity — in order to prevent catastrophic outcomes.

Above is a graph made from a dataset that records taxi rides in New York City. The x-axis shows the timestamp of the ride, and the y-axis illustrates the number of passengers that used the service. You can find this dataset, which is maintained by the Numenta community, here. The full raw dataset is available at the NYC Taxi and Limousine Commision (TLC).

You can view this data directly from the notebook. Simply follow the installation steps in Orion, Alternatively, you can launch binder to directly access the notebook.

Loading the NYC Taxi data using Orion

The dataset covers taxi demand for approximately 7 months in 2014 and early 2015. A data point is recorded every 30 minutes. Just by looking, we can see some anomalies, like the spike that occurs at the beginning of November.

NYC Taxi demand with highlighted anomalies

After zooming in, we can see additional anomalies (shown in red) more easily. Clear anomalies occur in five different places in this data due to five different events: the New York City marathon, Thanksgiving, Christmas, New Years Day, and a snow storm. If we wanted to analyze New York City taxi demand on a normal day, we would have to remove those five abnormal days first.

Although in this case the anomalies revealed themselves after zooming in, it doesn’t always turn out this way. In fact, most of the time, we don’t know when an anomaly exists; making the search space huge and detection difficult. In addition, the number of time series generated far surpasses humans’ ability to monitor them for anomalies in real time. But in order to analyze time series properly — to build models and find patterns — we have to find anomalies. This often tedious task is what we refer to as time series anomaly detection (AD); searching for a needle in a haystack.

Because there’s simply too much going on to monitor and manage time series manually, we need automated anomaly detection.

What types of anomalies are there?

Depending on the source of the data and its domain, there can be many varieties of anomalies within a dataset. It is useful to consider these two broad categories:

Point anomalies are single values that fall within low-density value regions. We identify point anomalies by a single timestamp. If we witness a series of consecutive point anomalies, we refer to them as collective anomalies.
Contextual anomalies are values that do not fall within low-density regions yet are anomalous with regard to local values. We identify such anomalies by an interval of start timestamp and end timestamp.

What we’re searching for can be summarized visually thusly:

Point anomalies (left) identified by single timestamps. Contextual anomalies (right) identified by an interval of start — end timestamp

Our objective is to find and locate these anomalies within a time series, in order to help the person using the data to make efficient decisions. How can we do this?

What are some traditional ways to approach anomaly detection (AD)?

People have long been coming up with ways to systematically identify anomalous sequences within a time series. Static thresholding is one of the simplest techniques. With this strategy, an alert is raised whenever a data point exceeds the expected range. However, this approach often fails to detect contextual anomalies.

More recently, experts have developed a collection of Deep Learning (DL)-based approaches to anomaly detection. One of the most interesting ideas involves using recurrent neural networks (RNNs) to recognize a pattern sequence and use an estimator to “forecast” the expected value. From there we can locate any anomalies by pinpointing discrepancies between the forecasted signal and the real one. (More about RNNs for AD can be found here.)

Deep learning-based methods make judicious use of the available data to learn the underlying structure of a time series, enabling them to perform complicated tasks such as anomaly detection. We can explain the general principle behind machine learning models (of which deep learning models are a subset) as:

Use machine learning to learn the pattern of the data.
Use the learned model to generate another time series.
Compare what the model expects with the actual time series value.
Use this discrepancy to extract anomalies.

Ideally, this will result in a sequence of “errors” for each time point that measures the likelihood of that time point being an anomaly.

How deep learning-based anomaly detection works

There are multiple ways to obtain the “what it expects” signal. For example, the RNN method previously mentioned predicts the signal’s values, as though forecasting the weather. Another exciting and unique approach involves trying to reconstruct the signal rather than predict it. In upcoming posts, we continue the discussion about time series anomaly detection:

We will talk more about time series reconstruction, time series generative adversarial networks (GAN), and how we used GANs to create our own time series anomaly detector in part 2.
We will also showcase how to evaluate an anomaly detection pipeline using Orion in part 3.

Time series anomaly detection — in the era of deep learning

Part 1 of 3

What is time series anomaly detection?

What types of anomalies are there?

What are some traditional ways to approach anomaly detection (AD)?

Written by MIT — Data to AI Lab