Time Series Anomaly Detection With LSTM AutoEncoder

Max Melichov
4 min readSep 19, 2022

--

Febonacci

What is a time series?

Let’s start with understanding what is a time series, time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it’s a sequence of discrete-time data.

What is an anomaly?

Anomaly is any deviation and deviation from the norm or any law in a variety of fields, which is difficult to explain with existing rules and theories.

A common way to find anomalies in none time series data

By understanding what are we searching for and in what condition we can move forward with trying to find a solution.

The next question is:
Why just finding outliers will not work?
By calculating the 25th percent (Q1) and 75th percent (Q3) and finding IQR with the next formula:

IQR = Q3 — Q1

The data points which fall below Q1–1.5 * IQR or above Q3 + 1.5 * IQR are outliers.

This easy and quick way doesn’t work with time series, as you can see in the example:

Anomaly with jump down

The anomaly is a simple deviation from the norm and as you can see it does not make it an outlier.

Ok, let’s try something else…
What about deep learning? LSTM to be exact.
First, what is LSTM? did you hear about RNNs?

Recurrent neural network

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes.
In simple terms, RNNs are very good with data that contains series.
RNNs have a major flaw, RNNs are not able to memorize data for a long-time and begin to forget their previous inputs. This problem refers as
vanishing and exploding gradients and to overcome this problem LSTM is used.

Long Short Term Memory

Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1997.
The significant change that makes LSTM better than RNN is that LSTM has a so-called ‘forget gate’ which controls what data is important to save for the long term and which data is important to save for the short term thus LSTM plays a key role in anomaly detection and in particular in deep learning.

Furthermore, when you’re solving a problem with deep learning you need a good architecture that is made to solve this kind of problems.

The architecture that I used is called autoencoders.

Autoencoders

what is an autoencoder?
An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning).

To get a good understanding of what autoencoder is answer this simple question.
What series of numbers are easier to remember?

2,4,6,8,10,12,14

4,9,1934,1

In a simple look, you can see that the first series has a pattern thus making it easier to remember and that’s how an autoencoder works in general terms.

Autoencoder looks similar to this:

General Autoencoder

The input and the output have 8 features and each layer has the same neuron count as its counterpart layer making it look like it has a mirror at its center.

The input layer to the center is called an encoder and it works like a Principal component analysis (PCA).

The center layer to the output layer is called a decoder.

whose two combine making an autoencoder. autoencoder is learning crucial patterns and with the use of LSTM, it can learn patterns on series data, Thus making it a superior solution to the common outlier detection.

But wait! how does it solve our problem? it’s just learning the series data.

Finally

After running the learning stage on the train data the test data will show the anomaly if there is any.

but how?
Think about it as you’re trying to learn the pattern of this series:

1,5,1,5,1,5,1,5,2,4,1,5,1….

and there is abnormal data in the middle that makes you confuse the pattern and predict it wrong and that my friend is an anomaly.

let me explain when you are trying to predict a series by what you know about the pattern and something unusual happens and if the prediction is wrong, thus the loss goes up and you use it as an indicator for anomaly in the data.

To sum it up let’s look at the code:
I will use the series that you saw in the outlier example.

Standard scaling the data:

Creating a time series:

Splitting the data:

The model:

The main:

Time step calculation-

Get data values from the training time series data file and normalize the value data. We have a value for every 5 mins for 14 days.

24 * 60 / 5 = 288 timesteps per day

288 * 14 = 4032 data points in total

Calculating the highest loss with this line:
THRESHOLD = np.percentile(test_mae_loss, 95)

with a bit of visualization the final results.

On the left train loss, On the middle test loss, On the right the predicted anomaly.

Please consider following me at medium.
Full code:

You can find me on LinkedIn:

https://www.linkedin.com/in/max-melichov/

--

--