What is an anomaly? It’s something that deviates from what is standard, normal, or expected. In some cases, you can find anomalies by setting up thresholds or bounds. Or you can use supervised machine learning algorithms to find them. But it will require us to train the algorithm with a dataset containing anomalies. But what if the anomalies are hidden in a time series dataset? You can spot them by plotting a trend. Usually they are out of place in such a plot. For example:
The above picture shows a time series dataset in GumGum. The red point is an anomaly with an unusual drop. How can we catch such anomalies as soon as possible, while creating less false alarms compared to a rule based system? The answer is applying time series models. This blog will illustrate how we implemented an anomaly detection system by using the Prophet library to catch anomalies such as the one above.
Time series models are often applied to predict the future. But if you take a close look at the historic trend, outliers or anomalies can be easily detected. This illustrates that we can convert the difficult problem into a normal statistical problem and solve it in two steps:
- Find a time series model to fit the data
- Detect outliers as anomalies
There are many time series models such as ARIMA, moving average, exponential smoothing etc. How to select a proper model for your dataset? The general idea is that you don’t want to select an overfitting model which leads to less or no anomalies and you also don’t want to choose an underfitting model which leads to false anomalies.
In our implementation, we choose the prophet library. Here are two main reasons:
- The library is well supported and maintained by Facebook.
- Our experiments showed us that for our time series dataset, the library is reliable, provides fitted values and uncertainty intervals.
The uncertainty interval is important to us. Because it adds a buffer to fitted values. According to our known anomaly test data point, the actual anomaly often happens outside of the uncertainty intervals. As a result, instead of comparing actual values to the fitted values and setting up a fixed threshold for all data points, the upper bound and lower bound in the uncertainty interval can be used as dynamic thresholds, which help us reduce false alarms. Therefore the anomaly detection formulas we implemented are:
if actual_value > (1 + upper_alpha) * upper_bound, we detect it as positive anomalyif actual_value < (1 — lower_alpha) * lower_bound, we detect it as negative anomaly
Here upper_alpha and lower_alpha are percentages you can use to adjust the threshold.
Although prophet is reliable enough to provide a good fit model with default hyper parameter settings for most of the time, we still want to measure how well the model fits the data. If it’s underfitting or overfitting, we want to tune the hyper parameters automatically. In order to measure model accuracy, we choose mean absolute scaled error (MASE). According to our experiment, if MASE > 1.5 the model is probably underfitting, otherwise if MASE < 0.3, the model is probably overfitting. Then we will tune hyper parameters with corresponding actions in prophet:
If a brute-force search fails to find a good model, we also flag the current data point as an anomaly. Besides the brute-force searching, we also implemented a two stage fitting mechanism in cases where the current model detects anomalies and the data point is not the latest. The purpose of the system is to detect whether the latest data point is an anomaly or not. We do not care about anomalies that have already happened a while ago. But those anomalies may impact the model. Therefore, after the first fitting, we reset those data points to null which are treated as missing values in prophet. Then we fit the model again.
After finding the satisfactory fit, we tried the model with our production data. Soon enough it caught a problem. Here is the email alert sent from our anomaly detection system when the anomaly happened at 2020–02–05 05:00:00 am. The detection application is scheduled to run every hour and an email alert will be triggered if there is an anomaly happened at the latest hour.
Based on our implementation applied on production time series data, the anomaly detection system provides more accurate alerts and less false alerts compared to our old rule based system.