Data Quality: Challenges of Real-World Anomaly Detection

Photo by Sean Benesh on Unsplash. (Resized)

Building an anomaly detection system is like walking on a tightrope.
Too many false positives: nobody checks the alerts. Too many false negatives: the time saved with useful alerts is not worth the time spent maintaining the system. In both cases, you fall short of expectations.
In a previous article, we presented how we use ThirdEye to detect and understand data quality incidents on thousands of timeseries. We shamelessly skipped one of the most important parts of such system: how to actually build detection rules that work. Let’s remedy that.

AB Tasty provides solutions to create online experiments and optimize user experiences. Our products are used by a broad range of industries, with clients of all sizes. Hence, the timeseries we monitor have extremely diverse patterns. This makes it hard to build efficient anomaly detection rules for all our clients.

This blog describes the challenges of real-life anomaly detection, and how we tackle them in ThirdEye.

Use Case

Let’s look at a typical timeseries we want to monitor: the number of visitors.

One week of the timeseries.
One month of the same timeseries

You can observe some patterns:
- daily seasonality: users visit around specific hours, and sleep at night
- weekly seasonality: users visit more on specific days of the week
- trend: on average, the number of visitors is growing
- noise: there are some random variations
If you are not savvy with timeseries decomposition, check the basics here.

We have 4000 clients, so we will have 4000 signals like this one. We want to detect data loss. Because we want to detect incidents quickly, we check values every hour. We define a potential data loss as a drop of 80% compared to the baseline.
For instance:
- the baseline is 200. The observed value is 140 → OK
- the baseline is 200. The observed value is 20 → Anomaly!

The baseline is the value we expect to observe, based on historical data. Computing a baseline can be simple or rely on complex ML models.

The same timeseries with an 85% drop on September 15: we want to detect this!

Looks easy right? Let’s see how we implement this in ThirdEye.

Implementation

First, we need to choose how to compute the baseline. We start simple.
Consider the seasonalities: it does not make sense to compare a Monday to a Sunday, nor to compare 6 PM to 6 AM. Hence, to compute our expected value, we use values with the same day of the week and the same hour.
In fact, to account for noise, instead of taking the previous value only, we will take the mean of the last 2 values:

Baseline computation logic.

We implement such rule in Thirdeye:

Detection rule with average of previous values as baseline

Just after writing the detection config, we can preview the behavior on historical data. This helps to check the performance:

Thirdeye UI. Write detection rule — Preview — Iterate

The big anomalous drop on September 15 is detected! Great. But there is another anomaly - the red point at the left - that looks like a false positive.
Meet the first challenge of real-life timeseries anomaly detection:

Challenge 1: Noise in small values

Consider a client that has a noise of around ±100 visitors.
During the day, the client has around 2.5k visitors/hour. ±100 visitors is only a 4% change, so the detection rule does not trigger.
But what happens at night, when the traffic is very low? If the average value at night is 100, in some unfortunate cases, you can have a baseline of 180 visitors, and a current value of 30 visitors: an 83% drop!
This is exactly what happens above. At first, this looks ok: we get this false positive only one time in the preview. We could estimate we have this error only once a month. But remember: we have 4000 timeseries to monitor. One false positive per month for each timeseries means 130 false positives per day! This is not acceptable.
We could think of averaging the effect of noise by using more historical data. This could help but this would not solve the real problem: the alert rule does not manage seasonalities.
We will fix this with an ML model, but let’s look at another challenge first.

Challenge 2: Important Trends

Consider a timeseries with an important trend: the historical data will quickly become irrelevant.

Important downward trend: many false positives

We can see the baseline (the dotted orange line) is way too big. The average value goes down quickly, so the values from previous weeks are not relevant. As a human, we see the downtrend each day, but the simple rule only uses data from t-7 days and t-14 days. We end up with many false positives.
Here, the problem is that the alert rule does not manage important trends.

Introducing ML models

To manage the two problems described above, we use the Holt-Winters model. This ML model learns the trend and seasonalities of the timeseries.
We implement the rule in ThirdEye:

Detection rule with ML model

We preview the rule on the two challenging timeseries:

Outage timeseries. ML detection rule. Better management of noise in small values, but new false positives.
Important downward trend timeseries. ML detection rule. Better, but still 1 false positive.

This looks better: the ML model does not get trapped by seasonalities and trends. But we get new problems: there are new false positives, and they are difficult to explain.
Notice the sensitivity parameter in the config. When small, the model tolerates values that are far from the baseline. When big, the model is less tolerant: it can raise a lot of false positives. The problem is that finetuning sensitivity is not straightforward. Most importantly, sensitivity does not help to implement our main objective: detect drops of 80%.

What we would like is the best of the 2 detection rules:
→ an ML model to manage complex patterns in the timeseries
→ a simple rule to enforce our business requirements.

Combining rules is easy in ThirdEye: we use the ML model as a base detector, and we filter anomalies that don’t match our business rule:

Combining ML and simple heuristic rules.

Notice we set a higher sensitivity for the ML model. That’s ok, its false positives will be filtered by the business rule.

Outage timeseries. ML model and business rule combined. No false positive.
Important trend timeseries. ML model and business rules combined. No false positive.

We now have a robust rule! Or so it seems.

Challenge 3: False Trends

Think sales, event subscription deadlines, product release promotion… all of these spread on multiple days, but don’t last forever.
What happens then?

“False trend” timeseries: seems to grow for 2 weeks, then goes back to the previous trend.

Something looks like a trend, but it’s only temporary. The model learns it and makes irrelevant predictions: we get a false positive. With our many clients, this kind of problem happens every week.
It’s extremely difficult to manage such pattern with an ML model, so we add a new business rule: we compare the current value with the median of the last 5 weeks. The idea is that the median on a long history is not impacted that much by false trends. We don’t raise an anomaly if the difference with the median is not significant.

We implement the filter:

Filtering anomalies that are too close to the median.

We get a preview:

“False trend” timeseries. ML model, business rule, median rule combined. No false positive.

No false positive! In ThirdEye, it is easy to stack heuristics on top of ML models. This is key to achieving robust detection rules. We believe it’s a must-have feature in any industrial anomaly detection system.

Challenge 4: Transient errors

A problem we see a lot is transient data drop:
- clients perform maintenance: their traffic goes down for a few minutes.
- our pipeline has more delay than usual: values are small at detection time
These are not false positives, but such anomalies can spam a lot. If compatible with the SLAs, we want to ignore such anomalies.

Timeseries with a transient error. True positive, but too short to be relevant.

We stack another rule: ignore anomalies that last less than 2 hours:

Filtering short anomalies.

We get a preview:

Timeseries with a transient error. The error is ignored because it’s too short.

The transient error is ignored.

We now have a robust anomaly detection rule! Here is the complete yaml configuration:

Example of a robust rule for the simple requirement: detect drops of 80%.

Other challenges

We hope these visuals helped you understand the challenges of real-life anomaly detection. Below are some other challenges and the way we manage them in ThirdEye.

  • holidays: a classic topic in timeseries. It’s easy to manage in ThirdEye: you can ignore anomalies on specific timeframes. Actually, our main challenge at AB Tasty is to choose the correct calendars: depending on the country, holidays are totally different! For each client, we identify the major countries and take the corresponding calendars.
Ignoring anomalies on a timeframe. Here, ignoring on Patriot Day.
Ignoring anomaly if not enough historical data.
  • alert spam: detection jobs run every hour. If an anomaly is not resolved, ThirdEye will repeatedly send alerts. To avoid this, it’s possible to merge anomalies. We merge anomalies that are consecutive or spaced by a single hour. We end the merge after 3 days. This way, we receive a new alert if the anomaly still exists.
Merging anomalies
  • conditions on other metrics: In some cases, we want to skip the detection rule on a metric if another metric is too small. For instance, we ignore the transaction revenue detection (very noisy) if the number of visitors is too small.
Check the value of another metric before running a detection

Performance at scale

In this article, we focused on finetuning a single detection rule. This does not scale at AB Tasty because we need to create thousands of detection rules. We built a tool to generate the rules based on business info such as SLA, product package, and industry. We call it the detection rule builder.
Instead of measuring the performance of a single rule, we measure the performance of the detection rule builder itself and incrementally finetune it.

We do this in production.

We follow this iterative loop:
1. Generate new rules
2. Run the detection jobs, on production data, for all our clients. Don’t send the alerts to the on-call system.
3. Analyse the anomalies. Label them as useful or not. Identify the limits of the current generated rules. Identify finetuning opportunities.
4. Add the finetunings that generalize best to the detection rule builder.

Once the false positive rate is good enough, we plug the alerts into our on-call system. Throughout this process, we manually analyzed more than 400 anomalies.

There is no free lunch for anomaly detection at scale.

Conclusion

Even a seemingly simple detection rule can get challenging to implement in real life. We identified common problematic patterns in timeseries and explained how we manage them in ThirdEye.
We hope this article will help the data community in their next anomaly detection projects.

Resources:

- Xiaohui Sun, Smart alerts in ThirdEye, 2019, Linkedin
- Cyril de Catheu, Data Quality: Timeseries Anomaly Detection at Scale with Thirdeye, 2021, AB Tasty
- Tobias Macey, Gleb Mezhanskiy, Strategies For Proactive Data Quality Management, 2021, Data Engineering Podcast
- Perform research close to the production in Spector et Al, Google’s Hybrid Approach to Research, 2012
- Time series components in Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on September 15, 2021.

When not specified, image by the author.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store