Data Quality: Challenges of Real-World Anomaly Detection
Building an anomaly detection system is like walking on a tightrope.
Too many false positives: nobody checks the alerts. Too many false negatives: the time saved with useful alerts is not worth the time spent maintaining the system. In both cases, you fall short of expectations.
In a previous article, we presented how we use ThirdEye to detect and understand data quality incidents on thousands of timeseries. We shamelessly skipped one of the most important parts of such system: how to actually build detection rules that work. Let’s remedy that.
AB Tasty provides solutions to create online experiments and optimize user experiences. Our products are used by a broad range of industries, with clients of all sizes. Hence, the timeseries we monitor have extremely diverse patterns. This makes it hard to build efficient anomaly detection rules for all our clients.
This blog describes the challenges of real-life anomaly detection, and how we tackle them in ThirdEye.
Use Case
Let’s look at a typical timeseries we want to monitor: the number of visitors.
You can observe some patterns:
- daily seasonality: users visit around specific hours, and sleep at night
- weekly seasonality: users visit more on specific days of the week
- trend: on average, the number of visitors is growing
- noise: there are some random variations
If you are not savvy with timeseries decomposition, check the basics here.
We have 4000 clients, so we will have 4000 signals like this one. We want to detect data loss. Because we want to detect incidents quickly, we check values every hour. We define a potential data loss as a drop of 80% compared to the baseline.
For instance:
- the baseline is 200. The observed value is 140 → OK
- the baseline is 200. The observed value is 20 → Anomaly!
The baseline is the value we expect to observe, based on historical data. Computing a baseline can be simple or rely on complex ML models.
Looks easy right? Let’s see how we implement this in ThirdEye.
Implementation
First, we need to choose how to compute the baseline. We start simple.
Consider the seasonalities: it does not make sense to compare a Monday to a Sunday, nor to compare 6 PM to 6 AM. Hence, to compute our expected value, we use values with the same day of the week and the same hour.
In fact, to account for noise, instead of taking the previous value only, we will take the mean of the last 2 values:
We implement such rule in Thirdeye:
Just after writing the detection config, we can preview the behavior on historical data. This helps to check the performance:
The big anomalous drop on September 15 is detected! Great. But there is another anomaly - the red point at the left - that looks like a false positive.
Meet the first challenge of real-life timeseries anomaly detection:
Challenge 1: Noise in small values
Consider a client that has a noise of around ±100 visitors.
During the day, the client has around 2.5k visitors/hour. ±100 visitors is only a 4% change, so the detection rule does not trigger.
But what happens at night, when the traffic is very low? If the average value at night is 100, in some unfortunate cases, you can have a baseline of 180 visitors, and a current value of 30 visitors: an 83% drop!
This is exactly what happens above. At first, this looks ok: we get this false positive only one time in the preview. We could estimate we have this error only once a month. But remember: we have 4000 timeseries to monitor. One false positive per month for each timeseries means 130 false positives per day! This is not acceptable.
We could think of averaging the effect of noise by using more historical data. This could help but this would not solve the real problem: the alert rule does not manage seasonalities.
We will fix this with an ML model, but let’s look at another challenge first.
Challenge 2: Important Trends
Consider a timeseries with an important trend: the historical data will quickly become irrelevant.
We can see the baseline (the dotted orange line) is way too big. The average value goes down quickly, so the values from previous weeks are not relevant. As a human, we see the downtrend each day, but the simple rule only uses data from t-7 days and t-14 days. We end up with many false positives.
Here, the problem is that the alert rule does not manage important trends.
Introducing ML models
To manage the two problems described above, we use the Holt-Winters model. This ML model learns the trend and seasonalities of the timeseries.
We implement the rule in ThirdEye:
We preview the rule on the two challenging timeseries:
This looks better: the ML model does not get trapped by seasonalities and trends. But we get new problems: there are new false positives, and they are difficult to explain.
Notice the sensitivity
parameter in the config. When small, the model tolerates values that are far from the baseline. When big, the model is less tolerant: it can raise a lot of false positives. The problem is that finetuning sensitivity
is not straightforward. Most importantly, sensitivity
does not help to implement our main objective: detect drops of 80%.
What we would like is the best of the 2 detection rules:
→ an ML model to manage complex patterns in the timeseries
→ a simple rule to enforce our business requirements.
Combining rules is easy in ThirdEye: we use the ML model as a base detector, and we filter anomalies that don’t match our business rule:
Notice we set a higher sensitivity
for the ML model. That’s ok, its false positives will be filtered by the business rule.
We now have a robust rule! Or so it seems.
Challenge 3: False Trends
Think sales, event subscription deadlines, product release promotion… all of these spread on multiple days, but don’t last forever.
What happens then?
Something looks like a trend, but it’s only temporary. The model learns it and makes irrelevant predictions: we get a false positive. With our many clients, this kind of problem happens every week.
It’s extremely difficult to manage such pattern with an ML model, so we add a new business rule: we compare the current value with the median of the last 5 weeks. The idea is that the median on a long history is not impacted that much by false trends. We don’t raise an anomaly if the difference with the median is not significant.
We implement the filter:
We get a preview:
No false positive! In ThirdEye, it is easy to stack heuristics on top of ML models. This is key to achieving robust detection rules. We believe it’s a must-have feature in any industrial anomaly detection system.
Challenge 4: Transient errors
A problem we see a lot is transient data drop:
- clients perform maintenance: their traffic goes down for a few minutes.
- our pipeline has more delay than usual: values are small at detection time
These are not false positives, but such anomalies can spam a lot. If compatible with the SLAs, we want to ignore such anomalies.
We stack another rule: ignore anomalies that last less than 2 hours:
We get a preview:
The transient error is ignored.
We now have a robust anomaly detection rule! Here is the complete yaml configuration:
Other challenges
We hope these visuals helped you understand the challenges of real-life anomaly detection. Below are some other challenges and the way we manage them in ThirdEye.
- holidays: a classic topic in timeseries. It’s easy to manage in ThirdEye: you can ignore anomalies on specific timeframes. Actually, our main challenge at AB Tasty is to choose the correct calendars: depending on the country, holidays are totally different! For each client, we identify the major countries and take the corresponding calendars.
- cold start: ML models perform poorly without enough historical data. This happens when we have a new client. We had to implement a new filtering rule to manage this case. Adding new rules to ThirdEye proved to be quite easy.
- alert spam: detection jobs run every hour. If an anomaly is not resolved, ThirdEye will repeatedly send alerts. To avoid this, it’s possible to merge anomalies. We merge anomalies that are consecutive or spaced by a single hour. We end the merge after 3 days. This way, we receive a new alert if the anomaly still exists.
- conditions on other metrics: In some cases, we want to skip the detection rule on a metric if another metric is too small. For instance, we ignore the transaction revenue detection (very noisy) if the number of visitors is too small.
Performance at scale
In this article, we focused on finetuning a single detection rule. This does not scale at AB Tasty because we need to create thousands of detection rules. We built a tool to generate the rules based on business info such as SLA, product package, and industry. We call it the detection rule builder.
Instead of measuring the performance of a single rule, we measure the performance of the detection rule builder itself and incrementally finetune it.
We follow this iterative loop:
1. Generate new rules
2. Run the detection jobs, on production data, for all our clients. Don’t send the alerts to the on-call system.
3. Analyse the anomalies. Label them as useful or not. Identify the limits of the current generated rules. Identify finetuning opportunities.
4. Add the finetunings that generalize best to the detection rule builder.
Once the false positive rate is good enough, we plug the alerts into our on-call system. Throughout this process, we manually analyzed more than 400 anomalies.
There is no free lunch for anomaly detection at scale.
Conclusion
Even a seemingly simple detection rule can get challenging to implement in real life. We identified common problematic patterns in timeseries and explained how we manage them in ThirdEye.
We hope this article will help the data community in their next anomaly detection projects.
Resources:
- Xiaohui Sun, Smart alerts in ThirdEye, 2019, Linkedin
- Cyril de Catheu, Data Quality: Timeseries Anomaly Detection at Scale with Thirdeye, 2021, AB Tasty
- Tobias Macey, Gleb Mezhanskiy, Strategies For Proactive Data Quality Management, 2021, Data Engineering Podcast
- Perform research close to the production in Spector et Al, Google’s Hybrid Approach to Research, 2012
- Time series components in Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on September 15, 2021.
When not specified, image by the author.