How to Design Data-Driven Alerts

Colin Jemmott
Seismic Innovation Labs

--

Alerts are an underappreciated part of data science. Designing good alerts was much harder than I expected it to be — my first several attempts at alerting systems were dismal. This article describes an approach that ended up working well. Because my experience is B2B SaaS companies, this discussion is geared towards alerts that are sent to people at work.

What are “data-driven alerts”?

First, by “data-driven alerts” we mean notifying users about meaningful data changes. Understanding exactly what that means will help us make good architecture and design choices, so let’s step through it slowly.

Notifying: This could be email, text message, an alert inside an app. But it isn’t simply logging or analytics, it is an active process designed to, ummm, alert!

Users: alerts are for humans! They should be designed for humans and sent to actual people.

Meaningful: This isn’t simply a weasel word. Meaningful is usually a business question. A good test is if the alert involves both money (making or losing it) and control (the user can do something about it). Alerts that involve lots of either one can be enough, but if there is not either one involved, it probably is a bad alert. Meaningful is also a trust question — false alerts erode trust and cause future notifications to be ignored.

Data changes: These alerts are not simply about state (e.g. the server is down), but are about streams of data. Usually the goal is to figure out what is “normal” and what is “surprising”.

So taken all together, “notify users about meaningful data changes” makes our goal clear: helping users make data-driven decisions. Notifications should cause humans to change their behavior. If your alert doesn’t cause someone to do something different, it isn’t a good alert.

So how do we do that? Here is the approach that worked for me:

Data

Before you even get started with data-driven alerts, you need to have access to clean, near real-time data that you understand. Solid data is critical for alerts. Next, you should share that data, preferably with the person who will be getting the alerts. This reporting might be plots, an API, a BI tool, whatever. By sharing the data with them you will get feedback about quality, facilitate discussions about what is important, and allow end users to self-serve when they have questions about alerts.

Metrics

Choosing the right metric is a business question and a math question. First, identify common scenarios (not rare cases) that have significant business impact. Next, try to write down a metric that captures that scenario that is robust, explainable, and aligns with intuition. I find it best to describe exactly what the metric means in words, trying to be as crisp as possible.

It is worth spending some time on this, because it is the hardest step. For example, we may have an email marketer that wants to know when their click rate has significantly dropped. So our metric might be “number of clicks today divided by number of emails sent”. As it turns out, this is a terrible metric.

On the business side, sometimes you need to dig deeper with the customer to understand what they are really asking. Our marketer may say they want to know “when the click rate is low” but what they are really concerned about is “Did I just send a mass email that people aren’t clicking on?”. These questions may have different proxy measures.

Translating that to math means calculating the percentage of emails are clicked on at least once (unique click rate), rather than the total click rate (which can be inflated by a few bots clicking thousands of times).

It turns out unique click rate doesn’t exactly address the marketer’s main concern because emails are often clicked on much after they are sent — sometimes days later. Our alert needs to be timely, so we can’t wait for all the clicks to come in. One solution is to predict what the expected final unique click rate will be with only the first few hours of data.

Marketing emails are often delivered (blue) in large batches, but the user clicks (green) take much longer to happen. This means the click rate increases over time, every time!

So our revised metric might be “What percentage of emails sent in the last 24 hours do we expect to be clicked on at least once?” It is imperative to find a precise wording of the question that addresses the business need and can also be calculated efficiently.

Scores

While it is possible to notify directly from a metric, I prefer to translate the metric into a score. This score is designed to normalize, add robustness and detect anomalies. A good score will answer the question “how surprising is this metric?”

My personal preference is to normalize the metric so that 1 “typical” and higher is “bad”. So in my email example, a good score might be “30 day average metric / 24 hour average metric”.

As it turns out, this does a abysmal job as a score in practice. The problem is that marketing email volume is highly variable, and so would alert on days when very few emails were sent. This fails our “meaningful” test! One way to fix this is with a volume penalty that reduces the score when the volume is less than a typical day.

Notifications

Don’t do this to my inbox!

Now we have a score that tells us when there is a meaningful data change! The last step is notifying a user. It turns out it is critical to design a notification manager to avoid sending tons of meaningless alerts, or not sending critical updates.

Imagine we run our alerting code as an hourly batch job. After a bad send, the user would receive an email every hour until the condition clears! Here is a general recipe for a notification manager to build from:

Send an initial notification when score > threshold for N hours.
Send another notification when:
- Score doubles (getting worse)
- Score < threshold for M hours (cleared)
- Is ongoing for 24 hours

Other special cases to watch for in designing a notification manager are on-boarding (what do you do when you don’t have the history to make your score?) and correlated or nested alerts (often multiple things go bad at once, resulting in a cascade of simultaneous alerts that should be bundled or suppressed).

Alerts for Humans

Now we have clean, real-time data we understand, metrics that accurately capture important business scenarios, scores that robustly tell us how surprising or bad the current metric is, and a notification manager to assure we only bother someone when it counts. But our alerts are useless if they don’t provide enough information for the recipient to take action.

An example email alert that describes the situation, provides context, and provides resources for the user to begin diagnosing and fixing the problem.

Notifications need to:

  • Describe the data. What is the source? How is it filtered? When was the last sample taken?
  • Provide context. Show or describe trends, recent minimum or maximum, the scope of the problem. Help users understand what “normal” is.
  • If possible, identify causes and solutions. If it is not possible to automatically diagnose the issue, offer help or provide links to resources.
  • Include additional information. The notification is a high level jumping off point, and users will be left with questions. Ideally give them access to a BI tool or plots where they can answer questions, but even a link to a raw log of recent activity can be helpful.

Notification UI/UX matters. Give users control over how and when they are notified. It is unforgivable to not allow users to opt out, but best practice is to allow them to change the threshold / frequency of notification. Finally, strive for clarity in design, paying careful attention to placement, ordering and colors. Clearly indicating the level of the notification (informational / alert / critical) helps users prioritize. And always remember that the user has other tasks, and may not understand jargon.

Development Plan

Development of data-driven alerts is inherently iterative — even with clearly defined situations there are often edge cases that aren’t immediately obvious. To make sure you only send good alerts to users, I recommend an approach like this:

  1. Historical Testing. Use a BI tool or code to develop the alerting pipeline, and then play that back on historical data to see when alerts would have fired. I find it helpful to also have some synthetic data that has some typical and some unusual cases. The goal here is to get a feel for when alerts would have fired if the system had been active.
  2. Internal Testing / Human-in-the-loop Testing. Once the system is up, start by sending the alerts just to yourself. Get a feel for them — do you care about the alerts you are getting? Next, start reviewing and manually approving alerts to be sent to the end user. It is better to be conservative here.
  3. Customer alerts with follow-up calls. Once alerts start flowing to customers directly, it is good to call them directly and offer to help them resolve the issue. This will help you know if the customers understand the alerts, care about them, and can help you improve your notifications. While this is time consuming, there is no substitute for direct end-user feedback.

Conclusion

This is only one way to build alerts — there are many. At the end of the day, as long as the user being notified changes their behavior, then you have helped someone make a data-driven decision. It doesn’t get better than that!

--

--

Colin Jemmott
Seismic Innovation Labs

I am a data scientist at Seismic Software and lecturer in the Halıcıoğlu Data Science Institute at UC San Diego. http://www.cjemmott.com/