Dan Frankowski | Pinterest data scientist, Business Analytics
Are your graphs getting you down? Do you feel like you’re drowning in data? The best use of data is to prompt useful action. Sit down and grab a drink while I tell a tale of trying to turn our own data frown upside-down. Or, less cute: learn how looking for anomalies in your daily business metrics can help you more quickly find, learn about and respond to changes in your business.
The year is 2013. We’re always trying to better understand our service, so we find ourselves asking, are existing Pinners happy? Are more people signing up? Are the new users happy? We had built plenty of graphs around our top-line goal of monthly active users, and if that number moved up or down, we tried to explain it, usually by digging into other graphs and data about individual features, platforms or any one of dozens of factors. Sometimes we discovered something broken or greatly improved, but wouldn’t notice it for a few days (or occasionally a few weeks!). Maybe we improved the experience on one platform, or broke it in one browser, or improved the flow of Pins in category feeds, or broke category feeds. Once we understood our data, we could fix what was broken, and learn from and celebrate successes.
The problem: too many graphs. There were hundreds of graphs (thousands of lines!), moving around every day. In that crowd of lines, just a few of the changes were the most important to our business that day, but we didn’t have people to evaluate every line.
At a hackathon later that year, we tried a quick hack to find the most informative, surprising, the most anomalous lines on the graphs.
We had an asset: we stored most of our daily business metrics in one place, an internal system called Pinalytics. Building something on Pinalytics data could be general and widely used. It had one point per day, per line, so although it wasn’t highly detailed, it often had a lot of data behind each point, so it should be less noisy. Detecting something after a minute might be better than waiting a whole day, but that’s a harder problem, and we figured detection after a day was better than not at all.
We needed to do two things: find the anomalous data, and send that information to the right people. We took a quick hack at each, as follows.
Finding the anomalous data
To find an anomaly, predict what is going to happen and highlight big deviations from the prediction. We tried a simple model: predict a steady rate of daily change. That is, the difference between day three and day two is the same as the difference between day two and day one. Example: maybe the number of users on your site on days one, two and three is 1,000, 1,010 and 1,020, meaning you add 10 users per day. If you add 10 users per day for a month, then one day you don’t add any users, that’s an anomaly.
However, the daily changes aren’t all going to be EXACTLY the same. If you add nine users instead of 10, maybe it was just a random fluctuation. To address this, we use the variation in the past to predict today’s variation.
To get detailed, we took the daily changes from the last four to six weeks, assumed they were normally distributed (i.e. a bell curve), then put today’s change on that normal curve. From that, we computed a p-value, which, roughly speaking, is the chance that the deviation from the pattern happened at random. So, changes with a very low p-value were unlikely to have happened at random. There are plenty of subtleties we are ignoring for our quick hack, such as multiple comparisons. We set the p-value threshold to 0.001, one out of a thousand. In practice, a real anomaly is often far outside the bounds of normal. We didn’t want people to learn to ignore the email, so we preferred to be fairly certain to report real anomalies.
Sending to the right people
Our goal was to quickly find and understand anomalies, and take any necessary action.
Not everyone is interested in every graph. We thought about allowing people to subscribe to anomalies in a graph, but we were concerned no one would do it. Life is busy, why go searching for more work when the benefit is unknown? (Remember, this was a quick hack.)
We figured there was likely someone at the company who knew the most about why a line moved, or would take action based on its movement, or both.
So instead, we did these things:
- We put each graph into a report group (for example, there might be 20 graphs in the “spam” group). We gave each report group a single owner who we thought might care about the data for the group. We disallowed teams because it’s easier to talk to a person than a team. Instead of asking people, we reassigned based on feedback if it was the wrong report group or owner. (Spoiler: this was only mildly useful, see below.)
- We sent all the anomalies, sorted by owner, group and graph, to a single email list with our “metrics avengers,” people across the company interested in how the business was doing. To make it easy to get more detail, each anomaly was linked to the graph that showed the anomaly. One user requested sparklines (small lines) in the email, so he could see it on his mobile phone. That would be great, but not in our quick hack.
- Every day one of us would look at all the anomalies (usually around a dozen), and reply to the email with some short commentary saying, “Sarah, this number changed a lot yesterday. Do you know why?”
How did we do?
We did pretty well.
We found plenty of anomalies of interest to people across the company. In some cases, we found broken things to investigate. In other cases, we found positive changes no one expected. We also found changes a few people expected (say, they were running an experiment), but many people didn’t know about so the right person could educate a great audience (our metrics avengers) to spread the word about the change, with very specific data about how it affects our daily metrics.
Examples of anomalous changes in the last few months:
- Interest follows went up after we started emailing recommended interests to follow
- Push notifications about board follows broke
- Signups from Google+ changed as we ran experiments
- Our tracking broke when we released a new repin experience
- Our tracking of mobile web signups changed
These examples show the breadth of changes: user-facing and tracking changes, expected and unexpected.
How to get better
Our one-day hack (well, it took a couple of days) is still running 18 months (15 Internet years) later, although in different forms on two different data sets (Pinalytics version 1 and 2). Better, it’s still providing value. However, there are many ways to improve it.
First, the anomaly detection is too chatty. It finds false anomalies. The model is simplistic. Our data has strong day-of-week effects (e.g., a Saturday is different from a Tuesday), which our model ignores, and some strong day-of-year effects (e.g., Christmas is different from other days). Also, there’s no notion of which graphs are more important to our business, so it alerts on real changes that are less important, which is distracting.
If you do one thing, make sure there are few enough alerts that people continue to pay attention. It’s better to skip some real events than show a bunch of fake ones, which may train everyone to ignore the alerts. We took one important step (possibly deserving of its own blog post) to reduce false alerting: we started producing spam-corrected business metrics. We took the most recently discovered spammer accounts, added them to a list and adjusted business metrics discounting the spammer activity. Before spam correction, many important graphs were snapping up and down like a flag in the breeze from spammer activity (that affected metrics, but not necessarily users!).
Second, the anomaly detection misses things. Our data has strong day-of-week effects, which not only explains large changes, but also hides small ones. If something normally goes up on Saturday, but it doesn’t this Saturday, that’s anomalous. It will also miss gradual changes, say over a week. These gradual changes would be harder to find, and might be confused with seasonality, but nevertheless may be missed. Finally, even if it finds something so important that we should continue to alert until we make the fix, it adjusts to the new level and stops alerting after a single day, as long as the next day is a change of a size we’ve seen before. This is both a feature (fewer alerts) and a bug (a change is only alerted once no matter how important).
Third, the anomaly detection is too slow. If something bad happens at the end of a reporting day, it’s another day of user actions, and then part of a second day aggregating the data to see the anomaly, resulting in a delay of nearly two days. This hack is valuable, but getting the data faster would be even better. Note however that doing real-time anomaly detection is harder, both because there is less data, and because the real-time measurement systems to instrument are more complex and may have more limited capabilities to work around. Graphite, for example, has time-series averaging and lagging, but not a general programming language.
Fourth, we could improve who sees what. Assigning reports to groups with owners was fine for visual grouping, but the owners had varied reactions to being nominated to look at these reports. Okay, many don’t look. There’s been far more action started by an interested party seeing an anomaly that may indicate something bad and emailing all metrics avengers to ask if anyone knows why. There’s still one global email list of metrics avengers, but our company continues to add people and graphs. With more to look at and more people to look, there may be more value in allowing teams to subscribe to their own alerts. We could also try to more systematically teach people why and how to look at these anomalies. Finally, we’ve also done some work to provide anomaly detection as a library that other teams can use for their own data, but there’s more to do getting that library in use.
A simple model based on the assumption that daily changes are normally distributed can be implemented in a few hours. Revise that model until it has only a few alerts, mostly real and important. Hook it up to a daily email to your metrics avengers, and you’re on your way!
Dan Frankowski is a data scientist on the Business Analytics team.
Acknowledgements: Andrea Burbank, Chunyan Wang and I developed this together. We also used the data and email infrastructure built by many, and relied on the tolerant vigilance of our metrics avengers to respond to many and continuing questions. We all remain open to the possibility that data, well-used, can help us be better.