Using Machine Learning to Automate Anomaly Detection on a 100,000 Device Network

Published in

Inside Outcome

9 min readMay 14, 2019

How Outcome Health improves the clinical workflow

Outcome Health is a healthcare innovation company that showcases relevant content to patients, caregivers and healthcare professionals at the point of care. We have a fleet of over 100,000 devices in providers’ offices across the country that improve the patient experience by delivering high-quality educational and entertainment content when patients most need it.

Our Wallboard product, for example, features interactive anatomy models that can help physicians easily explain a patient’s condition.

You can find more info about the Wallboard and other products on our website here.

Managing such a large fleet of devices requires significant operational maintenance. In particular, it can be difficult to understand how new software updates and releases are affecting the business value of our network. Device up-time, successful content download and play data, and network connection are just a few of the metrics we need to keep track of — and in-house testing can only catch so many bugs.

In this post, I will describe how Data Scientists and Data Engineers on our Product Analytics team built an automated Anomaly Detection system for finding issues on our network caused by software updates and alerting end users with actionable information.

1) What are the Product Analytics Team’s KPIs?

Before diving into the details, it will help to provide some context on our Product Analytics Team’s KPIs. There are 3 metrics that our Product Analytics team cares about above all others:

The number of devices in the field
The number of devices heartbeating (online)
The number of sponsored content displays/plays on a device

Anything that affects these 3 critical KPIs requires immediate attention, and is therefore a prime candidate for automated anomaly detection.

2) Choosing the Right Model for Our KPIs with Jupyter Notebooks

Before trying to detect anomalous KPIs, we needed to set expectations for our KPIs in the first place. This was the perfect use case for some basic modeling.

How We Prototype

We prototype models (and most analytics related functionality) in Jupyter Notebooks. They’re perfect for data exploration and testing, which was exactly what we were looking for.

We’ve built a custom, installable package we’ve creatively dubbed “data-pipeline-utils” that allows a user to pull data from any of our many data sources (Salesforce, Treasure Data, S3, on-premise databases) with the same syntax. This has sped up our development process considerably, as it’s less of a pain to draw data from different sources, which we do all the time.

from data_pipeline_utils import Database# Initialize database objects
db1 = Database('db1')
db2 = Database('db2')
treasure_data = Database('treasure_data', 'event_metrics')# Results to dataframe
df = db1.to_dataframe(query='SELECT * FROM heartbeat_logs WHERE created_at > NOW() — INTERVAL 1 HOUR')
df = db1.to_dataframe(table='statuses_daily', timestamp_field='created_at', start_timestamp='2018–04–01', end_timestamp='2018–04–02')
df = db2.to_dataframe(schema='master_inventory', table='devices')
df = treasure_data.to_dataframe(table='event_metrics_daily')# Results to file
db1.to_file(query='SELECT * FROM heartbeat_logs WHERE created_at > NOW() — INTERVAL 1 HOUR', filename='db1.heartbeat_logs.csv')
db1.to_file(table='devices', compression='gz', filename='db1.devices.csv.gz')# Results to S3
db1.to_s3(query='SELECT * FROM heartbeat_logs WHERE created_at > NOW() — INTERVAL 1 HOUR', filename='db1.heartbeat_logs.csv', bucket=, key='db1/2018/04/01/db1.heartbeat_logs.csv')
db1.to_s3(table='devices', compression='gz', filename='db1.devices.csv.gz', bucket='oh-data-lake', key='db1/dt=2018–04–01/db1.devices.csv.gz')# Execute SQL
results = db1.execute(query='SELECT * FROM devices')
results = treasure_data.execute(query='SELECT * FROM cmapaign_delivery.deployed_wallboard_daily')
db1.execute(query='UPDATE devices SET last_seen_at = NOW()', is_dml=True)

How We Choose Models

When it comes to actually choosing a model, we focus first on understanding the relationship we’re trying to model as best we can. While it may be more fun to just try on lots of complex models, that is almost always less efficient (and usually much less effective) than spending the time to deeply understand the relationship you’re modeling.

We focus first on understanding the relationship we’re trying to model as best we can.

Our Use Case

In our case, modeling the number of sponsored content displays on a device required understanding how we determined what played on a given device. A breakdown shows a relatively clear, straightforward relationship with only a few key variables:

Sponsored content plays are limited based on when clinics are open.
Sponsored content plays are correlated with how many sponsored content items are on a device — the more sponsored content, the more plays, generally speaking.

By setting up a simple, piecewise linear function based on (1) whether a day was a weekday/weekend and (2) the number of sponsored content items on our devices, we were able to model the majority of the variance in our content plays.

Not only was this model effective enough for our purposes, it was easily explainable and intuitive to business users.

Adding Memory to Our Models

One added complexity we faced was getting our model to “learn” from previous phases of a release. We release our software updates in phases; Phase 1 includes 1% of the network, Phase 2 includes 5%, phase 3 includes 25%, and so on. In the event that a software update meaningfully changed our KPIs for a product-improvement related reason (playing sponsored content more frequently, for example), we would not want to flag the change in our KPI during every release.

Instead, we wanted our model to “learn” from previous phases, which entailed some kind of memory. We implemented this by assuming that the pre-to-post-update percentage change to a KPI from a previous phase would propagate through later phases. We then simply applied that percentage change to subsequent expectations.

This simple methodology has actually worked surprisingly well, as you can see below.

An example of memory being propagated from Phase 2 to Phase 3 of a release.

While this solution obviously does not cover all cases, and can be thrown off by the small sample sizes of earlier phases, it has been more than good enough for our purposes.

3) Choosing the Right Infrastructure — Airflow & Docker

How We Develop Airflow Workflows

Deciding where to run and save all of this data was made much easier by our newly built Airflow environment. Airflow is a workflow management tool that allows you to easily build and schedule ETL jobs. It is based on the concept of Directed Acyclical Graphs (DAGs), which can be thought of as a graphed “to-do list”. We ingest all data from all our devices into our data lake via Airflow workflows.

Development of Airflow DAGs has been made much easier with Docker — a containerization service that allows you to run isolated applications on a machine. By pulling down the Docker image of our production Airflow instance into our local environments, then running a copy of Airflow on our local machines, we’re able to rapidly build and test new Airflow workflows. This was a crucial part of building Anomaly Detection within Airflow — our iteration cycles would have been much longer without Docker.

Running Airflow on your local host is easy with Docker!

Our Use Case

Although the functionality described here isn’t what would typically be considered an ETL job, it was an output we wanted to provide on a regular, daily basis. This daily cadence would give us time to gather enough content play data to determine whether our KPIs had been meaningfully affected.

Given those parameters, Airflow seemed like a good fit. So, we extended our Airflow instance to functionality beyond ETL processes, building an Airflow workflow to run the data collection, modeling, visualization, and alerting functionality needed by Anomaly Detection.

Our next iteration will likely be to turn this functionality into a service that updates in real-time. However, as an initial concept, Airflow has been great for releasing fully-functional data products quickly.

4) Creating the Correct, Actionable Output

A Data Scientist’s Chief Responsibility is Instigating Action

While modeling is often spoken of as the primary responsibility of a data scientist, at Outcome Health, we believe the primary responsibility of our data scientists is to translate data into immediately actionable insights that generate business value.

Most of the time, when an insight is delivered and no action is taken, the cause of inaction has more to do with communication than it does with lack of issue urgency. Ambiguous alert messaging, unclear next steps, and friction between message and action all work to prevent insights from being immediately actionable.

It is a data scientist’s responsibility to either build around or remove those obstacles. To that end, figuring out how to get end users to take action on the insights we deliver is probably the most important piece of any project. It is where we spend the bulk of our time.

Figuring out how to get end users to take action on the insights we deliver is probably the most important piece of any project.

Our Use Case

Our solution for Anomaly Detection was:

To go to where the users are: Slack.
To plug everything required for action directly into the Slack message.

This removed 2 points of friction right off the bat — users were alerted via a pushed message, rather than manually checking a dashboard, and they did not need to go digging for the data related to the alert.

What is required for action depends on the end user. In our case, our end users wanted to see the KPI data for new software releases graphed — so we provided that to them in Slack. Their response times to issues immediately improved, which fulfilled the “immediately actionable” criteria we were aiming for.

An example of a software issue being caught. Want more details? Feel free to reach out!

5) Outcomes & Results

Since implementing Anomaly Detection, we have saved hundreds of man-hours of manual tracking, significantly reduced operational risk, and greatly improved response time to issues that do arise. All that with nothing more than a simple model, some business logic, and an Airflow workflow.

Perhaps the most interesting, unexpected outgrowth of this work is the community buy-in and interest in other use cases. Once it became clear what was possible with this simple Anomaly Detection “framework”, the flood gates opened to all kinds of requests. Everything from Salesforce data entry QA to increased visibility into changes in ad campaign status have been proposed and tackled.

We have, to-date, implemented 10 separate use cases under this ever-broadening umbrella of “Anomaly Detection”.

6) Lessons Learned & Looking Forward

We’ve learned a great deal in the process of building Anomaly Detection over the past year. However, of all our fresh insights, the following were probably the most beneficial:

First, Airflow is good for more than just regular ETL: we’ve expanded our use of Airflow to nearly anything that can be scheduled. It has been highly effective for rapidly building high-fidelity prototypes of data products.
Second, highly-specific instructions are much more valuable than most dashboards: our initial methods for identifying operational issues centered on dashboards. While dashboards are useful in many situations, dashboards leave much to be desired when it comes to generating action. There’s simply too much friction between visualization and action — a well-placed instruction reduces that friction to nearly zero.
As a result, we are able to rapidly iterate on products that generate real business value through action: this framework has been instrumental in improving how we build systems at Outcome Health.

Looking forward, we’re beginning to build increasingly technical, interesting systems. The newest use case in development is the most exciting yet — we’ll be using image recognition to ensure that screenshots taken on our devices in the field are close enough to “correct”, validated screenshots. This will catch errors like black screens, blurry videos, and more — all while using some nifty new tech.

We’re excited to see where the future takes us as we strive to improve the patient point-of-care experience.

Want to join a growing analytics team working to change healthcare? We’re always looking for talented people to join our Product Analytics team! We’re currently actively hiring for Data Analysts and Data Engineers, but regardless of role, if you feel you have something to contribute, we want to hear from you! Feel free to reach out at darrin.lim@outcomehealth.com, or check out our careers page here.

Special thanks to Jenny Beightol, Michael Gunn, and Shashin Chokshi for reviewing this blog post.

Using Machine Learning to Automate Anomaly Detection on a 100,000 Device Network

1) What are the Product Analytics Team’s KPIs?

2) Choosing the Right Model for Our KPIs with Jupyter Notebooks

How We Prototype

How We Choose Models

Our Use Case

Adding Memory to Our Models

3) Choosing the Right Infrastructure — Airflow & Docker

How We Develop Airflow Workflows

Our Use Case

4) Creating the Correct, Actionable Output

A Data Scientist’s Chief Responsibility is Instigating Action

Our Use Case

5) Outcomes & Results

6) Lessons Learned & Looking Forward

Written by Darrin Lim