Explainable and Accessible AI: Using Push Notifications to Broaden the Reach of ML at Headspace

Headspace

Published in

Headspace-engineering

11 min readOct 18, 2021

Author: Matt Linder / Co-Author: Koyuki Nakamori

TL;DR

Headspace wanted to serve personalized content to members who had not recently opened the app
We built out infrastructure that allowed us to leverage our Cloud services, plus the 3rd party Braze Customer Engagement platform, to serve Machine Learning-recommended content to members via push notifications
Our first experiment doing so got GREAT results
We implemented that experiment as an evergreen model in production
You should, too

Introduction and Problem Statement

Headspace’s core products are iOS, Android, and web-based apps that focus on improving the health and happiness of their users through mindfulness, meditation, sleep, exercise, and focus content. Machine learning models are core to our user experiences by offering recommendations that engage our users with relevant, personalized content that builds consistent habits in their lifelong journey. Headspace also has an amazing Lifecycle marketing team, who do incredible work in providing regular communications — through email, push notifications, in-app modals, and other surfaces — that further our users’ engagement, helping them on their journey from prospects to members to advocates.

Until recently, those two aspects — Machine Learning and the out-of-app communication channels served by Lifecycle- were totally separate. Traditionally, users consume our ML models’ personalized recommendations by

Navigating to one of the app’s tabs / views that contains content (for example, our Today tab).
Sending a request from the app client to ask our backend content services to load content to serve users.
Our content services then forward the request to our Prediction Service, which supplies a content recommendation for that user (or, if none is available, generates a default fallback content to serve).

The Problem

This approach has three primary limitations:

It isn’t able to directly respond to the immediate user’s context. For instance, if a user has recently searched for “trouble sleeping”, then when their local nighttime hits, we should ideally send them our best Nighttime SOS Meditation and Wind-Down content.
Inactive or dormant user segments (those users who have not opened the app recently) are inaccessible. To actually influence and engage users, a necessary precondition is that users have to log into the app first. This leaves a huge area of improvement to engage dormant user segments that contribute significantly to undesirable subscriber base churn.
Users don’t internalize / understand when content is actually being personalized, or why they are receiving the content recommendations they are receiving. That “meta-awareness” — when a user understands not only that something is being recommended, but why — can be hugely beneficial in boosting engagement. We all like it when something is made especially for us, so a lack of this type of Explainable AI is a serious missed opportunity.

Before we dive too deep into the solution to our threefold problem, let’s expand upon this idea of Explainable AI.

According to IBM “Explainable AI (XAI) is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms.” This is obviously a HUGE topic, with relevance in every area of AI and ML, but for the present moment, let’s scope it to the aforementioned use-case of content recommendation.

To me, Explainable AI in content recommendation can be effectively summarized in one phrase: “Because you liked ___, you might also like ___”. This phrase, combined with an effective ML model, achieves so much:

It uses kind and empathetic language to show that the service is paying attention. This makes it feel individualized.
It ties the ML model’s recommended content to previously-experienced content, with which the user has established a relationship. Personally, I find that this helps prime me to go into the new content with a positive attitude and an open mind.
It ties into an existing UX pattern with which the user is likely to be familiar and have positive associations. You can probably think of a couple of services you’ve used that use Explainable AI language like that above.

All of these things combined make Explainable AI a very useful paradigm for content recommendation, and they’re clear reasons why Headspace was eager to break into the Explainable AI space.

Explainable AI is also a core part of Headspace’s constant work in Responsible AI. Responsible AI is a huge and hugely important topic, more than worthy of its own blog post, but the short version here is that making our ML models explainable and interpretable allows us to better understand them from a human perspective. Content Recommendation isn’t an abstract subject: every single one of our “users” is a Headspace member — a very real person who wants to use our platform to help with their mental health and wellness journey. Our content often speaks to very sensitive topics, and we know our members’ relationship with the product is a very private one. Our recommendations need to be as trustworthy as our amazing meditation teachers, and we take this responsibility very seriously. Being able to explain to our members exactly why and how they were shown a recommendation is important. Headspace is very proud to have created a Responsible AI committee to keep us on track in this line of thinking, and that will guide all of our future work with ML and AI.

Our Solution: Push Notifications via Braze Canvases

The Headspace Machine Learning team tapped into our Engineering organization’s existing event bus and service infrastructure to create a scalable, maintainable pipeline for pushing ML-powered, XAI recommendations to our end users.

Using this infrastructure framework, we have been able to deliver content recommendations to user segments that were previously inaccessible. An example of an in-app modal content recommendation for sleep content triggered by users’ recent search queries is below:

We can also leverage existing ML content recommendation on a user’s content completion to recommend a next appropriate piece of content to consume (for example, recommending the next logical content progression in a course series that the user is currently in the middle of or using a sequence-based ML model to predict the next most-likely content complete):

In this case, the Sequential Recommender model we started with is a Markov Chain-based model. If you are not familiar with Markov Models, the basic gist is that they use the history of past “state changes” (in this case just think of a state as being a single piece of content that a user consumes, and a state change as a user moving between different pieces of content) to calculate the probability of future changes, allowing us to make predictions based on those probabilities. Markov Models are some of the simplest possible tools for Sequential Recommendation, which is why we wanted to use one as our baseline for: a.) gauging the effectiveness of Sequential Models in this use-case, and b.) benchmarking future, more advanced Sequential Models (more on that later).

High Level Architecture

The high-level architecture diagram for the ML-powered push notification infrastructure is shown below. Implementation details and design considerations will follow.

The Event Bus Architecture

The Headspace Machine Learning team leverages Databricks Spark as its default compute runtime — we use Scheduled Jobs to compute and refresh predictions.

On top of that “typical” architecture, there were a couple of key decisions we made in constructing the architecture for this solution:

We wanted to use Protobuf messages for scalability (the ability to push millions of messages very quickly)
Using an SQS queue to decouple the producer (Databricks jobs / ML models) from the actual consumers (our Braze Service, etc.) and keep the design consistent with the rest of the app’s microservice-oriented, cloud-based architecture.

With this architecture, the execution from the ML side is actually quite simple. Model data pull, training, and post-processing all happen via scheduled jobs on Databricks. We then use another scheduled job to send our prediction payloads — user id, previously-viewed content name, predicted content name, and a deep link to predicted content, just packaged as a Python dictionary — to our Push Requests SQS Queue.

This wakes up our Push Notifications Lambda, which — processing the payloads in batches — repackages each payload as a Protobuf and sends them on to our Event Bus SNS Topic, which puts them on the Headspace Event Bus. Our Braze Service, listening for relevant, Braze-related events, takes the ML prediction payloads and forwards them to Braze, a third-party Customer Engagement Platform that we will discuss… now!

Braze

We don’t have time to get into everything that Braze does, or even everything that Headspace uses Braze for, so we’ll stick to what’s relevant: push notifications.

In this use-case, Braze functions as a platform from which Lifecycle Marketing teams can group app users into audience segments and create Canvases and Campaigns to interact with those segments via push notifications. These push-notifications are often triggered by in-app actions, which is what the Braze service is typically listening for, but we’ve set it up so that we trigger our canvas with a custom event containing the very same ML prediction payloads we discussed in the last section!

Treatment/Control Bucketing and Canvases

As a mature Customer Engagement Platform, Braze has robust features for cleanly bucketing users into control/treatment groups for A|B testing and experimentation. The exact details of implementation are outside the scope of this article, but the basic outline was:

(with help from Lifecycle) We set up an appropriate Segment of users for our experiment
Then built our Canvas — a user flow that has the capability to: 1.) Bucket our user Segment into Treatment and Control populations; and 2.) Send different Push notifications to each of those groups
We used Braze Liquid variables to generate customized and personalized templates using the properties from our custom Event (remember: that payload from earlier)
We followed Lifecycle’s guidelines for setting up the many, MANY guardrails necessary for this type of direct communication with Headspace members: observing Quiet Hours, rate-limiting, global limits on the number of push notifications per day, etc., etc..

We also implemented a Braze feature called Intelligent Timing for our Canvas. In essence, this feature allows you “to deliver your message to each user at the time which Braze determines that an individual is most likely to engage. Braze calculates the optimal send time based on a statistical analysis of your user’s past interactions with your messaging (on a per channel basis) and app.” There’s a lot to like about this feature, but we’re especially big fans of the fact that it allows us to batch send our Events to Braze without having to worry about when our users are going to receive the push notifications, in terms of their local timezone.

Impact

So far, we’ve implemented the above setup in one highly-successful experiment that’s turned into an evergreen model. The experiment, which used a Sequential Recommender model to predict relevant content for our members, achieved great results in many metrics, but — as mentioned in the introduction — we were most interested in its potential impact on users who weren’t already in the app. To that end, we were thrilled to see 78.65% lift in Completes among our Dormant segment of members (vs Control), defined as people who had not completed a piece of content in the last 30 days.

Other significant metrics included:

[Open Rate]:

4.49% statistically significant lift in Direct Open Rate over Control

[Content Start and Complete]:

54.68% statistically significant lift in Content Start per Send
68.49% statistically significant lift in Content Complete per Send among all member cohorts

And this is just the beginning. Now that the infrastructure for serving ML predictions via push notifications is built out and this model is in production, the world is our oyster. There are so many possibilities with this new surface, not just for designing new experiments in content recommendation, but for any type of ML intervention we can think of. We’ve got big plans, and we hope to share them with you soon.

Next Steps

There’s WAY too much to share here, but to scope out a small preview, we can discuss just the ML side of things. The experiment above was performed with a baseline Sequential Recommender model, a Markov Decision Process module of our own design. As the ML-heads out there know, this is far from state of the art. Essentially, we achieved great results with a baseline model, but for the future, we plan on iterating towards more and more state of the art/complex models, including BERT4Rec (already achieving great results for us on in-app predictions), other Attention-based models like TiSASRec or Switch Transformers, all the way to full-on Reinforcement Learning, one of the current Holy Grails of ML.

Relatedly, we’re also interested in doing our own optimization on push timing, to replace Braze Intelligent Timing. Intelligent Timing is really great, but we want even more flexibility. If we’re going to recommend a user a piece of Sleep content, isn’t it better to do so in the evening, even if Braze knows that they normally open their Headspace app at 9am (and would thus send the notification around then)?

Key Takeaways

In the hopes that this post inspired you to take the plunge and start serving ML via push, we thought we’d share some lessons we learned along the way:

Best Practices:

Dogfood: Test Braze sends on a (VERY CAREFULLY CREATED IN BRAZE) Segment of in-house users, and make sure to have any delays/Intelligent Timing turned off
Listen to the experts: As mentioned a million times above, at Headspace, Braze is owned by the Lifecycle Marketing team, and they know what they’re doing. We always deferred to them on matters of Segment and Canvas creation, especially creative matters like notification copy.

Be Patient:

As you can tell from the Architecture section, a lot of work went into provisioning the Cloud resources necessary to set this up. If your organization has any type of (Dev)Ops and/or CI/CD infrastructure, expect to work closely with them in setting up the necessary connections (Queues, Topics, Services) to make it all work. Hope you like Terraform!

Lessons Learned:

Copy matters! Sure, we all like the concept of “Because you liked ___, we think you’ll like ___”, but take time to work with Marketing and Creative to sculpt your copy. What does it really mean to like a piece of content? Did the user like it, just because they watched/listened to it, or is there a better way of talking about it?

We hope this has been helpful. Serving Machine Learning predictions via Push opened up a whole new surface for us to interact with your members, and it’s been a really interesting system to develop.