Segmentation at Scale to Enable Better Health Outcomes

Henrik Kowalkowski
Inside League
Published in
6 min readMar 13, 2023
An illustrative example of the segmentation of a member’s journey

Motivation

At League it is our mission to empower people to live happier, healthier lives. In order to best serve our members it is imperative that we are able to predict their individual level of engagement. For example, if we notice a segment of members is unlikely to return to the app in the next week we can send them notifications to nudge them to return. In doing so we can help promote healthy habits like being active and eating well. We call our engagement prediction and segmentation service “Retention Engine”.

So how does all of this work?

TLDR; the Model and Segmentation Logic

We use a Recency, Frequency, Monetary Value (RFM) style model in conjunction with a Logistic Regression classifier to calculate the probability a member will return in a given week. (For our purposes we are only interested in the engagement components of the model so we omit the monetary value feature). We break these probabilities into segments using business logic to determine the segment a member is predicted to belong to in a given week. After these probability cutoffs are applied, we group members into four model based segments: unengaged, atrisk, engaged, and loyal. In the next week unengaged members are very unlikely to return to the platform, whereas atrisk members require immediate action to retain. The engaged and loyal members have a high predicted probability of returning. We segment the remaining unclassified members using simple rules based logic.

Technical Considerations

Tech Stack

League is a strategic partner of Google Cloud (GCP). For this project we wanted to leverage a variety of services that GCP provides:

  • Structured data is managed with BigQuery which provides low latency big data processing at scale.
  • Python functions are executed using Cloud Run.
  • Models are saved and loaded from Cloud Storage on a weekly basis.
  • Artifact Registry is used to manage image versions.
  • Processes are scheduled using Cloud Composer.
  • Looker is used to visualize segment counts day to day.
  • Salesforce Marketing Cloud is linked to BigQuery to customize messaging to the individual member.
  • Finally, as League is a FHIR native healthcare company, we write the results to Cloud Healthcare API as FHIR Observations.

Data Granularity and Period

The granularity of the data plays a large role in the performance and complexity of a model. To simplify the inputs for those consuming the results downstream, we use weekly engagement data. If the member logged in within a given week then they receive a value of 1, if not then 0. The outputs of the RFM model, recency, frequency, and t (time since the member entered the data window) are also of interest for further segmentation.

Another consideration is the length of data to make predictions with, also known as the data window. This period is set using business context and performance requirements. From a performance perspective, the data window can be considered a hyperparameter and is relatively easy to select for. From the business perspective it is the stakeholders, the downstream users of the segments and associated outputs, that need to be consulted to ensure that the window makes sense. For example, we may work with a client whose member base experiences high churn so a longer data window may not be suitable for messaging.

Performance Evaluation

Transparent performance evaluation is key to getting stakeholders to buy into the results of a model. This can be accomplished by considering the baseline performance context in which the downstream consumers of the model previously had to operate. If the retention model can beat out comparator baselines, it becomes clear to stakeholders that they can trust the results and apply them on a machine to machine basis at scale.

For our segmentation model we compare its performance to that of two separate baselines. The first baseline (Majority Classifier) is the most naive as it assumes that the majority action of members from the last week will be the action all members take in the next week. So if most members logged in within the last week, it will predict all members will log in within the next week. The second baseline is more tailored (Last Label Classifier). In this case it predicts on a member basis that last week’s action is the action the member will take in the next week.

For the area under the receiver operating characteristic curve (ROC AUC) a higher score is better and the score ranges [0, 1]. Since the Majority Classifier is a no skill classifier that predicts the same value for all members, it receives a score of 0.5 each week. Comparatively, the performance of the Last Label Classifier is greater as it is personalized to the member level. The Retention Engine performs the best, using the RFM and Logistic Regression combination. Knowing a member’s previous engagement is quite indicative of their subsequent engagement as the performance curves for Retention Engine and the Last Label Classifier curves mirror each other.

Obstacles

During development we encountered a couple of challenges that ultimately improved the results of our model. In previous iterations we had tightly constrained the data window. This had the benefit of restricting our predictions to members that our capability teams thought would be most actionable. However, the constrained window resulted in convergence errors as our model had too few different data points to fit with.

Increasing the length of the data window ameliorated this issue but resulted in the model predicting on more members than we might actually wish to take action on. Working with our marketing team we were able to create a fourth segment, unengaged, that could be classified but restricted from messaging. This also reduced the amount of data that we needed to write to Cloud Healthcare API, improving our write times which had been considerable. With these tweaks we were able to increase the performance/efficiency of the model/process without compromising its business application. In the figure below we provide an example of what a member journey might look like through Retention Engine:

The member begins their journey on day 1 and is active for the first 2 days (black stars). They are labeled as new for the first 2 weeks. After 2 weeks the model has enough data to make a prediction and classifies them as atrisk as the member only logged in within the first week of the period (during the first 2 days). In the week they are classified as atrisk the member does not log in so in the following week they are classified as unengaged. After 3 weeks as unengaged and without logins, the member falls out of the model and is marked dormant. After 90 days without login the member moves from dormant to lapsed. On the 99th day the member returns and is marked reactivated. In the following week the member is picked back up by the model and classified as atrisk. Over the next several weeks the member continues to log back in and moves from engaged to loyal.

Conclusion

Building this service took input from Data Scientists, Analysts, Engineers, Marketing, and Clients. This holistic approach allowed us to overcome several obstacles and produce a performant model to better understand our members. Our Retention Engine is just one of the exciting ways we are working to drive healthier outcomes as our platform grows!

--

--