Incremental Modeling of At-Risk Health Conditions

Published in

apree health (Castlight) Engineering

7 min readAug 21, 2020

Internship and Project Background

This summer, I had the opportunity to intern with the Castlight Health Data Science team that is responsible for developing and maintaining Castlight’s Genius personalization engine. The engine uses existing patient data, taken from a variety of sources, to classify users into clinically-informed rule-based segments and machine learning driven at-risk segments. Users are placed into at-risk segments for various health conditions, based on their likelihood of developing the condition in the future. These Genius segments are ultimately used to provide Castlight users with personalized recommendations. These recommendations can take the form of email campaigns that promote relevant, employer-sponsored health programs, as well as in-app cards that link to educational content and wellness programs.

My internship project aimed to explore an incremental approach to modeling the ML-based Genius segments. The current modeling approach trains a classifier on an aggregated set of features that come from several different sources, including medical claims history, Rx claims history, biographic/demographic data, and in-app behavioral activity. We refer to the set of features coming from a particular data source as a feature modality. The Incremental Modeling Project aimed to clarify the value that each specific modality is contributing to the overall model performance by answering the following questions:

What is the relative importance of each of the feature modalities for the Genius at-risk segments?
Is an incremental modeling process, where we separately train classifiers using a subset of feature modalities for which patients have filled data, more powerful than modeling using an aggregate of features?
Can we develop a process to improve model performance for patients that have filled biographic and demographic data, but are missing data from the medical and Rx claims modalities?

How Useful is Each Modality?

To address this question, separate models were trained only on the subset of features belonging to particular modalities. For this project, we were primarily interested in the diabetes, lower back pain, and pregnancy personalization segments. Additionally, the behavioral activity modality was not emphasized, as the segments of interest had lower fill rates for this modality, Here are some key insights that helped us better understand the relative importance of each modality:

For the diabetes segment, most of the predictive power came from the medical claims modality.
For the low back pain segments, both medical claims and Rx claims provided high predictive power. However, including both of these modalities was not significantly more useful than including just one of them. These results are visualized in Figure 1.
Interestingly, for the pregnancy segment, biographic data seemed to be more useful than medical data. Information about the user and the user’s family, especially related to age, were slightly more useful than specific medical claims features for predicting pregnancy.

**Figure 1: Performance of models trained on different subsets of feature modalities.**

How Does An Incremental Modeling Process Compare to Aggregate Modeling?

From this set of results, we can ask a follow-up question related to how we engineer the model. If we train separate models for patients only on feature modalities that are filled for that set of patients, can we achieve better performance than if we train a model using all patients and all modalities together? In other words, would we see improvements if we train separate models on patients who have filled medical claims data and those who are missing medical claims data?

To answer this question, a simple experiment was run on both the diabetes and lower back pain segments. This experiment essentially compared the performance of classifiers trained separately on populations that had fully filled data for a given subset of modalities to a classifier that was trained on an aggregate of all patients and feature modalities.

Evaluation of consistent test sets found that modeling incrementally based on filled modality subsets led to slightly, but not significantly, higher precision values than modeling aggregately. These results held for both the diabetes and lower back pain segments. This confirmed our initial intuitions — that the machine learning algorithms currently being used in Genius segmentation are adaptable enough to handle populations that have low fill rates for certain modalities.

Can we Improve Model Performance for Wellbeing-only Customers?

Perhaps the most important question relevant to this project had to do with improving model performance for patients that are missing large chunks of data from a particular modality. Specifically, Castlight does not receive medical or Rx claims data for most Wellbeing-only users, and thus they miss out on some of the value of Castlight’s personalization engine.

The first step in exploring this task was to better understand the role that specific medical and Rx features played in contributing to model performance. Was including every medical or Rx feature necessary to get improvements over the baseline biographic and demographic model? Could we reduce these down to just a handful of useful features?

To narrow down the number of medical claims features necessary, the most informative features from the medical-only and Rx-only models were “stacked” onto a baseline model that only consisted of biographic and demographic features, one at a time. This analysis was repeated for each of the three segments to get a sense of the minimum number of medical features required to get significant boosts in predictive power over the baseline model. For each segment, the analysis revealed that precision converges after adding 15–20 medical features to the baseline model, with meaningful boosts to precision after including just a handful of medical or Rx features. This is a significant reduction from the hundreds of medical and Rx features currently used in the aggregated model. Figure 2 shows the relative improvement in precision after incrementally adding the most informative medical and Rx features for the lower back pain segment. Similar patterns were found when this analysis was applied to the other segments.

**Figure 2: Improvements in precision after incrementally stacking the most important medical and Rx features.**

For the diabetes personalization segment, this analysis allowed us to identify just a handful of important medical features. Significant improvements in precision were found when these features were included in a baseline model consisting of only biographic and demographic features. The most important insights from the diabetes-segment are discussed below.

In the diabetes at-risk model, the most useful medical claims predictors were related to hypertension and pre-diabetes. Just adding these two features to the baseline biographic and demographic model led to significant improvements in predictability. This is consistent with what we know about diabetes, as people with diabetes are much more likely to have hypertension than people who don’t have diabetes. Similarly, being treated for pre-diabetes is an explicit sign that one is likely to develop diabetes in the near future.
Other important medical features were claims for non-traumatic joint disorders, heart disease, and lipid metabolism disorders. These issues are all complications that are known to be associated with having diabetes.

The low back pain personalization segment was slightly different from the diabetes segment in that there were significant amounts of Rx claims data included in the aggregate model.

Because fill rates were similar between Rx and medical claims features, we decided to stack features from both modalities onto the baseline biographic and demographic model. We found that the most useful features added to the baseline model are a mixture of medical claims features and Rx features.

Two of the most useful medical claims features that led to a significant boost in precision were not related to a specific medical condition but conveyed the number of dates a patient received treatment and the number of distinct providers that a patient was treated by. This suggests that people who have back pain-related issues are more likely to receive many treatments from several different medical providers.
Rx claims features were also predictive for this segment. Classes of drugs (FDB Parent Group) related to analgesic and anti-inflammation (painkillers), central nervous system age, locomotor system, respiratory therapy agents, and gastrointestinal therapy were all quite useful in improving the precision of the model. As expected, drugs belonging to these groupings are all used either to treat back pain directly or to treat conditions that are associated with back pain. Including the most important medical claims features led to further, gradual improvements in precision, and tended to provide information that was similar to the most important Rx claims.

Future Directions

Knowing just a handful of medical or Rx-based features can greatly improve Wellbeing users’ personalized experience with the Castlight app by allowing for more relevant content recommendations. Of course, obtaining this sort of specific medical information can be a challenge, especially when we don’t have access to the full set of claims history. Asking users directly about what treatments they are receiving or what drugs they are taking may be seen as intrusive. Additionally, patients may have trouble relating their own healthcare experiences to the level of specificity of CCS2 categories for medical claims or FDP groups for Rx claims. A follow up to this project could look at using this subset of highly predictive medical and Rx claims features to develop a more advanced health questionnaire targeted at Wellbeing-only users. Ideally, responses from such a questionnaire will improve the predictability of Genius at-risk segments for any patients for which Castlight has limited or no access to Rx or medical claims.

I am incredibly grateful that I had the opportunity to work with some amazing engineers and data scientists during my time at Castlight. I’d especially like to thank Vinay Yadappanavar for introducing me to this project and providing me with his constant guidance throughout the internship. Also, special shoutouts to Sunil Rath and Ankur Jain for their expertise in all things data and ML-related.

Incremental Modeling of At-Risk Health Conditions

Written by Aayush Jain