Engage : Predicting patient disengagement from weight loss program

8 min readFeb 14, 2019

INTRODUCTION

Obesity has become a serious health problem in the United States and recent studies showing that it is much complex problem than initially though of as a simple calorie intake problem. In medical terms, it is defined as a chronic, relapsing, multi-factorial neurobehavioral disease where increase in body fat promotes tissue dysfunction resulting in adverse metabolic, biomechanical and psychosocial health consequences such as cardiovascular and hormonal problems. There is also a drop in the overall life quality due to hardship of moving at extreme cases. Currently 38% of the Americans have obesity and 32% are labeled as overweight. As the number goes higher, so does the cost associated with the health expenses and it is expected to reach around $1T within 10 years.

Weight control and fighting with obesity requires both professional health from medical experts and general changes in the lifestyle and eating habits. According to Dr. Bailony co-founder of Enara Health, “It’s critical to approach patients not only scientifically, but also personally.”. Traditional clinical methods provide the professional help but the personal involvement is very limited within the clinic visit time. Enara Health’s “digital clinic” approach brings a new supplement to the clinical visits. Their online platform tries to maximize the patient engagement and gives the professional support and awareness through online interactions. These include daily interactive readings and tasks about healthy lifestyle, snap a food photo and get professional reviews, as well as online appointments with medical providers, all within one app. Patient weight loss progress is tracked with smart scales apart from the regular clinical visits for continuous recording.

PROBLEM DEFINITION

Enara users lose about 15% of their weight loss as they progress through the program and continue to keep their weights in control for over 2 years. This is an exceptional achievement, compared to the other non-invasive, non-surgical programs. However, as any business model, characterization of user churn which is patient drop-off or disengagement in this case, is always a priority task to optimize to increase the effectiveness of the program and keep users engaged. Furthermore, retaining existing users is cost effective approach to for business aspects of the program.

In this work, I aim to determine the critical features that leads to patient disengagement and create an alert system for people at the risk of dropping off. This work will create actionable insights for the company on critical areas to improve their quality of service and keep their patients engaged.

DATA SET

Mobile app is a great data source to extract patient engagement and activity level.

To effectively characterize patient drop off, behavioral user data from online platform is the critical piece of data. Individual user logins as well as interactions with the tasks and readings and uploaded food photo count/frequency gives clear insights about the user engagement. Secondly, Enara has a clear advantage over the traditional services to directly evaluate user feedback: online appointment and messaging history between patients and healthcare providers. Finally, patient demographics data complements the data set. Total 15 critical features were used to model this system.

METHODOLOGY

Classification of the patients that completed the program and drop out groups was the main approach to address this problem since the groups were clearly labeled. To supplement this methodology, I have used survival analysis to predict the user drop out times and assign a risk level for each user.

Classifying the users between two groups is a great way of determining the critical features of the engagement activity. First overview of the average of user appointment and engagement data shows clear signals for the feasibility of this approach. There are clear differences between the two groups even on the average behavior. However, this is a time dependent problem where user engagement over time fluctuates. Specifically, time lapse between user login events, appointments and food photo uploads may vary greatly. This become a bigger issue, considering the fact that lower patient engagement towards the drop-off point or very high engagement at the initial enrollment may significantly skew the averages. To solve this issue and attribute memory effect, I have calculated an exponential weighted average of time lapse between online engagement events to be used as a feature. The weight function was iterated to capture the best memory effect using the recall as a readout.

User statistics differs between drop outs and active user base.

Another advantage of Enara’s online system is the direct access to patient-doctor interactions. I ran a sentiment analysis using textblob library on conversations between patients and doctors to evaluate the sentiment of the patients over time. This is a direct indicator of the user satisfaction levels that were included in model building and it will enable to create an actionable insight on the company part to re-evaluate their responses to the patients.Messages were categorized as sent and received messages by patients to evaluate the both sides of the conservations.

Apart from the user engagement and message sentiment, patient demographics such as user’s age, gender and initial BMI’s were used to check for any possible bias between specific groups.

Next step is the model selection Random Forest Classifier were used since they considered highly robust due to number of decision trees participating in the process and robust for over fitting. The algorithm does well on getting the relative feature importance (90% recall), which pivots the analysis to create actionable insights. Although it is considered as a slow algorithm since it has multiple decision trees, however the data set was not relatively large and feature number is not very big so the algorithm run time was not a problem.

Classification models helps us to determine the critical features and act upon those, however it lacks the prediction power of temporal resolution based on the way data set is handled. To supplement this, survival analysis was implemented. It is a method for analyzing data where the outcome variable is the time of the occurrence of the event of interest which is patient drop-out in this problem.

Initially, survival function is estimated using the Kaplan-Meier Estimate to look at the population dynamics. Further, Cox’s proportional hazard regression model is implemented using the idea that log-hazard of an individual is a linear function of the static covariates and population-level baseline hazard that changes over time. During this analysis, proportional hazard assumptions were checked and data set was stratified to adjust for the failing test.

RESULTS

Classification

Random forest classification model resulted with 80% accuracy and 90% recall for the 80%-20% train-test split for analysis on 500 patients. The classification algorithm enables us to determine the most critical features were to set the groups. Among many, the most important features for user engagement is the overall weight loss of a patient during the program enrollment time. This directly correlates with the basic assumption of consumer products: if people enjoy the product/result, they will continue to use. One may consider the fact that people may leave the program once they are hit to certain weight loss and question why this is the most critical feature to determine. Enara health offers continuous support even after weight loss targets hit, and for many patients keeping a steady weight is the hardest part of their journey.

Online engagement levels are another great indicator of the user churn. Especially the readings/assignments were not highly used among the churning patients. A critical actionable insight is to make an online interface more accessible and easier to navigate may enable patients to stay in the program longer.

Finally, overall sentiment of the messages sent by the patients tells more about their likelihood of dropping out of the program. Doctors may not be able to catch the overall shift in the user’s sentiment with their interactions over time, however the model puts great importance on this. Even though individual messages may vary based on short term events, general sentiment and analysis creates memory effect on patients and may lead to bad feedback and drop out over time.

On the other hand, demographics information plays a little role on determining the classes. Gender, age, and initial BMI has very little affect on the model accuracy. One surprising factor is the received message sentiment by the patients also has little effect. That is probably because doctors and staff that patients are engaging with are always responding in a positive manner so it does not create a difference between groups.

Local running Bokeh server for searchable patient dashboard

The classification analysis was also used to label risk scores for each patient. The log-probabilities of belonging to each group is used to associate a risk factor with patients. If a model is certain (> 75%) that a person belongs to active group, its labeled as low risk. A region where model does not perform well to characterize (75% — 40%) is set to medium risk and if a person is categorized to a drop-out class (< 25%), he/she is labeled as high risk and the doctor board will be alerted based on that information.

Survival Analysis

For the purpose of estimating the drop out time of patients, survival analysis was run with the initial fitting to Kaplan-Meier estimate. Population dynamics revealed that the drop-out rate slows down after a year. A year is a reasonable time to create sustainable habits and a critical benchmark to realize the change, in a sense a psychological barrier. If Enara can utilize incentives to keep patients engaged up to a year, they can maximize the user enrollment afterwards.

Population statistics are good to understand the differences between the groups, however I was interested to obtain risk scoring and drop-off time for the individuals. That can be achieved with the Cox’s proportional hazard model. An example plot shows the different characteristics of 3 different users evaluated up to 1000 days. User 13 is expected to drop-off faster than others simply based on his/her user engagement and appointment characteristics.

This model is evaluated with 3-fold cross validation and 80% accuracy obtained. It is performing not as good as the classification model and this can be attributed to the fact that appointment data set violated the proportional hazard model and needed to be stratified. This limits the usefulness of the analysis as well as number of people in each class to train the model on.

FINAL REMARKS

The 2 different models created for this project can be utilized to asses the user engagement and alert the healthcare specialists to reach out their patients that are at the risk of leaving the program. To have a better assessment, it’s critical to have data set that represents the user behavior reliably. As more and more data collected for a longer period of time, reliability of the model will increase to assign correct labels to each person.

I also suggested Enara founders that a pre-survey for patient characteristics and expectations, and well organized mid-program questionnaires can shed more light into the patient drop outs since most of the time, the direct evaluation from user’s perspective tells more especially in programs that are heavily personalized like Enara’s health loss program.

Engage : Predicting patient disengagement from weight loss program

Written by Sinan Can