Don’t Miss a Step: Predicting Late Consumer Behavior

For my Insight project, I partnered with a company seeking to broadly understand user churn. Given the nature of the company and the sensitive and proprietary data, I will use general and comparable examples in this post. In order to address their general question, the focusing question I ask is: which consumers are likely to miss a behavior, and when?

Why focus on who misses a behavior? Behavior history and behavior patterns are major factors affecting key outcomes like average lifetime consumer value, individual health outcomes, and individual earning potential, among others. So while my specific use case will be more generic, there are many domains in which this problem and the methodology I employ are applicable. Specific use cases include predicting medication adherence, class attendance (time until a student drops out of school), or renewing a monthly subscription (time until a consumer cancels her subscription).

A brief look at the data the client provided shows that the difference in average customer value between users with zero and one missed behaviors. For this example, let us assume that we are examining daily trips to Starbucks for a morning latte. The figure below shows that the average customer value between consumers who never miss a morning coffee and those who miss 1 morning latte is about 100 points. The usual “correlation is not causation” caveat applies, as an underlying factor could be affecting behavior history and average consumer value (for example, maybe those who miss their daily latte are of a different “type” — they frequently run late and occasionally miss their morning Starbucks — than those who always make it a point of getting their morning caffeine fix).

My project focused on predicting which consumers will miss a recurring behavior and when they miss that behavior. Given user and account features as input, my algorithm functions as an “early warning system” for the client, allowing them to answer:

  • Who is likely to benefit most from a recommendation to maintain on-time behavior?
  • When should the company make a recommendation?

And given these answers, the company can better decide:

  • What recommendation to make.

The deliverables for my project were: (i) an algorithm examining the impact of missed behavior on consumer outcomes; and (ii) reproducible modeling code to implement the final machine learning models.

Data Wrangling and Cleaning

The data the client provided included information on consumers (approximately 250,000 users) and their records (approximately 1.5 million records). While I cannot discuss the specific features in detail, the data includes various features on the users and their records. The key outcome is their behavior history (using the running example of Starbucks, this would be their daily morning trips to get a cup of joe).

When considering their latte history, I made a simplifying assumption: I calculated only the time until the first missed Starbucks trip. I did not consider multiple missed behaviors or time between missed behaviors. A simplified user’s behavior history would look something like this:

where ‘1’s indicate on-time behavior and ‘X’s indicate late or missed behavior. In this example, this user’s survival time (“time until missed Starbucks trip”) would be coded as 7, and he would receive a ‘1’ for a status of “failure” (missing a trip). Another user who made all on-time visits for these 10 periods would receive a survival time of 10, but a status of ‘0’, indicating that she is right-censored (did not miss any trips to the coffee shop).

Given the short time-frame for this project (3 weeks), I made some decisions to make the analysis more efficient:

  1. I dropped observations with missing values instead of trying various algorithms to impute them.
  2. I filtered out weekend coffee trips; I am only interested in workday/weekday trips to Starbucks (this is most analogous to my actual case).
  3. I created a binary indicator to distinguish between two main types of loyalty accounts.
  4. I created another binary measure to indicate whether the consumer had an individual ‘loyalty’ account or a joint or family ‘loyalty’ account.

Random Survival Forests

Given my focus on who misses a behavior and when, my outcome of interest is the time until a missed behavior. Therefore, the model best suited for this data is survival analysis. However, many survival models impose restrictive functional forms or unrealistic assumptions on the data (for example, the proportional hazards assumption). And with my goal of improving prediction performance, another model might be more predictive, but would most likely lose the survival time present in the data or the interpretability of a survival analysis.

To get the best of both worlds, I employ a Random Survival Forest model. Random Survival Forest (RSF) is an extension of the original random forest (RF) model, allowing for non-parametric analysis of time to event data, including data that are right-censored. The tree-based nature of RSF also allows for automatic and easy estimation of interactions between variables, which are difficult to identify under traditional survival models. Moreover, visually exploring RSF models (improving interpretability) is possible with the ggRandomForests package in R.

Construction of trees in RSF models takes both survival time and censoring status into account, unlike conventional RF models. Nodes in the tree are split incorporating this information; a split node in a tree uses randomly selected features that maximize the difference in survival between ‘daughter’ nodes. After the tree is grown, a cumulative hazard function (CHF) is calculated for each tree, which are then averaged to create an ensemble CHF.

Model Performance

From a predictive standpoint, the model performs well, with an Out-of-Bag (OOB) prediction error of 29.7 percent. Another measure of accuracy/model quality is Harrell’s concordance index (C-index). Similar to the area under the ROC in classification problems, the C-index estimates how well the model classifies on-time versus missed visits. With a C-index of 69.0 percent (and C-index error rate of 31.0 percent), the results are similar to the OOB error.

Variable Importance

Variable Importance (VIMP) is a measure of feature misspecification in a RF model; larger values indicate that those features contribute more to the predictive accuracy of the model (misspecification of these features would reduce predictive accuracy of the forest). The following figure shows that Feature 1 (let’s say this is the income of customers over the age of 18) and Feature 2 (let’s assume this is the number of ‘loyalty activities’ that a consumer engages in) contribute most to the predictive accuracy of the model. While the number of loyalty activities (for example, using a promo code) is an important feature, the two main types of loyalty activity (Feature 9) and whether the customer has an individual or family loyalty account (Feature 8) have very little predictive power.

Interpretability

The primary benefit of a RSF model over a conventional RF model is the added interpretability offered. Given that I am interested in identifying consumers (who) that would be best served by a recommendation to improve on-time or recurrent behavior history and when those recommendations should be made, interpretability of the model is key. Features 1 and 2 are the two most important, but how can that drive potential actions? What range of income? Users with how many loyalty activities?

In order to get an initial understanding of how income plays a role in a consumer’s history of Starbucks visits, I recoded the continuous income measure into quartiles and re-ran the RSF model.

The figure shows that, among the lowest quartile of income, the median survival probability (the probability a consumer “survives” and continues on-time behavior) drops to 0.5 at around 40 periods, with a probability of just under 0.6 at 24 time periods into the start of observation. The median consumer in the second quartile of income has a coin flip’s chance of missing a visit at 80 periods. Both first and second quartile median survival curves are below probability of 0.75 at 24 periods, so this might be an optimal time to intervene and recommend actions that could increase the likelihood of maintaining regular visits.

I examine the 24th time period in more depth, identifying how a consumer’s income and their number of loyalty activities interact. The figure below is a conditional dependence plot, showing the survival probabilities of individuals by income and the number of loyalty activities at 24 periods after the start of observation (I filtered out the users with more than 15 loyalty activities to make the graph more readable; the results are robust to this change).

In the top left panel, the probability of not missing a daily latte (surviving) increases with the number of loyalty activities. In both the first and second income quartiles (top left and top right panels), individuals with 0–1 loyalty activities are most at risk of missing a trip (consumers who missed are displayed in orange), and the trend across all income groups is that survival probability increases as a consumer’s number of loyalty activities increases.

Recommendations

With those results in hand, what recommendations could this company make? One way to read this graph is that encouraging users to engage in more loyalty activities will increase their average lifetime value. It is unlikely that the number of Starbucks loyalty activities in which you engage causally improves your likelihood of making recurrent weekday visits. It is more likely that users with more loyalty activities are of a different “type”. They could take on more activities because they have good habits (or they are obsessive about their coffee), which would explain their continuing behavior. Users with 0–1 loyalty activities are an interesting sub-group however, as they are still making daily trips but are not engaged in something like a rewards system. They could actively be avoiding the reward or loyalty system. Both would require further investigation to understand these consumers.

With that said, an intervention at 20–24 periods could still be effective for these users. Recommendations such as setting up reminders (emails or a push notification) to make a visit could increase the probability of regular visits.

In general, when to make a recommendation is key. 20–24 months into the start of a program or observation period is an opportune time to intervene with some consumers. Other recommendations can be determined by the particular subgroup and survival probabilities estimated by the RSF model.

Main Takeaways

  1. Make specific recommendations to consumers in the first and second income quartiles
  2. Further investigate users engaging in 0–1 loyalty activities
  3. Make recommendations at 20–24 months (or sooner)
  4. Determine if user qualifies for special promotions or discounted pricing

About Me

I am currently a fellow at Insight Data Science in Boston. Insight is a fellowship for research scientists transitioning into data science. Previously, I was an Assistant Professor of Public Administration at West Virginia University, and I received my Ph.D. in Political Science from Princeton University. I conducted research on energy and environmental policy (with a focus on fracking) and public opinion. I also taught applied statistics to Master of Public Administration students, with an emphasis on arriving at actionable insights that could be conveyed to decision makers. You can find me on Twitter, GitHub, and LinkedIn.