Estimating and Visualizing Business Tradeoffs in Uplift Models

Sam Weiss
Sam Weiss
Nov 15 · 7 min read

Uplift modeling is a powerful tool we use at Ibotta to assign treatments to optimize for a particular response variable. However, we often face tradeoffs in optimization. This post will go over how to estimate and visualize tradeoffs with more than one response variable.


Ibotta is a consumer facing app that helps shoppers save both time and money. I’ve previously written about how we use uplift modeling here and here. It’s a subset of machine learning that assigns a treatment to an individual with the goal of optimizing a particular response variable.

However, the treatment may affect multiple responses variables at the same time. It is often not clear what the trade-off should be on these metrics and what one should optimize.

For instance, suppose you have a marketing campaign where you can give an individual a treatment with the goal of increasing the individual’s activity, but that treatment also comes with a cost. What are the tradeoffs you are willing to accept between increasing activity and increasing costs?

This post will explore how to develop a system of models to visualize
and estimate tradeoff in uplift models with competing response variables of
interest. First it will go over the general uplift model setup and then it will generalize it to several responses. Finally a simulated example is made.

Uplift Problem setup

The general setup for a lift model is:

y: Response variable of interest you’d like to maximize. Could be something like user retention.

X: User level covariates. Includes things like previous in-app behavior per user.

T: The randomly assigned treatment. In this case it is whether or not to give a bonus to a particular user and is binary. Assume that the distribution and assignment of a treatment is uniform and random.

With the data (y, X, T) the goal is to build a treatment assignment policy 𝜋(x) that will use X to assign T that maximizes the value of y. Or in this case we want to use user history to assign whether to give a bonus to a user in order to maximize retention.

A frequent practice is to model the expected outcome y _i under different Treatments and choose the T that maximized y_i for each user.

Eqn 1

There are several approaches to do this and can be done with a run of the mill ML algorithm that incorporates interactions like a random forest. To get the counterfactual for each treatment you just need to predict with different values of t and for each user select the treatment which has the highest expected value of y.

Extending Uplift to Optimize Multiple Responses

The treatment effect often affects several response variables at once. For instance we may have two response variables that we are interested in minimizing or maximizing: y_1 and y_2.

A simple approach would be to create a new variable y_new = w_1*y_1 + w_2*y_2 where w_1, w_2 are weights that correspond to how important the objectives are. These weights can be negative as well.

Businesses are generally interested in some function of revenue and costs. Specifically they may be interested in maximizing profit in which case you can create a new variable: profit = revenue - costs . In this case W_revenue is 1 and W_cost is -1.

Often business may care more about growth. Instead we’d want to bias the weight W_revenue higher than 1 in this case.

However, finding the weights is not obvious. It’s not clear what the trade-off should be (i.e., how much more in costs is acceptable for increases in individual activity or revenue).

This post advocates for estimating the expected responses under different weights and visualizing these as a set of curves. Observing these tradeoffs can be used as a tool to decide what tradeoffs are appropriate for businesses in general and for Ibotta in particular.

To do so I first extend the policy function to include multiple responses and weights:

Eqn 2

Where W is a set of weights. Varying these weights will result in optimizing
toward a particular variable. To get a tradeoff for the example above I might
set the cost weight to -1 and vary the revenue weight between 0 and a relatively large number.

For each set of weights I can calculate the model’s optimum decision
𝜋(x_i, W) for a user and then estimate the counterfactuals under that set of weights for multiple responses using the ERUPT metric. This is a metric I wrote about before here and is a way to obtain unbiased estimates of response variables by using the proposed treatments. Repeatedly doing this for several weights will give a tradeoff curve.

Simulated Example

Here I’ll generate 20,000 observations of two response variables, two explanatory variables, a binary treatment variable and an error term of the form:

Let’s call y_1 user activity which we want to increase and y_2 a cost associated which we want to minimize. Without using lift models we have two possible decisions: to give everyone the treatment or to give no one the treatment. The expected outcomes result in an increase of 1.5 units of costs and 1.5 units of activity or no changes in the treatment.

Using lift models allows the user to have more options between these two extremes and to target people who are more susceptible to treatment for activity but less so for costs. Similarly there are individuals in this population that if assigned the treatment would be expected to not increase activity but have substantially increased costs. Unless we only care about activity we can presumably do better by not giving some low activity — high cost individuals the treatments.

To estimate these marginal tradeoffs I built a model for each response using a random forest with half of the data for 50–50 train-test split. This model should take into account the necessary interactions lift modeling requires to obtain the counterfactuals for individual i, treatment t, and response j by predicting:

Eqn 3

With a given set of weights w_1 and w_2 and predicted values from Eqn 3 into expected output for Eqn 2.

The weights are assigned q,-(1-q) where q is between 0 and 1 for 100 points. As we go from 0 to 1 the ratio -q/(1-q) implies that the model will go from focusing on maximizing activity to minimizing costs completely. Initially no one will be assigned the treatment. As q approaches 1 we expect everyone will be assigned the treatment. Below is the distribution of people receiving treatment for test data. It corresponds with the intuition that more people will receive treatment as we increase q.

With this assignment policy we calculate the expected responses in two ways using test data: 1) using the means of expected values the model outputs and 2) the ERUPT metric. We compare these two curves with the known true response values below.

The above chart also shows that the ERUPT estimate of tradeoffs is much closer to the true estimates than the expected output one. This is because the model estimates for the treatment effects is biased downwards. Different models might yield more accurate results in a more unbiased fashion but this highlights the issues with using only model predictions.

One can view this graph as a Production Possibility Frontier available to us through this program using uplift modeling. As we increase our weight to higher increased activity relative to costs we get both more activity and costs. In addition we see diminishing marginal returns to the program. Fun fact: the slope of this graph corresponds to the ratio q/(1-q).

Conclusions and Thoughts

Ibotta currently use this methodology and update it on a monthly basis to see where we’d like to be in the next month. The benefit of this approach is that it allows us to adjust marketing expenditures to Ibotta’s business needs as necessary; when we need to get more user activity we have flexibility to spend more to meet our objectives.

From an engineering perspective this formulation is very flexible. We can use the same model and all we need to do on backend is change the weights. It has been a powerful tool that gives upper management the information they need to control ‘dials’ of the business.

gist code

IbottaML is hiring Machine Learning Engineers, so if you’re interested in working on challenging machine learning problems like the one described in this article then give us a shout. Ibotta’s career page here.

Building Ibotta

Thoughts and experiences from Ibotta's engineering, analytics and product teams

Thanks to matt johnson and Evan Harris

Sam Weiss

Written by

Sam Weiss

Building Ibotta

Thoughts and experiences from Ibotta's engineering, analytics and product teams

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade