Using Expected Value for Classifier Use in Business Problems
I’ve been reading Data Science for Business, by Provost and Fawcett, a very useful book that explains some of the most important principles and topics in data science. The authors’ language and structure helps a lot in developing an intuitive understanding of key data science concepts like model tuning, model evaluation, and various models themselves like decision trees, linear models, and k nearest neighbors. I highly recommend the book if you’re someone who works with data scientists, if you’re a beginner data scientist, or even if you’re a data science expert who’s looking for a good resource to refresh your fundamentals with.
I found this one chapter particularly interesting because it talks about a framework, or way of thinking, that I haven’t really heard about elsewhere. While specific tactics, such as how different kinds of models work, are definitely important and a large part of what a Data Scientist needs to know and be able to do, I think higher level strategy is also important. Anyways, the framework is highly practical, which fits the authors’ theme for the book: that data science isn’t just about analyzing data, but also about understanding the business problem in an analytical way. I wished there was something tangible and interactive to go along with their explanations in this chapter (and others), so I decided to create a guide of sorts, this blog post plus an interactive Jupyter Notebook you can download and play with. The blog post provides context if you haven’t read the corresponding chapter in the book yet, so the Jupyter Notebook is near the end.
If you have the book already, this blog post corresponds to the latter “half” of Chapter 7, “Decision Analytic Thinking I: What Makes a Good Model?”. This guide and especially the Jupyter Notebook assumes that the reader already has some familiarity with the basic ideas of machine learning, such as supervised learning (specifically classification), data pre-processing, holdout set testing, and model evaluation.
When applying data science to solve business problems: what is the real goal?
Like approaching any sort of problem, you have to uncover what the real goal of a data analytic project is. It can be tempting to get caught up with the surface level question or jump straight into solutions.
For example, questions about customers come up a lot in business: which customers are most likely to churn? Which customers are most receptive to upselling? The idea is that once we can predict which customers are most likely to be upsold, we can call them, try to get them to buy more items like an add-on for the thingamajig they just bought, and generate more revenue for the business. Let’s run with this “upselling” case as an example.
The real business goal for answering “which customers are most receptive to upselling?” is so that we can not only generate more revenue from upselling customers, but also maximize the profit generated from our efforts. Not all customers will be equally likely to be upsold (some are curmudgeons, others might have a real need for the other products we’re selling), those who we do upsell could purchase different amounts of stuff, and the act of upselling costs us time and money (which can also be variable). So how do we even structure a problem like this, and then decide what to do?
Introduction to the expected value framework, and how it helps break down problems
Let’s introduce the expected value framework, and weave it into how we’d structure and break down our business objective for this “upselling” project.
As a quick refresher:
expected value (of a variable) — a predicted value of a variable, calculated as the sum of all possible values, each multiplied by the probability of its occurrence
Basically, what do we anticipate, or expect, the value of some variable to be, given that there is some uncertainty in the chances of different outcomes happening.
Frame the question in terms of expected value
Back to our upselling question. Each customer has his/her own probability of being upsold, and likely amount that they will be upsold for; there’s also a cost to upselling, which we may have to eat if we call a customer who doesn’t want to buy anything else from us. So, thinking in terms of expected value, each customer will have an expected profit, given that we reach out to that customer to try and upsell them. More specifically:
Which means that, assuming we reach out to a customer, the expected value of profit ($latex E(Profit)$) equals the probability of upselling the customer ($latex p_u$) times the profit we’d get from upselling the customer, plus the probability of failing to upsell the customer (1 minus the probability of upselling the customer) times the profit we’d get from failing to upsell the customer.
Breaking out profit in each potential outcome:
Where $latex v_u$ is the value, or revenue generated, from upselling the customer, and $latex c$ is the cost of trying to upsell the customer (we assume the cost is constant across customers for simplicity). Notice in the second half of the equation that if we fail to upsell the customer, the outcome is that we get $0 in revenue and eat the cost ($latex -c$) of trying.
Now, the path to obtaining our original business goal, to maximize total profits, is clear: try to upsell all customers where the expected profit of trying to upsell each one is greater than 0 (assuming we don’t have any budget or constraint on how many customers we can upsell to).
Expected value breaks the problem down for us
Also, thinking in terms of expected value has now broken up the problem nicely for us: to figure out the expected profit of trying to upsell a customer, (1) figure out the probability that upselling will work $latex p_u$, the (2) value of a successful upsell $latex v_u$, and the (3) cost $latex c$ of trying to upsell a customer.
Now, we can go more low level and think about how we might address each piece analytically. We can build a machine learning model, a classifier, on historical customer data of which kinds of customers were successfully upsold and which kinds weren’t, to address (1) and generate a predicted $latex p_u$, or probability that upselling will work, for each customer. For simplicity, we’ll assume that both (2) and (3) are constant are constant across all customers, but technically, you could build another model to predict (2), the value of a successful upsell for a given customer.
More specifically, for (1), our historical customer data is a snapshot of all customers that we’ve previously tried to upsell to, at time t. One column in the data is whether or not (e.g. a 1 or -1, or 1 or 0) we were able to successfully upsell each customer by some future date t+1, say 3 months later; this is the target variable. The other columns, or features, contain data on each customer before time t, such as number of previous purchases, number of times customer has been back to our online store, shipping zip code (which we can estimate income level with), etc.
Now we have a structure, thanks to EV (expected value), for evaluating whether we should try to upsell any individual customer in order to maximize company profits.
Let’s plug in some numbers to see how we might use our structure to make decisions on whether we should try to upsell a customer or not.
Take Customer A. Based off of what we know about other customers that are similar to him, our machine learning model predicts that he has a 91% chance of being upsold, if we call him.
Let’s assume that if we upsell a customer, they will spend $100 to buy an add-on to the thingamajig they already bought. Let’s also assume that on average, it takes a 30 minute phone call at a salesperson’s hourly wage of $30 / hour, to try to upsell someone, so the cost of upselling is $15.
Therefore, the expected profit for trying to upsell Customer A will be:
$latex E(Profit_A) = 0.91 * (\$100 — \$15) + 0.09 * (-\$15) = \$76$
And since the expected profit is positive, it is worth it to try and upsell him, because on average (if we keep trying to upsell people like him), we will generate $76 in profits each time for the company.
Now let’s look at Customer B. Based off of what we know about other customers that are similar to her, our machine learning model predicts that she has a 4% chance of being upsold, if we call her.
So, the expected profit for trying to upsell Customer B will be:
$latex E(Profit_B) = 0.04 * (\$100 — \$15) + 0.96 * (-\$15) = \$-11$
We should not try to upsell customers like Customer B, because on average, we will lose $11 each time.
If we do this expected value calculation for each customer we’re thinking about upselling to, we can arrive at a subset of customers where the expected profit of upselling each one is positive, and thus if we try to upsell all of them, our expected total profit will be maximized.
See this Jupyter Notebook for a full example of training a machine learning model on historical customer data to predict whether or not a customer will be upsold or not, and the associated probabilities of each outcome happening. These probabilities, along with the expected value framework, are then used to show which customers we should try to upsell to maximize our company’s profit.
Note that using the expected value framework to calculate something like expected profit depends entirely on two things: the probabilities of different outcomes (e.g. a customer successfully being upsold or not) and the benefit or cost of each outcome. Both can be estimated with models and comprehensive data, but not always very well, or it may be impossible in the first place. This is where both business and data understanding come into play: a data scientist has to understand what data is available and what it can be used for, and also understand how the business works so that accurate cost/benefit numbers can be gathered. This also means that the results of using expected value are sensitive to changes in either type of variable, probabilities or cost/benefit numbers. Though the expected value framework can be a practical and structured way to break down a business analytic problem, the data scientist may have to use other methods to inform action if he/she doesn’t have enough confidence in the probability or cost/benefit estimates. Like all things in life, there is no one size fits all approach: the EV framework is a tool in a data scientist’s big toolbox.
Thanks for reading, I’m always open to questions, suggestions, or other kinds of feedback!
Originally published at making my own luck.