How to Score Records You Used as Examples: Predicting Upsell and LTV in Einstein Prediction Builder

by Anastasiya Zdzitavetskaya, Director of Product Management, Salesforce

When building a prediction with Einstein Prediction Builder, sometimes the records you want to use as examples (your historical data) are the same records you want to create predictions for. Think of this as the example set/prediction set overlap problem (or training set/scoring set overlap). Typical predictions that fall into this category include customer attrition, lifetime value, high-value customers, and upsell.

Here are two ways to address the overlap problem: (1) the time horizon approach, and (2) the randomization (two-segment) approach to create your predictions. In the Summer ’20 release, you’ll be able to use filters to explicitly define your prediction set. This feature makes the randomization approach unnecessary because you can choose to use the same records in your example set and prediction set. Read more about this in the release notes!

Read on to understand when to choose the time horizon approach versus the randomization or prediction set filters approach. Use the Prediction Definition Framework described in this blog to think through the upsell and high-potential customer use cases, and see which approach works best in each case and why.

Upsell use case with the time horizon approach

Every company wants to know whether its customers are likely to buy another product or service. This type of prediction is an upsell or cross-sell problem. The goal isn’t to recommend the right product (which is a different class of machine learning problem), but rather to predict how likely a customer is to buy additional products.

Defining your example set can be tricky. Positive examples are easy to identify: any customers who bought two or more products. But what about negative examples? For each customer who bought only one product so far, you have to determine whether this customer has not bought an additional product yet (and thus belongs to the prediction set) or whether they will never buy any more products (and becomes a negative example). In this particular example, set/prediction set overlap problem, you need to differentiate between records to score and negative examples.

To determine the right time horizon to use, create a report, and identify how long it normally takes for customers to buy a second product after the initial purchase. In this case, it’s six months.

The Avocado Framework for this problem might look like this:

  1. Dataset: All standard customer accounts.
  2. Positive Examples: Accounts with two or more products.
  3. Negative Examples: Accounts with only one product and more than six months since the purchase date. For the sake of this prediction, assume they will never buy again. You’ve given up on them because historically, other customers made their second purchase much sooner than six months since the first purchase.
  4. Records to Predict/Score (Prediction set): Accounts with only one product and less than six months since the first purchase date. Since they’re still in the green zone (they can potentially make their next purchase soon), create a prediction for these customers.

Here’s how to set it up with filters:

To set up the upsell problem in Prediction Builder, select the No Field option and use the Yes and No example filters to define the logic:

Negative examples:

Tip: One of the conditions involves dates with the comparison: Purchase Date Less than Comparison Now () Minus 180 days. In this context, Less than means Earlier than. Thus, Purchase Date has to be anytime before 6 months ago (for example, 7 months ago or 1 year ago).

In summary, use the time horizon approach if you can confidently define the time horizon to separate between negative examples (“write-offs”) and records to score (“undecided” but with good potential).

Salesforce recommends using the time horizon approach for attrition use cases. Instead of predicting whether the customer will ever attrit, you can predict if the customer is likely to attrit within the first year of becoming a subscriber or buying your product, for example.

What if there is no well-defined period? Then you can try the second approach: create two or more predictions on randomized segments of your data. Or, in the Summer ’20 release, use filters to define your prediction set.

High-value customers use case with randomization or prediction filters.

Let’s say we want to predict high-potential customers: those who are likely to spend more than $X during their lifetime. The actual value of $X depends on your business. You can create a report and see what represents the top-spending 10% of your customers. This is your high customer threshold. In our example, it is $500.

The challenge with this prediction is that it’s very difficult to differentiate between negative examples and records to score. If they haven’t spent $500 yet, does it mean they never will (thus becoming your negative examples)? Or that they haven’t yet reached this $500 threshold, but eventually they will? This is another instance of the negative examples and prediction set overlap problem.

In general, it’s better to use a time horizon approach and identify which customers are likely to spend more than $500 within some specific timeframe, as described above. For example, you can create a report and identify how long it takes for the majority of your customers to reach this $500 threshold. Let’s assume it is 6 months. Then our negative example set will include customers who have not reached this $500 threshold within 6 months. This will probably provide better predictions than the randomized approach described below because your negative examples are much more aligned with the negative behavior — customers not spending $500 within their lifetime.

Alternatively, you can create a numeric prediction, predicting Future Lifetime Value (LTV) for each customer. In this case, you want to use all records as examples (learning from all existing customers’ spending history) and predict for all records (estimate Future LTV for all customers). This is another example set, and prediction set overlap problem.

Two-segment randomized approach

Until there is support, for example, set/prediction set to overlap, use the following approach as a workaround.

Use two randomly created segments to differentiate between the example (training) set and the prediction (scoring) set. Build two predictions: one trained on Segment 1 and predicting for Segment 2, and the other one trained on Segment 2 and predicting for Segment 1.

Prediction Definition Framework — high-potential customers

For this use case, the Avocado Framework looks like this:

  1. Dataset: All customer accounts.
  2. Positive Examples: Customers who spent more than $500.
  3. Negative Examples: Customers who spent less than $500 and are in Segment 1.
  4. Records to Predict/Score: Customers who spent less than $500 and are in Segment 2.

Use two randomly created segments to differentiate between example and prediction sets. Using the formula below, we randomly assign customers to Segment 1 or Segment 2 (basically, we are creating an odd or even segment in our data based on some number field):

IF(MOD( INDEX_c , 2) == 0, “Segment1”, “Segment2”)

Now for the final setup using filters:

Then create a second prediction, with customers in Segment 2 as the negative examples, and predicting for customers in Segment 1.

Using Prediction set filters

When the new prediction set feature is available, you can use prediction set filters to explicitly define records to score. This allows the example set and prediction set to overlap.

To predict LTV, select “Score only records that are in the example set,” which creates predictions for all records that you used as examples (the example set and the prediction set fully overlap).

For the customer attrition and high potential customer use cases (negative examples and prediction set overlap), select “Score specific records” and use filter logic to exclude positive examples. In these scenarios, positive examples are easily identifiable (customer attritted or reached a high-value threshold) and don’t need to be scored. Basically, you score everyone, except for the customers who have already attrited or reached a high-value customer threshold. And the same records are used as negative examples and records to score.

How to use your predictions to improve business outcomes

Incorporate these predictions in your business process to streamline resource management and automation:

  • Double your efforts on customers who are more likely to buy another product or achieve high LTV. To identify these customers, create a new list view and sort by scores (either the likelihood to buy or achieve high LTV), so sales reps can prioritize customers with the highest potential. You can also review customers in the middle range (those are your borderline opportunities) and identify steps to get them back on track. Perhaps, you can add them to the appropriate marketing campaign.
  • Automate creation and allocation of tasks to focus the Sales and Services team on the high potential customers. You can use Process Builder to automate task creation for prioritized customers.
  • Show the top predictors for each customer, so your business users can see reasons behind these predictions. Just add the Einstein Predictions lightning component to the Account layout page and select the name of your prediction.
  • Use Einstein Next Best Action to provide the right recommendations to the sales reps for each customer based on the predictions and business rules. You can get an idea of what to recommend for each customer based on the top predictive factors in the scorecard.

For more information about the prediction lifecycle, please review this blog.

Summary

If you encounter a situation where you want to score records that you used as examples, first evaluate if you can use the time horizon approach to separate between negative examples (“write-offs”) and records to score (“undecided,” but with good potential). If this is not possible, use the randomization approach or prediction set filters approach (available in the Summer ’20 release).

For more on the latest when planning your prediction, check out the official Salesforce documentation.

Related Blog Posts

  1. How to turn your Idea into a Prediction
  2. How to use Einstein Prediction Builder for Opportunity scoring.
  3. How to Use Einstein Prediction Builder to Predict Opportunity Amounts
  4. Which fields should I include or exclude from my model?
  5. Understanding your Scorecard Metrics
  6. Understanding the Quality of Your Prediction
  7. A Model That’s Too Good to be True
  8. How do I know if my prediction is working?
  9. Custom Logic on Predictions from Einstein Prediction Builder

--

--