For important decisions, listen to your AI, but retain responsibility

Published in

GAMMA — Part of BCG X

6 min readMar 4, 2020

As artificial or augmented intelligence (AI) masters more and more amazing applications, the optimism about its future continues to grow. With new packages such as AutoML, there is a growing belief that there is no need to be “picky” about the model to run. Just try them all, tune the hyperparameters, and see what works best, leveraging domain knowledge with limited hand-tuning. This is particularly true in the case of deep learning architectures, which are becoming ubiquitous.

Yet this belief clashes with reality. Keeping the “art” of data science out of data science only works reliably when the model meets four specific criteria:

· you are making many small, low-risk decisions, so that the risk of getting any individual decision wrong is not catastrophic

· you have a very extensive and rich set of data (e.g. millions of data points)

· you have only simple, high-level expectations for the model’s output

· you are modeling within a limited range of previously observed outcomes

At the moment, only a small minority of business situations fulfills those criteria. In most situations, choosing the right algorithm isn’t just about maximizing accuracy; it’s about choosing the right evaluation criteria and making sure they are met to the greatest extent. Delivering the right set of models and algorithms requires an experienced data scientist, coupled with an analytics-savvy business owner. They need to make sure decisions are better than today, without the risk of significant outlier outcomes. They need to be able to combine model outputs and business judgement to be able to act when trends change and contexts shift.

Fortunately this is a solvable problem. Below we discuss how business judgement can be used to inform model and algorithm design, and also allow for the ideal set-up of “humans in the loop.” We’ll use retail promotions as an illustrative example.

Keeping the art of data science in play: a retail example

Sophisticated retailers use a series of models to forecast the performance of a promotion before it is run, so that they can hit their desired targets for traffic, sales, revenue, or margins. These models usually draw on extensive data from familiar, easily understood sources — historical sales, price, seasonality, industry trends, and competitive actions. Now imagine that one major retailer’s management team has religiously followed the script for implementing machine learning. Inspired by the potential opportunities that can arise if they “let the data speak for itself,” they have unleashed the full power of automated predictive modeling on the planning process for their upcoming promotional calendar.

The problem is that many of these models do not meet the criteria that would allow them to function reliably without the “art” part of data science. First and foremost, holiday season promotions are high-stakes events with significant business implications. Getting them wrong can cost millions of dollars. You can’t experiment your way out of this situation. This means that human judgment is needed to interpret and adjust the results, especially the second- and third-order effects. For example, it may be “economically optimal” to discount to win volume, but what about the impact of discounts on future price perceptions? What about the risk of starting a price war? Yet many forecasting models are so black box that it can be difficult for a business user to understand the underlying logic and thus choose when the overrule the model.

Second, while retailers have many data points, they come from a small number of holiday season promotions whose time periods or contexts may be incompatible with each other. As a result, a model may be making a prediction using data from a completely unrelated time period.

Third, trying to stretch a model beyond its limitations can lead to unintended and sometimes embarrassing consequences. The output of the models — while accurate, technically speaking — do not have real-world relevance. These models are very capable of producing extremely accurate forecasts that are either counterintuitive or misleading. Their spurious results tend to fall into two categories: predictions that seem to violate basic rules of classical economics, and predictions that exceed even the team’s wildest expectations.

Let’s start with the first group. Outcomes that seemingly violate classical economics include:

· Price decreases that lead to volume decreases

· A 20% discount that leads to 2x sales, while a 21% discount leads to 1.5x sales volume

· A 20% discount that leads to 2x sales, but a 21% discount leads to 10x sales volume

What causes these outcomes? The first one can occur when the dataset contains business contexts that do not apply to the one being modeled. It may, for example, include markdown or clearance sales which may have seen low prices and low volumes, but do not correspond to the promotions currently being forecast. The latter two cases often indicate differences in merchandising, placement, and advertising. Unless there is a way to control for shelf placement, appearance on an end cap, or exposure in a weekly flyer, the model may draw misleading relationships between price and volume or identify misleading behaviors at key thresholds.

Outcomes that defy even the highest expectations include:

· A mere 5% discount gets a 10X increase in sales on Black Friday

· A year-round discount on seasonal items (e.g. turkey for Thanksgiving) would drive significant sales

These outcomes result primarily from incongruities across time periods. In the first case, discounts on Black Friday have historically been very deep, so the model would need to rely on sales events from other periods to look for responses to a 5% discount. In the second case, the reverse happens: the model projects responses from one period into another where no discounts have previously existed. These outcomes can dramatically undermine the confidence users have in their models. They need to expend extra mental energy on all these “watch-outs”.

So what is the solution? It lies in understanding, appreciating, and correcting the inherent limitations of predictive modeling.

The “art of data science” approach

Broadly speaking, the team must pair any model with a clearly defined and demarcated business problem that it can capably answer. The team must also make a reasonable trade-off between time and robustness. A good, fast recommendation is clearly better than a perfect late one. This means that the team must have a full understanding what the model will be used for — and what it will not be used for — as well as what attributes the model should have.

To make this work, we recommend that the team choose a specific set of models that can avoid the issues above by applying these changes below. Again, we use retail promotions as the illustrative example.

· Make exclusions ex-ante: The team should exclude markdowns and clearances (i.e. non-promotion price changes) from the training data and use a model that is sensitive to these phenomena so that one can immediately spot when they have crept in. It’s very easy to look, for example, at a linear model, and detect when elasticities have the wrong sign.

· Enforce common-sense controls on parameter estimates: For most business phenomena, a small increase in one variable should drive a small increase in another. Erratic effects in business are rare. The model must ensure that the impacts of specific variables are monotonic and smooth.

· Properly define the model’s reach: The team should put constraints on when and where the model is allowed to provide a prediction. It should also communicate to users the degree of confidence that they should have in the model’s predictions.

· Remain conservative when needed: The team needs to communicate the uncertainty around the estimates, and, for major decisions, consider recommending an outcome with a marginally lower average profit impact, but higher certainty.

The art of data science will always play a role as machine learning makes more inroads into key business decisions. Machine learning has the best chances of reducing or removing a team’s cognitive load when the team learns with the model, so that it can inform and instruct it on an ongoing basis.

For important decisions, listen to your AI, but retain responsibility

Written by Michael Chu