GAMMA — Part of BCG X - Medium

Zero-Based Demand: An Innovative Model Framework for Forecasting Travel Demand during COVID… and…

Serena Zou — Mon, 19 Sep 2022 14:45:33 GMT

Zero-Based Demand: An Innovative Model Framework for Forecasting Travel Demand during COVID… and Beyond

Written by: Serena Zou, Alex Fernandez, Aaron Arnoldsen, Mike Beyer, Andrew Simon and Arun Ravindran

COVID-19 dealt the airline industry a stunning blow it’s still recovering from. In 2020, airlines experienced more than 75% drop in international demand and a nearly 50% drop domestically¹, putting it in the record books as the worst year ever in the history of air travel. Today, a full two years later, emerging COVID variants are still forcing workers to take sick leaves, as a shifting patchwork of quarantine policies makes travel from one country to another a veritable obstacle course of changing rules and regulations. And as if this weren’t enough, airlines and other travel companies have discovered that the years of historical data and relatively predictable demand patterns are now much less useful for predicting future demand. No longer able to rely on extrapolations of past customer behavior or on “gut feel,” many airline planners are searching for ways to predict travel demand amid what can seem like an endlessly volatile market.

As BCG has demonstrated with Lighthouse by BCG, it is possible to forecast the future without relying solely on the past. Developed specifically for the travel demand market, BCG’s zero-based demand (ZBD) forecasting model essentially sits atop Lighthouse by BCG, utilizing high-frequency data and a dynamically weighted unobserved component model (“DW-UCM”) engine to provide maximum flexibility to the project of understanding future demand. ZBD consists of three components: An initial module configured to predict if a travel segment is open or closed (e.g., due to government restrictions), a second module that predicts near-term recovery, and a component that forecasts long-term recovery using a probabilistic model.

These elements of the DW-UCM framework enable the forecasting engine to compute both near-term movements in demand and long-term scenario planning. Based on numerous real-world client engagements, ZBD has shown a superior ability to forecast airline demand amid market volatility, and when historical data has lost its predictive power. Looking ahead, companies will be able to leverage this multi-horizon, high-granularity, modular dynamic-modeling framework to respond more quickly when the next great shock comes.

ZBD Use Cases: The Airlines and Beyond

Demand forecasting is a fundamental building block for nearly every business organization. Whether used for capacity planning, supplier selection in supply chain optimization, or campaign design in marketing, companies rely on these predictions for critical decision-making. Getting those decisions right can make or break a company.

Never has demand forecasting been more critical than since the outbreak of COVID-19. But this need comes not just during pandemics. In recent years, markets have been rocked by everything from hurricanes, floods, and droughts to earthquakes and volcanic eruptions (such as the one in 2010 that grounded the European airline industry). So too, other non-natural shocks such as 9/11 or the Great Recession of 2007–08 have suddenly upended standard ways of predicting demand.

The ability of ZBD to accurately forecast demand amid market turmoil can help the airline industry continue its post-pandemic recovery. But the benefits of such forecasting can go well beyond the airlines themselves and extend to tangential sectors, which include (but are not limited to):

Hotel Revenue and Labor Management: Demand forecasting plays a key role in the ways that hotels manage both their revenue and their staffing. By leveraging outputs from the ZBD forecast model and airline-travel patterns, hotel owners can improve their existing forecasting algorithms to create a more accurate and time-sensitive foundation for overall revenue- and staff-management systems.
Network Planning: For industry sectors with strong competition, such as the airline industry, slight improvements in network planning can lead to significant cost reductions and/or revenue increases. Demand forecasting can help airlines to optimize their fleets, increasing capacity on routes with greater demand, while decreasing it those with lesser demand.
Air Cargo: This sector has also been hit heavily by COVID-19. Forecasting demand can improve air-cargo capacity planning by estimating the belly space that will be available in commercial flights during the upcoming months.
Tourism Marketing: Government tourism departments can look to ZBD predictions to improve their revenue management.
Car Rental: Due to its high correlation with the airline industry market, the car rental industry can benefit from more accurate forecasting of passenger demand.
Oil Price Prediction: With the airline industry being one of the largest consumers of oil, changes in demand among airline and other travel industries can be used to forecast changes in oil prices.

The Perennial Benefits of Granularity and Understandability

The benefits of using ZBD come not only in periods of uncertainty. Regardless of market conditions, ZBD’s ability to predict demand at high granularity and with high frequency — and to do so quickly and without the need for a team of highly skilled (and highly paid) experts — can always help companies optimize resource allocation.

BCG believes that the combination of “Human plus AI” is the best way to bring the benefits of AI to the world. Rather than creating black box technologies that are impenetrable to business users, we use tools such as SHAP analysis and BCG’s FACET to help clients understand the drivers of model outputs. By incorporating ZBD into our proprietary visualization tool DAIS, we provide business users with a global view of how market demand is evolving. This transparency enables users to add their human insight and experience to further improve the value of the model outputs.

The Prime Ingredient: Forward-looking, High-Frequency Data

ZBD is an airline-specific forecasting model that sits atop the highly acclaimed Lighthouse by BCG, introduced in 2020. Traditionally, demand-forecasting models have relied on historical data to capture consistent, reoccurring patterns such as trend and seasonality. Since this approach to forecasting essentially became obsolete with the appearance of COVID-19, BCG turned to close-to-real-time (high-frequency) data sources. These sources include consumer-mobility trends, economic activity tracking, web traffic and search trends, and government and publicly available spending and unemployment figures. These data sources are at the core of Lighthouse by BCG, which also utilizes both historical and forward-looking data generated by its own internal models. These models make it possible to track everything from disease spread to trends in unemployment, consumer mobility, and consumer spending.

The advantage of using Lighthouse by BCG data is three-fold:

Many of these data sources are travel-demand drivers. Some, such as total number of cases and deaths obtained from the epidemiological forecasts, played a more important role at the height of the pandemic. Others, such as consumer mobility trends and the Consumer Activity Index, can have more impact during normal times. Lighthouse by BCG’s forward-looking orientation can capture market signals early and thus, provide more reliable forecasts — even with uncertainties.
Lighthouse by BCG allows us to use 2020 and 2021 data in training, instead of excluding them. This, in turn, enables ZBD to maintain its relevancy during both normal and abnormal periods to provide short-term, medium-term, and long-term forecasts.
The broad geographic data coverage on which Lighthouse by BCG is based helps the AI in ZBD learn from different countries worldwide. The resulting global “collaborative learning” environment helps the model understand the travel patterns of countries that are recovering faster than the others.

The Recipe: A Flexible Modular Modeling Approach

To handle rapid changes due to events such as secondary infection waves, local outbreaks, and COVID lockdowns, or to any other natural or economic influences that can affect the airline industry, ZBD adopts a dynamically weighted, unobserved component model (DW-UCM) structure that addresses three key questions:

When will domestic and international air travel be possible?
How will domestic and international air travel demand recover?
What will be the long-term, new-normal demand levels for domestic and international air travel?

For each of the questions, we built several ML models, each of which features a variety of engineering techniques.

The first component in the DW-UCM structure, the restriction model, tracks both virus-evolution forecasts and changing government restrictions. It uses this information to forecast when a specific country pair will open their borders to each other and resume mutual international flights. The virus-evolution forecast is one of the multiple datasets within Lighthouse by BCG. The government-restriction data is extracted from the Oxford Tracker dataset, which combines different containment and closure policies such as restrictions on internal movement or stay-at-home requirements. We built the restriction model using a binary classification structure, the output of which includes the probability and given data of the opening of a country pair. This framework also allows users to construct various scenarios using different probability thresholds.

To address the issue of demand recovery, we used an XGBoost regression model to create short-term demand forecasts, and a Random Forest model for medium-term forecasts. The short-term model leverages airline booking, searching data and various external Lighthouse by BCG data to capture recent changes in the market and provide strong predictive power amid volatile demand.

The medium-term model uses Lighthouse by BCG’s CAI forecast suite to capture potential downturns linked to pandemic progression and consumer activity. The CAI forecasts are generated from a nonstationary Markov chain model that conforms to index movements over time. In this way, the near-term demand model uses a UCM in which unobserved components such as seasonality are dynamically weighted to improve the overall accuracy of the forecasts.

To prepare the long-term forecasts, the model looks to long-term CAI forecasts. Studying the correlation between the CAI values and airline demand (together with more external signals from Lighthouse by BCG such as Point-of-Interest data), we can use ZBD to predict how the market will evolve directionally.

In keeping with BCG’s Human + AI orientation, the model also allows users to put in their estimates for optimistic, normal, and pessimistic states. As a result, ZBD can accommodate the human uncertainties of various business situations as it conducts scenario planning.

Using a flexible model structure such as this provides a few obvious advantages:

A dynamic demand-forecast model can incrementally rebalance each component’s weight upon the addition of new information, thus improving overall accuracy.
For volatile travel demand during uncertain times such as during a pandemic, the model’s pattern can vary significantly for differing forecasting horizons — rather than trying to use a one-size-fits-all model. Using numerous features and algorithms enables each model to perform at its best for each given period of time. By learning across multiple model components and times, the demand-forecasting engine can balance weights along different time granularities and capture changes in market behavior.
ZBD’s modular model structure makes it easier for end-users to track performance and tune models. This is particularly important during uncertain times, when conditions can change quickly and more detailed attention is required to assure constant, reliable forecasts. Since each module is relatively independent, users can track down problems more easily: Even if one model breaks, other models won’t be directly impacted.
When the world enters a period of post-pandemic normality, end-users can easily update date-input features of selected models. Lighthouse by BCG data sources such as the epidemiological forecasts, for example, could be replaced by forward-looking booking data and CAI forecasts.

A Brief Caveat or Two

It should be noted that we have made some assumptions when constructing the ZBD model. Furthermore, we note that there is some degree of bias in the data, such as the data the model uses to predict border-restriction openings. We have observed that when the R0 value was declining, governments in general tended to open borders. But there are exceptions. In Brazil and the U.S., the airline market has been kept alive despite high R0. In countries such as Australia where the R0 has remained low, borders remained closed.

Initially, the opening of a border was a one-time event: once the border opened, it would remain open for the foreseeable future. But as new COVID variants continue to appear, borders that were once open may close again, only to reopen at some future time. Because of ZBD’s modular structure, these inconsistencies can be addressed without affecting short-term recoveries. For the medium-term forecasts, using a stochastic model enabled us to predict an increasing probability that new vaccines would be developed to treat emerging variants. This has proved to be the case.

For data scientists interested in using this zero-based demand-forecasting framework, we should note that, unlike traditional ML models, ZBD does not always perform better with more data. It can sometimes be the case that more data points — those that do not represent current status — may simply introduce extra noise. Some degree of customization in the training data may be required, together with giving more weight to recent observations.

ZBD can provide more reliable forecasts than traditional ML models during uncertain times by capturing multiple waves and providing more accurate directional forecasts. But ZBD will not always provide 100% accuracy and will require careful monitoring and, when needed, equally careful adjustments.

An added benefit of BCG’s “Human + AI” orientation is that ZBD was developed to be “explainable AI.” As such, business users are more able to conduct monitoring and make adjustments without input from domain experts or data scientists.

The Path Forward

Travel demand is typically driven by business needs, vacation plans, and activities such as visiting family and friends. Forecasting the trends and seasonality of these demands could once be based on historical data. But, alas, unforeseen events (such as global pandemics) can create volatile markets in which it is fruitless to try to plan ahead by looking back. ZBD addresses this reality by ingesting high-frequency, forward-looking data, and a flexible, modular model structure to create outputs that help companies act faster than their competitors the moment a new disruption unfolds. But even during periods of relative normalcy, ZBD’s dynamically weighted modular framework allows companies to consider these new data sources to gain a much more accurate picture of the market. A company’s ability to correctly anticipate demand is, perhaps more than any other factor, fundamental to its success.

Reference:

1. https://www.iata.org/en/pressroom/pr/2021-02-03-02/

Zero-Based Demand: An Innovative Model Framework for Forecasting Travel Demand during COVID… and… was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to Quickly Optimize a Non-Linear Optimization Problem by Exploiting Local Properties

Herrmann Charles — Mon, 12 Sep 2022 14:34:27 GMT

Algorithmically Set Prices and Adjust Hundreds of Thousands of Individual SKUs to Meet Pricing Goals

In 2021, a large specialty retailer sought out BCG to help it improve its pricing methodology. With thousands of stores across the U.S. and hundreds of thousands of individual Stock Keeping Units (SKUs) — and operating in an increasingly volatile market — the company needed a more agile, scalable way to determine optimal pricing than its current, largely manual system. In response, and taking global constraints into account, we developed an algorithm to find the optimal price for each SKU. Pricing can have a dramatic impact on profit and customer perception, as is evidenced by the wealth of literature on pricing strategies.

During the first phase of our engagement with the client, our team researched the history of the company’s pricing. Using AI and machine learning, we isolated the impact on demand of pricing increases and decreases, and computed price elasticities based on such factors as seasonality and product type. With this information in hand, we then moved into a second phase, during which we built a pricing tool that would give the company a more data-driven way to set prices. As we will explain below, the pricing tool uses machine learning — and human intuition — to calculate how much the company should increase, decrease, or leave untouched individual SKU prices to reach any number of strategic pricing goals.

For the AI system we had in mind to work we had to be aware of its potential pitfalls — and design the system in such a way that it took into consideration what humans, such as pricing analysts, might know. For example, a pricing analyst will manage the overall price perception at a category level because he or she is aware of the long-term effect of perception in retail. But with a few years of data, and since the company has carefully managed its perception, we were not hampered by the lack of specific historical data we would otherwise need to train the model on price perception. We also considered the fact that the pricing analyst also knows the importance of avoiding price wars — and how price gouging can anger customers and potentially damage the brand.

Solving the Problem for all SKUs

Our approach to building the right solution, therefore, was to leverage price elasticities at a tactical level to find the best price for each SKU, while maximizing overall revenue or profit. We did this while still meeting a global strategic constraint represented by a pricing goal the client management team had set at a category level. We took into consideration that each SKU has a minimum price (often manufacturer imposed) and maximum price (also called insult price).

For the pricing tool to be the most useful to management, it had to be able to very quickly compute optimal SKU prices and their impact on overall profitability. In this way, management could use the tool to construct various scenarios: If the company decided to raise average prices by 5%, these would be the best SKU prices; if the goal is to raise average prices by 10%, these prices would be best.

To summarize:

Choosing the Best Approach

As we initially studied the client’s problem, we considered using publicly available solvers. The problem, however, was that the function was nonlinear — which immediately excluded a mixed-integer programming solver. Off-the-shelf gradient descent is not adapted to our constraints. And finally public nonlinear constrained optimization solvers have no guarantee of finding the optimal solution and, like black-box optimizers, have an excessively long run time.

We also considered a simple heuristic, optimizing each SKU independently and applying a scaling factor to meet the strategic constraint. But the heuristic actually yielded a negative impact on profit (since the profit curve is flatter around the optimal point, going all the way to the optimal point leads to price increases that, compared to the impact on the average weighted price, do not add significantly in terms of profit). We also brainstormed other heuristics (such as increasing SKUs to their independent optimal prices in order of inelasticity), but they suffered from similar problems, and could require more iterations to meet the global constraint. When optimizing prices that have a decent baseline, precision matters.

Our goal was to find a solution that would be as accurate as possible and could still be computed fast enough to enable real-time tests of various pricing scenarios. The solution we have developed exploits the property of our optimization problem and quickly finds the optimal solution for all values of the strategic constraint parameter. The strategy could be summarized by transforming the problem such that greedy solutions (which are usually easy to compute) are optimal. In the final analysis, the AI/ML-based pricing tool met all the client’s criteria.

Before we go in more technical details of our solution, let’s formulate the full problem more formally to set the technical context:

Objective: Max ∑_i profit_i(ie sum the profit of all SKUs)

Decision variable: Pi (price for SKU i)

Constraints: 
(1) ∑_i(Wi * Pi) /∑_i(Wi) = Constant 
(2) Pi >= mi, Pi <= Mi, for all i (min and max prices)
(3) The quantity follows a negative exponential demand.
Quantity new price = Quantity base price * exp(elasticity * ((new_price/old_price)-1), with negative elasticity convention.
(4) profit i  = Qi * (Pi – Ci), (i.e., profit is simply quantity * (revenue – cost))

Parameters:
Wi (weight of SKU i)
mi (minimum price for sku i)
Mi (maximum price for sku i)
Elasticity i: the price elasticity of sku i
Quantity base price, the baseline quantity of each sku
Baseline base price, the baseline base price of sku
Ci: cost of sku i

The Unlock

We can precompute the solutions for any feasible constant (in constraint 1) by leveraging the fact that the exponential function derivative is strictly monotonic (within some bounds), with the computational complexity of a simple sort.

As a reminder, any SKU has a profit curve that looks like something like this:

Let’s set an example to build up the intuition, and then go through the full optimal heuristic:

Without loss of generality, in our example we optimize 3 SKUs, with elasticities -0.6, -0.7, and -1.1, base prices of $1.00, base quantities of 1, and costs of 0. We also assume that we will optimize in the range of price of [0.5; 1.8] due to business constraints on the prices.

For context, we can plot the three profit curves along price changes. (Notice that with a price of $1.00, we do not change any prices and, therefore, we have the baseline quantity of 1 for all SKUs):

To optimize, let’s start the prices of all SKUs at their minimums (i.e., 0.5). If the average weighted price needs to increase by epsilon, we also need to determine which SKU we would increase the price for. Clearly, it would be the SKU that can give us the highest profit increase — the one with the highest derivative at the current price point.

Let’s plot the derivative of the profit curve:

We notice that at the minimum price, SKU “a” has the highest derivative, meaning it gives the highest increase in profit for an epsilon increase in price (which makes sense since this SKU is the least elastic). We also notice that after the price increase, SKU “a” ’s derivative will be lower. If we repeat this process to reach any feasible average weight, the SKUs will take turns getting a price increase, and we will arrive at an optimal path of price increase:

At first, it is best to increase the price of SKU “a.” But this has the impact of decreasing the derivative along the profit curve, making it more optimal to start increasing SKU “b.” At this point, we begin alternately increasing different SKUs based on the derivative value. By allowing ourselves to make price changes for one SKU then another, we can build a combined numerical derivative plot that shows which SKU is getting a price increase in which order. This, in turn, leads to the optimal path of price increase:

Notice on the graph that the SKUs take turns getting price increases. This is what makes the problem difficult to solve. If there were simply three colored lines, one after the other, then we could solve this problem using a simpler algorithm — and still get an optimal solution.

For each point on this optimal path, we can compute a corresponding profit and average weighed price. And for a given average weighted price, we can find the optimal price path that gets us there.

Corresponding (head) table

The Algorithm:

A step-by-step code walkthrough is available here:

Based on this approach, we can therefore use the following algorithm, which is optimal within the level of the numerical precision desired:

We break each profit curve in mini-steps to the desired level of precision for the solution.
We compute the associated numerical derivative of each mini-step of each curve.
We sort the values of the derivatives from greatest to lowest.
Since, for each mini-step, there is a weighted average price (Constant defined earlier in constraint (1)) value associated with it, we get the desired weighted average price (within the chosen precision) by finding the closest match.
Once a match is found, we apply the optimal path of price increase that yields that weighted average price.

In other words, we have transformed a “hard” problem to solve in such a way that a greedy solution is optimal as well — and provides an optimal answer quickly. Our transformed problem is equivalent to the original one and yields the same optimal solution — but is much easier to compute.

We should note that in sorting the derivative as per the above example, we have assumed that the derivative function of the profit curves is always decreasing. After a certain price increase there is, however, an SKU inflection point at which the derivative comes back down. For our purposes, we can simply choose not to explore price increases past that point.

Using ML Pricing in a Bionic Company

Clearly, there is a tremendous competitive advantage in management being able to instantly see how it can precisely adjust the pricing of thousands of individual products to reach a strategic pricing goal. The pricing tool’s ability to very quickly determine price changes needed to reach these various goals gives management the ability to fine tune its pricing strategy to meet larger business objectives.

As we note in our description of the bionic company, technology reaches its fullest potential when it is combined with the flexibility, adaptability, and comprehensive experience of humans. The pricing tool itself is built on mathematical rules and so, accordingly, the prices it calculates can be viewed as suggestions, not dictates. For the company to reach optimal performance, it must add human insight to the equation, as relevant product and pricing managers review pricing options and apply additional, non-mathematical considerations that might affect the final SKU pricing. These human insights themselves can be informed, for example, by applying an analytical thought process such as using various approaches based on observed price variances when setting minimum and maximum prices. Approaches to solving these kinds of problems can vary, but it is this overall fusing of artificial intelligence with human intelligence that enables companies to compete effectively in a complex, rapidly changing world.

How to Quickly Optimize a Non-Linear Optimization Problem by Exploiting Local Properties was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

How good is your forecasting? Unpacking metrics to evaluate true business impact of your models

Emilio Lapiello — Thu, 16 Jun 2022 12:30:28 GMT

Assessing machine learning models with granular metrics can boost business impact.

by Emilio Lapiello, Mikel Arizaleta, Andra Fehmiu, Alessandro Scaglia, Wenting (Tina) Hou

There are multiple machine learning metrics data scientists use to assess the accuracy of forecasting models. No single metric is considered ideal. Most, including MAPE (Mean Absolute Percentage Error), WMAPE (weighted Mean Absolute Percentage Error) , and R2, share a common approach: Using the notion of an “average” to provide feedback on model performance. MAPE, for example, describes the average absolute percentage error a model makes when comparing its predictions to actual values in a back-testing effort.

Aggregated “on-average” metrics such as MAPE help data scientists compare model performance against a benchmark. But these metrics often lack the detail needed to understand whether the model is providing predictions that are useful from a business perspective. Model predictions are much more valuable when they enhance business knowledge and help users understand how to solve the problem at hand.

The value of model predictions depends strictly on the nature of the sought-after outcome. Say you are a restaurant chain manager and want to understand hourly customer demand to optimize staffing at each of your restaurants. To have business value, your model must be accurate during demand peaks and troughs throughout the day, rather than on average. Knowing that your model’s MAPE is low, for instance, would be meaningless if you can’t determine whether the error is concentrated at high or low demand hours.

Similarly, say you sell canned soups in different flavors and want to forecast the number of boxes of each product your sales representative should be able to sell to any given store in a month. The challenge is that a typical store buys very few of some soup flavors (perhaps one or two boxes), but dozens of boxes of other ones. In this situation, an aggregated metric might be highly skewed by the quantity of the less-popular products sold — even though these sales would not have a large impact on your business overall performance. If, for example, the metric predicted that a store would buy two boxes of the product when, in fact, it bought only one, the metric would account a 100% error on this item.

Clearly there is a business need to evaluate forecasting models beyond aggregated mathematical metrics and consider the real business impact forecasting errors make.

Introducing F-MAPE, a new way to evaluate the business impact of forecasting-model errors

To overcome the limitations of the MAPE approach, we have developed a new way to assess model forecasting accuracy — one that we believe helps solve some of the issues that surround aggregated metrics. We call our approach Factorized-MAPE (or F-MAPE) because it unpacks or “factorizes” the summation components of MAPE. In doing so, it provides feedback on model performance that is actionable from both a business and a data science perspective and presents a clearer view of any error that might be induced from using the forecasting model.

To demonstrate the F-MAPE approach, we will use the canned soup sales forecast example referred to above. Assume we have built a model that predicts monthly sales volume for each flavor at each store. Our goal, using a back-testing exercise, is to compare the model’s predicted sales volume to the actual sales volume. In doing so, we will focus on two numbers:

1. The error (with sign) of our sales volume prediction for each product-store combination (i.e. actual volume minus predicted volume) in a specific month

2. The actual volume sold for the product-store combination in the same month

For the sake of argument, we will stipulate that the error spans from -147 to 147, and that actual sales span from 3 to 740. We can bin both dimensions and use those bins to create a table. In each cell of this table, we represent the percentage of model predictions falling into the relative range of both error and actual volume sold.

Figure 1: Percentage of predictions by volume prediction error and actual volume sold

For example, the highlighted cell shows that 1.2% of all our predictions are for products that sold between 10 and 20 boxes and have a prediction error between 1 and 2 boxes.

From this table we can also derive a maximum percentage-error for each cell by dividing its maximum error range by the minimum actual range. For example, the highlighted cell only contains predictions with a maximum error of 20% (i.e., 2/10).

Figure 2: Maximum error prediction by cell

How is our model performing?

If we set a model error tolerance threshold of, say, +/- 20%, then we can tell whether our prediction errors in a cell are tolerable or not: whether it is, in effect, a set of “good” or “bad” predictions. In the following table we have highlighted in green those cells that have good (i.e. tolerable) error predictions:

Figure 3: Percentage of predictions with tolerable error (green) vs not (white)

Moreover, by summing all such “good” cells, we get a measure of model performance ‘accuracy’ strictly defined by business-set error tolerance, which, in this case, is 73.2%.

Notice that:

Businesses can set different tolerance thresholds according to their needs
Tolerance thresholds need not be symmetric for over- and under-prediction errors
Bin sizes impact the model performance accuracy summary metric described above — because we are conservatively assuming all predictions in a specific cell have maximum possible error
Bin ranges can be chosen according to the required analysis resolution and detail needed
It is possible to calculate an accuracy summary metric using each individual prediction percentage error and then counting how many predictions are within our tolerance thresholds

In the product store volume example, we decided not to use predictions for products that had only recently been introduced at specific stores and identified as those selling less than 3 boxes. We could also decide that our error tolerance is 20% in over-prediction, but only 10% in under-prediction in order not to encourage a loss in sales volume. With this business input, we can determine that 68.2% of the time our model predictions are within the business-set margin of error.

Figure 4: Percentage of predictions with tolerable error, asymmetric tolerance thresholds

The Considerable Benefits of F-MAPE

F-MAPE unpacks the “model accuracy black-box”, which leads to specific actions that both business and data science stakeholders can take. We identified more such benefits while designing and implementing the F-MAPE approach.

First, F-MAPE gives business stakeholders the ability to effectively compare the percentage of good-versus-bad predictions based on those factors that matter most to them. If, for example, the business is willing to accept overpredicting but not underpredicting, then F-MAPE tolerance thresholds can be set to account for that.

Stakeholders can also use F-MAPE to ascertain for which products and store locations the predictions are good or bad. It is possible, for instance, to use the F-MAPE matrix to extract the bad predictions and determine whether they tend to happen for specific products, stores, or any other relevant feature.

Stakeholders can exclude from the model accuracy calculation predictions that, from their business perspective, are not important. If, for example, a model prediction errs on products that sell less than 5 boxes and low sales products are not relevant, the business can simply exclude those predictions from its model accuracy assessment and get a better view of the actual value of model predictions.

From a data science perspective, F-MAPE makes it possible to identify potential model bias in predictions. For example, the metric enables data scientists to quickly observe and calculate if the right-hand side (with respect to 0 error) of the F-MAPE table includes more predictions than the left-hand side. Data scientists can further quantify this skewness by row and investigate potential trends in model bias.

F-MAPE also enables data scientists to focus further research and model development on improving the model in those areas where it is not performing well, rather than focusing on improving its performance on average. If the data science team detects that the model is failing in specific areas of the matrix, they can then flag those areas and build a descriptive model to automatically profile instances in which the model fails, and focus improvements on those by, for instance, using more or better data for specific products or stores.

Finally, F-MAPE allows data scientists to isolate predictions for products or stores they know are not correctly predicted by the current model, and then build separate models for those. If the matrix generates significant errors for product with large sales volumes, then the data scientist can, for instance, create a separate model for just those products.

Getting to True Business Value

F-MAPE provides a novel and effective way to unpack aggregated forecasting metrics and improve forecasting model performance using business goals as its compass. We believe this approach of separating and binning dimensions in accuracy metrics such as MAPE can, in general, lead to a better understanding of the business benefits of forecasting predictive models and how to improve these models to effectively drive business value.

How good is your forecasting? Unpacking metrics to evaluate true business impact of your models was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

Smart Integration: Four levels of AI maturity, and why it’s OK to be at Level 3

Silvio Palumbo — Mon, 25 Apr 2022 15:21:07 GMT

Don’t copy Google

Business executives around the world are increasingly being exhorted to “embrace AI.” But what exactly does that mean? Researching, improving, and customizing AI solutions represent distinctly unique approaches to embracing AI, a domain commonly seen as the prerogative of highly technical organizations — and academia. These institutions are often in a league of their own as they lead efforts to research and improve the AI landscape.

That’s Google’s playbook, for example.

But even organizations that are not in the business of innovating technical solutions can reap outsized returns through the careful application of data and advanced analytics, including AI. These companies have learned to act as “Smart Integrators,” building competitive advantage by orchestrating tools and AI applications that have been developed by other specialized organizations, and then directing the tools to fit their specific data, technology, and talent context.

For the vast majority of the organizations, there is no point in pursuing Google’s playbook — it would not work. The sweet spot for many of them is not to win as AI developers, but to win as AI Integrators — Smart Integrators.

Who really “Does” AI?

Just as banks don’t feel the urge to reinvent Microsoft Excel to manage their financial modeling, the typical data team does not invent and build the new solutions, algorithms, and automation paradigms they need to stay competitive. That is the domain of highly specialized organizations, and the select data scientists, mathematicians, statisticians, physicists, biologists, and other subject matter experts they employ to push the envelope of AI and ML applications. Their role is to build newer, faster and more accurate solutions — some of which are surfaced to the open-source community, while others remain proprietary. If your company does not employ these experts on your data team, that’s perfectly alright. In fact, it’s probably the best strategy.

Broadly speaking, organizations fall into 4 levels of AI creation. Some span across more than one level but, in general, most organizations fall into one of these categories:

Level 1 — Innovators: Conducting Primary Research

Level 1 organizations develop new solutions, perform primary research, and push true innovation. These include the tech giants (Amazon, Google, Microsoft, et al) and academia (such as MIT and UC Berkeley), along with other large entities that either focus directly (academia) or indirectly (military, tech) on research or build their competitive advantage through technology and advanced analytics.

Illustrative Level 1 organizations and their solutions

Level 2 — Scalers: Pushing Solutions at Scale

Level 2 organizations, such as Uber and Meta, develop on the shoulders of IP created by Level 1 organizations, often focusing on automation and/or operation at massive scale. While not the creators of AI, Scalers are often the first movers in a field or the first to apply an existing AI solution at an unprecedentedly large scale.

Illustrative Level 2 organizations and their solutions

Level 3 — Integrators

Level 3 organizations are, well, just about everyone else: those that integrate tools, solutions, technology, and approaches developed by Level 1 and Level 2 organizations, combining these elements to fit their specific need in the most economical way. Most Level 3 companies that are not in the business of building/researching technology or analytics fit this profile, and display a remarkable degree of variation in terms of their individual success in developing Level 3 integration. The “Smart Integrators” among them are those that truly excel in the fine art of developing competitive advantage among their peers.

Level 4 — The Also-Rans

And then there are the Level 4 organizations that cling to backward-looking reports and dashboards that don’t fit the definition of advanced analytics, and that are reluctant to embrace at any meaningful scale the predictive and prescriptive power of data.

Building or Integrating Analytics

Just as banks don’t stop to reinvent the spreadsheet to address their specific needs, Level 3 organizations do not as a rule create new tools and algorithms. Rather, they adapt someone else’s AI solution and drive their business decisions around scaling existing data-driven applications and fully leveraging their data science teams.

By way of example, consider a retailer that seeks to reduce customer attrition. Customer-retention tactics, typically the realm of marketers and strategists, are most powerful when they can be precisely targeted and pre-emptive. That is the role of the Level 3 data science team: to leverage existing AI applications that use data to help marketers understand which customers are at risk of departing, and why. The most appropriate application for this task is, in my opinion, machine learning, used in the following way:

DATA WORK:

· First, aggregate past transaction and marketing-engagement data to create a consolidated customer view, often across different Lines of Business (LOB).

· Identify the subset of customers that have churned in the past to understand the leading indicators that inform a churn event.

ALGO WORK:

· Perform basic data transformations to feed into a “classifier.”

· Run the classifier, and then optimize model performance towards the highest accuracy achievable rate.

EXECUTION WORK:

· Drive retention campaigns based on model output.

· Measure results across different KPIs and propose corrections.

· Create a feedback loop to let the algorithm learn over time.

Typical end-to-end modeling pipeline

To perform these operations, the data team would have to write a certain amount of code. Most of the code would cover data transformation (having data in one place, cleaning fields, de-normalizing tables, etc.) and automation. But the team would not “build” new algorithms for this effort. Instead, it would leverage a classifier (such as XGBoost, a Machine Learning ensemble model) and other open-source tools (such as Airflow), along with libraries for accuracy reports, automation, optimization, tuning, and other operations. The team would need a skilled data scientist to deliver the best machine learning solution, in the form of a churn-prediction model. But to do so, the data scientist would not create a new algo. Thanks to the open-source community’s contributions of XGBoost library, AirBnB for Airflow, or Meta for the Prophet forecasting libraries, there are countless solutions available to power all Level 3 data science applications.

Smart Integration: Winning at Level 3 of AI adoption

Winning at Level 3 is not better or worse than winning at Level 1: It is simply a different path to creating business value, which must be tackled using different strategic approaches.

Level 3 organizations have access to a wealth of solutions developed by Innovators and Scalers that can help them unleash value from data. Developing a compelling recommender engine for an e-commerce retailer, a churn predictor for a telco company, or a next-best-action platform for a financial institution all share the same mechanics.

In these scenarios, it is critical to understand that the highly vaunted “algorithm” plays a critical, but not overwhelming, role in an analytics solution. There are obvious nuances depending on the application. For example, there will be greater complexity when heavily relying on computer vision for automation, and lower complexity when relying on predictive models for marketing outreach. Nevertheless, the conclusion holds: Success does not depend solely on the quality of the algo.

Illustrative capabilities supporting modeling pipelines

The fact that powerful algos (again such as XGBoost) are readily available means that the source of value must rest elsewhere. The data science team’s responsibility is to select, tailor, and optimize the algo, then embed it into a broader workflow that leverages the tools and solutions that have been developed by Level 1 and Level 2 organizations (compute management, automation, parallelization, etc.) and are widely available via the open-source community. This activity represents a complex orchestration in its own right.

How do you create competitive advantage when every Level 3 organization and its competitors have unfettered access to the exact same tools, algo and solutions? The winners are those whose end-to-end orchestration represents the best cohesive blend of solutions for that specific context, data, and business application. It all boils down to how well the team integrates existing AI solutions.

Creating Differentiated Competitive Advantage

Smart Integrators need to focus on three core battlefields: cross-disciplinary talent, data, and technology design.

1. Develop cross-disciplinary talent

Level 3 organizations do not perform primary research. Instead, they tune and adapt existing algos and plug them into very specific workflows. This poses a different challenge than creating a new algo, one that requires business acumen as well as analytical prowess. One way to meet this challenge is to build a team of unicorn-skilled, business-savvy, polyhedric data scientists who:

· Understand that adding such data as Yelp reviews, Google traffic, or weather data will increase the accuracy of a prediction, and add those feeds to the first-party data

· Can analyze past patterns to identify leading or lagging correlations (such as, for example, that rent prices take six months to reflect interest rate changes), and can account for that in the data transformations

· Have the intuition that change in trend is a better predictor than the simple sales figures, and can create or overweight that variable

· Can, by filtering the upstream data for low-price transactions, intentionally skew the model result, such as toward “low-price recommendations” as part of a marketing campaign

When those data science unicorns are hard to find and retain, Smart Integrators can achieve this same outcome at an organizational level by creating cross-disciplinary teams. The collective brain trust would typically be comprised of marketers, sales associates, and pricing and logistics experts — all of whom share their ideas and intuitions with perhaps “less-polyhedric” data scientists who can direct the team’s collective thinking toward the right algo selection and technology stack. Given the paucity and the expense of the most highly-skilled, most business-savvy data scientists, this collaborative solution makes sense for most Level 3 organizations.

For this solution to succeed and scale, however, the data scientist(s) on your team must:

· Have streamlined access to data

· Have an operational analytical environment suited for data science (a data warehouse is not an analytical environment for advanced applications)

· Leave scaling and productization problems to the team dedicated to supporting these issues, and…

· Have access to business experts who can help them understand and embed the organization’s context into their work

Data Engineers versus Data Scientists

Analytics excellence for Level 3 organizations is about understanding the talent map required for an integration strategy, not innovation or scaling strategy. Successful integrators build competitive advantage by orchestrating and customizing different components and libraries developed by others. This priority suggests a team composition that is skewed more towards data engineering than data science.

The specific team composition and team size vary from organization to organization, but there is a clear trend tied to maturity. New teams are built around data scientists and focus on creating Proofs of Concepts (POC) such as recommender engines, propensity models, churn predictors, markdown algorithms, and basic forecasting models. As organizations pivot from POCs to hardened solution (algo + in-market initiatives that exploit it, at scale), the algo side remains unchanged as the work pivots towards scaling, automation, acceleration of cycles, reduction of compute costs, reduction in latency and connection issues, bug resolution, and integration of all data.

Talent map evolution

In this context, AI tuning is a data science effort, while AI development at scale is a data engineering effort. Value accrues with scale.

2. Data

Data Volume & Signal Presence

If your organization does not build new algorithms — and with all machine learning algorithms having essentially been democratized — data volume and signal presence become undisputed differentiators and sources of competitive advantage. This is true for Innovators and Scalers alike, as demonstrated by deliberate strategies to accumulate information, extract signals, and create closed eco-systems.

Tesla, for example, is pushing the envelope on computer vision and autonomous driving, but it is also hoarding unprecedented volumes of camera feeds (from beta programs and internal tests) to analyze as many traffic situations as is technically possible. A competitor with the same algo would not achieve the same results without a comparable volume of data.

Other strategies involve researching novel ways to collect data. Netflix has researched approaches to automatically extracting (rather than manually tagging) the minutia of elements of videos that could help predict the next big success. In the near future, this kind of data-driven strategy may funnel investments to those productions with the highest number of such elements and, therefore, the highest probability of success.

After “hoarding” and “extracting” data, the next best approach is “merging” or finding signals from a broader ecosystem. Amazon leverages a wealth of (anonymized) purchase behavior from its ubiquitous e-comm eco-system to enrich its DMP strategy of targeting and allocating bids for media spend. Meta goes a step further by blending Instagram behavioral data with their social platform data.

For Level 3 Integrators, creating new analytical paradigms is not an option, but data strategy offers a compelling arena to create lasting competitive advantage. Access to first-party data is almost a level playing field, but Integrators can differentiate themselves through experimentation that enables them to enrich that data in unique ways.

Experimentation

Level 3 organizations don’t invent new AI solutions. Instead, they rely on what’s available from democratized research. And most AI applications available today, given sufficient historical information, can inform future actions only by looking at the past. These analytical approaches are usually based on correlation instead of causation, and are just as “enlightened” as the history they analyze. This means that even a trove of first-party data fails to inform new scenarios, given there’s no history. For example:

· An organization has always priced with seasonal discounts between 10% and 25% and wants to know whether a discount of 5% or 35% would work better

· A marketer wants to introduce a buy-one-get-one offer for the first time, but can’t tell upfront if it will perform better than a simple discount

· The pricing team wants to test localized pricing strategies and wants to pick the most likely region to perform well in the test

· The operation team wants to accurately forecast demand during an unprecedented pandemic

First-party data inform tactical decisions, while experimental data informs strategic decisions such as how to react, adjust, or course correct. Most Level 3 Organizations run, at best, simple A/B tests, look at historical price elasticities, or conduct scenario analyses based on demand forecasts. But to succeed in an AI-permeated world, these organizations must embrace a differentiated strategy based on ongoing and scientific experimentation, coupled with advanced causal-inference paradigms.

Experimentation requires organizational alignment and orchestration and is rarely the responsibility of data scientists who have been hired to build algos, not to solve business problems. Organizations fail to act on this imperative when they misunderstand attendant organizational complexity and view experimentation as a “cost.” Put simply, performance measurement through experimentation such as localized lift, attribution, sequence optimization, and competitive response is one of the most overlooked challenges of data-driven strategies. It is, in fact, a fundamental revenue-generating element of a successful winner-take-all scenario.

Starbucks understands this and optimizes its AI-powered marketing through ceaseless experimentation. Its approach includes a dedicated team of data scientists, engineers, and marketers who share a platform with a highly optimized and automated experimentation framework — with a marketing budget specifically focused on updating the knowledge around Starbucks customers’ preferences and reactions. This arrangement represents first-party data on steroids and able to create a true first-mover advantage… when done right. Personally, if I had to choose, I’d rather have Starbucks’ experimental data than a new algo.

3. Technology and Platforms

AI applications require technology and platforms to collect data from POS, sensors, and fleets; to integrate with delivery channels; and to perform data transformations, run models, and measure results. Unlike a decade ago, organizations now have unhindered access to all the compute and technology they need to perform seamlessly at scale. Data size is no longer an issue — even for Fortune 100 companies capable of generating incalculable volumes of data. Similarly, all Level 3 companies have access to the same democratized and consumption-based-priced choices of stack components, both in consumer analytics and operations. This includes cloud providers, forecasting and planning platforms, customer data platforms, workflow managers, automation engines, and other components. The question is how well each organization can engineer, harmonize, and manage the same exact data “stacks” and stack components in a way that enables them to outperform their competitors — and do so in a way that cannot be replicated.

One way to answer this question is by looking at the many niche platforms available in the market. The available “proprietary” customer data platforms, AI recommender engines, and forecasting solutions are typically built around the same open stack available to every other Integrator. What makes them “proprietary” is a combination of topic depth (achieved by the data owner having worked/accumulated data on a specific problem for years) and customized integration (as a result of the data owner having scaled and harmonized the components and algos in a business context). This combination of depth and specialization creates a compelling value proposition that, for most level 3 companies, is worth paying for. Smart Integrators don’t need to reinvent the wheel. Instead, they can follow the same tactics to abstract and reinforce what might qualify their platform as “proprietary”: their own experimental data, their own business context, the knowledge pool in their talent base. Translated into a technology stack conversation, the strategic decision is not whether to integrate, but how to do so in a way that builds and retains competitive advantage.

Building the Integration “Glue”

But what exactly does it mean to “build”? In this case, the tactical answer is to build the “glue” that makes the integration scale. The organizational answer is to build what allows you to retain talent. As discussed earlier, AI solutions rely on signal richness (historical and experimental data) and customization to the specific business context (feature engineering, model tuning, rule layer). If you are indeed building advanced analytical solutions, the investment should start with foundational areas — those areas where “owning the intelligence” will lead to better results than plugging in an off-the-shelf solution.

Consider starting with the basics:

· Stand up a true analytical environment — not a data warehouse, but a distributed computing environment flexible enough to accommodate the (high) peaks and valleys of model training.

· Develop robust data plumbing that includes POS integration, data exchange platforms, real-time architecture, pervasive tagging for digital assets, and properly sized querying capabilities to filter such tagging data. Getting data is of paramount importance.

· Resource and/or provision talent around API and other integration layers. Your vendors are all API-ready: You are better off engineering your own side of the handshake.

· Invest in rigorous automation capabilities. The tools are out there, such as Airbnb’s Airflow or Oracle’s Jenkins — tools that Level 1 and 2 companies have generously developed for Level 3 companies. Once Smart Integrators acquire these tools, their first priority should be to set them up in a modular, scalable fashion.

· Templatize and modularize what you can. Aim for repeatability and ease of approval, such as by building modular templates for personalized campaign. These templates are easier to QA and pass legal review. Break down forecasting processes into sequential steps for ease of intervention and explainability.

(Note that the foundations for AI we point to have no AI component. Remember, you are in the Smart Integration business, not the business of developing technical solutions.)

When it is time to pick your advanced analytics battles, choose wisely. Let’s say that you are building a consumer analytics capability for such tasks as personalized marketing, hyper-targeting your media spend, or setting prices. You might decide to own the experimentation layer. If you do not own the platform, you should definitely own the data and the strategy behind the way you put your experimentation dollars to work. You might also choose to own a part of the intelligence, which typically consists of a subset of recommenders, such as your churn-prevention logic. You most certainly would want to own the business rule and orchestration layer, which are your brand guardrails, and the logic that optimizes all your marketing outreach. (Note that these are illustrative, not definitive, examples.)

Specificity aside, owning sometimes might mean building and integrating one or more of those illustrative elements. More often, it means hyper-customizing your vendor solution by using your own add-ons. A broader, well-integrated systems will outperform a scattered stack (even one with peaks of excellence such as, for example, the best neural network available) that contains glaring holes (such as the lack of a scalable experimentation platform). But you might find another more compelling reasons for owning a piece of the stack.

Focus on Projects that help Retain Talent

Integration requires analytics and engineering talent, but pure integration in and of itself might not sound as interesting or appealing to that same talent. Data scientists, for example, enjoy “leveraging algorithms.” Machine Learning engineers like to grapple with “scaling algorithms,” while Full Stack Developers find professional satisfaction in “building products” — and they don’t like to mix and match. (As a rule, for example, data scientists don’t particularly enjoy data engineering.) This creates one of the conundrums of being an Integrator: You need the skills of talent who aspire to work on much more than integration.

The solution is to find projects that are interesting enough to entice analytics practitioners to stick around. Smart Integrators have learned to prioritize and resource a subset of applications and in-house development specifically to address the retention imperative. This is not just a matter of ginning up a skunkworks: It is a deliberate investment around the core data asset that can both create a competitive advantage and retain talent. These projects might include the development of an incremental product recommender, a customized CRM integration, a localized forecasting engine, or a complex pricing tool. All these will be appealing to data scientists.

Projects that appeal to engineers and developers are those that “productize” solutions. These might include packaging critical processes into a software layer, building a tablet-friendly UI for the sales force, or developing an advanced workflow manager, hyper-dynamic templates and wireframes for web navigation, or an innovative check-out process. More advanced Level 3 organizations have also invested in modular Analytical Workbenches (combinations of internal tech and vendors such as Data Robot) to streamline data engineering and empower external vendors to plug in the company’s solutions more effectively. Or they invest in data scientists who will focus on algo optimization.

It would be reductive to frame such efforts as retention gimmicks. These initiatives are consistent with a Level 3 paradigm: working within the very specific context, data, and industry to focus on refining, scaling and adapting algos and solutions developed by Level 1 or 2 organizations. The important takeaway is that internal initiative priorities should also include a talent overlay. When you focus on projects your analytical talent is already engaged with and up to speed on (in terms of the latest technological advancements in that area), they will be more inclined to stay — and might even deliver some innovative new output in the bargain.

Conclusions

Data scientists, software developers, and data engineers are often removed from strategic decision-makers within large organizations. Their deep knowledge of what actually drives AI execution does not percolate through the ranks of management, thus creating cognitive dissonance between the hype of “developing AI” and the reality of “extracting the value of AI through smart integration.” There is no lack of a “cool factor” in Smart Integration. But since it is rarely understood as the catalyst of AI adoption, it is not strategically supported when decisions are being made about staffing, architecture design, data strategy, and operating model.

There are two positive angles to consider when thinking about embracing AI. First is the ability of AI to unlock value, such as the way Level 3 organizations that adopt smart integration continue to realize outsize returns. In this angle, AI is a means to an end, as integration can indeed become proprietary and differentiated. The second angle is that embracing smart integration is not limited to a binary outcome: It can be embraced in waves of adoption.

Regardless of which angle seems more appropriate, the first step to embracing AI is really a matter of “education” — of bringing data practitioners closer to the decision-making process and fostering a genuine cross-pollination between analytics know-how and business objectives. It is the factoring in of this added intelligence that puts the “smart” in Smart Integration.

Smart Integration: Four levels of AI maturity, and why it’s OK to be at Level 3 was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

Software Developers Align to Reduce Escalating Computing Emission

BCG GAMMA editor — Tue, 29 Mar 2022 12:33:30 GMT

Written by Matt Kropp and Niels Freier

For those not intimately involved in software development, it’s hard to grasp just how much greenhouse gas is emitted when software is built, AI models are trained, and data centers are cooled — or to grasp that these emissions are increasing at an astronomical, climate-wrecking rate. As part of our commitment to dramatically reduce the software development industry’s environmental impact, BCG GAMMA has joined the Green Software Foundation. The foundation’s goal is to build a trusted ecosystem of people, standards, tooling, and best practices to make our industry more sustainable.

“We have the opportunity to make the software industry net zero: This is not a hard-to-abate sector,” says Matt Kropp, BCG Managing Director and Partner. “But it will take measurement, awareness, and empowerment for technologists to get there. As one of the areas of greatest innovation, technology will lead the way to addressing climate change.”

Emissions are Often Out of Sight, but Should Never Be Out of Mind

BCG estimates that datacenters now consume from 1–2% (and rising) of all the energy generated each year around the world. This is due, in part, to the rapid increase in compute-intensive tasks like AI model training. For example, the number of floating-point operations per second (FLOPS) has steadily increased by a factor of 150 since 2004, from 100 GigaFLOPS in 2004 to 15 TeraFLOPS in 2020. And according to a 2019 University of Massachusetts study, an inefficiently trained NLP model using Neural Architecture Search can emit more than 626,000 pounds of CO2 equivalent — about five times the lifetime emissions of the average American car. Clearly, the emissions produced by software development, while invisible to most end users, has a significant impact on our climate. Data engineers and data scientists must work together to reduce these emissions.

We have a strong commitment to protect our planet through our work with clients and partners, and by improving our own operations. In fact, we have committed our organization to reach Net Zero by 2030. Our CO2 AI solution is already helping our clients both measure and reduce emissions generated by a broad range of activities, including computing. Our CodeCarbon library enables our clients to track the amount of CO2 produced by the cloud or local computing resources used to train machine learning algorithms. Our clients can then use CodeCarbon to identify and implement more efficient software and ML training code.

Sustainability Requires a Change in Mindset

But the climate problem needs more than a technology. As BCG GAMMA Associate Director of AI Software Engineering Niels Freier notes, “[w]e need standards, tools, and best practices to change our mindsets as software engineers. We can start by asking ourselves questions like whether we need to deploy a large cluster to run simple mathematical operations, when maybe we could accomplish the same thing by optimizing a little bit of code.”

GAMMA is comprised of more than 500 software developers who work with clients to develop repeatable products and components. Because we work with so many Fortune 500 companies, we are in a strong position to immediately help the Green Software Foundation shape, create, evolve, and promote adoption of emission-reducing standards and tools, and develop software-development best practices.

The Solution is Both Industry-Wide and Personal

One of the major obstacles we as an industry face is lack of awareness of both the true environmental impact of our work — and of ways to mitigate this impact. We as technologists must work together to make it easier for companies to accurately measure the climate impact of the code they write, and then quickly reduce that impact. Data centers can purchase more renewable energy. Coders can write more efficient code. Tech vendors can develop more power-efficient hardware. We must clearly and quantitatively understand the full extent of our industry’s climate impact. Our planet cannot afford for us to invent the next venture, the next algorithm, the next AI use case without making the reduction of environmental harm a central tenet of the work we do.

We at BCG envision a future in which sustainability is a cornerstone of software development. By joining the Green Software Foundation, we offer our expertise to help make sustainable standards and best practices commonplace, easily accessible, and the driving force behind innovation and adoption. As part of our commitment to driving change, BCG will host two events this coming spring at our Boston and Paris offices to increase awareness of and membership in the GSF. We must work together to convince more organizations around the world that solving the climate crisis is no longer a matter of discussion. We know what needs to be done. Now we must act.

Software Developers Align to Reduce Escalating Computing Emission was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

Demand Forecasting Evaluation

Slava Bazaliy — Fri, 25 Mar 2022 15:33:09 GMT

Demand Forecasting Evaluation: A Single Metric for Optimal Planning

Demand forecasting always comes with uncertainty. When someone reports that a forecast is 90% accurate, what does it mean? What if another metric says that it is only 60% accurate? Do any of these accuracy numbers translate to actual business value and how? The answers to these questions exist, but they are not simple!

By Viacheslav Bazaliy, Slobodan Milovanovic, Antti Niskanen, Daniel Sack & Jan Beitner

For any business, demand forecasting is a crucial component of an end-to-end (E2E) planning process. It enables optimal decision making amid times of uncertainty, promotes efficient supply chain management, and acts as a real-time indicator of relevant market trends. Whether used for planning sales of mature products in well-known channels or of entirely new products in a pioneering market, demand forecasting adds significant depth to the decision-making process. However, as with all things that concern future events, forecasting brings uncertainty to the planning process. Planning can be optimally performed only if these uncertainties are correctly quantified — shifting the challenge towards obtaining the best prediction accuracy.

Over the past three years alone, BCG GAMMA has completed more than 40 large-scale transformations built on a foundation of improved forecasting. Cumulatively, these transformations have generated more than $10B uplift in revenue. GAMMA’s approach builds value and competitive advantage at the intersection of data science, technology, people, business expertise, processes, and ways of working. We have observed that, with recent trends toward strong digitalization and data democratization, demand forecasting accuracy has improved tremendously — and has done so across numerous industries. This is a welcome development given that the speed of market entry for products accelerates year over year. This improvement in accuracy is largely the result of advancing machine learning (ML) methods. Cutting-edge ML models can now incorporate thousands of factors, learn patterns from past data, and provide a market overview that enables improved business decision making.

When employing ML methods, data scientists tend to report their achievements and progress using out-of-sample metrics such as mean-squared error (MSE). Those with technical backgrounds can usually understand the meaning of metrics like MSE because these concepts are studied in universities and have clear probabilistic interpretations. But it can be nearly impossible for many business leaders — including those with STEM degrees — to understand model performance without a clear context of the problem scale, understandable performance benchmarks, and, most importantly, a direct connection to business processes. Company executives usually have solid domain expertise and a great understanding of the business context. However, for many business leaders, assessing the technical context and making informed decisions quickly, remain daunting challenges.

So, is there a single evaluation metric to optimally plan a business?

Forecasting from the Executive Perspective

Imagine that we are running a small kiosk that sells coffee, pastries, and a variety of confectionery goods. We have a friend who is a data scientist, and who has graciously agreed to help us with our inventory buying decisions. She has trained three models on our past data and has now shown us the backtesting result from the past week for one of the products. From now on, we will use nicknames No-sales, Average, and ML for corresponding forecasts displayed on the graph below.

Actual sales and backtesting predictions from three pre-trained models

With this result in hand, we can now choose which of these three forecasting models to use. But which model should we, the executives of this small business, choose? The answer depends heavily on the product nature and business constraints of our coffee kiosk. We will go deeper into the business context of our example in the following sections. But first, we will draw on our extensive BCG GAMMA experience to help our friend the data scientist establish the technical framework for the demand forecasting assessment.

A 3-step Approach to Choosing a Forecast Model

Let’s take a step back and examine how to approach demand forecasting evaluation from a business perspective. This type of evaluation fits into the general ML model assessment framework. The framework’s goal is to construct a procedure that results in an unbiased, out-of-sample accuracy estimate. However, a few aspects complicate the demand forecasting evaluation:

1. A time dimension that imposes additional assumptions on the generating process and restricts us from randomized data splits for out-of-sample assessments of errors

2. The difference between observed demand, which is limited by factors such as stock level and sales, and actual unobserved (unconstrained) demand

3. Zero-inflated data at low granularity levels, and violation-normality assumptions for model residuals

This is an abbreviated list of complications that can differentiate demand forecasting from traditional regression problems. Such complications are always present to some degree. We propose a three-step approach that allows us to evaluate a predictive model from a business perspective to address them systematically. Note, that it differs from modelling or training assessment that might require totally different aggregation levels and loss metrics options. Let’s dive into the details of all of the steps.

Step 1: Select aggregation level

We suggest the aggregation level selection as a first step because this choice will influence your options both for the validation procedure and applicable metrics. As stated above, we look at this purely in business terms. From a modelling perspective, this question can be irrelevant, e.g., hierarchical machine learning models can utilize all levels and benefit from reconciliation techniques.

Looking from this angle, the most appropriate aggregation level is naturally defined by the inference we want to do based on the demand projections. For instance, if we run a stock allocation across stores, the best option is to look at store level forecasting errors while the whole chain would be insufficient. One could stop the discussion at this point. Still, it turns out that statistical properties of the less aggregated levels might limit the scope of suitable metrics to sophisticated and unintuitive options. Those can be difficult to communicate to business stakeholders and eliminate the transparency of the evaluation process.

For example, on granular levels, such as daily sales of a specific product in a particular store, we often observe many zero sales and only a tiny fraction of actual positive sales. These distributions are named zero-inflated and require certain statistical assumptions for the underlying mixture of data-generating processes. An overdispersed Poisson distribution, such as the Negative Binomial distribution, in particular, is a good default distribution option for modelling such data.

Example of daily sales quantities histogram for a fashion product in one store

However, the most popular evaluation metrics like mean squared error or R2 assume normally distributed errors. Thus, model selection with these metrics on granular levels can be suboptimal and biased. Luckily, when we aggregate data, the central limit theorem starts to play in our favor. Aleatory uncertainty that is large at granular levels vanishes when we model a significant volume, and the distribution of aggregated demand converges to normal.

Although technically the best evaluation granularity can be derived from the target business decision, in practice we face another trade-off between complexity and a clear understanding of the evaluation process. What is best depends on the exact context. Thus, we should always carefully examine possible aggregation options for both hierarchy (such as product versus product category) and time (such as days versus weeks).

Step 2: Set-up validation procedure

It is important to differentiate between actual (observed) sales limited by the stock levels and unconstrained (unobserved) demand that could be realized under perfect conditions. Stock level is the typical limiting factor for sales, but other events such as in-store operation failures or holidays can also distort the sales picture. We highly recommend for you to account for this difference in your modelling and set up the unconstraining procedure for your demand forecast target.

Demand forecasting operates within a time series. We recommend that standard practices such as rolling cross-validation procedures always be applied so that you can construct unbiased, out-of-sample accuracy estimates and prevent data leakage in the evaluation. In order to obtain unbiased validation, the train-test splits have to be representative. In particular, they should account for seasonality, special days, and other relevant systematic differences between periods of time.

Step 3: Select evaluation metric

In general, demand forecasting is formulated as a regression problem. Evaluation metrics in regression problems can be split into bias and variation (accuracy) classes, where bias indicates signed deviation from actual values (location), and accuracy evaluates unsigned average deviation (variance of data). Note that this split of metrics is not based on the bias-variance trade-off concept.

In business applications, the selection of evaluation metrics also involves a trade-off between interpretability and statistical rigor. The percentages might be more intuitive to interpret, but actual business KPIs will depend on absolute variation. Wrongly selected KPIs might lead to suboptimal hyperparameter selection but might also create transparency that can accelerate business adoption of a new machine learning-based forecasting tool. As such, it is essential to have a clear understanding of the underlying probabilistic assumptions for different KPIs.

Commonly used metrics

Let’s now examine the list of commonly used metrics for demand forecasting evaluation, focusing on point estimates. Some metrics, like MSE or MAE, originate from log-likelihoods of corresponding probabilistic models. While others, like R2 or MAPE, are preferred due to their standardized scale and more intuitive interpretation.

Example: Model selection with common metrics

Now we have all the knowledge we need to select the best forecast from available options. We want to have a complete view when backtesting, so we suggest to our friend that she should calculate bias and three other common accuracy metrics to help us select the best forecast.

Revisited: actual sales and backtesting predictions from three pre-trained models

Let’s look at a few popular evaluation metrics for our coffee-kiosk business problem.

Bias — metrics that, in simple terms, tell us how off the model predictions are in percentage to the average target value.
SMAPE — symmetric version of the mean absolute percentage error that compares absolute error to the average between forecast and target. The latter property guarantees that the value always belongs to the 0–200% interval.
wMAPE — another weighted version of MAPE, where individual absolute errors are weighted with target values. Unlike SMAPE, it does not have an upper bound.
R2 (coefficient of determination) — estimates the share of variation in data explained by the model predictions. This coefficient originates from classic OLS (ordinary least squares) methods and provides a number between -100% and 100%.

It seems that this approach makes things even more confusing. A common combination of SMAPE and bias would select the average forecast in this case, while a machine learning forecast is preferred by wMAPE. On the other hand, wMAPE values for average forecast and no-sales forecast are almost identical, so this metric alone can also be misleading.

How can we resolve this disagreement between different metrics? Let’s return from the world of math to the world of business.

Example: Adding Business Context

Now we will need more context about the product we are modelling. Let’s consider two different scenarios where sales and forecasts correspond to:

Ice cream
Donuts

We will assume that the price point and, thus, the average yearly revenue for both products are quite similar.

Product 1: Ice cream

From a business perspective, if our coffee kiosk has a freezer with sufficient space, we can store unsold ice cream there and won’t have to account for daily sales fluctuations. The main goal of modelling for icecream, therefore, would be to keep overall bias close to zero.

Let’s assume that we do daily allocations according to next-day forecasts, and that we can keep unsold products in our on-site freezer.

Ice cream scenario sales and stock allocation for Average and ML forecast

Note: We intentionally used unconstrained demand, which is different from actual sales. Sales on the last day in our example were zero, but in our scenario, this was the result of a stockout.

According to the average forecast model, total sales for the allocation strategy are 15, while total sales for the machine learning-based strategy are only 12. Assuming a $3.00 gross margin per item, our kiosk business would get a 25% uplift from using the average forecast compared to the machine learning forecast.

Product 2: Donut

Unlike ice cream, donuts should be sold while fresh, with unsold products thrown away at each day’s end. In our kiosk, we do not make donuts ourselves. Therefore, we would pay more attention to daily dynamics in this scenario since overstocking donuts would significantly decrease our profits given the high cost of goods sold (COGS).

Let’s assume that we do the same daily allocation according to the forecast, with the caveat that we must scrap the unsold products at the end of the day.

Donut scenario sales and stock allocation for Average and ML forecast

In this scenario, we get the same amount of sold items for both average and ML forecast allocations. However, for the allocation based on the average forecast, we bought more items, which, if unsold, would then have to be thrown away. For this product, the machine learning allocation was more accurate, with an overall gross margin 24% higher than that resulting from the average forecast allocation.

Financial results from backtesting

Combining metrics: aggregate and conquer

Generalized, clear and simple KPIs are crucial ingredients for making educated decisions. On the contrary, the discussion above illustrates that demand forecasting requires tedious business case analysis and selection of tailored metrics. However, we can (and sometimes have to) remove one dimension of complexity for practical reasons. Namely, for granular demand forecasts, we cannot evaluate all the individual category metrics, so we have to combine them into one or several KPIs that we can keep track of. In our toy store example, this would correspond to combining evaluation metrics across products we are selling — ice cream and donuts.

The simplest option for aggregating metrics that dominates in practice is to take the average. This robust and easy to explain aggregation provides a good insight into the performance, but often can be misleading for the actual incremental value of the model. For example, consider forecasting for two products where one of them has zero sales for the evaluation period. If the model predicts zero sales for both products, the resulting mean metrics might look sensible while the underlying forecast is practically useless.

The previous example illustrates that treating different metrics equally in aggregation can be deceptive. In reality, demand forecasts for some categories are more important than others for various reasons. This can be driven purely by business objectives or physical constraints like storage volume for large items. Clearly, storing additional bubble gum on the shelf is easier than finding space for another 5L bottle of milk. To tackle this aspect, it is common to use weighted metrics. They are still easy to grasp but proven to provide a fair estimate of business performance in practice. As an example, the weighted root mean squared scaled error was used for evaluation in the well-known Kaggle M5 demand forecasting competition by Walmart. The weighting by recent sales is chosen to select “the best performing forecasting methods to drive lower forecasting errors for the series that are more valuable to the company”. Based on that, using the convolution of metrics is probably the best universal approach for aggregation in general. But, can we do anything better considering the business context, as we did in our previous examples?

Let us again consider the problem of daily stock allocation. Once we have the trained model in hand, we face the following decision for each product: how many items should be delivered tomorrow given the remaining stock level and the forecast for tomorrow?

Despite forecast uncertainty, the optimal business decision remains the same within a certain range of outcomes

The next day realized sales can vary a lot, as well as EOD stock level, due to imperfect forecast and aleatory uncertainty of the sales process. Understocking would mean unfulfilled demand and lost sales, while overstocking might cause problems with storing the remaining items. However, we have a range of outcomes for which our decided delivery amount remains optimal, despite the forecasting error. Therefore, for a given product, we do not need the perfect demand forecast but the one that provides accurate enough predictions to keep stock within a certain predefined limit. For stationary processes, this requirement translates to the threshold for the appropriate accuracy metrics, e.g., MSE limit for the normal likelihood case. In this scenario, the best model for us would be the one that provides the desired accuracy for all products, and the corresponding KPI is the percentage of products where accuracy metrics is below the predefined threshold.

Once again, we have demonstrated that tailored business decision analysis can lead to a better model performance evaluation. Note that while this method is specifically designed for our context, the possibilities for tailoring are limitless. Careful examination of business context will be always rewarded by value gain from the correct decisions.

At BCG GAMMA, we use this value-centric approach in PLAN AI, an end-to-end planning solution. PLAN AI focuses on key planning decisions and brings internal and external data sources together to enable better decision making, and orchestrates, not only different accuracy metrics but also different forecasts into a single source of truth.

Learn more about PLAN AI, and feel welcome to reach out to the team by e-mail PlanAI@bcg.com!

Conclusion

As demonstrated by the examples, demand forecasting evaluation is rarely a straightforward matter, even for simple businesses. Like many other data science applications, such evaluations require a combination of strong modelling skills and sound business acumen. Furthermore, real-world applications come with a variety of products and business constraints which make it extremely hard, if not impossible, to arrive at a perfect metric with a closed-form expression. The aggregation of metrics across groups in the hierarchy is also a challenging problem that lacks a perfect generic solution. Hence, the absence of a well-defined target function for AI engines leaves no room for silver-bullet solutions that would solve the generic demand forecasting problem for all kinds of businesses at once. It is only the combination of domain expertise and data science methods, integrated into the business processes, that can enable businesses to unlock the full value of machine learning-driven demand forecasting.

References

Demand Forecasting Evaluation was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

Hybrid Demand Planning for Petrochemical Suppliers

Mishtu Sharma — Mon, 17 Jan 2022 08:32:31 GMT

Written by Mishtu Sharma & Mudit Mishra

Because the petrochemical industry plays a vital role in providing raw materials for a wide array of products like plastics, resins, lubricants, fibers etc., it is imperative that the industry finds a way to improve its predictions of future demand.

Market demand for petrochemicals is largely driven by rising demand for downstream specialty chemicals and plastics manufacturing that are central to the automotive, construction, and manufacturing industries. The basic chemicals and plastic derived from petrochemicals act as building blocks for numerous non-durable and durable consumer goods. And this demand is growing, with the global petrochemical industry size projected to reach $651.1 billion by 2027, expanding at a CAGR of 5.0%.

Since accurate demand planning helps manufacturers create an optimal production plan that saves enormous amounts of both time and cost, a reliable demand-planning solution is a powerful input to the production-planning optimizer. In this article, we will examine typical challenges faced by petrochemical suppliers as they plan future demand. We will then describe our demand-forecasting solution and its impact.

1. Client Context and Challenges

One of our clients, a major petrochemical-business player in Europe, wanted to improve its demand planning to further improve inputs to its production-planning system. The client’s current production planning factored in expected customer demand and price information to prepare the most optimal short-term plan. The client’s goal was to secure an analytics toolkit that enables it to forecast monthly demand for each customer, across all products, in each region, and for both spot sales and contracts.

Some of the more significant challenges facing the client include:

· Low Accuracy: The client currently relies on sales managers’ intuition to forecast demand for each customer across all products. The current accuracy is approximately 55%.

· Lack of advanced ML solution: Historically, the client has relied solely on these intuition-based forecasts, with neither a stable nor highly functional model to support their predictions.

· Siloed working environment: Sales and production-planning teams work in siloes, leading to information loss and lack of structure.

· Lack of Market Intelligence: Because departments have been working in silos, gathering and streamlining data for exogenous variables takes much longer. Furthermore, the client’s existing model did not capture the impact of external variables.

2. Our Hybrid Forecasting Approach

Looking at the challenges faced by the client, we concluded that one single ML technique might not work for every time series, as each series is quite granular. Because an increase in granularity of the analysis can also lead to increased data inconsistency and noise, we decided to create multiple ML models, then pick the forecast that was provided by the most accurate ML model.

While creating the hybrid approach, we placed a great deal of emphasis on solution modularity and scalability, whether for version controlling or automated parameter tuning. As an impact of this hybrid approach, we typically saw an uplift of overall accuracy by 5–40% for different time series over and above the singular approach baseline.

As part of our solution, we created a framework in which a fully automated pipeline performs data cleaning and preparation, and runs five different ML algorithms using powerful parameter tuning to forecast for future.

There are 3 major steps in the framework -

1. Data Preparation: The first step includes the gathering, cleaning, and preparation of the analytical dataset that will get fed into the ML algorithms.

2. ML Engine: The same analytical dataset is then used to run multiple ML algorithms along with continuous cross validation and automated parameter tuning to select the best set of parameters. These parameters are in turn used for testing the accuracy on out-of-sample data.

3. Forecasting: Finally, the best ML algorithm is chosen based on out-of-sample accuracy and is used for forecasting future time periods.

Data Preparation

Next, we shift gears and do a deep dive into our approach. First, BCG helped our client create a powerful demand-forecasting framework to manage its 5,000 products and customers across multiple regions in Russia, Europe, Asia, and Americas. The demand forecasting framework predicts monthly demand using historical transaction data as well as more than 200 external variables across 15 data sources.

BCG gathered and streamlined multiple internal and external data sources by creating systematic matching keys for data manipulation. The streamlining of these data sources resulted in a more efficient way to work across departments, enabling us to create the external variables that were fed into the model. Along with this data, we also used business and modelling filters to remove data inconsistencies and redundancies.

Machine Learning Algorithms

Next, we created five fully automated machine learning algorithms, passed the same master data to each of them, and then captured their predictions.

Multiple algorithms were used for hybrid demand forecasting

Each algorithm belongs to a different statistical technique that overcomes different types of issues faced while forecasting demand. Each uses its own set of advanced parameter-tuning and cross-validation techniques to generate its predictions. Our data science counterparts on the client side have complete control over the parameters, which can be further tuned based on market scenarios. The idea behind this arrangement is to enable client staff to run these models with only minimal supervision on our part.

Note: Pycaret is an auto-ML library in python that runs multiple regression models and uses the one that optimizes a user-defined metric such as R2, MAPE, or AIC. More information on the other techniques can be found at Prophet, Sarima and Sarimax, and GBM.

3. Key takeaways from our experience

Through the use of streamlining and automation, we were able to resolve a number of challenges facing our client. In doing so, we learned a number of important lessons:

a. Hybrid Forecasting wins over Singular Approach

During the entire model building and implementation journey, we observed multiple ways to efficiently and dynamically increase accuracy without relying on static parameters.

We created a hybrid model because each time series is different in nature, so a single model will not fit all scenarios. For each time series, we picked a separate algorithm based on out-of-sample MAPE. Using this approach, we found that the uplift in accuracy ranged from 5–40%, resulting in a weighted-accuracy uplift of 15% overall.

Hybrid model clearly works better than individual models

b. Tree-based algorithms show superiority over conventional approaches

In our experience, historical transaction data in Industrial Goods/Energy sectors is typically not as consistent as transaction data from other domains. Hence, it becomes difficult to decompose seasonality and trend from the data. For this reason, conventional time series approaches such as Exponential Smoothening, Holts Winters, or SARIMA provide less accurate forecasts.

Recursive nature of tree based algorithms provides better results

We noted that the use of recursive, tree-based algorithms are not only more dynamic, but also provide better results. In our work with the client, the introduction of tree-based algorithms increased accuracy by upto 10%.

c. Shifting cross validation stabilizes forecasts

Since time-series data is time ordered, random K-fold validation does not work. Instead, we introduced cross validation with shifting time periods to all our algorithms to create more robust parameters and forecasts. The dates for cross validation were automated in the user config. Introduction of dynamic cross validation helped stabilize the model results and reduced variation in accuracy of from 25% to less than 5%.

d. Aggregation, Recency and Guardrails keep the forecasts within bounds

Time-Series Aggregation: For some product-customer combinations, we did not have enough historical volume transacted. These combinations were aggregated to a higher level of product hierarchy and later de-aggregated using historical patterns.

Emphasis on Recency: While forecasting demand for future periods, we used the latest 6 months for finding best parameters, and used the complete historical data for model building. We did this to put more emphasis on recent demand patterns and trends.

Guardrails: We restricted our raw forecasts from each algorithm by putting guardrails to avoid outliers. Guardrails can be created by looking at historical pattern or using business rules. An example of guardrails based on historical pattern is using mean and standard deviation. An example for business rules can be looking at fixed volumes as per customer contracts along with upper and lower tolerance limits.

4. Multi-pronged impact of Hybrid Forecasting

In less data-centric industries such as Energy and Pet-Chem, BCG has helped our clients uncover potential impact of this solution in multi-fold ways, including:

Dollar Savings:

Our new, advanced, fully automated algorithm delivered an estimated savings of $2/ton for various petrochemical products (Poly-ethylene, Poly-propylene, Poly-styrene etc) by improving:

· Stock Management: Reducing costs of storage due to over-production of grades saved an estimated $ 1.4/ton.

· Production Efficiency: Reducing production losses due to transitioning products from one grade to another saved an estimated $0.5/ton.

· Logistics Optimization: Reducing additional costs due to mismanaged transportation saved an additional $0.1/ton.

Accuracy Uplift:

After we compared the model results with sales forecasts, we saw an uplift of 5–10% in accuracy at the granular level and 10–20% at the aggregated level. This increased accuracy resulted in better inventory and production planning.

Forecast Efficiency:

Since the fully automated algorithm is so well suited to the client’s legacy system, we saw a significant reduction in time to forecast. The model runs in approximately 15 hours including data preparation and all five algorithms. Previously, when these forecasts were produced using experience and intuition, it took sales managers weeks to collate all the forecasts across various products.

Client Enablement:

One of the major changes BCG observed was that the client was now more equipped for better forecasting now that it had access to:

· Better, more streamlined internal and external datasets with seamless connectivity

· The factors and/or reasons that could impact demand forecasts

· Improved dynamics of demand across multiple clients and regions based on varying market conditions

Begin the work of future forecasting now

Despite being very advanced in terms of its oil-producing and refining capabilities, the petrochemical industry has lagged in its ability to base demand forecasting on more than experience and intuition. Only through the use of advanced analytics and new external data sources will the industry be able to plan for what promise to be increasingly tumultuous world markets.

This is a far-from-straightforward undertaking. Given its history of limited use of historical data, along with its heavy reliance on intuition, the petrochemical industry must up its game and implement more sophisticated approaches to leverage existing data to create the most accurate demand forecasts. These kinds of capabilities are rarely found in-house, so it is in their best interest for companies in this industry to seek out those that have the necessary experience and data science skills. Getting demand forecasting right is important for all industries. But for commodity-based industries like petrochemicals that are heavily impacted by external factors and unprecedented changes in supply and demand, accurate demand forecasting is a necessity.

Hybrid Demand Planning for Petrochemical Suppliers was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

Canadian Open Banking Initiative Presents Sterling Opportunity for Banks

BCG GAMMA editor — Thu, 16 Dec 2021 15:43:30 GMT

By: Stiene Riemer, Joerg Erlebach, Kiran Konanur, Mark Schofield, Barric Reed and Jakob Liss

Covid has accelerated digital transformation across every industry, and banking is no exception. The pandemic-driven need to operate remotely has driven consumers to expect from their banks the same type of data-driven, innovative services offered by Big Tech. This presents a significant challenge for traditionally conservative Banks: if banks themselves don’t provide these kinds of services, FinTechs have shown that they are more than ready to step in and meet consumer expectations, though often through the use of suboptimal methods such as screen scraping.

Millions of Canadians currently utilize services offered by unregulated third-party providers that require them to share their access credentials, which the providers then use to access customers’ private banking information. Needless to say, the unregulated sharing of access credentials poses an immense security risk for consumers, especially given the number of data breaches that occur each year. While most breaches expose lower-value items such as usernames and email addresses, data breaches of the future might include consumers’ private banking credentials. The havoc this could wreak on consumer accounts is unimaginable.

To address this imminent threat and others, Canada created an Advisory Committee on Open Banking tasked with crafting a long-term vision and plan for the secure sharing of consumer data between providers within Canada. The committee reached a critical point in the design of this vision and plan with the August 2021 release of its final report.

What is Open Banking?

Open Banking is a set of initiatives under which consumers and small businesses direct the sharing of their data between financial institutions and third-party providers. Open Banking enables numerous use cases, including:

Customer finance management: Provides consumers with better visibility into their finances across their providers
Payments: Enables streamlined management of bills, payroll, peer-to-peer transactions, and more
Lending: Enables faster, more straightforward, and more automated adjudication of credit applications, allowing a significantly quicker time-to-yes, better price differentiation, and ongoing monitoring of risk
Customer engagement and retention: Enables fully personalized customer journeys with highly tailored offerings to improve client engagement and reduce churn through a comprehensive and timely understanding of customer needs
Audit: Provides intelligent, approachable tooling to support taxes, audit processes, and more
Portability: Reduces lock-in via streamlined transferring of accounts between providers, further crystallizing the need for impactful, personalized treatment to drive enduring customer satisfaction
Authentication and fraud prevention: Enables authentication services as well as fraud and money-laundering prevention through cross-institution communication and a full view of customer transaction activity

According to the August 2021 report, there are three foundational elements to an Open Banking system. The first is a set of common rules for Open Banking industry participants. These rules ensure that consumers are protected, and that liability rests with the party at fault. Second is an accreditation framework and process that allows third-party service providers to enter an Open Banking system. Lastly are the technical specifications that allow for safe and efficient data transfer and serve the established Open Banking policy objectives.

The big upside to Open Banking

While Banks must exert significant effort to participate in Open Banking, there is a compelling value proposition at stake for Banks who get it right. Leaders stand to realize improved margins, reduced costs, and new revenue streams. By utilizing a broader, fuller set of customer information, Open Banking makes it possible for Banks to benefit through:

Supercharging of internal processes: Make banking operations easier, faster, better; e.g., improving credit-risk assessment through automated risk monitoring and data-enhanced credit adjudication processes to enable quicker time-to-yes and optimal pricing differentiation, or more efficient and effective fraud and compliance processes through comprehensive transaction monitoring.
Fortifying and enhancing existing offerings: Maintain and improve competitiveness; e.g., improving offer take rate by moving from segment-based offers to truly personalized offers, boosting customer engagement and reducing retention through right-time, -place, and -message communications, and improving prospecting through better targeting.
Developing new B2B or B2C offerings: Monetize new data, new services, or internal competence; e.g., productizing internal competencies, such as fraud-as-a-service, for sale to other businesses as unique offerings.

Planning for the future, Banks can begin to leverage their assets in the present, specifically transactional data. Advanced banks are already setting up analytically-driven processes for client outreach through preferred channels, and have enhanced internal processes, such as credit risk and compliance management, with customer data. In doing so, Banks will be prepared for a rapidly changing digital-banking environment as new, broader data sets become available.

Capitalizing on the complete customer picture

Today, Banks have a limited view of their customers, with each Bank having only pieces of the puzzle. It is not uncommon for a customer to have checking accounts with multiple Banks, credit cards with other Banks, and loans with other various organizations. As such, Banks can often only infer activity and customer preferences based on a limited, albeit powerful, view of the customer.

Open Banking breaks down these informational barriers, enabling a single, consolidated view of customer data — especially high-powered transactional data — to be shared across all Banks and providers. Higher quality, more comprehensive customer data turbocharges Banks’ analytic engines, enabling them to do things they could not do in the past. For instance, Banks can utilize all of a customer’s transaction streams to deliver a truly personalized customer experience.

With the release of Open Banking, coverage of key customer data elements will increase.

While retail customers are typically the focus of Open Banking discussions, it is essential to note that these changes also apply to SME customers within Business Banking divisions. Data leveraged for these customers has historically been limited to financial statements and indicators, with high-powered data, such as transactional information, often being underutilized. Open Banking presents the impetus to change this, enabling Banks to capitalize on improvements both now and in the future as their SME customers’ data profiles expand.

However, the expertise to fully utilize key customer and behavioral data needs to be developed over time, which is something BCG GAMMA has been helping clients with around the world. Specifically, Banks need to learn how to leverage all available customer data including and, perhaps most importantly, behavioral, and transactional data, to produce new, advanced solutions. This experience has led to SmartBanking.AI, a data and analytics platform specifically built to easily execute best-in-class use cases on top of the full suite of banking data. (Read here for a more detailed description of SmartBanking.AI and how it works.)

To illustrate how Banks can capitalize on a fuller customer picture, imagine these three scenarios:

A customer who loves to travel has a checking account with Bank A. The customer has a credit card from Bank B that is specifically designed to reward frequent travelers through a high travel-spend multiplier. Bank A is unaware that the customer uses this card and may not think to offer the customer a special promotional offer to sign up for its own travel-rewards card. With Open Banking, Bank A can choose to offer its own optimal card based on the customer’s transaction history, significantly improving the value proposition for the customer.
A small restaurant chain needs a loan to finance a new location. It submits a simple loan request consisting of the amount and term to its Bank and, with Open Banking, finds that its loan is instantly approved. Behind the scenes, the Bank leverages an up-to-date, full view of the chain’s finances. Funds are immediately disbursed to any account of the restaurant chain’s choosing, which are connected without the need to input account and routing numbers. This rapid approval process would apply just as readily to individual customers.
A customer is behind where he needs to be on his path to retirement. Today, Banks rarely have complete views into customer investments and can only infer how well they are positioned for retirement. With Open Banking, this customer’s Bank — or his new provider — could review his complete profile and proactively help him meet his goals. This might include assisting him in reducing excessive spending or moving investments into lower-fee instruments. Or the Bank could offer him a loan to reduce his high-interest debt.

Needless to say, the number of opportunities is large and will continue to grow as new companies innovate on top of the Open Banking infrastructure. BCG can help harness this opportunity by working with clients to identify areas of opportunity, analyze the potential value, prioritize opportunities, and drive efforts to prepare for Open Banking.

Capitalizing on the complete customer picture: Client examples

While not an absolute, the more data fed into a model, the better the model performs. This is the common thread for all the data and analytics opportunities related to Open Banking: more data enables better outcomes. The following examples help to illustrate how BCG and BCG’s clients are actively developing the capabilities to capitalize on Open Banking.

Example 1: Improved product recommendations for retail customers

Data improves modeling outcomes by allowing better prediction of customer behavior, such as whether a retail customer would take an offered product or not. Improvements can be driven by more comprehensive, high-powered modeling datasets (à la Open Banking), as well as by more advanced modeling techniques.

The illustration below shows how more data and advanced models led to the improved delineation between customers who are likely to take a product offering and those who are not. This ability to delineate enables Banks to perform more targeted and, hence, more successful marketing. It also enables Banks to target customers with ads and/or offers for only those products each customer is interested in, thus helping to reduce marketing noise.

Advanced modeling on comprehensive data sets enables highly precise targeting of products to customers — Open Banking will deliver even better performance

Example 2: Earlier, better detection of credit risk events for SME customers

Another way data improves modeling is by making it possible to detect key customer signals both earlier and with fewer false alarms. Similar to the previous example but using data from SME customers, this improvement can be driven by advances in both data and modeling techniques.

Capitalizing on the combination of a full, internal-only customer data profile and advanced modeling techniques, Banks can detect adverse client credit events up to 18 months earlier than the existing, expert-based modeling approach. As can be seen in the illustration below, predicted creditworthiness deteriorates slowly over time rather than dropping precipitously over a month. This gives Banks significantly more time to treat customers effectively, allowing the Banks to shift from reactive to proactive measures while also ensuring cost-effective servicing by relationship managers and maximizing the effectiveness of credit risk reviews.

Data and analytics provides clients with significant lead time to treat customers based on relevant signals

Example 3: Transaction and account data key, but not the entire picture

When evaluating the predictive power of each data category, account and transaction data have the highest predictive power, even when data coverage for customers is not 100% complete. Thus, extending the coverage and the completeness through Open Banking will provide a solid boost to predictive power. Beyond this, other data categories synergistically drive a performance boost above and beyond that of account and transaction data alone. Expanding completeness and coverage of these additional categories will also contribute to a boost in predictive power.

Expanded data coverage through Open Banking will drive more powerful modeling performance

For BCG’s clients, every ounce of predictive power is essential. Better models mean, among other things, more streamlined credit processes. This, in turn, drives operational efficiencies, faster time-to-yes for credit decisioning, and more precise risk assessments for more attractive offer pricing for customers: a win for both Banks and customers. Banks should look for ways to leverage Open Banking to drive a significant modeling boost through expanded completeness and better coverage of shared data categories.

Countries actively pursue Open Banking standards

Banks in the UK are currently operating under one of the world’s most advanced Open Banking initiatives. UK regulations require Banks to cooperate with third-party providers, follow strict accreditation processes for certifying new providers, and implement technical infrastructure to facilitate sharing among all involved parties.

This initiative was not implemented overnight. A roll-out of these regulations was beset by numerous delays. In fact, six of nine UK account providers failed to meet the initial launch deadline and had to receive government extensions to qualify. This did little to inspire confidence among a public already wary of online privacy. The UK is not alone when it comes to roll-out delays. Australia and Brazil experienced similar delays when attempting to roll-out the first wave of their Open Banking initiatives. But three years later, the UK has a growing marketplace of Open Banking-enabled apps and services (over 109 at the time of writing) and has experienced continued growth in user adoption and confidence. Clearly, there is much to learn from the UK’s Open Banking initiative.

Within the EU, Open Banking initiatives are currently regulatory-led. The EU’s Payment Services Directive (PSD2) mandated that, beginning in 2019, all EU Banks had to make it possible for customers to securely share their account information with other financial service providers. To date, Mastercard’s Open Banking Tracker shows that nearly 500 third-party providers have, in accordance with PSD2, registered to provide national regulators in the EU with consumer account information or payment services. Meanwhile, EU regulators continue to move ahead, with a review of PSD2 having begun earlier this year (2021). The final review will have significant implications for the current requirements, as well as for PSD3. Participants should expect a broader feature set from PSD3, such as more customer accounts (credit cards, loans, and more), non-financial industry data, customer authentication, fraud detection, and/or new contactless payment methods.

Within the US, Open Banking is industry-led. With no current regulations requiring Open Banking, many financial institutions are stepping forward on their own to make data available via API. Some also allow third parties to facilitate various actions from customers’ accounts, such as transferring money. However, unlike the UK, there is no one single US data-sharing specification, with the closest being the Financial Data Exchange (FDX). FDX is an industry-led, non-profit standards body operating within the US and Canada whose goal is to accomplish many of the same objectives as Open Banking.

Lack of data-sharing specifications has opened the door to many innovative FinTechs, such as Plaid, that are seizing the opportunity to act as a data-exchange layer, facilitating easy, secure access to multiple providers. Other FinTechs provide interesting new products for consumers and small businesses. Mint, for example, offers products for financial monitoring and planning, while Venmo enables effortless P2P payments.

Canadian Banks: Act quickly to shape Open Banking’s future

Canada is currently experimenting with a mix of regulatory-led and industry-led activities. It was regulatory-driven activities that produced the above-referenced final report on Open Banking, which stipulates that consumer data sharing infrastructure must be in place by January 2023.

At the same time, multiple industry-led initiatives are active within the Canadian market. Symcor hopes to build upon the success of COR.IQ, the fraud product it has successfully deployed in partnership with TD, RBC, and BMO. Meanwhile, numerous Canadian Banks, including the top five Canadian Banks (RBC, TD, Scotiabank via Tangerine, CIBC, and BMO) have joined the FDX.

Given the significant overlap of objectives between Canada’s industry- and regulatory-led initiatives, Canadian Banks must act quickly to develop an action plan or risk wasting substantial time and effort and potentially losing market share to upstart FinTechs.

It is imperative that Canadian banks come together to develop and present a refreshed Open Banking strategy that addresses the final report and ongoing activities. Canadian Banks must clarify how the report and industry-led initiatives will dovetail to enable the smooth and timely roll-out of Open Banking services within Canada. While taking a cooperative approach to rationalizing their efforts, these Banks must specify a clear regulatory response and compliance plan — and assume a prominent role in shaping Canadian banking’s future. Furthermore, these Banks must provide a clear answer to the question of coopetition between Banks, service providers, and others for the proposed solutions.

Canadian Banks must propose their own version of Open Banking, including a clear set of standards, a governance model, an accreditation process, an organization to manage and coordinate coopetition, and the technical specifications for secure data exchange. This version of Open Banking will likely result from blending existing initiatives and the August 2021 final report.

Ultimately, the path Canada chooses will determine the range and type of opportunities available as a result of Open Banking.

Significant work ahead to prepare for Open Banking

Consumer expectations for more digitally enhanced and personalized banking services are proliferating. As such, Canadian Banks must quickly assess their existing internal capabilities to make sure they are ready for Open Banking. They must ensure that several key capabilities are in place if they are to effectively capitalize on this emerging trend. These include:

An operating model, organization, and capabilities to deliver on the final plan at scale and pace
Solid technology infrastructure and capabilities to support scalable, data-intensive APIs — along with the software development capabilities to iterate and improve over time
A rapid innovation and commercialization capability to develop and implement data-driven Open Banking use cases such as personalized product offerings, customer retention management, and enhanced credit risk tailored for retail and SME customers
A fully analytics-empowered organization within the retail and business Bank with business processes targeted and optimized for the use of analytics to drive customer satisfaction and Bank operating effectiveness
A bolder, faster approach to developing partnerships, ecosystems for aggregation, and other use cases as required

A timeline for Open Banking winners

With 2023 set as the target date for launching Canada’s initial Open Banking capabilities, Canadian Banks have little more than a year to prepare. Given this abbreviated timeline, the next three months will be critical. Within this challenging time frame, Banks must accelerate their own actions to develop necessary capabilities and assert their role in shaping the country’s future Open Banking model. We propose the following 18-month high-level timeline to get started:

First 3 Months: Accelerate and expand Open Banking enablers

Set up core business processes and technology infrastructure necessary to ensure success, drive decisions fast, and deliver on no-regret actions.
Build out core data and analytics infrastructure required to enable key use cases.
Develop an initial set of key customer-focused use cases, covering both retail and SME, such as cross-sell/upsell and retention.

Following 6 Months: Innovate and incubate

Build out the regulatory, data-exchange utility, aggregation, and technology capabilities required at pace.
Continue developing key analytics use cases such as credit-risk monitoring.
Continue building the critical analytics capabilities, such as omnichannel orchestration and experimentation tracking, necessary to experiment at scale.

Final 6-plus Months: Commercialize and Implement

Prepare for launch, complete with commercially viable and ready-to-deploy solutions.

It is not hyperbole to state that the outcome of the next 18 months will shape the structure of the Canadian banking industry for years to come. Structure drives conduct, which ultimately drives performance, which, in turn, will shape the industry’s future. Banks that have not come up to speed quickly enough stand to lose market share to Banks (and FinTechs) that have. The winners will be those Banks that move quickly, invest aggressively, and focus on regulatory-compliant commercial outcomes.

Canadian Open Banking Initiative Presents Sterling Opportunity for Banks was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

E2E Planning: What comes after the Bottleneck Economy?

Daniel Sack — Wed, 24 Nov 2021 15:50:43 GMT

Think demand planning during a period of constrained supply is difficult? Just wait for the period that follows. When supply once again exceeds demand, as it surely will, statistical forecasts trained on constrained-demand data will no longer work. How then will companies accurately forecast the future?

By: Daniel Sack, Olivier Bouffault, Marcel Sieke, Rajesh Shetty, and Slava Bazaliy

The holidays are upon us, and we can expect a continuation of markets in which demand exceeds supply. But what happens when supply chain constraints are relaxed? CPG, Fashion and Luxury, and Retail companies need to start upping their game so they can accurately plan ahead for when supply exceeds demand, for that day will surely come. And when it does, those companies that have not prepared themselves will lose sales and customers. For example, CPG players may find themselves deprioritized by e-tailors due to poor service levels or lose customers who switch brands and/or stores when the products they want are not on the shelves. This is even more relevant today, given large channel shifts in many categories from brick-and-mortar to e-commerce. Accurate forecasting is essential to manage the high volatility in demand in the years to come. BCG’s PLAN AI is designed to give companies the advanced tools they need to look ahead with confidence.

Planning when “true” demand is not easily measured

Modern end-to-end (E2E) planning frequently relies on decision-support and planning-collaboration tools offered in off-the-shelf software suites. These tools use sophisticated statistical forecasts and, often, machine learning to construct demand forecasts. The forecasts are then used to generate recommendations on quantities to produce or buy and where to allocate them, and to support other key decisions.

Prior to COVID, consumer brands had a long-standing bias toward overproduction and overbuying. This bias was driven by a myriad of internal factors such as growth aspirations, incentive structures, and process design. While this often resulted in unsold inventory and lower full-price realization, it did provide an easy training ground for statistical forecasting tools built into the off-the-shelf planning suites.

As supply is increasingly constrained, actual sales diverge from “unconstrained” demand signal which demand forecasts attempt to predict.

Decisions to overbuy and overproduce result in more supply than needed to meet demand, which means that “true” demand opportunity in the market is more easily observable, and stockouts are relatively limited. Statistical forecasts are capable of “uncensoring” or “unconstraining” historical-demand signals, but that becomes increasingly difficult as the proportion of stockouts gets closer to 100%. Our current supply-constrained period means that statistical forecasts will have insufficient training data to generate these unconstrained demand forecasts.

Statistical forecasting models included in most off-the-shelf planning software suites underestimate actual sales when trained on supply-constrained historical sales data — the more constrained, the greater the difference between predicted and actual.

At present, the inability to create accurate forecasts is less important since most companies are opting to produce/buy as much as possible to make up for pandemic-induced shortages. But going forward, once supply chains are again able to fulfill ordered quantities, produce/buy decisions will become far more important. The statistical forecasts underpinning these recommendations will be derived from planning tools that have been trained on systemically constrained demand data. They will, therefore, underestimate demand, which can lead to underbuying, underproduction, and tremendous amounts of uncaptured value. This, in turn, will result in out-of-stocks and lost sales, which will only slow the process of generating the “observations” upon which these statistical models depend — and further impede the recovery.

Building accurate demand models

In periods when the “unconstrained” data typically available during periods of overbuying and overproduction are absent, companies must draw on a more diverse and timely data set to generate recommendations. Adding data from external sources that may have correlations with demand, but are not supply constrained, can give valuable insight into what the “true” demand–the “counterfactual” — might have been. Complementing those external data sources with more real-time internal data, such as signals from short-term demand sensing models, can also bring additional value and speed to decision making.

One of the most powerful sources of demand planning information is embedded institutional knowledge — the human intelligence that resides among people whose job it is to make important decisions such as buying quantification, assortment localization, allocation, or replenishment. This intelligence, which companies have not typically been able to leverage, can be extremely useful when systematically captured and fed into machine learning models. In our experience, in fact, data generated from institutional knowledge are among the most, if not the most, predictive features in statistical forecasts.

Data scientists must do several things to leverage human intelligence fully:

Revisit the machine learning underpinning the statistical forecasts, putting in more explicit unconstraining mechanisms and allowing for explicitly injecting additional assumptions by a human planner.
Incorporate more granularity (particularly on the time dimension) to allow for more accurate extrapolation after stockouts, including the possibility of considering different future-demand scenarios.
Integrate the richer set of data mentioned in the previous two points.

Alternatively, data scientists can embrace Bayesian methods to build bespoke models enabling easier adjustments of macro assumptions.

PLAN-ing for the future

And they can turn to BCG’s PLAN AI. The PLAN AI framework addresses these three points, uses Bayesian methods to transform the end-to-end planning processes — and automates two critical tasks. First, it integrates with BCG Lighthouse, a high-frequency data and analytics platform that captures real-time consumer and market-level signals to anticipate future demand. Lighthouse can introduce into the planning process any number of diverse, external datasets to help forecast uncensored or unconstrained demand. Second, it leverages custom-made BCG products such as our patent-pending Ranking Application to integrate human intelligence into planning software and the algorithms on which they are built. With the two resources built into PLAN AI, BCG can help companies conduct highly accurate demand forecasting regardless of the supply/demand balance.

In a perfect world with perfect data, this forecasting would be automated by the out-of-the-box features of off-the-shelf planning tools. But, alas, the world is more complicated than that. Especially in times of great market volatility, the algorithms that drive demand forecasting must be tailored to the specifics of your business and must incorporate the most relevant external data and as much human intelligence as is available.

Here in the real world, where business challenges always seem to outpace the technical solutions we devise to solve them, PLAN AI — supported by BCG Gamma’s proven ability to customize forecasting algorithms — can provide a high level of confidence and flexibility companies need to successfully plan for the future.

In the meantime, and until supply chains return to their former strength, BCG can assist in “de-bottlenecking” the supply side by:

Simplifying product portfolios using our value-driven algorithms to optimize product mix and help companies move to the next step and create optimal produce/buy plans
Working closer with company suppliers and taking the integration of supply constraints to the next level in supply plan scenarios
Strengthening demand shaping by matching price and price/promo information with the supply and inventory situation to grow market share

Companies that begin now to strengthen their demand-planning capabilities will be able to accurately match supply to demand once supply returns — regardless of biased input data. In doing so, they can look forward to both reducing waste and maximizing value.

E2E Planning: What comes after the Bottleneck Economy? was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.

Solving Complex Supply Chains with Adversarial Optimisation

Kelvin Hsu — Wed, 17 Nov 2021 08:02:36 GMT

Decision making in complex systems often involves search spaces of an explosive number of possibilities and outcomes. By “letting trade-offs fight to equilibrium”, insights that reveal ways to optimise these systems start to emerge.

By Kelvin Hsu, Thomas Sandeman, and Artem Vladimirov

Optimising large supply chain networks is an essential part of reducing inefficiencies and capturing missed opportunities — and it can potentially generate or salvage tremendous value over the entire system of scale. Companies may cut costs by shoehorning off-the-shelf optimisers to complete this critical task. But these readymade products often fail to grasp the underlying network dynamics and, instead, treat the system as a black box to be optimised using a top-down objective. Solutions obtained this way are rarely insightful, are often inefficient, and can take longer to complete as they struggle to navigate large, complex search spaces.

For more than 20 years, BCG has been a pioneer in the field of simulation, including the development of digital supply chain twins. In this article, we share how we developed an “adversarial” approach to optimise supply chains for a steel producer by driving trade-offs inherent within the network dynamics to equilibrium.

Fighting to equilibrium

In complex systems, decision making often calls for the careful balancing of trade-offs. It requires searching through a staggering number of possibilities and outcomes, and navigating through countless intricate dynamics that determine how, subject to various uncertainties, individual components may interact over time. The process of balancing trade-offs can be particularly challenging when optimising supply chains in operations-centric industries such as steel manufacturing.

Given their complexity, finite systems cannot avoid creating trade-offs. One such trade-off might be physical, as in the need to balance steel mass conservation constraints throughout a supply chain. At other times, it might involve the need to balance priorities from competing production demands. Insights into how these trade-offs interact can be uncovered using a bottom-up approach that simulates the larger system to highlight the interplay between constituent parts. By taking an adversarial approach that involves “letting trade-offs fight to equilibrium”, these insights can reveal ways to optimise the system. Solutions obtained in this manner are often easier to interpret and faster to reach because of the way they explicitly incorporate the dynamics of the problem. This can make the difference between being able to arrive at an approximate solution or, given practical time and resource constraints, never arriving at any solution at all.

A case for optimising inventories for contract exports

Typically, our clients prioritise supplying the domestic market, exporting excess volumes only “as available”. We refer to these exports as spot-exports. These spot-exports serve as a release valve for any remaining excess capacity at various points of the supply chain. Given the uncertainty from domestic demands of each steel product and capacity levels for each production asset, these excesses can vary from week to week.

Spot-market prices and margins are typically much less than those achievable through long-term contracts. This presents an opportunity to lock in higher-margin contract exports, then utilise inventories throughout the supply chain to buffer volatility.

The challenge is to determine the right monthly contract export volume for each available steel product and the month-by-month inventory targets across each point of the supply chain. This is made more complex when a supply chain is:

· Constrained: The supply chain network is distributed across the nation with point-of-production constraints.

· Dynamic: Events and decisions in previous months affect those in future months.

· Stochastic: There is uncertainty from volatilities in production capacities and domestic demands.

Abstracting supply chain dynamics for simulation

A simulation approach allows us to capture the stochastic nature of demand and supply. Determining the right level of abstraction is key to balancing two primary concerns that often pull in opposite directions because of the need to:

· Capture enough intricacies so that solutions can be truly useful

· Make assumptions and simplifications so that problems remain feasible

To structure the simulation, we broke the supply chain into generic components and created rules that defined how each component should interact with the others. This allowed us to construct a highly configurable supply chain across multiple echelons.

We used two core objects, production assets and demand handlers (green boxes), to describe the supply chain. The objects are characterised by physical or priority rules that govern the monthly flow of steel mass through the supply chain.

To abstract one level further, production assets need not correspond to physical machines. Instead, they are characterised by the type of output they create. This means that each asset’s available capacity is the sum of the available capacities from the part of each and every physical machine that produces its specific characterising output. This one-output-per-production-asset simplification allowed us to further map inventory points one-to-one to these abstracted production assets. We then placed these points at the output of each asset (white boxes).

Capacity uncertainties are still allocated to physical machines so that statistical independence remains across physical machines instead of across the abstracted production assets. As such, the production of each asset at each point in time depends on the combined capacity and working time available to each machine, which also factors in scheduling and planned-maintenance impacts.

Each of these assets is responsible for letting its upstream know how much feed it requires. Each asset must consider any feed requirements from its own downstream, along with its own demands (domestic and export) and inventory targets.

Demand handlers are characterised by the type and priority of products whose demand they handle, as well as by the production assets they point to as suppliers. The handlers are responsible for distributing demands as production requests to each supplier at the start of the period, and for fulfilling demands at the end of the period — with the resulting products at the right priority ordering. To capture both long-term forecast errors and short-term demand variability, we encoded uncertainty separately between annual demand projections and monthly demand variability.

The result was a supply chain represented by a highly configurable and flexible directed acyclic graph. The graph was formed by composing production assets and associated inventory points throughout and ended with demand handlers at its leaves.

With the network structure of the supply chain defined, we could then focus our attention on structuring the execution of the optimisation, as well as on the temporal stages of each simulation. In general, we find it helpful to define abstractions layers early on to help create structure and break down the overall optimisation. The layers include:

1. Optimisation: A single optimisation run is comprised of multiple iterations, with each iteration being a single simulation.

2. Simulation: A single simulation is comprised of multiple playouts under the same initial conditions, with the difference between playouts arising stochasticity from domestic demand volumes and production capacities. (We typically use 100 playouts.)

3. Playout: A single playout is comprised of multiple time periods. The time periods can be either months or weeks. (In practice, we found months to be a more appropriate period.)

4. Time Period: A single time period is comprised of multiple stages.

5. Stages: A single stage is comprised of multiple unit steps, each of which is an instance where a single production asset or a single demand handler performs some action.

In the 4th layer (Time Period), we broke down the supply chain dynamics within each period into distinct stages. In doing so, we paid particular attention to how operational priorities were factored in and how the states that emerged were interpreted. This approach had the added advantage of being easy to pause so we that could stop and interpret each step at any point. The five action stages are:

Stage 1: Pull signal creation: Domestic and contract export demands ask their suppliers (assets) for production.

Stage 2: Pull signal execution: Assets execute planned productions.

Stage 3: Pull signal resolution: Demands take supplies from suppliers.

Stage 4: Push execution: Assets execute with remaining capacity.

Stage 5: Push resolution: Assets remove excess via spot exports.

In the 5th layer (Stages), we paid particular attention to executing unit steps in a way that encoded priorities. Some of these priorities were plain physical requirements. For example, upstream assets had to execute productions first so that downstream assets would have feed to execute theirs. At other times, these priorities encoded business requirements. For example, demand handlers for domestic demands generally should have a higher priority than those for contract exports, or producing feed for downstream products should come first before meeting demands for the midstream product in question.

An adversarial approach to inventory-export optimisation

A simulation on its own can give you only a range of likely results for a given scenario. We needed to determine the optimal levels at which the steel producer’s contract exports would lock in, along with the corresponding inventory targets required to ensure that domestic demands continue to be met. The large number of products and inventory points created a huge combination of options from which to choose, which in turn required a sophisticated approach to finding the optimal configuration.

It was at this point that we used the adversarial approach that alternates between an “attack” stage and multiple “defense” stages. Each attack stage increases the levels of contract exports to lock in through spot-to-contract conversions for each product. Each defense stage increases inventory targets across the supply chain to buffer volatility in demands and capacities. Both stages are tunable in terms of how aggressive or conservative the steps are.

The two types of stages are adversarial in the sense that they work against each other: Raising commitments for contract exports presents a higher risk of not meeting demands, while raising inventory targets leaves less volume for exports. This represents a trade-off. This first stage is an “attack” because spot-to-contract conversations are the direct source of value we are attempting to capture. The latter stages are “defensive” because, by protecting existing demand commitments, they serve to mitigate the side effects of capturing this value.

Before allowing an attack stage to execute, we repeated the defense stages until the constraints were met. Consequently, each attack stage was followed by multiple iterations of defense stages. In essence, we struck at opportunities as they came, but only if they did not jeopardise our bottom line.

What resulted was an adversarial optimisation procedure that rapidly converged to a stable solution, with each optimisation stage or step being highly interpretable in terms of its purpose. By identifying the trade-offs involved in directly driving the source of value, we were able to focus the optimisation on these trade-offs directly and avoid the need to estimate value in dollar margins.

The choice of step size is important. As contract exports can come only from executing spot-to-contract conversions, a reasonable choice of the magnitude of increase in each step of the “attack stage” should depend on the amount of spot exports currently being done. Specifically, we make these choices dependent on the quantiles of the spot-export distribution. The corresponding percentile is expressed as a hyperparameter and endowed with a learning rate of less than one so that convergence will be smoother and without oscillations. Since this move serves to increase the objective, there is no need to take aggressive steps. In practice, we choose the 0th quantile, which is simply the minimum, making these the most conservative steps in spot-to-contract conversions.

Increasing export commitments, however, can raise the risk of higher unmet demands for existing commitments. We know that having higher inventory levels would provide better buffers to meet demand in cases of above-average domestic demands and below-average production capacities. As such, increasing export commitments can serve as a driving force for increasing inventory targets, while also hinting that the magnitude of the increase should depend on the amount of currently experienced missed demands. Again, we make the increase in commitment dependent on the quantiles of the missed-demands distribution, with the corresponding percentile expressed as a hyperparameter. Since this move serves to meet constraints that cannot be compromised, we want to do this aggressively. In practice, we choose the 100th quantile, which is the maximum, making these the most aggressive steps to remove unmet demands.

After the two opposing stages interact for a period of time, they eventually reach a state of equilibrium, via scheduling of monthly inventory targets and export commitments. This is a state where as much contract exports as possible have been squeezed from spot exports for each product — without jeopardising existing demands and commitments.

In fact, because we encode the priorities to ensure that downstream products receive higher priorities during the simulation, we are able to capture another type of opportunity in addition to spot-to-contract conversion: Product upgrades. We were able to move more products further downstream in the supply chain than previously possible by supplementing feed bottlenecks with additional inventories. Steel products produced further downstream also fetch higher premiums.

Lessons in effective abstractions and heuristics

Effective abstractions enable useful representations of dynamical systems for simulation and scenario generation. Heuristics then exploit known system characteristics and dynamics to quickly reach sensible solutions. Together, by forming an in-depth, bottom-up understanding of system properties, abstractions and heuristics make solving a large and complex problem both feasible and practical.

For physical systems such as a wide network of supply chains, there are often highly interpretable heuristics that can be leveraged to push the search in the right direction and promptly arrive at a solution. Additional effort may be needed at first to understand the operational dynamics and driving sources of value in the system. However, by uncovering these heuristics, this early investment pays off in the practicality and interpretability of the resulting solution.

We learned numerous lessons from this experience:

1. When the system dynamics are complex, it is key to balance the level of granularity and detail for the simulation. On one hand, it can be useful to identify the natural, foundational “atoms” of the complex system and view all subsequent dynamics as derivations from a minimal set of logical rules that describe how these atoms interact with each other. On the other hand, modeling at this level can be prohibitively expensive or impractical. Instead, it is often worthwhile to formulate the right level, instead of treating the system as a black box into which an existing technique must be shoehorned.

In our case, we were able to simplify and make feasible the problem space by first identifying that the type and volume of mass flowing throughout the system each month was enough to provide a practical recommendation for inventory scheduling.

2. If a decision can be made “without regret”, the approach can be simplified by incorporating the decision directly within the simulation. We can often make these kinds of decisions with the help of a few mild assumptions. If there are no complicated trade-offs involved in a decision, it may not be necessary to include them as part of the optimisation. We can encode some of the necessary conditions for the optimal solution in the way we choose to simulate the system.

In our case, these conditions were encoded via the ordering of our five action stages. For example, we assumed that, all things being equal, downstream products fetch higher margins than upstream products. Under this assumption, the system should have been pushing steel as far downstream through the supply chain as physically possible — but only after existing demands were met. Importantly, this was one of those decision we were able to make “with no regrets” in that, once existing demands were met, there was no other reason (except physical feed constraints) to prevent us from making this decision. This is unlike performing spot-to-contract conversions or increasing inventory buffers where we know there are secondary consequences that can be calculated only with an optimisation procedure. Therefore, as long as we accept the assumption, it does not matter how much higher the margin is: We will always be better off making this decision.

3. When the search space is large, top-down specifications of a final objective can be inefficient or difficult to solve for. While each individual component and step of the simulated dynamic supply chain is relatively simple, the combined system can be quite complex. If we were to treat the system as a black box and apply an off-the-shelf generalised optimisation algorithm, it would very likely take many iterations to converge and be difficult to debug or interpret. This is because the algorithm would have limited problem-specific guidance for searching over the solution space.

In our case, if we were to specify a top-down objective over which to optimise, such as the expected profit margin, we would not have been able to easily obtain gradients of the objective with respect to the decision variables. These variables included inventory targets across the supply chain and the contract export volume to commit for each product. Importantly, forming the financial objective itself requires another layer of estimation regarding the margin from both domestic and contract export sales. Furthermore, it requires that stochasticity caused by other external factors be captured throughout the year.

4. Thinking about effective abstractions and heuristics when facing a new problem can reveal useful properties to reason about the system. This not only helps produce results efficiently but can produce useful insights from the way the solution is reached.

In our case, both the attack and defense stages corresponded respectively to what a decision maker would do directly to pursue higher margins and separately protect existing commitments. We can let the optimisation determine the exact trade-off between the two stages. We make the important directional decisions and let the system balance itself out. Furthermore, because the drivers of each stage and step of the optimisation are clear, it becomes an intuitive process to tune exactly how aggressive or conservative they should be, or to spot where any imbalances, bottlenecks, or even opportunities may lie within the system.

5. Coming up with tailored optimisation strategies can be effective, insightful and — last but not least — fun! Simulating a system from the ground up raises all sorts of curious questions about the core properties and scenarios one must capture. Optimising it further requires iterating (with human discussions) on the properties of a good solution before boiling it down to smaller steps (the computer iterations) to get there.

The problem-formulating process is as dynamic as the problem itself and brings great satisfaction and joy when it eventually leads to an effective and insightful solution.

Balancing abstraction levels as a critical problem-solving trade-off

In this article, we have shared how we formulated an adversarial approach to balance difficult trade-offs. Once formulated, algorithms would be able to figure out what the correct balance should be. However, there remains another critical trade-off that algorithms cannot necessarily help us with but is, perhaps, even more pressing to balance correctly: Finding the right abstraction level across a wide spectrum ranging from formulating a generic abstract framework to developing a domain-specific solution.

Decision making in complex systems often involves search spaces of an explosive number of possibilities and outcomes. On the top-down extreme, we can invest efforts in constructing a value objective to be optimised under a general-purpose optimisation algorithm. These solutions may benefit from the simplicity and clarity in what the optimal solution should look like, but may lack the in-depth knowledge of the problem at hand needed to solve it efficiently. On the bottom-up extreme, we can choose to use highly granular and detailed agent-based or even physics-based simulations to model the problem. These solutions may benefit from a forward model that most clearly replicates reality, but may also require so much detail that the truly important aspects of the problem are obfuscated.

Good solutions require a careful balance within this spectrum. Throughout this effort to formulate a domain-specific solution for a large supply chain network, we worked with domain experts in the steel manufacturing industry to understand where the major levers are, what the important drivers are, how critical factors influence the system, and which level of detail the solution should capture. It is this collaboration between industry and functional experts that is critical in determining the abstraction level at which the problem can best be solved.

Solving Complex Supply Chains with Adversarial Optimisation was originally published in GAMMA — Part of BCG X on Medium, where people are continuing the conversation by highlighting and responding to this story.