The complex challenge of demand forecasting for business

Authors: Yann Hendel & Olivier Kahn

Demand forecasting is critical to businesses across almost all industries. It can seem easy, because there are easy ways to build simple models. But in practice, building a demand forecasting model that is accurate and useful is a complex challenge.

As an illustration, below are four types of models, each with more complexity than the one before it.

Exhibit 1: Four demand forecasting models, each of which is more complex than the one before it

Most companies are doing (1) or (2), and the most advanced ones have implemented (3). But very few are actually working on (4), which is where the true value can come from. Indeed, with each use case, there are specific tools that need to be used. An off-the-shelf solution will never yield the level of accuracy desired.

Choosing the right algorithms

When demand forecasting is not critical to the business, simple models will suffice. But when the lion’s share of operational decisions made are based on it, having a robust and accurate model is key. Among the industries where demand forecasting is business critical are:

· Retail: to determine the amount of particular product that should be available and at what price, via which channels, etc.

· Industrial manufacturing: to decide how much of each product should be produced, the amount of stock that should be available at various points in time, when maintenance should be performed, etc.

· Travel and tourism: to assess, in light of available capacity, what price should be assigned (for hotels, flights), which destinations should be highlighted, what types of packages should be advertised, etc.

Traditionally, demand forecasting has largely been done using time-series algorithms. Such techniques make use of signal extrapolation whereby trends, seasonality, and cycles that occurred in the past are used to forecast demand in the future. This method works well when the forecast horizon is short-term, when the demand has some stationarity properties (i.e. trends and cycles can be identified), and when the number of variables that influence demand is limited.

However, if those conditions are not met — for example, if the forecasting is done too far in advance — as long as the demand is product-based (quality, price, etc.) or location- or people-based (store, occasion), then the use of feature-based algorithms, which rely on an abundance of data, is preferable.

The use of feature-based algorithms has been made possible, in large part, by four factors:

· The development of feature-based algorithms such as random forest, gradient boosting, and deep learning, which take advantage of data quantity to increase predictive power

· The creation of advanced sequential and “signal extrapolation” models such as recurrent neural networks, long short-term memory, which can take in more features and perform more granular analysis than simpler models, improving the accuracy of outcomes

· The emergence of new data lake technologies such as Hadoop and Spark, which allow for efficient storing and processing of enormous amounts of data at a reasonable cost

· The availability of better tools, such as Python, as well as access to libraries (most of which are open source) and the online community that builds and supports them

A demand forecast model is not something one can build blindly. Before diving into the algorithms, the data scientist should first determine what it will be used for (e.g., buying optimization, store allocation, pricing optimization, etc.). This rigorous definition of the business case at stake is needed to make sure the right granularity, cost function, and expected accuracy will be applied.

Moreover, there is no single demand forecast model that will work across use cases. Let’s look at two very different use case examples, examine where and how they differ, and which demand forecast model should be used for each.

Example 1: Specialized retailer

Let’s first consider a fashion accessories retailer. It plans to launch a number of new products — among them a new line of watches and a new line of handbags — and so needs to order the right amount of goods from its suppliers some three to six months in advance, the lead time needed to manufacture them and ship them out. Each product has a planned lifetime of between three and 12 months (see Exhibit 2).

To get an accurate demand forecast, the retailer can rely on past point-of-sale or online transactions of similar products. It has multiple data points for each of those similar products, among them the time and date when each purchase was made, the specific products that were purchased, the channels through which the purchases were made, whether or not a discount was applied, and any related advertising campaigns. The retailer also has data pertaining to each product’s characteristics (e.g., price, color, material, look and feel, etc.). So, the retailer has a lot of buying context from which it can extract information to potentially create a very accurate demand forecast.

Exhibit 2: Timeline for a fashion accessories retailer’s product launch

Data science translation

Because the retailer needs to order up to six months in advance, it can only order once, as products rotate quickly. As such, the model needs to output one single value that encapsulates the total demand for each product’s lifespan. It doesn’t need to be precise on a more granular level, such as weekly or daily.

Because the retailer knows some of the feature values at the time the orders are made — such as placement and product characteristics — they can be used in the forecast to predict demand. For those feature values that are unknown at the time of ordering, such as advertising campaigns, it can either build individual scenarios or use averages, for example by assuming a product will be, on average, advertised during a certain period of time.

The retailer expects high substitution among the products; in other words, the demand for one product will depend on the number of other, “similar” products available at the same time. It could subsequently decide to create such a feature (i.e., number of similar products available at the same time) by building a similarity measure among the products in order to learn how it has impacted demand in the past.

Elements of the solution

If the retailer wants to predict one single value per product, the use of regression analysis is appropriate. Depending on the type of data, (e.g., numeric or text), volume, collinearity, etc., it can leverage generalized linear models, random forest, gradient boosting, or neural networks.

For its similarity measure, the retailer can either use the existing product categories or build a clustering model. Or it can try both and see which one is the most significant.

The retailer expects some non-stationarity, as certain products can be trending. Hence it can build an overall trend prediction model based on external data — for example by learning the relative sales of a certain type of product based on the number of tweets mentioning it — and use it as another feature in the final model. Or it can build a separate model and stack the two together.

For the optimization part (forecasting how much of each product should be ordered), the retailer knows that the demand will not be perfectly forecasted. So, it needs to either account for a specific strategy — we can buy a bit more, or we can buy a bit less — or build a stochastic optimization model to decide how much to purchase based on the cost of under- and over-buying. (See Exhibit 3)

Exhibit 3: Data science solution workflow

Example 2: Steel manufacturer

Let’s now consider the case of a steel manufacturer that produces several types of products for large clients in the automotive, construction, and packaging industries.

Each of the manufacturer’s products has its own specifications (e.g., grade, thickness, purity), but they share the same raw material input and some similarities in the production chain. It has several assets along that production chain tasked with various processes, such as hot rolling and galvanizing, but each of the assets has a constrained capacity.

The manufacturer’s most profitable clients expect a quick turnaround when they place orders, which is much faster than the lead time needed to actually produce those orders. As such, the orders have to be anticipated and scheduled in a smart way. (See Exhibit 4)

Exhibit 4: Three stages at which demand forecasting is needed to optimize a steel manufacturer’s supply chain

Data science translation

Because the manufacturer has clients from different industries, whose demand for steel is largely driven by their own commercial activity, it makes sense to use a time-series model that includes their specific business drivers, such as new car production, new construction projects, and seasonality.

Also, because the necessary lead time is long while the order delays can be short, a demand forecast needs a long time horizon — one at least equal to the lead time.

While the products are different, some share similarities regarding production, hence the demand has to be forecasted at different stages: both demand for final products as well as demand for each of the intermediary products.

And finally, because the production capacity is constrained, demand cannot be forecasted by directly linking to actual past sales. Rather, past demand has to be extracted from the order book.

Elements of the solution

For each client and each product, the manufacturer will want to forecast a time series of weekly demand that will cover an entire year, with less accuracy needed over time.

The time series’ main drivers will be signal features (e.g., trend, seasonality) and external drivers; for example, when it comes to automotive clients, external drivers would include the growth of the automotive industry as well as those clients’ own financials. The models the steel manufacturer could use would be ARIMA or LSTM, maybe stacked with a feature-based regression model for other drivers.

Since the different products share some similarities, it would also aggregate each of the products’ demand at each step of the production chain along with the capacity of each asset in order to understand what order volume would be feasible.

Then, given the demand probability for each product and each client, the corresponding price, the current stock, the current maintenance plan, and other potential business constraints, it could perform production optimization to decide which material should be produced (including intermediary products), what levels of stock will be needed, which orders can be filled and which orders have to be delayed or refused, etc. (See Exhibit 5)

Exhibit 5: Demand forecasting for a steel manufacturer


As we’ve seen, demand forecasting is a large class of complicated analytical problems, where multiple models can be used depending on the business context and needs. Here the role of the data scientist and their ability to translate an organization’s business needs into data science, pick the right algorithms, and implement them correctly is key.

That’s because demand forecasting is a complex challenge, one that requires significant knowledge in order to determine which of the various models can and should be used for the situation in question. Among them:

· Sequential models (e.g., ARIMA, LSTM)

· Supervised regression and classification “point-matching” models (e.g., generalized linear models, random forest, gradient boosting, neural networks, SVM)

· Potentially NLP or image processing, if the input data is in a non-structured format

· Optimization models to translate the demand forecast into actual decisions (e.g., linear programming, genetic algorithms, stochastic optimization)

Indeed, there is no single demand forecast model that will work across use cases. Which is why when demand forecasting is business critical, the role of the data scientist is, too.