Best Practices for Demand Planning: Forecasting Models Review

Nicolas Vandeput
Sep 8, 2020 · 7 min read

The article below is a summary of one of my LinkedIn posts. If you are interested in such debates, let’s connect! I would like to thanks the following people for their insightful remarks in the original discussion: Timothy Brennan, Chris Davies, Valery Manokhin, Leonardo Cabrera, Charlie Kantz, Karl-Eric Devaux, Spyros Makridakis, and Dyci Manns Sfregola.

Supply chain demand planners often ask themselves: What is the best model to forecast my demand? What are the best practices I should follow to improve my forecast?

It is impossible to give a definitive, absolute answer to those questions. There no silver bullet model that would be the best for every single company for every single product. But I will try in two articles (this is the first one, you can find the second one about forecasting process here) to review most of the existing forecasting models and then show you tips, tricks, and best practices for tweaking and selecting them. This first article will review forecasting models and see how they perform to forecast supply chain demand. Forecasting demand is nothing like forecasting electricity consumption, airplane passengers, or online connections. Supply chain demand datasets are often short — 5 years of history is already a lot — and especially volatile. It means that not all forecasting models will be appropriated.

For each model, I will shortly explain how they work (see ⚙️), their pros and cons (✅/❌), and advise further reading material (either books📖 or academic papers📄) with a link to the document, if available.

Source: Data Science for Supply Chain Forecasting 2nd Edition. You can pre-order it here.

📈 Statistical Methods

Exponential Smoothing

In supply chain demand forecasting, exponential smoothings (ETS) are kings. In my experience, most current supply chain forecasting tools rely on some variation of exponential smoothing to forecast demand.

⚙️Those models forecast demand components (level, trend, and seasonality) by updating them slightly after each demand observation.

  • Pro: easy to understand, implement, and interpret. Flexibility with additive and multiplicative seasonalities.
  • Con: Difficult to add external features. Not able to forecast new products.

Exponential smoothing models are often called “Holt-Winters,’’ based on the names of the researchers who proposed them. An early form of exponential smoothing forecasting was initially proposed by R.G. Brown in 1956. His equations were refined in 1957 by Charles C. Holt, a US engineer from MIT and the University of Chicago. The exponential smoothing models were again improved three years later by Peter Winters. Their two names were remembered and given to the different exponential smoothing techniques that we sometimes call “Holt-Winters.’’

This is the model that we use at SKU Science (an online platform for demand planning — free to try), and that I generally advise for anyone starting with demand planning.

📄 Brown, R. (1956). Exponential smoothing for predicting demand

📄 Holt, C. C. (1957). Forecasting seasonals and trends by exponentially weighted moving averages.

📄 Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages

📖 Rob J Hyndman and George Athanasopoulos Forecasting: Principles and Practice

📖 Nicolas Vandeput Data Science for Supply Chain Forecast


ARIMA models are often used by academics and forecasters to forecast time series with plenty of historical data. In my experience, I never saw ARIMA giving accurate results (within a reasonable computation time) for supply chain demand datasets.

⚙️ARIMA models use multiple linear regressions of historical values and errors to predict the future. SARIMA is the seasonal version of ARIMA, and ARIMAX can use external features to make predictions.

  • Pro: Can use external variables.
  • Con: Long history needed, computation time is usually (very) long. Not able to forecast new products.

📖 Rob J Hyndman and George Athanasopoulos Forecasting: Principles and Practice


Croston models (and its later variations: SBA and TSB) were created to forecast intermittent demand. I wrote an article about Croston models (available here), and discuss how to deal with intermittent demand in another article. In short, as they do not add much (if any) accuracy compared to simple exponential smoothing, I would not recommend using Croston models to forecast intermittent demand.

⚙️Croston models are close to exponential smoothing. They learn both the probability of having some demand and the demand level, if any.

  • Pro: Interesting concept of demand probability and level.
  • Con: Does not deliver good accuracy compared to simpler models (such as simple exponential smoothing). Not able to forecast new products. Not able to take external features into account.

📄 J. D. Croston Forecasting and Stock Control for Intermittent Demands

📄 Ruud H. Teuntera, Aris A. Syntetos, M. Zied Babai Intermittent demand: Linking forecasting to inventory obsolescence

📄 Nicolas Vandeput Forecasting Intermittent Demand with the Croston Model


In 2014, Nikolaos Kourentzes proposed a new forecasting technique:
Multiple Aggregation Prediction Algorithm (MAPA).

⚙️His idea can be summarized as:
1. Aggregate the time series with various temporal hierarchies (monthly, quarterly, half-year, etc.)
2. Generate a forecast for each temporal aggregation
3. Use a disaggregation technique to transform all those high-level temporal forecasts into a single unified temporality (weekly or monthly).

  • Pro: usually more accurate than exponential smoothing, especially for intermittent products.
  • Con: difficult to interpret, longer running time, challenging (and time-intensive) optimization, damped seasonality, usually (slightly) negatively biased. Not able to forecast new products. Not able to take external features into account.

📄 Nikolaos Kourentzes Improving your forecasts using multiple temporal aggregation

📄 Nikolaos Kourentzes, Fotios Petropoulos, Juan Traperob, 2014, Improving forecasting by estimating time series structural components across multiple frequencies

🖥️ Machine Learning

At the core of many machine learning models lie decision trees.

⚙️Decision trees make predictions based on input features by asking consecutive yes/no questions. I explain their basic working in my article Machine Learning for Supply Chain Forecasting (if you are not familiar with decision trees, I advise you to read this article first before going further).

Many models were then created based on ensembles of trees. Ensemble models are making a prediction using multiple sub-models.

Ensemble #1: Bagging Models

The most famous ensemble model is the forest.

⚙️A forest creates a hundred(s) trees and averages their respective predictions (this technique is called bagging). The trick is to populate different and accurate trees. Generating different trees is done by restricting each decision node’s available features, randomly shuffling the training data set of each tree, and — for the extremely random tree model — only allowing random feature splits.

  • Pro: straightforward method.
  • Con: not so accurate.

📄 Tin Kam Ho, 1995, Random decision forests

📄 Pierre Geurts, Damien Ernst, Louis Wehenkel, 2006, Extremely randomized trees

📖 Nicolas Vandeput Data Science for Supply Chain Forecast

Ensemble #2: Boosting Models

⚙️Boosting models do not average multiple sub-models. Instead, they train sub-model one after another based on the overall model mistake so that each new sub-model (usually a tree) is specialized in correcting the current errors.

Those boosting models are currently considered best in class and used by many data scientists.

  • Pro: (very) good accuracy.
  • Con: more challenging to optimize than bagging methods. Needs lots of data.

They were initially created in the late 1990s with AdaBoost, then refined with Gradient Boost (2001), Extreme Gradient Boosting (2016), and Light Gradient Boosting (2017).

📄 Yoav Freund, Robert E Schapire, 1997, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

📄 Jerome H. Friedman, 2001, Greedy Function Approximation: A Gradient Boosting Machine

📄 Tianqi Chen, Carlos Guestrin, 2016, XGBoost: A Scalable Tree Boosting System

📄 Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu, 2017, LightGBM: A Highly Efficient Gradient Boosting Decision Tree

📖 Nicolas Vandeput Data Science for Supply Chain Forecast

Deep Learning

Since the mid-2010s — thanks to new optimization models and more computation power — we saw the rise of deep learning models. The first artificial neurons were imagined in the 1940s, and the first neural network was physically implemented in the late 1950s.

⚙️Neural networks are created by stacking multiple layers of artificial neurons. Neurons are simple units: they sum up inputs (usually the outputs of previous neurons), process the sum in an activation function, and outputs the result.

  • Pro: (very) good accuracy.
  • Con: Need lots of data.
Source: Data Science for Supply Chain Forecasting 2nd Edition. You can pre-order it here.
Source: Data Science for Supply Chain Forecasting 2nd Edition. You can pre-order it here.

Different kinds of neural networks were developed. Notably, Long-Short Term Memory networks (LSTM) specialized in natural language processing (NLP), and Convolutional neural network (CNN) specialized in image recognition.

LSTM for demand forecasting? The international M4 forecasting competition was won by a hybrid model using LSTM and exponential smoothing. But the M4 dataset was not your usual supply chain demand dataset: it contained long, stable time series. In her master thesis, my student Lynda Dhaeyer, shown that LSTM couldn’t beat ‘simple’ feed-forward neural networks on an actual supply chain demand dataset.

📖 Nicolas Vandeput Data Science for Supply Chain Forecasting 2nd edition.

About the Author

Nicolas Vandeput is a supply chain data scientist specialized in demand forecasting and inventory optimization. He founded his consultancy company SupChains in 2016 and co-founded SKU Science — a fast, simple, and affordable demand forecasting platform — in 2018. He enjoys discussing new quantitative models and how to apply them to business reality. Passionate about education, Nicolas is both an avid learner and enjoys teaching at universities: he has taught forecasting and inventory optimization to master students since 2014 in Brussels, Belgium. He published Data Science for Supply Chain Forecasting in 2018 (2nd edition in 2021) and Inventory Optimization: Models and Simulations in 2020.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Nicolas Vandeput

Written by

Consultant 📦📈 Author: 📙Data Science & Forecasting, 📘Inventory Optimization

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Nicolas Vandeput

Written by

Consultant 📦📈 Author: 📙Data Science & Forecasting, 📘Inventory Optimization

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store