Forecasting amidst Covid19: Improving forecasts with Product Segmentation and Pandemic Model

Arpit Jain
4 min readJun 5, 2020

--

by Arpit Jain (Arpit.Jain@sas.com) & Johannes Trummer (Johannes.Trummer@sas.com)

In our previous post, we addressed about the big forecasting challenge faced by the retailers all over the world due to Covid19 pandemic. To combat the challenge, we first analyzed the forecasts obtained by applying general, most commonly used time series (ARIMA) and machine learning models on all five products. The results obtained were irrational and pragmatically far from reality. In this post, we’ll present our findings on how product segmentation forecasting and the inclusion of a pandemic model can effectively handle Covid19 like events and thus, significantly improve the forecasts. In addition, we‘ll address the limitations and assumptions we made while setting up the forecasts.

Applying segment-specific modeling strategies to every product

Illustration 1: Applying Machine Learning to Destatis Data, with Segmentation. Forecasting: SAS Visual Forecasting. Visualization: SAS Visual Analytics. Actual values are colored as grey line, ARIMA models in orange, Panel series neural network in green and Stacked model (ARIMA + Neural network) in blue.

For each of the five products, we applied a segment-specific forecasting model. The generated forecasts reflect meaningful short-term new-normal levels, and in long-term reaching again the pre-crisis or normal levels. It is noticeable here that the forecasts for disinfectants and flour would generate a demand for zero products. The saw-tooth pattern for disinfectants repeats itself and ends in a sellout for this product. For flour, a short zero is reached and settles back to the normal level before the crisis, whereas the demand for toilet paper and soap remains far above the pre-crisis level. The demand behavior for flour can be explained by Short Term Pantry Loading, which is also observed in many other crisis (https://www.bain.com/insights/chinas-retailers-and-the-coronavirus-outbreak-lessons-from-the-past/). In this case, private households stock up on unusually large quantities of flour to cover their needs for an unusually long time when the crisis occurs. This leads to the fact that in the following period the demand for flour decreases tremendously, because it is covered. Only after the stocks created have been used up do households begin to buy the product again in normal quantities and demand levels settle at pre-crisis levels. This effect cannot be seen in the forecast for the products soap and toilet paper, although they show similar developments in the actual sales data. In order to take such effects into account more effectively, we included additionally a pandemic progression model.

Pandemic model as an indicator for future demand

In our forecasting models, we included “pandemic” as an indicator for future demand. Here, pandemic means the number of active coronavirus cases in Germany up to the time of the forecast has been considered and, in addition, this information has been supplemented for the future with a pandemic model that predicts the number of coronavirus infected cases. In addition, we complemented our models by adding relative change in active coronavirus cases over a time period as another independent variable. This helped the forecasting models identify pandemic intensity as observed during different phases of the pandemic.

Illustration 2: Applying Machine Learning to Destatis Data, including Pandemic Model in addition to product Segmentation. Forecasting: SAS Visual Forecasting. Visualization: SAS Visual Analytics. Actual values are colored as grey line, ARIMA models in orange, Panel series neural network in green and Stacked model (ARIMA + Neural network) in blue.

Our forecasts suggest that adding a pandemic indicator on top of product segmentation added great forecast value and generated realistic and meaningful forecasts. We see that the predicted demand for the various products reaches back to the pre-crisis levels, adhering to the speculated end of the pandemic era. There are still further peaks in demand for flour or disinfectants, but a normalization to the pre-crisis level can be seen for the further course of the year.

Based on the MAPE score, we again observe that the machine learning models especially, the stacked model (machine learning + ARIMA) outperforms conventional ARIMA models in all cases.

Limitations of our approach

All forecasts are based on the sales data of Destatis.de products. These include the sales information of the products between 05.08.2019 and 30.03.2020, whereby 05.08.2019 was set as the reference point with 100% sales volume. To increase the accuracy of a forecast, sales data from a longer period would be necessary. Due to short recent sales history, we defined a holdout sample of only last 2 data points, and therefore adequate validation of each forecasting model is lacking. The MAPE score, widely used statistics to select the best forecasting model, was very close for all models and therefore, the best model was chosen based on business understanding and, in certain cases, gut feeling. In addition, no inventory information is provided with this data and we strongly believe that the demand curve will be significantly influenced by the actual stock level: No stock, no sales! Therefore, including inventory information would be another starting point to increase the accuracy of the forecast of demand quantities.

Conclusion

All findings and forecasts were generated based on existing information. Despite the limitations imposed by the available information, we already showcase some meaningful findings:

- A segmentation of products based on their current sales and demand patterns improves the forecast. Without segmentation, the effects of the pandemic on the respective products is not considered.

- Machine Learning methods outperformed conventional Time Series models in handling short-term demand changes. Creating pandemic specific machine learning models further improves the forecasts.

- External factors like restricted living, lock-down, the number of coronavirus cases or the general trend can further increase the quality of the forecasts.

We are researching other improvements and strategies to add value to the forecasts. Hereby, we want to invite you for a discussion about our methodology and inputs to create more powerful forecasts in the context of the Covid-19 pandemic.

--

--

Arpit Jain

Analytics customer advisor for retail & cpg. Forecasting expert. Passionate data scientist. Chess master.