How to choose the right visualizations to better debug your forecasting models

Brian LOZACH

Published in

Artefact Engineering and Data Science

8 min readSep 9, 2021

The path to developing a high-performance demand forecasting model — Part 3

TL; DR

When dealing with forecasts models, the best approach is often to iterate continuously, adding some data sources, improving feature engineering, tweeking model parameters… Most of the time Data Scientists tends to be fixed to only one KPI (i.e. RMSE, Forecast Accuracy…). There is often lot more information behind those KPIs that needs to be analyzed to improve prediction. Building an appropriate visualization tool is a great mean to deep-dive into the model behavior, spot quickly pain-points of your model, and thus gain in accuracy efficiently.

This article will develop the key questions one should ask him/herself when evaluating a forecasting model, then present must-have visualizations to answer these questions and finally propose a quick implementation under a unified tool to gather all these visualization using Streamlit.

Context

This article sums up what we learned from building a unified visualization tool to help Data Scientists, Software Engineers, Product Owners & Demand Planner (business experts) develop sell-in forecasting models for 10+ business units in a food & beverage company. Our models made forecasts at a Daily x Warehouse x Product level, for the coming 14 weeks. These were developed using boosting methods, and take into account product characteristics, historical sales, events and promotional data.

Key questions to ask when evaluating a Forecasting model

1. Is the model good compared to the baseline?

Having access to current predictions (for instance demand planner forecasts) on the same scope is very helpful. It allows a good understanding of the business behaviour on a specific period, product or location. The more you interview the Business, the more you gain insights, and the more you can implement the right features.

2. Chasing over or under-predictions and drops in Forecast Accuracy

Must have questions are: Do I catch the global trend? Does the model catch known recurrent events like holidays, warehouses closure, school holidays? Do I have some accuracy drops on particular periods?

Why is it important?

If the model over-predicts, it induces stock increase and thus stock costs.
If the model under-predicts, it induces out-of-stocks periods, thus missing sales opportunities and decreasing customer satisfaction.

Spotting such events is a great way to reach efficiently more accurate predictions. These are often well known by demand planners and quite simple to implement in your model when you have the information. For example, in many of our business units, some products were sold for school lunches. Introducing and preparing a feature representing school holidays led to a great increase in our accuracy on these particular periods.

3. Dealing with product specificities

Is my performance homogeneous on my products brands / families? Are there any other distinction between my products (products sold only during promotions periods, best-sellers vs. low volumes products, products )?

Why is it important ?

To validate the good behavior of the model on the whole scope. Depending on the business need, a minimal accuracy can be required on the whole scope.
To identify critical products, that can be analyzed more precisely, and thus gain in accuracy.

These questions help you understand the business more precisely. For example, splitting models based on the importance of the products in terms of volume often lead to the increase of the performance. Indeed, the demand for regular products is very different from the demand for promotional products or less common ones, which can be highly correlated with promotion periods or have a very sparse sales profile. In most of our cases, we trained distinct models to address those different types of products.

4. Do the constitutive effects are correctly taken into account?

Is the model correctly capturing promotions effects? Are there cannibalization effects? Does the model adapt well to exogenous phenomena (ex: strikes)?

Why is it important?

In most of the cases, promotions are an important part to drive demand and can lead to great spikes in sales data.
It can lead to big waste / out-of-stocks phenomena.
A more realistic model taking into account sophisticated effects will more likely be adopted by your future users.

Please feel free to refer to the previous article of our Forecasting Series to tackle promotion data: 5 tips to better take promotional data into account

How to analyze your forecasting model : from macro-KPIs to the evaluation on a specific scope

What are the must-have visualizations?

To build your evaluation tool, you must combine two elements :

A filter part to evaluate the performance on a specific scope
A set of visualizations to quickly spot improvement axes

The filter part must allow to filter on several axes : period of analysis, locations (retailer, warehouse …), products (a set of products), and finally product categories.

We recommend at least the 4 following visualizations:

First, the evolution of predicted volume, with the information on the real sales. This is the most comprehensible one, and the first to look at. It helps you understand your model: is my model catching the global trend? Is it over or under-predicting? Do I catch the spikes and drops?
Second, the evolution of the Forecast Accuracy. Depending on the way the forecast accuracy is calculated, this figure may be necessary to complete the first one. This figure helps you spot quickly pain-point periods in your forecasts, and thus tells you on what period you should deep-dive.
Third, some well represented KPIs to compute your Forecast Accuracy on different scopes. We recommends splitting your accuracy on different levels : warehouses, product category levels, products, and even to mix those analysis axes (for instance to create a heatmap of the forecast accuracy on each warehouse x product categories). Again, it helps you find critical localisation or products.
The last one, is the contribution evaluation. To understand precisely how your model works, you need to evaluate most used features. Classical TS models (ARIMA & Co., Prophet, …) propose feature decomposition natively. For boosting methods (XGBoost, CatBoost, LightGBM…) frameworks like SHAP are very useful to precisely model behavior across each feature. Representing those contribution on the periods helps you evaluate which phenomena is driving your forecast at what moment.

How to start building your forecasting studio dashboard using Streamlit?

Streamlit is an open source python library to create shareable web apps in minutes and is still gaining popularity across Data Science community. In this article, we will not present the tool as many articles are already available on this topic, but we will focus on the easy implementation of one visualization.

The choice of using Streamlit for this type of project was motivated by several pre-requisites :

Streamlit makes a MVP dashboard very easy to set-up.
Fully integrable in a Data Science project. Since your Streamlit pages will be written in Python, you will be able to use your core project’s functions. For example, if you have developed a lib to access your data-preparations, reference tables, and predictions, you will be able use them directly in your dashboard code.
Shareable to many users, by exposing your dashboard on a port of your remote machine or deploying it on solution like App Engine, Cloud Run …

1. Global structure of your forecasting dashboard

Before diving into the implementation, the prerequisite in building a dashboard is to draw the parts of your application.

To be clean in your implementation, you can divide your code into several parts:

The core dashboard that you can launch with streamlit run forecasting_studio_app.py
The distinct pages (simple EDA on your training dataset, forecast analysis, feature contribution,…)
Lib scripts to gather your data-preparation / figures for each page

Hierarchy of your repository

The main page will set-up the global structure of your app, i.e. the global layout of your app : in our case, a wide layout, and a sidebar to display the name of your app and the available pages.

Root page of your application, where you can add some pages

The configuration file, that we gather the colors and different configs for all your Plotly figures.

Configuration file used to normalize all your visualizations

2.Building “Evolution of Forecast Accuracy visualization”

First we will gather our Plotly figures in one script :

Your figure used to track your Forecast Accuracy

We will finally implement our forecast analysis page, that will loads our dataset, distributes our figures on different columns, …

The page that will gather all your figures

Tadam ! Here is our app :

Let’s take a look at the presented figure. Regarding the previous points we discussed, here are the key points of our figure :

First, it allows us to spot quickly weeks for which our model is not accurate: December seems to be a complicated month to forecast. The end of our backtest period is also a problem for our model. We need to discuss with the business to understand which effects are ruling those weeks, and adapt the model consequently.
The figure has also a parameter called “events”, a list of pair of dates, representing global events, and allowing us to highlight events such as school holidays. We can quickly spot that our worst week is a holiday week. Perhaps some warehouses closed during this period, or customers ordered more than usual to prepare the beginning of the year... Again, business owners can surely add an external eye on those weeks.
As you can see that two series are displayed. In our case, it represents the accuracy of our model and Demand Planners’ one on the same period. Regarding the figure, we can see that the worst weeks are also complicated weeks for Demand Planners. The greatest drop in April has been well forecasted by our model. The end of the backtest period seems more complicated for our model, while demand planners’ accuracy seems stable.

Conclusion :

EDA is key when evaluating your forecasting model
Group all your figures in a unique dashboard allows you to focus on iterations, and thus gain in efficiency
The dashboard allows you to focus on key scopes, spot axes of improvement quickly.
You have just implemented the core structure of your dashboard, go ahead and add new figures !

Once you have build your visualization tool, it is time to deploy it. Here is a great ressource to share your app : How to deploy and secure your Streamlit App in GCP?

Thanks a lot for reading up to now, do not hesitate to reach out if you have any questions. You can find more about our projects by visiting our blog.