Forecasting: Unlocking Value In The Supply Chain

Egor Howell
Gousto Engineering & Data
7 min readJun 26, 2023

How forecasting can aid in resource planning

Meme by author.

Background

How many data science buzzwords do you know? Let me list my personal favourites: AI, deep learning, machine learning, large language models, and neural networks. One thing you don’t hear mentioned a lot is forecasting, which is arguably one of the more powerful fields as it enables you to predict the future.

Forecasting = Predicting a Future Event or Trend

At Gousto, we have a tribe/squad structure, and I am lucky enough to be part of the forecasting squad! And in this post, I want to discuss the purpose and the questions we answer in forecasting.

I am also actually part of the optimisation squad :)

Why Gousto is Unique

The key difference between Gousto and other organisations is that our product, the recipes available, change every menu week. On some menu weeks, there may be a katsu chicken curry, and on other weeks it may not be available (sad face).

A single menu week runs from Saturday to Friday and customers have until three days before their specified delivery date to choose an order.

As you can imagine, this weekly changing menu creates a challenge in determining the required number of stock-keeping units (SKUs), ingredients, and materials we need for each week to fulfil every order whilst minimising waste to reduce our carbon footprint.

Fortunately, forecasting comes in and saves the day! Through the use of forecasting, data science, and machine learning (sorry for the buzzwords), we can generate predictions that provide the operational, supply, and demand planning teams with the necessary information to allocate resources accurately.

Order Demand

Question

The first question we answer in the forecasting squad is:

How many customers will order a box?

If we know how many customers will order a Gousto box for a given menu week, we can effectively plan our factories, labour, and other resources.

We also break this total order volume forecast into:

  • Daily level forecast
  • How many boxes with 2, 3, 4, and 5 recipes
  • How many 2 and 4-person boxes

Answer

So, how do we predict the order volumes for future menu weeks? We use the traditional time series forecasting model of ARIMA. ARIMA stands for AutoRegressive Integrated Moving Average and is a combination of three components:

  • AutoRegressive (AR): Forecasting future order volumes through a linear combination of previous order volumes.
  • Integrated (I): Order of differencing the time series to make it stationary. Stationarity is an assumption by most forecasting models, including ARIMA.
  • Moving Average (MA): Instead of using the actual observed values, this component utilises past forecast errors to aid in predictions.

These components are then linearly coalesced to form the ARIMA model:

Equation generated by author in LaTeX.

Where:

  • y’: The differenced time series, which is obtained by taking the difference between adjacent observations in the original time series.
  • ϕ: The coefficients of the autoregressive components (lags), which correspond to the lags of the previous menu weeks’ order volumes. They determine the scale of impact of certain past order volumes on the forecast.
  • p: The number of historical weekly order volumes we use in the model, which is the number of autoregressive components.
  • ε: The forecast error terms, typically normally distributed.
  • θ: The coefficients of the lagged forecast errors. They determine the influence of past forecast errors on the current prediction.
  • q: The number of lagged error components we include in the model, which is the number of moving-average components.

To be honest, don’t worry too much about all this mathematical details (I certainly don’t), it is just here for completeness. The key takeaway is that we forecast future order volumes by some combination of past observed order volumes.

If you want a more detailed explanation of how ARIMA works, you can check out my previous blog on this model here.

In reality, we also incorporate additional exogenous variables into the forecast to improve the prediction. These variables include seasonal effects, bank holidays, and wider Gousto events. By considering these factors, we further refine our forecasting model to take into account any odd occurrences that happen throughout the year.

Oh and one last thing, we call this model the Order Volume Forecast (OVAF)!

Recipe Popularity

Question

The second question we answer is:

What recipes will customers pick?

If we know how popular each recipe will be for a menu week, we can have good knowledge about the number of each ingredient we will need. For example, if a tofu dish is the most popular choice among customers while a lamb dish is the least popular, it informs us that we should order a larger quantity of tofu compared to lamb. This way, we can meet the demands of our customers and fulfill every order!

Answer

To determine the popularity of a recipe, we approach it as a regular supervised machine learning problem. The target variable of this model is the historical popularity of a recipe (for a given menu week), denoted as P_i:

Equation generated by author in LaTeX.

Here R_i refers to the quantity of recipe i sold (in a given menu week). So what we have is a discrete probability distribution, P(i), with n recipes.

Our features for this problem is anything that can inform us of the popularity ahead of time. This includes:

  • Star rating: Users can rate each recipe on the app
  • Seasonality: Some dishes sell more in winter than in summer, and vice versa
  • Diet type: Vegetarian, vegan, pescatarian, etc.
  • Surcharge: Our “Save & Savour” range costs 50p less per portion!
  • Description word embeddings: Employing a word2vec-like approach for the recipe descriptions
  • Nearest neighbour embeddings: Using word2vec for the nearest recipe

With these types of features and target variables, we have a tabular dataset in our hands. As known to every data scientist, the go-to algorithm for tabular data is gradient-boosted trees. Therefore, we train our data on a LightGBM model. Now, we can input a recipe along with its features, and the model will predict its popularity for that menu week!

I have omitted some other technical aspects, such as making the recipes’ popularity menu week independent and how the final popularity distribution is calculated. While these steps are important to the internal process, they are not necessary for gaining an intuition of how the model is working.

Plot showing the distribtuion of recipe popularity as a function of the number of recipes on the menu. Plot generate by author in Python.

To learn more about our recipe popularity model (RPD), which we call the Uptaker, you can find our talk about it at last year’s data science festival on YouTube here.

Order Simulator

Question

And finally we arrive at our final question:

What will be in a customers box?

Answer

Create synthetic orders!

We can sample recipes from the recipe popularity distribution to make ‘simulated boxes’ and the number of boxes we need to simulate comes from the order volume forecast! This allows us to effectively ‘predict’ (or forecast) what the customers will order for a given menu week, so we can plan our resources effectively.

The general flow and connection of these forecasting models then look like this:

Plot by author.

You might be wondering:

Why is it necessary for us to predict the customer boxes?

The reason is that most customers tend to select their recipes at the last minute, but we need to plan our resources well in advance of that timeframe. Consequently, these ‘simulated orders’ provide us with a guide on what the orders will potentially look like for a particular menu week.

Furthermore, we need to know the contents of a box (the recipes) to send them to the right factory to ensure the order is fulfilled. A factory may only host half of the recipes listed inside a box, so it’s no good sending that order there! This is why we need to forecast at the box level so we make the right choice (optimisation) about where the order will go.

Summary & Further Thoughts

Forecasting is still quite a niche and not a very touched domain of data science, however, it is of high importance to us at Gousto. As our business is unique in that our recipes change every week we need to accurately predict what our customers’ boxes will look like to ensure we fulfill every order.

References

  • If this post has inspired you to learn some forecasting then this book is a fantastic starting point on your journey: https://otexts.com/fpp3/

Connect With Us!

(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA 4.0)

--

--

Egor Howell
Gousto Engineering & Data

Top Writer: DS, ML, AI , Statistics & Optimization. 🎬 https://www.youtube.com/@egorhowell. ---- All opinions here are my own.