Reducing Food Waste Through Data Science
A few days ago we had the chance to host a Dataiku meet up at our Gousto offices, during which we discussed three different use cases for forecasting. Giving a talk is always a good opportunity to gather one’s thoughts so I thought I would also write them down in a blog post form! Below you can find a summary around how we do forecasting at Gousto and why we think it’s such a big deal.
Why is forecasting so important to us?
Part of why Gousto is so convenient as a service is the fact that we help our customers minimise their level of food waste through the use of exactly portioned ingredients. While helping reduce food waste in our customers’ homes is already a big step forward, it is also crucial that we consider food waste as a business.
At Gousto, we strongly believe in the need to build a sustainable company. We have many different initiatives including our pledge to reduce plastic by 50% by the end of this year. Alongside these efforts, we work hard on minimising the amount of food waste we create as a business. We buy fresh ingredients and want to make sure these don’t go to waste in our factory. Our forecasting process is a crucial aspect to keep waste as low as possible, but as you can imagine, predicting the future is not an easy task!
With a short lead time of 3 days between orders being finalised and delivery, it is important that we provide our suppliers with accurate predictions of how many ingredients we will need on any given day. In our particular case, challenges come from several places.
On the one hand, our menus are different every week. This means that it is hard to predict how recipes will perform, as it will depend on the rest of the recipes on the menu. We like to describe this concept in terms of a footballer joining a new team — their performance might be very different than in their old team due to the rest of the players’ abilities. A recipe’s popularity will also depend on the season: salads will perform better in summer than in winter whereas stews are likely to be ordered more during the cold season.
The other part of the challenge is to find the sweet spot between ensuring we do not have a large amount of food waste and not disappointing our customers by having sold out recipes. We could easily make sure that all our recipes would always be fully stocked by purposefully overforecasting — but that would produce a lot of waste. We are therefore working on finding the right balance between the two opposing objectives.
So how do we actually forecast?
We need to break the forecast down into three main levels, which are used by different parts of the business.
Box level: the first step to forecast is to estimate the number of orders in any given day.
Recipe level: we then need to break this down into how many portions of each recipe we think we will sell on a certain day.
Ingredient level: finally, in order to contact our suppliers, we need to break the recipe forecast down to an ingredient level.
We also separate our forecasts into short term and medium term. This refers to whether we are forecasting for a day which customers can currently order their box for on the website or whether it is for future menus. The difference lies in the fact that we have more live data available for menus which are currently on the website.
Short Term Forecasting
Order number: to calculate the number of orders in a given active week, we use a technique called cohort analysis. This is based on splitting our customers into groups depending on when they signed up to our service and predicting how many of those will still be ordering in future weeks. We model this behaviour based on what marketing channels those customers were acquired from and which discount strategy they landed on. As the weeks pass, the behaviour of older cohorts (who have been with us for longer) becomes more predictable and our confidence in the prediction increases.
Recipe popularity: recipe popularity for available menus is predicted using the live data we have on what customers that have already placed orders for that menu are ordering. For example, we might see that the ‘Honey Mustard Chicken Tray Bake’ is in almost 8% of our customers’ baskets, whereas the ‘Creamy Two-Mushroom Risotto’ is in under 2% of baskets. This allows to have a pretty good view of how our different recipes are performing and how many of each we should purchase.
Medium Term Forecasting
This is where things really start to get interesting! For medium term forecasting, menus are not yet available on the website, which means that we do not have live data about how many orders we currently have for the week or which recipes people are picking from.
Order number: to estimate the number of boxes, we use a Facebook open-source package called Prophet. This is simple to use and allows us to take different inputs into account. These are:
- General business trend : are we getting more boxes than we were getting a year or two ago? (Yes, we are!)
- Yearly contribution : as you can imagine, the use of Gousto is highly correlated with time of the year. In January, we see a big spike as people come back from Christmas full of New Year Resolutions including being more healthy and having more home-cooked meals. However in summer, kids are off school, resulting in less routine and therefore it becomes harder to fit Gousto in.
- Holidays : finally, we can also take into account holidays. One of the biggest challenges when forecasting is predicting what is going to happen around big events such as Christmas and Easter. People go on holiday and our numbers drop significantly from one week to the next. This tool allows us to introduce dates on which holidays fall on, and it will predict their effect based on what happened in past years.
Recipe popularity: the prediction of how recipes are going to perform is based on the properties of those recipes. For existing recipes (ones which have previously appeared on the menu), we use a simple random forest model, with certain recipe properties, such as dish type and cuisine, as features.
New recipes (ones which haven’t appeared on a menu before) are more challenging. However, we know that similar recipes tend to perform in a similar way, so it’s important for us to be able to work out how similar any two given recipes are.
To do this, we have uploaded all our recipe and ingredient data to the graph database neo4j. This allows us to represent all the relations between all the recipes, ingredients and capture many of their attributes. It is a curated, original data set that we have worked on with teams across the business, including our Food and Digital Product teams. Using this information, we can calculate a similarity metric between any two given recipes, based on how many connections they share. When we have a new recipe, we can find the existing recipes that are most similar to this new recipe and how they performed in previous menus. We have found that this method outperforms the random forest model for newer recipes.
I hope this gives you a flavour of forecasting at Gousto and the type of things we need to take into account when predicting demand. This was a summary of the method, but we think that the way that we have integrated the forecasting algorithm into the wider business is also interesting to discuss. I will be writing about this on another post coming soon!
Until then, I leave you with a fun quote from Niels Bohr:
Prediction is very difficult, especially if it’s about the future!