Gousto R-series Vol 2: Tackling the Cold-Start Problem in Recipe Recommendation Engine

Published in

Gousto Engineering & Data

10 min readNov 22, 2022

As mentioned in the previous post in Gousto’s R-series, customers at different stages of their journey at Gousto have different needs and hence have been served differently by our recommendation engine. It could be a prospective customer who is still deciding whether or not to try out Gousto, a customer who has just purchased their first box and is still exploring what we have to offer them, or our most loyal customers who’d like to open their app every Tuesday at noon to see what surprises Gousto bringing in that week.

In this volume of the R-series, we’ll go a bit further into the details about how we have tried (sometimes failed) and eventually solved the cold-start problem for customers who have just ordered their first Gousto box using Rouxcommender, our recipe recommendation engine.

*Acknowledgement: I’d like to thank many Goustonians who helped in the battles against the cold-start problem, notably: Sheng Chai, Ross Warren, Jonny Baker, Agnes Garoux, Dana Fonareva, Barry Pace, Kelly Batchelor, Nick Koronka and other former/current members of Turnips.

But firstly, what is the cold-start problem?

Simply put, a cold-start for a recommender system happens when the model doesn’t have enough data about either new customers or new products to make a decision: to recommend or not?

For Gousto, as we are a subscription-based business, our customer base grows very quickly, and as a result, our recommender systems suffer from a user cold start, i.e. we don’t know much about customers’ tastes in their early stages at Gousto.

Another challenge is the product cold start, as we update our menus and recipes very often. Each week’s menu is unique and massively different from the one last week. However, in this post, we will cover the user cold-start problem first.

Why is it important?

Retention: at Gousto, we all believe that business value can only come from great customer experience. Ensuring our customers have the best experience in choosing recipes can help to keep them using our products longer, not to mention that this early-life customer segment makes up a big portion of our customer base.

Improve trust in our recommendations: for customers who have made their first order, Gousto has a bespoke section called “Chosen For You” that displays the top 15 recipes recommended by our Recommender System. In our user research interview, many times we realised that the customers don’t choose from this section although the recipes they finally chose do come from the top 15 recommended recipes (displayed in multiple sections). A lack of trust due to early bad experiences with the Chosen For You section could be one reason leading to this paradox.

Non-choosing customers: around 8–10% of our customers each week don’t browse the menu and simply trust our recommender system to deliver the best boxes of recipes for them. The issue is that the majority of customers in this group are in their early stages, therefore without solving the customer cold start problem, this group of customers could receive a sub-optimal selection of recipes (I did once receive a box of 2 burgers while forgetting to choose in my early boxes, but let’s leave this for a future blog post).

What have we tried?

Recipe Battles

We tried to collect customers’ taste preferences right in the beginning: the idea is to ask customers after they ordered the first box to join a small game called recipe battles. They would be asked 5 questions (battles) to choose between 2 contrasting recipes (say between an American Burger vs a Steamed Fish). We collected customer feedback from both Web and iOS platforms and did some offline evaluations (a demo of how it works is below).

Recipe Battle Demo [credits to Rianne McCartney for this]

The data from recipe battles did help our recommender system to know more about the customers, but interestingly, it didn’t contain as much information as the first order itself. This led us to the second iteration, Rouxcommender Jnr.

Rouxcommender Jnr

We already described Rouxcommender Jnr in the previous blog post but can recap here. We built an item-based recommender system that recommends recipes relevant to the first box of our customers. By relevance here we meant using the wisdom of the crowd, ie “customers who like this also like that”. The visualisation below shows how Roux Jnr works. At this point, we can generate personalised recommendations for the customers just over 1 second after they made their first order.

This approach is essentially the classic item-based collaborative filtering and hence suffers from the exact same problem: product cold start. As Gousto menus change week-on-week, new recipes didn’t have interaction data, e.g. they had no past orders before and hence weren’t recommended highly to our customers.

This explains the bump in the middle of the the chart below (the bump appears for new recipes). The X-axis presents the recommended rank for the recipes and the Y-axis shows the corresponding uptake of the recipes at that rank for a sample of our customers.

Recipe Uptake by Personalised Recommendation Rank — Rouxcommender Jnr.

The solution

Okay, we tried a few things and got certain amount of success, but still did not quite solve the problem. However, we collect quite a few important insights from these attempts.

Implicit data is more useful than explicit data, not just because its coverage is higher (all customers have at least an order but less than 25% of them are willing to fill a survey), but also because of the nature of this type of data.
A box is enough: previously we used to represent our customers by their ids, and the job of the recommender system was to position these ids correctly so the customers would be recommended relevant recipes based on where they are in our customer map. We also know that a big proportion of our weekly boxes are unique (60+%), and this is excellent news for a recommender system as we were able to represent a customer by the boxes they ordered.

From this we made 2 changes to our recommender system, moving from Rouxcommender-V1 to Rouxcommender-V2.

1. Changing customer representation

We define a customer by their data-self instead of their id. Previously, Rouxcommender V1 tried to learn the customer id embeddings (a numeric representation of our customer, like coordinates in a map) based on what they ordered in the past. So every time a customer buys something (or doesn’t buy something), we change their id position on the map so that a group of similar customers will stay close together.

However, to be able to do this, we need to maintain a big list of all customer ids (as we don’t know who will order next week) and this list grows very quickly, in line with Gousto’s customer base. We did some analysis and reduced this list to a stable yet manageable number of customers who we are highly confident will order next week (from 1.6 million to around 500K weekly). However, this approach also suffers another, bigger, challenge: how do we know the new customer’s id to generate recommendations for them?

The solution is to represent a customer by their behaviours. The behaviours could be purchasing, ie what they have ordered in the past, ratings, likes/dislikes or viewing. For now, we only focus on the ordering patterns but will explore other types of customer interactions in future.

We also rebuilt the model architecture to enable such customer representation from recipes using the bi-encoder pattern, similar to what was proposed in AWS Sagemaker’s Object2Vec. Our left encoder is a bi-RNN model that captures customers’ previous behaviours as sequences of recipe embeddings, and the right encoder is the combination of content-based (green) and behaviour-based recipe embeddings (yellow). The reason why we used content-based recipe embeddings in the right encoder is to make sure that it doesn’t suffer from a product cold start. All recipe embeddings were pre-trained and frozen to take advantage of transfer learning. A high-level model architecture is captured in the diagram below.

Model Architecture for Roux-V2 (illustration purpose only)

This model architecture has helped solve the problem of “unknown id” customers. Any customers now with at least one order could be served very relevant recommendations. This also helps reduce our training time massively as we don’t need to store/fine-tune a massive list of customers, and hence training time has been reduced 5 times.

However, the next question is: how quickly could we update the recommendations for the customers?

2. Synchronous recommendations

Since Rouxcommender-V1, we are still running in batch mode, meaning that for each week, we generate recommendations for all the customers that we know for that week. This has some disadvantages: firstly, customers who have just joined this week might not get the most up-to-date recommendations until a week later. Similarly, existing customers who made some orders this week wouldn’t have their recommendations updated. For example, if it is January and I’d like to try a month with Veggie recipes, the system wouldn’t be able to update its recommendations and I would still see lots of Fish Currys or Chicken Pad Thai.

I had a quick chat with Ross, our lead software engineer: “Ross, the model is ready but we can’t get the best of it without going real-time” and then that was the first time we used the terms “Synchronous recs”.

So indeed we went real-time! There were some concerns at first as it might be too risky for our menu experience, but we agreed on 2 principles in mind:

Low latency: model latency needs to be as low as possible, e.g. should not be more than 400 ms. We now have a latency of around 120+ ms.
Robustness: we would have a good backup in case the endpoint fails or timeout. In this case, we can roll back to the previous recommendations or the default recommendations (Gousto Recommends).

It is easier said than done, and the team has spent quite a lot of time getting it to work (and work well), but we are very glad now that we can update our recommendations for the next few menu weeks within less than a second after the customer made an order.

The winter is over!

In June 2022, we implemented the model, namely Roux-V2, and did an A/B test with Roux-V1(the deep collaborative neural net model).

The result is a big surprise to us: not only that the new Roux-V2 model helps solve the cold start problem for early box customers, but it also improved the relevancy of recommendations for our existing customers.

The graph below shows how the customers across the board benefited from the new model. The Y-axis represents mean basket match, i.e. % of recipes in a customer basket coming from the top 15 recommendations. This is the metric that captures recommendation relevancy. The X-axis shows customer tenure, represented by the number of boxes they have purchased.

Basket Match at top 15 by Box Number and Model

Even better, besides better customer experience, the A/B test on Roux-V2 has also become one of the most successful tests commercially in Discover, the Gousto tribe focusing on digital products.

In addition to solving the cold-start problem, Rouxcommender-V2 also show some nice bonus benefits:

It shifts the whole basket instead of just the first recipe, meaning that the recommendations are now more diverse and better meet the customer needs. The graph show the distributions of recipes added-to-basket order by recommendations rank for both models. As you can see Roux-V2 (blue) shifts all 4 recipes in the basket to the left (lower rank) compared to its predecessor.

Distribution of recommendation rankings by model and add-to-basket order

Training time is 5 times faster (and cheaper) due to simplifying customer embeddings (now represented by sequences of recipe embeddings instead of the customer features such as ids)
Most importantly, Roux-V2 has helped open a new window of opportunities: we are now able to extend the model and adopt many SOTA deep learning models, as well as experiment with new sources of data (ratings/likes) much easier.

What next?

Things go very fast at Gousto. By the time I had a chance to write up this blog post about Roux-V2, we’d already built and tested the next iteration of Rouxcommender, which gave us an even better performance (yes, I know: the speed of writing is slower than the speed of coding ;)).

In the next part of our series, we’ll continue the journey of Rouxcommender by helping them learn the customer tastes in a much more efficient and effective way.

Did someone say something about the transformer?

Yes, let’s talk to Optimus next!