Gousto Data Team: Best of 2021
2021 was not quite the year we expected back in January. However along with the challenges we also had some great highs, like seeing our colleagues again (and meeting the many new faces!) at our first data team off-site since the pandemic and finally getting back into our office, if only on a part time basis.
As we approach Christmas we look back proudly at everything we’ve achieved this year, and in this post we will share a few examples.
One of the things we’re most proud of is all the brilliant new data professionals that we’ve hired into Gousto this year. At the start of 2021 we were a team of 31, and now in December we’re made up of 53 data professionals across our three functions.
This post is split into achievements from each of those three data functions:
- Data Science
- Data Engineering
Feel free to jump to the section that interests you the most or read the full article for the complete lowdown.
And if after that, you agree with us that the Gousto data team is the place to be, then check out our current vacancies, and keep an eye out for many new roles in the next year!
Within Analytics we support the entire business with data driven insights and recommendations to drive Gousto forward. Here we reflect on some of these, from the first moments of our customers with Gousto, through to choosing their recipes, fulfilling the box in our factories and even how we give great customer care in the rare occasions there is an issue with their box.
Using Behavioural Science to experiment with our sign up journey
Understanding how prospective customers act when they arrive on the site is key to our success in effectively communicating the benefits of our product. This, however, can be very tricky in practice, since we have such a clear understanding of it and are most likely biased towards the service that we spend our working lives trying to improve.
Creating experiments to test new features allows us to validate the effectiveness of any updates we choose to make. On the downside, coming up with these features can prove difficult when there’s such an obvious difference between us and our customers.
Behavioural Science bridges the gap between us and our customers by providing a framework of universal thinking patterns (in the Western world at least) that we can place our trust in. The way that people instinctively react to external stimuli is surprisingly predictable, and so by using these principles we aim to come up with new iterations of the product that will pull the right inherent levers in our customers’ brains.
For more info, check out this blog post.
Measuring success of recipe collections
Limited edition recipe collections have always proved to be a big success at Gousto, but with the many different shapes and sizes of collections haven’t always been able to quantify how well they’ve performed vs each other — this is until the newly created Recipe Success metrics came into play. These metrics were applied to every collection which gave us a 360 degree view of how well they have performed across a number of key business areas (Brand Activity, Customer interest, Customer uptake etc.).
We have used these metrics to create consistent post-collection decks which allowed us to compare collections side by side regardless of their length of time on Menu or the number of recipes they consist of. Using these metrics, we could pull actionable insights & recommendations from each collection which helps shape future collections at Gousto.
These metrics have already played a vital role in unlocking the value of our collections through clear recommendations & learnings. They have also prompted additional work to look into how we provision for when collections go to market & if we can understand more about whether certain collections perform better at certain times across the calendar year.
Running physical experimentation on our factory picklines
While we have a lot of experience in running A/B experiments on our digital product, in 2021 we made a push to replicate this effort into physical experimentation as well to give us a new avenue to get insights on our data. Unfortunately, this is a lot more difficult to set-up and get reliable results from as there are so many factors that will affect every part of the pickline.
Some examples of this are where we have been swapping round some of our pick station configurations and trying out some new box designs to try and further optimise our packing process and enhance the customer experience.
We’ve now got a standardised guide that anyone can use to run a physical experiment while keeping it as close to an A/B test with an experimental and control group as possible. On top of this, we have a slew of experiments that we have successfully run and many more planned for 2022! This has become a great tool for us to get new insights that we wouldn’t be able to obtain from other types of analysis.
Proving the value of customer care
Customer care for a very long time was viewed as a cost generator where the main optimisation would orbit around trying to cut those costs to a minimum and automate as much of the journey as possible. Although this strategy proves to be fruitful we believe there is another side to the story — customer care being a value generator through improved customer retention.
Probably all of us have stories about an issue that we had with a product or service and how the way it got resolved made us feel about the company that provided it. Some of those experiences reinforce the trust we have in the company and make us stay around for longer, some of them frustrate us and we skip to a competitor that will hopefully resolve our problems in a better way. But how bad must the bad experience be? How often users are doing this and how does it actually impact our retention? Can we quantify it in a way that will give us a sense of opportunity in this space?
All those questions are aimed to be answered with this piece of work and will steer the direction of our customer care efforts and hopefully generate stories for our users about how great Gousto is in solving their issues.
In data science at Gousto, we build algorithms that help automate our decision making using maths and statistics. We are involved in all areas of the business, from predicting what our customers will do next, understanding what recipes they might like each week, to routing boxes around our factories in the most efficient way possible. We currently have 19 data scientists but we’re always looking to hire great new recruits!
Promotions are an important part of Gousto’s marketing strategy and a big contributor to our spend. The promotion optimisation algorithm delights our customers by personalising the offers they receive, and at the same time maximises the return on investment of our marketing efforts.
Promotion Optimisation algorithm looks at customers’ past interactions with our product in order to predict which offer or promotion would appeal most to each individual, and where should we invest our marketing budget. This algorithm is currently powering campaigns aiming to bring back on board customers who have cancelled their subscriptions. It has helped to achieve a significant increase in campaign conversion rate and campaign profitability. Next year, we aim to roll out the algorithm to other marketing campaigns, in order to delight more customers with personalised offers and promotions.
Late 2020, we formed Turnips, a cross-functional squad focusing on personalisation embedded with a data science unit. This year, we have introduced a new, deep-learning based family of recommender models: Rouxcommender Senior and Rouxcommender Junior (inspired by Michel Roux Snr and Jnr) [check our blog post here]. Our aim is to continuously improve customer experience while scaling on choices/menu size. Both models have been A/B tested and saw a massive improvement in customer experience and basket match (the % of purchased recipes recommended by our recommendation engine in the “Chosen For You” collection).
Looking forward to 2022, we have an exciting plan to bring a deeper and wider personalisation experience to more areas of the business, be it on the increasing number of swaps/variants being added week-by-week into our menu, or on how to personalise our boxes for customers with very limited information (non-chooser customers).
Machines only understand numbers. To help computers to learn what recipes are better, we’ve released Gousto’s first generation Recipe Embeddings (numerical representation of what recipes are) which is now being used in many areas across the business, e.g. in our supply forecasting algorithm and to predict performance of new recipes our food developers are working on.
There are also plans to introduce recipe embeddings into our areas such as our recommender system and our menu planning algorithm. We’re now also working on our second iteration of recipe embeddings using the latest natural language processing techniques!
2021 was a year full of changes for Gousto — large menu expansions, multiple new factories and an ever-increasing customer demand. It is an easy realisation that each of these factors introduce changes to our operation, however, understanding these changes and their effects is far less trivial. For example, how much slower would our production lines run if our menu had an extra 10 recipes?
To better understand these changes, we developed the Network Simulator, which simulates our supply chain based on decisions made by our ensemble of data science products. First, we generate a menu before creating a forecast of orders, then we can decide what recipes are hosted at each site, following this we then decide where to put our ingredients within the factory and the best route for each order to take through the factories, finally we then predict throughput.
Not only was the Network Simulator able to give the go-ahead on menu expansion, but we could also use it to spot upcoming stresses to our supply chain, giving us plenty of time to plan accordingly and optimally configure our network of factories.
This year we continued the focus on improving our data platforms while also delivering new and improved capabilities to our users. We ingested billions of rows, modelled hundreds of KPIs, and more than tripled the size of our Data Engineering Team!
In April, with the growing demand for Data Engineering, we split into two squads, Starfruits (focused on operational data ) and Dragonfruits (focused on our core data platform). Scaling the squads meant not only adding a new group of Data Engineers and Analytics Engineers, but many other new roles: hiring some absolutely amazing Business Analysts (one of whom may or may not be writing this), a Lead Data Engineer for Starfruits, BI developers, Delivery Managers and Agile Coaches. We learnt that building a squad is much more than just throwing people at a problem — it requires strong engineering leadership and experience — which we now have even more of!
Building on our Foundations
We’ve continued to leverage Spark Streaming and Delta Lake, and our early investment in building for flexibility and scale really paid off. We went all-in on the lakehouse architecture and migrated some of our oldest (and most brittle!) pipelines to our new Data Lake.
The biggest challenge was migrating our transactional database, where we store all data from Gousto’s websites and applications (order, subscriptions, recipes, users…). It contains 300+ tables!. We were able to successfully migrate the first 100 most used tables, and scaled our pipelines to be able to quickly ingest the rest of them should we ever need to. Replacing our pipelines meant we needed to ensure the numbers still matched, and QA was vitally important. We built tooling to enable us to QA repeatedly and quickly help us identify any issue with any row of data, all run from a single script. This intense QA allowed us to spot and fix multiple issues and made our pipelines stronger and more reliable than ever! Having these tables in our lake is now providing a single source of truth and a much faster queries for our users!
Automated Order Allocation (AOA)
AOA is our new data product which helps us make smart routing decisions to our factories. In order to evaluate the performance of AOA, the Starfruits built an ETL pipeline for AOA data exhaust into the data lake so it can be queried by any of our analytical endpoints (i.e. we can make a product dashboard so users can self-serve the data!). We had to rethink how we processed relational event data so our spark pipelines didn’t explode when dealing with large dimensional arrays of atomic types
We were able to leverage the same pipelines as our other products and this proved investing early in flexibility really pays off!
Our analytics engineers love DBT, and have spent lots of time this year solidifying its place as the analytic engineering tool of choice. Most of our data models and reporting now use DBT and we have built a great developer experience around it. We recently asked ourselves, “why limit DBT to just analytic engineers?” and are now working on the processes which will allow all data professionals — data engineers, analysts, data scientists and more, to self-serve their own high quality productionised data models using DBT. Opening the door to others will have multiple lasting benefits, enabling productivity for data users, while also giving analytic engineers a window into the types of data problems the wider data community are trying to solve!
It’s clear 2021 was a really exciting year in Gousto data and we’re even more excited about what’s to come in 2022!
Thanks for taking the time to read this far, we’d love to hear your thoughts so drop a comment below, and also make sure to follow the Gousto Engineering & Data blog to hear more about what we’re up to in the coming year.
But for now, we look forward to a well earned Christmas break and coming back together in the new year. Merry Christmas!