Gousto Data Team — Best of 2020
Most people are pleased to see the back of 2020 and very few will want to relive it. The Data team here at Gousto are no different. And although 2020 was a challenging year for us, as we did our best to support the ever-changing landscape of Gousto, we’re able to look back proudly at everything we’ve achieved. In this post, the team have shared a small handful of the many projects we’ve delivered in 2020 and whilst some days felt like we made zero progress, this post serves as a reminder that “the best view comes after the hardest climb”.
As we’re constantly hiring, I’m also hoping that this post gives any potential candidates a view into the world of Data at Gousto to aid in your decision-making process.
How to read this post
This is quite a long post so my advice would be to scroll down until you get to one of the three areas you’re most interested in and start there. Of course, if you want to read from start to finish, don’t let me discourage you.
There are three teams in Data here at Gousto:
- Data Engineering
- Data Science
And although we’re a team of 35 data professionals now, we weren’t even half of that at the start of 2020 — thank you to our amazing talent team in helping us get here. That growth itself is worthy of mention but growing a team and delivering value are two different things and I want to focus on the latter.
Note: A different member of the team has contributed to each section of the rest of this post.
Choosing a new BI tool
In 2020, we went through the process of choosing a new BI tool. Our aim for this selection was to encourage the use of self-serve analytics along with several other criteria. To achieve this we went through the following process:
- Listed key requirements
- Chose BI tools that meet all these requirements (via Gartner Magic quadrant)
- Request for proposal (RFP)
- 2-week hands-on trial
We gathered feedback from all areas of the business, including leadership, technical and non-technical stakeholders and analysts, to arrive at a decision. We know that choosing the tool is only half the battle, and now we’re on a journey to roll this out across Gousto.
The goal of this analysis was to improve our understanding of throughput. Throughput is one measure of operational efficiency we use in our Supply tribe. We did this by modelling the impact of the menu users can choose from and the operational variables. This helped us understand the impact on our operational efficiency. We’re now able to predict operational efficiency with the new throughput model with improved accuracy. In 2021 this will, amongst other things, help us be operationally ready for proposition changes.
Improving our data visibility in Supply
In 2019 we realised there was a large opportunity for improving our data collection and understanding in Supply. At the start of 2020, we started an extensive collaboration with both our systems and data engineering team. We pulled data from various systems within Supply, modelled it and visualised it on our BI tool. As a result, we now have improved visibility of what happens on the production lines (e.g. picking rates), what happens in the warehouse (e.g. transfer orders) and who was involved (e.g. attendance rates). Metrics are now available to everyone at Gousto and we’ve gained new insights into our operations. We now use this data for deep-dive analyses to improve our understanding of our operations and supply chain performance.
Creation of the Food KPI Dashboard
The end of 2020 saw the creation of the Food KPI Dashboard. The dashboard collated all the key metrics used across the food team into one place with the view of creating “one source of truth”. This helps the team make data-driven decisions as well as increasing transparency across the team. 2021 will involve the evolution of this dashboard as we start to define the viability of certain metrics and the inclusion of new ones that will help us better understand recipe and menu performance within the Menu tribe.
Combining quant exploration with user research to drive conversion rate uplifts:
In H1, all Beetroots experiments were solely based on user research, and we saw lots of flat results (what users say isn’t always what they do). By using quantitative analysis to find opportunities and create hypotheses, we saw 6 winning experiments in Q4. To make this possible, we built a Product Analytics team to support Digital Product. Each tribe now has an embedded product analyst and we’re looking to grow this team further in 2021.
Improving the power of our A/B tests with CUPED:
Online experiments contain a lot of noise and we wanted to find a way to reduce this noise to improve our experiment analysis. So we adopted CUPED (Controlled-experiment Using Pre-Experiment Data). CUPED is a statistical technique designed by the team at Microsoft to reduce variance in experiment metrics. This technique allows us to increase the power of our tests and as a result, detect smaller changes between the groups being A/B tested. Until recently, we’ve been doing this “by hand” but we’ve now created a simple CUPED calculator to automate the process.
Moving away from Conversion Rate
Moving the primary metric for a large part of the company is no easy task, but it was one we had to undertake when we realised that our historical experiments were statistically compromised. Running some simulations on our historical data, we were able to prove that using Conversion Rate for existing customers led to a higher than accepted false positive rate and we needed a better metric. So we moved to Average Order per User which required writing an entire paper plus a guidebook to help the teams understand the strange new world we were moving into.
Managing a multi-site operation through advanced algorithms
Knowing how many orders customers will make next week and beyond is key to our operation. It helps us plan how many colleagues we need to pick those orders and the quantity of each ingredient we will need to fulfil those orders.
Requiring a good forecasting capability is not unique to Gousto, many businesses need this. But we like to think our forecasting problems are special, in that we need forecasts at an order-level, a recipe-level and an ingredient-level. We also need to make these forecasts for multiple sites, as we recently opened a second factory in which to pick customer orders.
We’ve developed some incredible algorithms over the past year to help us manage a multi-site operation. They perform tasks such as determining which recipes to host at which factory and which factory is best to fulfil an individual order.
As we continue to open new factories in 2021 and 2022, these algorithms become critical to our supply operation.
Factory setup improvements
Our Pick-Face Optimisation algorithm (PFO) tells us how to set up our picking lines, and it is one of our most important algorithms in the Supply tribe —more context in this previous blog post. In 2020 we released our first major iteration of the algorithm since it was first developed several years ago. The new iteration had several improvements over the old solution:
- Better line throughput — using the Throughput model mentioned above, we estimate the new algorithm gives us an extra 20–25 boxes per hour in line throughput.
- Significantly faster runtime — this gives our operation more flexibility and makes for quicker scenario testing for our data scientists/analysts.
- Ability to scale — the complexity of our previous algorithm would scale exponentially with the size of the problem, whereas the complexity of our new algorithm scales linearly, allowing us to use it in our newer and larger factories.
You might assume that the throughput benefit was the most important outcome of the work, but as a business, we have gotten equal value out of being able to iterate and replan quickly due to the faster runtime.
Data for Dinner
In 2020, we released our new menu planning algorithm, Goustav! Goustav brought a more data-driven approach to the menu planning process by utilising a genetic algorithm. The algorithm chooses the best combination of recipes from our library by optimising for several objectives including variety and cost. Goustav has been a huge success and we have spent much of 2020 iterating upon the algorithm to enable certain functionalities, such as introducing OTIF (On Time In Full) optimisations to the algorithm to help reduce the number of complaints in a menu week.
Our other major release this year was Rica (yes we are very fond of naming our algorithms). Rica is our recipe development tool which we built to guide our amazing chefs on the most impactful types of recipes to develop, based on their attributes and how popular they are with our customers. Rica was only released in September and has already seen 30 new recipes created and planned into the menu.
Data Science is Growing
Retention plays a big role in the Growth Tribe — especially given that Gousto is growing to become the UK’s most loved way to eat dinner. While this is not a new topic for the tribe, everything in place was reactive and rule-based.
Faced with the success stories from solving problems with Data Science in Menu and Supply, the Growth Tribe got FOMO and 2020 marked the year in which a new Data Science function was set up to take it to next level . The first challenge was to create a Churn Prediction model that proactively identifies customers about to churn.
However, Churn Prediction won’t be the whole story and amazing things are bound to happen in 2021, such as Discount and Compensation Optimisation — so stay tuned to find out what our data scientists achieve in this area in 2021!
Data engineering had a busy 2020 as well. We started the year by improving our Supply Chain data. Making this data available for everyone at Gousto was crucial to measure and improve key business metrics, such as the number of orders on time and in full. With more data ingested, we can now measure how stock flows across our factories and take action to reduce our picking errors (wrong or missing ingredients on boxes).
Ingesting supply data was a huge engineering challenge. We needed to process millions of events in near real-time every day into our data lake. The processing also included some computationally expensive operations, such as merging new rows into existing tables to track our data latest status. We used a big chunk of 2020 to improve our engineering foundations, including leveraging Databricks to bring Spark Streaming and Delta Lake to our toolset.
But we know that just ingesting data and delivering it to the lake doesn’t add business value.
Creating Data Models
In 2020 we invested a lot of time creating data models that reflect our business, and we made accessing metrics easier for everyone. Our Analytics Engineers were/are using DBT to write SQL code following software engineering best practices. We delivered multiple core data models that yielded considerable improvements in ease of use and performance. All thanks to having a single source of truth and excellent data documentation available.
Growing the team and improving processes
All that happened while we focused on growing our team. We went from three people in January to seven by the end of December. Having a bigger team also allowed us to focus on improving internal Agile processes. Proper planning allowed us to know what was coming next and reduced the amount of work wasted by changing focus to solve last-minute requests. The Data Engineering team also worked hard to improve the tickets in refinements and design sessions, making us more confident about the work we needed to do and the effort required.
Thank you for taking the time to make it this far. We love a good discussion so please do feel free to leave a comment here or on our Twitter account @goustotech and we’ll get back to you. If you’re looking for a new role, please check out our careers page.