A first data science project with Gousto — Part 2: Panos takes on throughput modelling
In our Gousto data team, we love collaborating with students studying data related degrees, and this year we set two challenges for students to get stuck into. Today we caught up with Panos from the University of Manchester to hear how he found working on a real life Gousto data challenge.
Please introduce yourself and tell us about the course you are studying
I’m Panos Krintiras, 27 years old, and I come from beautiful Greece. Unfortunately, I haven’t visited it since September 2020, when I arrived in the UK for my Master’s studies. Before COVID, I decided that I wanted to pursue a Master’s degree related to data. After in-depth research, I decided that the MSc in Business Analytics offered by the Alliance Manchester Business School suited my interests. The courses of this Master’s program created the ideal curriculum to enhance my knowledge and skills to meet the requirements of the modern business environment.
Because it’s the Gousto blog, we must ask you what is your favourite food?
Having left home since I was 18 for my undergraduate studies, I miss A LOT my mother’s food! Especially during last year, I didn’t have the time to cook (I wish I was aware of Gousto many months ago ;) so I really missed good and tasty food! More particularly, I missed her spécialité Yemista (baked tomatoes or peppers filled with rice, chopped vegetables, and baked in a tomato-based sauce). If anyone of Gousto’s Food Team reads this article, please include Yemista on the menu! 😊
What first got you interested in data science?
I was able to have my first genuine experience in business analytics throughout the years that I worked at CERN. I realised the crucial impact that data analytics bears on organisations. Part of my job was to provide procurement reports for some of our stakeholders. I extracted the necessary information from a data cube using a reporting tool, which was easy to use but not time efficient. I would have saved a lot of time if I had the necessary technical knowledge to conduct my analysis and hence acquire a better and more accurate visualisation of the results.
What was the problem your project with Gousto was aiming to solve and what different techniques did you get to try?
The meal kit industry has constantly been rising, with the COVID-19 pandemic accelerating its growth even further. More particularly, during the COVID-19 pandemic, people’s attention to healthy eating increased, with 71% of people who prepared their meals at home aiming at continuing to do so after the end of the pandemic.
Therefore, increasing a recipe-kit company’s (like Gousto) factory efficiency to tackle today’s outbreak-related concerns is crucial. An important metric for evaluating a production line’s performance is the system’s Throughput (the average throughput (or mean production rate) is the average number of boxes released from the last workstation of the production line per unit of time) and is one of the most important Key Performance Indicators (KPIs) of a production/manufacturing line.
The project aimed to understand the factors that influence Gousto’s throughput and provide Gousto with practical advice on increasing its performance to satisfy the growing demand. Gousto provided primary data from its factories, and the main objectives of the research are listed as follows:
- To identify the effect of different factors on the throughput of a production line and compare the results with the current literature
- To provide insights of workload spread importance across the production lines of Gousto
- To associate the impact on throughput and the increase in the number of available recipes for the customers
- To explore the Covid effect in demand in the meal-kit industry
The data explanatory analysis and data visualisation were deployed as the tools to understand the business objectives and extract as much information as possible from a given set of production data collected by Gousto. Furthermore, data mining techniques and supervised machine learning models, such as Multivariate Linear Regression, Decision Tree, Neural Network, Random Forest and XGBoost, were used to analyse and predict the factory’s throughput, perform feature selection and explore features’ importance.
Data Mining and Analytic methods applied in the project
The results demonstrate that a wide range of factors influence throughput.
How did you work with the team at Gousto?
I am glad for the collaboration with Florine, a Senior Data Analyst in Gousto’s Supply tribe. In our weekly meetings through Google Meets, I was bombarding her with A LOT of questions. We were discussing my findings and the progress of the project. Also, I had the chance to meet Florine in person at Gousto’s office (of course, we passed by the kitchen 😊) for one of our meetings. Florine was there for me every time I needed her help and replied to all my questions. Thank you, Florine!
Furthermore, Florine arranged a visit for me to Gousto’s factory in Spalding. During this visit, I had the chance to meet other Gousto’s analysts and some of the factory’s personnel. This gave me the unique opportunity to better understand how Gousto’s factory operates and the challenges that the personnel face in their daily work.
What were some of the challenges you faced along the way, and are there any tips you would give to other data scientists starting their first project with a company?
The primary and one of the biggest challenges of this project was to define the project’s scope as I tend to be a curious person who wants to explore and understand every aspect of my studied topic. One of the most exciting challenges that I faced was understanding how Gousto and its factories operate and how this is represented on the dataset. I wanted to make sure that I understood Gousto’s needs and how valuable insights can be derived from the dataset provided to me.
Overall, data mining and data analysis techniques aim to extract meaningful information from raw data and convert it into useful knowledge to understand better processes and aid decision-making. Therefore, the most crucial part of a project with a company is gaining a business understanding of the project objectives and needs and transforming that information into a data mining problem. Identifying the data mining goal is a critical part of this process. Similarly, Data Understanding is the second most vital step. This phase involves a series of activities exploring and describing the data, ensuring data quality, gaining early insights into the data, or identifying intriguing subsets to build hypotheses for hidden information. Thus, business Understanding and Data Understanding are closely related. To formulate a data mining problem and develop a project strategy, it is necessary to fully comprehend the data.
Overview of the research(project) design
What was your favourite thing about working on this project?
The most exciting thing about this project was that I could implement the theoretical knowledge I learned during my Master’s studies in a real-world problem. Last September, I had zero knowledge of Python, but I managed to conduct all the analysis exclusively in Python. Furthermore, I realised the importance of data visualisation. Plots are an effective way to explore data. They are necessary for presenting results as they provide overviews and valuable insights. I enjoyed creating various plots and formatting them using the Matplotlib and Seaborn libraries.
And lastly, what sort of role are you looking for after graduating?
Business Analytics is about asking different questions to understand the current situation (What happened?), looking for the right answers in the data (Why is this happening?) to take evidence-based decisions and trying to predict the future (What will happen next?). Data analysis is an essential part of business analytics which I really enjoy and want to pursue in my career. Therefore, I am interested in a role that supports advanced analytics in an organisation for problem-solving and decision making, the role of a Data Analyst.