Data Science MSc projects with Gousto — Part II

Steven George
Gousto Engineering & Data
7 min readNov 15, 2022
L: Fengyi Zhu R: Aakash Arora

This summer Gousto’s Data Scientists once again collaborated with Masters students undertaking courses in Data Science and Advanced Analytics to work on company-specific projects. This programme has grown from 7 students last year to 17 students this year and we plan to continue this trend going forward!

In this series of blog posts we chat to some of the students we worked with in recent months to find out about their background and the exciting projects they worked on with Gousto. If you missed Part I you can find it here.

Please introduce yourself and tell us about the course you are studying

Fengyi: My name is Fengyi Zhu and my undergraduate major is Risk Management and Insurance. During the study, I realised that I was obsessed with data and passionate about extracting hidden rules or values from data. So I chose Data Science as my graduate major for further study. The first semester of Data Science (Business Management) at the University of Manchester starts with basic data governance, database principles and applications, Python fundamentals, and basic statistical knowledge of machine learning, and in the second semester begins with the study of common algorithms and applications of machine and deep learning.

Aakash: My name is Aakash Arora and I have over 3 years of experience working in the field of data and computer science. I completed my Msc in Data and Decision Analytics from the University of Southampton in September 2022. I love to build innovative solutions using code and after my daily 9 to 5 you’ll most likely find me in training in the gym.

Because it’s the Gousto blog, we must ask you what is your favourite food?

Fengyi: As I am Chinese, my favourite food is Chinese food! For example, fried eggs with tomatoes and rice for one person, or hot pot for a party.

Aakash: As I come from India I love spicy food. I’m also vegetarian so my choices might be a little different. In India there are multiple cuisines. So cuisine-wise my favourite dishes would be Paneer tikka (north), Dosa (south), vada pav (west), Rasagulla (east).

What first got you interested in data science?

Fengyi: As an undergraduate, we would explore financial data through R, predicting the future prices of stocks and studying the relationships between them. During my studies at the time, I was addicted to the fun of coding, to creating novel algorithms through mathematical and statistical knowledge in order to investigate the information and value behind the data, and to keeping learning new things. That was the first time I felt that I liked and was suited to this field, and then I started to get more and more addicted to data mining projects in different fields online.

Aakash: During the tenure in my previous company, I developed interest in the field of data as I worked with the data science team. This made me realize the importance of data and how it drives the decisions made in the world today. It was a time when almost every organisation was taking the data-driven approach to tackle challenges and build innovative solutions. I was fortunate enough to build an AI-based chatbot solution which helped save operational time and actually create an impact in the organisation. Experiencing the success of this project along with learning about the different techniques and methods in the field built my interest in the field of data science and thus I decided to pursue my Masters in Data and Decision Analytics.

What was the problem your project with Gousto was aiming to solve and what different techniques did you get to try?

Fengyi: My project with Gousto focuses on solving the user cold-start problem in Gousto’s recommendation system by publishing a pairwise comparison game and learning user preferences for recommendations. I chose to use the Bradley-Terry model to calculate each user’s preference for each product and combine it with the product’s features to obtain the user’s preferred features. Once we have obtained the user’s preferences for the product features, we use the MCDA model and the LightGBM Ranker algorithm to rank each user and select the top-N ranked recipes to recommend to the user. Metrics are constructed to evaluate which algorithm performs better in terms of accuracy and diversity.

Aakash: As Gousto provides new recipes to their customers, a key part of the growth and sustainability of the company is to be able to generate new recipes that may perform well in the market. Although Gousto has a dedicated recipe development team, it may become tough to come up with new recipe ideas frequently. To solve this issue, this project aims to build a human-in-the-loop AI system that is capable of generating new recipes using a set of ingredients passed to it. This AI generated recipe can help the recipe development team as a starting point or even the final recipe based on how realistic (and edible) the recipe is. Along with the recipe generator we also build a performance predictor to predict how well a recipe might do in the market.

To build the recipe generator, I used the T5 ( Text-to-text-transfer-transformer) which is a multi purpose transformer that is used to solve sequence-to-sequence tasks in NLP applications. To build the performance predictor, I used a set of regression based models like random forest, XGBoost, LGBM and compared which would be the champion model.

How did you work with the team at Gousto?

Fengyi: We met from time to time to share the results of my work and exchange ideas with each other, and Hai Nguyen gave me advice on how to solve my problems and thus move my project forward.

Aakash: It was a pleasure working with the team. My guide was Steven George. Steven helped me through every step and guided me through the times when I used to get stuck. To provide quick updates and constantly stay in touch, we communicated through Slack. We also had weekly meetings that were about an hour long to go through all the updates in detail.

Overall it was a very good experience and the communication between us was very effective.

What were some of the challenges you faced along the way, and are there any tips you would give to other data scientists starting their first project with a company?

Fengyi: The biggest difficulty I encountered during the project was the selection of the algorithm. Because we want more of a ranking order of all the items for each user, I chose a ranking algorithm to solve the recommendation problem. I had no prior exposure to algorithms in the field of learning to rank, so it took a lot of time and a lot of bumps in the road from getting started to writing the code and getting the final result.

My advice would be to communicate with the company in a timely manner to identify their needs, and to use your learning skills to learn to find references and code quickly and be able to apply them to your own projects while ensuring that your research is going in the right direction.

Aakash: Working on this project there were a lot of occasions where I got stuck. This was mostly due to the formatting of data. There were also situations which involved the libraries being incompatible. Of course if we Google enough there will always be a solution, but my advice to anyone pursuing such a project would be that during times like this:

1. Communicate with your guide

2. Try taking a break. Maybe you’ve burnt yourself out and your mind’s not work as effectively. So a break would definitely help (of course only if you have the time)

What was your favourite thing about working on this project?

Fengyi: I found the project challenging in that it was a relatively novel and interesting angle to mine the user’s preferences from the results of the game to build a user profile, and secondly, to have a search algorithm to solve the recommendation problem. And I feel a sense of achievement that it can ultimately improve the performance of the recommendations.

Aakash: My favourite thing about this project was that it was very unique and innovative. It managed to create an impact in the audience and turned a few heads towards it and this is something that motivates me.

What skills did you gain during the project and how will these help with your future endeavours?

Fengyi: Through this project, I feel that my learning and research skills have been enhanced. Whether you are working or studying for a PhD, it is very important to have your own learning methods and to quickly grasp new knowledge, which reflects your value in the workplace and in academia.

Aakash: I’ve gained a lot of knowledge while working on this project. Some of the key ones that really helped me was using transformers, transfer learning, fine-tuning parameters, and formatting data for fine tuning NLP tasks. The presentation at the end of the project also helped me gain more confidence in talking about pure data science projects.

Thank you for reading and if you missed Part I of this series you can find it here. To stay up-to-date with Gousto blog posts please follow our Medium page. If you’re a frequent user of LightGBM you won’t want to miss one of our most popular posts on The Problem with Gradient Boosting (Gradient Boosted Gremlins).

--

--