A first data science project with Gousto — Part 1: Andreas takes on recommendations
In our Gousto data team, we love collaborating with students studying data related degrees, and this year we set two challenges for students to get stuck into. Today we caught up with Andreas from the University of Manchester to hear how he found working on a real life Gousto data challenge.
Please introduce yourself and tell us about the course you are studying
My name is Andreas Hadjigeorgiou and I have two years of work experience in Data Science. After realizing my passion about Data Science, I proceeded in undertaking an MSc in Data Science (Business & Management) at the University of Manchester. It has been a great course and overall introduces the fundamentals of Data Science thoroughly with some cases introducing a more advanced knowledge of Data Science that I have not faced before in the two years. The course also focused on the online businesses and introducing how the data can be utilized in the e-commerce sector.
Because it’s the Gousto blog, we must ask you what is your favourite food?
Looking at Gousto’s recipes I realised how many different cuisines there are, and how many different tastes I need to try!
Even so, I would still choose the most classic choice, a Simple but juicy beef burger! For me, there is nothing that can beat a good burger at the right time!
What first got you interested in data science?
After graduating in Mathematics from University Of Essex, I created an online shop and started selling dog clothes online. To succeed, I took courses on Digital Marketing and learned more about other platforms such as Google Analytics, FB Ads and more. With the mathematics background it didn’t take long to dive into analysing the website’s visitor behaviour and trying to identify opportunities using data.
That is when I fell in love with data science after realizing the hidden potential in the data. Utilizing the data sources can change the trajectory of a business. Multiple applications such as defining your target audience, predictions to identify any future obstacles, automations and many more.
What was the problem your project with Gousto was aiming to solve and what different techniques did you get to try?
The project I was working on was creating a recommendation engine using Gousto’s data. The main objective was to maximize the hit-ratio and build a recommendation engine that can successfully recommend relevant recipes to customers depending on their taste preferences.
However, while building this and experimenting with different methods such as Collaborative Filtering and specifically K-NN means algorithm, I found out that there is an issue that affects the hit-ratio accuracy. That was the Item-Cold start issue. As Gousto introduces new recipes every week, more and more unrated and unseen recipes are introduced. Thus, leaving the Collaborative Filtering methodology unable to be utilized.
To address that, I used the Hybrid approach of Collaborative Filtering and Content-Based Filtering. This allowed me to capture the user’s behaviour but also the user’s preference. Thus, when a new recipe was introduced, the algorithm could understand the recipe characteristics, and based on the previous user’s preferences it could identify if a recipe is relevant to that user or not. This was done by using Matrix Factorization technique, a matrix holding the way that the customers rated and another matrix holding the recipes features. By doing so, a successful recommendation engine with approximately 30% weekly hit-ratio accuracy was achieved. Moreover, in the recommendations we could see a level of personalization, as the algorithm was able to recommend unseen recipes to users that were interested in such recipes.
How did you work with the team at Gousto?
Mainly we had pre-arranged meetings every 2 weeks and we were discussing the approach and where we were heading. Exchange of ideas, explanation of how Gousto’s work, and a description of all data was provided by the team.
This was helpful to capture the idea of how Gousto operates and utilize the given data in the approach.
What were some of the challenges you faced along the way, and are there any tips you would give to other data scientists starting their first project with a company?
One of the biggest challenges that I faced along the way was the computational and processing time needed.
Applying machine learning models on benchmark datasets like MovieLens or other well-known datasets to experiment with ML models, the computational time was never an issue. However, when experimenting with Gousto’s data, I found that computational time was an issue and prevented me from running the predefined algorithms that I have created.
Specifically, this was a challenge when I created the Hybrid approach solving the item-cold start issue. The Hybrid model could address the item-cold start issue, but it was suffering from a computational time issue.
This was troublesome when I was trying to optimize the hyper-parameters of the model in the hope of achieving better results. Unfortunately, that was a wall I could not pass due to the limited time given to complete the project, and thus only the default parameters were used. I strongly believe the model could achieve a higher accuracy and be utilized by the company if it was properly tuned.
With that said and from the recommendation algorithm developed I have two tips to any data scientist starting their first project within Gousto.
Firstly, and the most important, is to take the time to familiarize themselves with the data, but also very crucial to understand the way that Gousto operates and introduces new recipes each week. This can introduce bias to some models depending on the project so having a great picture will be helpful.
Lastly, a mistake that I have done and described earlier, is to familiarize with a cloud service such as AWS. This will allow running models that demand high computational time faster and more efficiently and avoid hitting the wall that I faced. Which can also affect the model’s quality as the hyperparameter tuning example.
What was your favourite thing about working on this project?
The fact that I was working with real-data and trying to tackle a real-life data science project was making every part of it exciting.
I loved the process of increasing the model’s accuracy and trying to recommend recipes to customers based on their preferences. Particularly, when I finally managed to recommend recipes to users and confirmed that if the model was in the deployment scenario could successfully recommend recipes to users I was intrigued.
However, the most fascinating part was when I was trying to put myself in the customer’s shoes and revisiting Gousto’s website to find out what customers see at the first glimpse so I can utilize this information in my model. This is something that can be done only on real-life projects like this one.
Moreover, to achieve the above I had to learn a lot of new stuff that I had no idea that I did not know and that made the project super fun! Reading about different people’s ideas and approaches on recommendation engines was very exciting.
And lastly, what sort of role are you looking for after graduating?
As I am currently running a Digital Design Studio that I founded during the MSc studies, I am keen on improving this. This allows me to create several e-commerce for businesses, consult on their digital presence and marketing to improve traffic and sales, integrate multiple tools such as Google Analytics and offer them a report of their site’s behaviour.
However, I would also like to follow my passion in Data Science, and I am really interested in Data Science consultancy, or any Data Science part-time or contract projects in e-commerce or trading industry as I believe I have in-depth knowledge. I am spending a fair amount of time on improving my Data Science skills as I still have a lot to learn.
Thus, the perfect scenario is to manage to combine these two together.