Food Recommendation System by ALS Method in Pyspark and Diet Food Recommender by Cosine Similarity Method

Merve Cengiz
5 min readJun 11, 2022

--

Hi Everyone!

I present my latest project at Istanbul Data Science Bootcamp. It was very exciting, exhausting and enjoyable to prepare and reach the final.

Everyone wants to choose their favourite food in a short time and eat it. In the first stage of this 2-stage project, I recommend your favourite meals to you. You will see 2 recommendation system in the applications. One of these is Food Recommendation System that will speed up the decision on which food to eat. You rate 5 foods on the list and top 5 foods which you will like most are recommended for you. This algorithm is known as collaborative-based recommendation. If you decide what to eat and if you go on a diet, top 3 foods appropriate for your diet are recommended by 2nd algorithm which I called it as FoodMagic 😊 This algorithm is known as content-based recommendation.

Below is a screenshot of the user interface application on Heroku:

About 2000 recipes were analysed. Collaborative Recommendation System was performed by creating userid and foodid.

Recipes were categorized into 7 diet types for content-based recommendation: Diabetic-Gluten Free-Ketogenic-Low Sodium-Low Cholesterol-Vegetarian-Vegan. Diet Food Recommendation System was performed by using diet types of foods.

Methodology:

1. Extract recipes and related features by using Web Scraping from Allrecipes.com,

2. Pre-process the data,

3. Exploratory Data Analysis (by using Tableau),

4. MongoDB Atlas for storing dataset,

5. Build ALS model (for collaborative-based),

6. Build cosine similarity matrix (for content-based),

7. Model deployment by using Flask and Heroku.

You can visit my GitHub page for full code of all steps in methodology.

  1. Web Scraping:

Collecting data from a website and learn its structure by parsing with BeautifulSoup and obtain a dataset.

I created code to get the features below for the recommendation systems on Allrecipes.com:

2. Preprocess the data:

Tokenization, Removing punctuation-numbers, Lemmatization are implemented in this stage. Dataset:

3. Exploratory Analysis with Tableau:

I have executed EDA by using Tableau which is one of great visualization tool for Data Science. It is so effective for creating dynamic plots and easy to use.

My evaluations and applications while exploring the data:

If you are not a member your username will be ‘Allrecipes Member’ automatically when you comment. In this situation the same user gives different rate for the same food. Then I removed all ‘Allrecipes Member’ from dataset.

If a food is categorized into more than one diet type, this causes duplicate data. I removed all duplicates in the dataset.

You can visit a Dashboard of EDA at my Tableau account.

4. Load Food Database into MongoDB Atlas:

Firstly, I created a database named Food Database and load recipes into MongoDB. My goal was to connect MongoDB via Databricks to retrieve dataset for implementing recommendation system but since my Databricks account is restricted in Community Edition I only used MongoDB to store the dataset.

5. Food Recommendation Analysis with Alternating Least Squares (ALS) technique in Pyspark in Databricks Community Edition:

I implemented ALS technique to build collaborative-based recommendation system. This algorithm relies on a ratings matrix for all foods, to generate similarities between users based on similar ratings.

I used randomSplit() function to split dataset into train and test set with 0.7 and 0.3, respectively.

5-fold cross validation was implemented by using numFolds.

To determine best hyperparameters I used ParamGridBuilder() for rank, maxIter, regParam and alpha.

For evaluation function RMSE was used.

According to the result of RMSE, 0.65 error is pretty low in a system rated out of 5:

6. Grouped Data into diet types for preparing Diet Food Recommender:

I used cosine_similarity() function to create a rating matrix to compute the distance between all foods. So, I have a foods-by-foods matrix of the distances between every food from the ratings space.

7. Model Deployment:

I have used Flask to develop of a web application and a web server I used for deployment was Heroku. I created app.py, requirements.txt and a Procfile in Flask which these are necessary for deployment on Heroku. Firstly, I added a new repo in my Github account then add all files in my Flask project to it via Git Bash and push them to load in my new repository.

Since my project includes big data application, one of the dataset size is large. Monthly data quota of Github is limited to 100 mb so I couldn’t load large dataset for ALS technique. Until full model deployment, I will upload the application video of the two recommendation systems. So follow me and don’t miss the video 😊 For now, only Diet Food Recommendation System (FoodMagic) application is at https://food-magic.herokuapp.com/

This web application works based on only one word. So if you write down ‘chocolate’ or ‘banana’ in the text area it works, otherwise it returns an error.

Many thanks for reading, please visit my Github account and Tableau profile. I posted the article in my Linkedin profile. Please don’t hesisate to connect with me for your questions.

--

--