Building a Content-Based Food Recommendation Engine
Istanbul Data Science Academy — Final Project
If you want to cook something new, you probably search for something different you like or if you want to cook for a special day, you take your time to ask your friends, family then make your decision. What if a system gives you some ideas about what to cook? A recommender engine could do that for you!
Content-based recommendation and collaborative filtering recommendation are the two most common systems. The aim of this post is to build a simple content-based recommendation system. We will see how we can scrape data, build a recommender engine, and design a basic interface to deploy on Heroku.
Collecting the Dataset
In this case, data pulled from a website that giving us cooking ideas. Python has great tools for web scraping such as “BeautifulSoup”, “Selenium”, etc. You need “requests” library for fetching content from a website then you can parse and return the data if you use “BeautifulSoup” library. In this project, I extract the data into a data frame using the “Selenium” library. “Selenium” has to open a browser for each request to get a website. For this reason, you need a webdriver sending the requests.
The dataset contains different types of recipes such as desserts, thanksgiving, Christmas, etc. In total, 2000+ recipes details available in our dataset. Each recipe consists of the below features
- Preparation Steps
- Recipe Rate
- “Make It Again” Ratio
I also scraped more information about the recipes. The data on the above contains the most reviewed recipes. So, each item has at least one review from a user. I collected these reviews to understand the general perspectives of the users about a recipe. In order to understand the perspectives of the users, I applied sentiment analysis to the reviews. I measured the polarities of the people using ‘TextBlob”. The polarity numbers range between -1 and 1. If the number less than 0 we can say that the review is negative or it is greater than 0 the review is positive.
Before building the engine, you need to clean the data. We should clear the texts from the characters and words that we do not need. While we were scraping the data, some special characters came with that such as ‘/n’ among the words. I started to clean those characters. Also, I removed the stop words. I applied classic clearing techniques for natural language processing the following steps:
- Removing punctuation
- Obtaining the words’ stems
- Removing numerical characters
- Converting letters to lowercase
In this section, I created some interactive charts using ‘Plotly’. Let’s have a look at some brief information and graphics.
As I mentioned above, we looked at the general ideas of the people. Generally, people gave positive feedback to the recipes ;
#creating a pie chart
the_colors = ['rgb(33, 75, 99)', 'rgb(79, 129, 102)',
'rgb(151, 179, 100)']fig = px.pie(d, names='Polarity')
If we look at the “make it again” rates, there is skewness on the histogram graph. The recipes made again 90%-95% of the time.
#create a histogram chart
fig = px.histogram(recipe_data, x="Make_It_Again",
xaxis_showgrid=False, yaxis_showgrid=False)fig.update_traces(marker_line_width=2, opacity=0.6)
fig.update_xaxes(showline=True, linewidth=1, linecolor='black')
fig.update_yaxes(showline=True, linewidth=1, linecolor='black')fig.update_layout(showlegend=False,
title="Make It Again Distribution",
xaxis_title="Make It Again",
We are going to build a recommender engine based on the ingredients, description, and preparation steps. The engine will make a recommendation according to positive reviews to the users’.
In order to create a recommendation engine, we need a vector of the matrix (in this case we use “TF-IDF Vectorizer”) to find the similarities between the recipes. Afterward, we use the “linear_kernel” function from “sklearn” to calculate cosine similarities. It measures the cosine of the angles between the two vectors. The vectors that we created with TF IDF based on the recipe information. As I said, the engine will make a recommendation based on user reviews. The following steps explain the working system.
- Determining the positive sentiments of the user with ‘TextBlob’ and filtering the reviews
- Choosing randomly out of the positive reviews and finding a recipe that user’s like
- Recommending top 10 recipes with the highest cosine score based on the recipe that we choose
Building a Simple Dashboard with Streamlit
Streamlit is an awesome tool that allows us to build interactive web applications without HTML knowledge. Designing an interface for your machine learning model is quite easy with Streamlit. It is also compatible with major libraries such as pandas, plotly, matplotlib, etc. You all just need a couple of codes!
- Installation & Running
You need to type the code block in your terminal for a classic installation ;
$ pip install streamlit
In order to run your app in your localhost, first of all, you need to change the directory to the folder that your app contains. Then, write the code block below. It will open a new tab on your browser. You can see your app on the tab.
$ streamlit run file_name.py
- Title & Table
You can use
st.dataframe() to display the data frame and you can sort the values by clicking the column names. Streamlit can display a data frame as a table.
st.table() is not for a dynamic table, you can just display the values without scrolling.
st.title(“Recommended for you!”)
The above code block renders below.
- Creating a Side Menu
You can create a side menu with Streamlit in different types. It is pretty easy with
st.sidebar() function. Anything you want to put in the sidebar, just add it to the function.
st.sidebar.text_input('''Enter your user name''')
As you notice, we inserted a box that you can type anything in it. Moreover, we added the recipes that users’ like as a table on the sidebar.
In order to graph the rates, I preferred to draw an interactive graph using the Plotly library’s bar chart. To show the graph on the app, you should add
st.plotly_chart() function at the end of the codes.
Note : Do not use
fig.show()function at the end of the graph codes, Streamlit could open a new new tab to show the graph. You can also set
pio.renderers.default = 'chrome'if you want to be explicit about which browser is used.
fig = px.bar(recom, x='Name', y='Rating', color='Name',
marker_line_width=2, opacity=0.6)fig.update_layout(showlegend=False, title="Rating",
yaxis_title="Rate")fig.update_xaxes(showline=True, linewidth=1, linecolor='black')
fig.update_yaxes(showline=True, linewidth=1, linecolor='black')st.plotly_chart(fig)
Now that we have our simple interface, we can host it online somewhere. We are going to do this by deploying it to Heroku. Heroku is a platform as a service(PaaS) which you can run your applications on the cloud to demonstrate to the world. First of all, you need a GitHub account and a Heroku account. We will upload files to our GitHub repository and make a connection between the repository and Heroku.
In the first step, create a repository and make sure that you upload the following files in your repository.
- file_name.py file contains your application codes.
- requirements.txt file includes the libraries’ names that you import for your app. You should write the libraries' names with their versions. (e.g. pandas == 1.0.1) You can check the versions by typing
pip show [library's name]in the terminal.
- setup.sh initiates the entire platform project configuration process. All environment variables and some configuration operations are performed by sourcing this file.
- Procfile tells Heroku what command to run to launch your web server.
The second step is creating a new app on the Heroku page. Click the new button and choose the “Create new app” section.
- Give an application name on the page which is opened then click the “create app” button.
- You will see an overview of the app. Click the “Deploy” section then select the GitHub button on the “Deployment method”.
- Make a connection with your GitHub account and type your repository name in the box then click the “Connect” button.
- Click the “Deploy” button at the end of the page.
As a result of these processes, the deployment will be completed.
You can reach my Heroku page and GitHub repo here :
Contribute to MerveHoroz/food-recommender-herokuapp development by creating an account on GitHub.
Thanks for reading!