Making a Movie Recommendation App using Streamlit

Subhradeep Rang
7 min readApr 17, 2022

--

Source: ActiveState

Introduction:

For a long time, I had an interest in knowing about the Recommender System. Nowadays, we can hardly find any platform where recommender systems are not in use. Amazon, YouTube, Spotify, Flipkart — every one of them is using this recommender system. So in this article, I am going to make a simple movie recommendation app using Streamlit. But before I start, I want to give a little introduction to the recommendation system.

Recommender System - Introduction:

In simple words, the recommendation system recommends new items according to our past preferences. The recommendation system can be divided into two parts:

  1. Content-Based Filtering
  2. Collaborative Filtering

In Content-Based Filtering, the recommender system filters the item according to the item’s content and suggests it to a user. For example, if a user likes a book, then the recommender system recommends a book that is never read by the user but this book is similar to the user’s previously liked book.

In Collaborative Filtering, if two or more users have similar preferences, then those items are suggested that are preferred by other users but that item is not used by the current user. If this definition is not so clear to you, don’t worry. I am explaining this to you using the image below.

In this image, there are two users, User 1 and User 2. We can easily notice that they both like sweet fruits. User 1 likes Apple and Mango and User 2 likes Mango and Strawberry. So, in that case, the Recommender system suggests Strawberry to User 1 and Apple to User 2 as they don’t taste it yet. I hope this is clear now.

Now, Collaborative filtering can be divided into two parts:

  1. Item-based collaborative filtering:- In this type of collaborative filtering, items are suggested according to their similarity calculated based on the user’s interaction with other items.
  2. User-based collaborative filtering:- In this type of collaborative filtering, items are suggested according to the similarity between users which I illustrate before.

Now there is a question:- it seems that Content-based filtering and item-based collaborative filtering are nearly the same. So, what is the difference? In Content-based filtering, we don’t need any User information while recommending an item. But on the other side, Item-based collaborative filtering needs the user’s interaction with the other items, i.e. ratings. It is all about data. If we see that our collected data contains the item’s feature rather than the user interaction, we simply go for content-based filtering. But if we see that the data contains the information about the user ratings, userId, and movieId, just like the Movielens Dataset, then we go for the collaborative filtering.

Let’s dive into the main part:

Now that’s all about the introduction of a recommendation system. Now let’s move on to the main part:- Making it!

For this, I am using the Netflix Movies and TV Shows dataset from Kaggle. You can download the dataset from here.

Let’s import the necessary libraries.

If we print the first 5 rows of the Netflix dataset, the data looks something like this.

This data has 12 columns which are listed below:-

  1. show_id:- Unique ID for every Movie / Tv Show.
  2. type:- Identifier - A Movie or TV Show.
  3. title:- Title of the Movie / TV Show.
  4. director:- Director of the Movie.
  5. cast:- Actors involved in the movie/show.
  6. country:- Country where the movie/show was produced.
  7. date_added:- Date it was added on Netflix.
  8. release_year:- Actual Release year of the movie/show.
  9. rating:- TV Rating of the movie/show.
  10. duration:- Total Duration - in minutes or number of seasons.
  11. listed_in:- Genre
  12. description:- The summary description.

From the above column description, we can easily notice that there is no user information available in this dataset. Here is also a column called rating but this column is not containing the numerical value. For this, we cannot find any similarity using this column. So, we have to make this recommendation system using Text-Based Similarity, i.e. by using their description column. So, here we are only interested in the title and the description column.

Now, as we have to work with the text data, so we have to clean those texts before further process. At first, we have to convert all the letters into lowercase. Then all the punctuations should be removed. After that, we tokenize the words, remove the stopwords and lemmatize them.

Now after we are done with the cleaning part, let’s convert the words into vectors. I am using TfidfVectorizer from Scikit-learn for converting the words into vectors. The full form of TF-IDF is Term Frequency - Inverse Document Frequency. This process of word vectorization not only convert the words into vectors but also takes care of the word’s importance. The main mathematical formula for TF-IDF is given below.

I don’t go deeper about the TF-IDF as this is not the purpose of this article. So, I just give a little definition of this. if you want to learn more about this, there are so many articles. Just go through them.

Just see from the above code snippet that we ignoring those words which are only in the one document and 70% of the documents. We are ignoring those words contained in the one document as those are not so much useful for finding out the frequency and other words contained in 70% of documents are ignored as those are shadowed by the other words.

After vectorization, we are going to make a data frame with that vector array where the title of the movies are in the index and the words are in the columns. then we save that data frame for using this in the Streamlit app.

Now there is only one thing left - calculating the similarity. I am using the cosine similarity for calculating the similarities between movies.

This function recommend_table takes those movie lists which are enjoyed by an individual and also takes the TF-IDF data frame which we saved previously. Then the cosine similarity is calculated between the user movie list and the other movies in the TF-IDF data frame. After that, this function returns a pandas data frame consisting of movie names and similarity scores sorted in descending order.

Now it is the time for making the Streamlit app for this.

At first, we import the necessary libraries for making the Streamlit app.

We are using the recommend_table function which we made previously. One thing I missed. We have to save our movie titles for making a dropdown list in the Streamlit app. Now we have to load the two files from our local storage:-

  1. TF-IDF data frame which we saved previously.
  2. movie title list saved as a pickle file.

You can see from the above code snippet that I used a decorator above the load data function - st.cache. this decorator cache the TF-IDF data frame after loading as this TF-IDF data frame has a big memory size. As far as I remember, it has a size of 337 MB! Of course, this is not a joke. This can slow down the app. So we have to cache it.

Now we make the main structure of the app.

Here I am using st.text('') for making a space between widgets. If you want to know about session_state in Streamlit, this is what I got from the Streamlit documentation:-

Session State is a way to share variables between reruns, for each user session. In addition to the ability to store and persist state, Streamlit also exposes the ability to manipulate state using Callbacks.

While changing a value in the widget, every time Streamlit loads the whole app to update it, which is not desirable. So, using the session_state in streamlit, we can prevent this. It only loads the widget to update its value. For this, the app performance is also improved.

If I run that code, the UI works something like this.

Seems awesome, right? Now you have the power of making your recommendation engine. This is an example of a Content-based Recommendation System. You can use any dataset and try it for yourself.

In the future, we will deploy this app using Docker as this article is already so big. Until then, stay tuned.

If you like this article, please appreciate it by clapping and if I miss something or write something wrong, don’t hesitate to tell me in the comments.

--

--

Subhradeep Rang

Passionate in learning about Data Science, Machine Learning and AI. Like to share my knowledge through article for making someone's life easier.