Movie Recommender System (Data Science Project)

Case study of Movie Recommender System using scikit-learn

Bhupesh Singh Rathore | Cruio
4 min readApr 1, 2022
Photo by Samuel Regan-Asante on Unsplash

Their is substantial increase in data in past decade and using data for the beneficial of users and brands is must. One among the example of data use for better user experience is recommender system. Recommending the user, what he/she desired to buy, watch, etc. is one of the key factor for business grow. Now, using this data for recommending the movies is another use-case and makes it easy for user as well. So, I’m going to implement the same using machine learning techniques so as users can find similar movies to their interest.

Basic Idea of Movie Recommender System

While working on data, it is important to know how to use data. So, here in this case, what I intent to do is, find the Keywords of each movie like, title, genres, overview, main keywords, cast, crew and even more if needed. After having those keywords in single array, it can be drawn into vectors for machine to understand and then finding the minimum distance between movies and least the distance more the similarity between the movies.

Data Source

Using the Kaggle data set of TMDB 5000 Movie Dataset for working on this project.

Source Link: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata

Working on Code

Data we got through Kaggle data set:

Movies Data set
Credits Data set
Making data useful by Merging and dropping null values
  • Processing the data:
Data Processing
  • Final Data which we need to perform the task on:
new_df : Final data
  • Modelling the data for the project:

Converting data into vectors using CountVectorizer tool under scikit-learn library. Then, converting the words into its root form, removing prefix or suffix using PorterStemmer tool.

Now, the most important function of Movies Recommender system is finding distance between each vector. Here we have multiple options to do so, where using Euclidean method can be used but it gives less accuracy. So, using Cosine Similarity in this so that we get higher accuracy.

Difference between Euclidean and Cosine Similarity
Data Modelling
Model is working perfectly

Working on Implementing Code on website

Now, our model is done and it’s time to implement it on Heroku website using streamlit library.

import pandas as pd
import streamlit as st
import pickle
import requests

st.set_page_config(layout="wide") # Website width = wide

#creating Poster Fetching Function
def fetch_poster(movie_id):
response = requests.get("https://api.themoviedb.org/3/movie/{}?api_key=1208641ad3f3af4b74bec18fd5720146&language=en-US".format(movie_id))
data = response.json()
print(data)
return "https://image.tmdb.org/t/p/w185/" + data["poster_path"]

# Creating recommendation function
def recommend(movie):
movie_index = movies[movies['title'] == movie].index[0]
distances = similarity[movie_index]
movies_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]
recommendedMoviesPoster = []
recommend_movies =[]
for i in movies_list:
movie_id = movies.iloc[i[0]].movie_id
recommend_movies.append(movies.iloc[i[0]].title)

#fetch poster from API
recommendedMoviesPoster.append(fetch_poster(movie_id))

return recommend_movies, recommendedMoviesPoster

#Accessing the Data
movies = pickle.load(open("movies_dict.pkl", "rb")) #importing data in dictionary
movies = pd.DataFrame(movies) #creating data frame of dictionary
similarity = pickle.load(open("similarity.pkl", "rb"))

# Website title
st.title("Movie Recommender System")

#creating dropdown for movies selection
selected_movie_name= st.selectbox("Movies",movies['title'].values)

#creating recommend button
if st.button('Recommend'):
name, poster = recommend(selected_movie_name)

col1, col2, col3, col4, col5 = st.columns(5)

with col1:
st.text(name[0])
st.image(poster[0])

with col2:
st.text(name[1])
st.image(poster[1])

with col3:
st.text(name[2])
st.image(poster[2])

with col4:
st.text(name[3])
st.image(poster[3])

with col5:
st.text(name[4])
st.image(poster[4])

Result

Here is the look of the Movie Recommender System.

Link for the app: https://bhupesh-mrs.herokuapp.com/

Conclusion

I had learned alot through the process and it had made me even more curious of working on more projects. I had gone through process like, data cleaning, data analysis, data modelling, deployment.

Bhupesh Singh Rathore — Portfolio

Follow me on — LinkedIn | YouTube

Enjoy Data Science ’n’ Coding 😎🐍.

--

--