Movie Recommender System (Data Science Project)
Case study of Movie Recommender System using scikit-learn
Their is substantial increase in data in past decade and using data for the beneficial of users and brands is must. One among the example of data use for better user experience is recommender system. Recommending the user, what he/she desired to buy, watch, etc. is one of the key factor for business grow. Now, using this data for recommending the movies is another use-case and makes it easy for user as well. So, I’m going to implement the same using machine learning techniques so as users can find similar movies to their interest.
Basic Idea of Movie Recommender System
While working on data, it is important to know how to use data. So, here in this case, what I intent to do is, find the Keywords of each movie like, title, genres, overview, main keywords, cast, crew and even more if needed. After having those keywords in single array, it can be drawn into vectors for machine to understand and then finding the minimum distance between movies and least the distance more the similarity between the movies.
Data Source
Using the Kaggle data set of TMDB 5000 Movie Dataset for working on this project.
Source Link: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata
Working on Code
Data we got through Kaggle data set:
- Processing the data:
- Final Data which we need to perform the task on:
- Modelling the data for the project:
Converting data into vectors using CountVectorizer tool under scikit-learn library. Then, converting the words into its root form, removing prefix or suffix using PorterStemmer tool.
Now, the most important function of Movies Recommender system is finding distance between each vector. Here we have multiple options to do so, where using Euclidean method can be used but it gives less accuracy. So, using Cosine Similarity in this so that we get higher accuracy.
Working on Implementing Code on website
Now, our model is done and it’s time to implement it on Heroku website using streamlit library.
import pandas as pd
import streamlit as st
import pickle
import requests
st.set_page_config(layout="wide") # Website width = wide
#creating Poster Fetching Function
def fetch_poster(movie_id):
response = requests.get("https://api.themoviedb.org/3/movie/{}?api_key=1208641ad3f3af4b74bec18fd5720146&language=en-US".format(movie_id))
data = response.json()
print(data)
return "https://image.tmdb.org/t/p/w185/" + data["poster_path"]
# Creating recommendation function
def recommend(movie):
movie_index = movies[movies['title'] == movie].index[0]
distances = similarity[movie_index]
movies_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]
recommendedMoviesPoster = []
recommend_movies =[]
for i in movies_list:
movie_id = movies.iloc[i[0]].movie_id
recommend_movies.append(movies.iloc[i[0]].title)
#fetch poster from API
recommendedMoviesPoster.append(fetch_poster(movie_id))
return recommend_movies, recommendedMoviesPoster
#Accessing the Data
movies = pickle.load(open("movies_dict.pkl", "rb")) #importing data in dictionary
movies = pd.DataFrame(movies) #creating data frame of dictionary
similarity = pickle.load(open("similarity.pkl", "rb"))
# Website title
st.title("Movie Recommender System")
#creating dropdown for movies selection
selected_movie_name= st.selectbox("Movies",movies['title'].values)
#creating recommend button
if st.button('Recommend'):
name, poster = recommend(selected_movie_name)
col1, col2, col3, col4, col5 = st.columns(5)
with col1:
st.text(name[0])
st.image(poster[0])
with col2:
st.text(name[1])
st.image(poster[1])
with col3:
st.text(name[2])
st.image(poster[2])
with col4:
st.text(name[3])
st.image(poster[3])
with col5:
st.text(name[4])
st.image(poster[4])
Result
Here is the look of the Movie Recommender System.
Link for the app: https://bhupesh-mrs.herokuapp.com/
Conclusion
I had learned alot through the process and it had made me even more curious of working on more projects. I had gone through process like, data cleaning, data analysis, data modelling, deployment.