Movie Recommendation System — Content Filtering

Anchit Jain
Data Science 101
Published in
4 min readJan 8, 2019
Netflix Personalized Movie Recommendations

For a long time, I have been thinking about how shopping websites like Flipkart or Amazon or movie based platforms like Netflix or even on medium suggest anything based on user interest.

But things are actually very simple. Unlike my other blogs, this will be little short and will suffice enough to brief you about “recommendation system” and of course with working code.

In this blog post, I will build a movie recommendation system using The movies dataset and deploy it using Flask.

Hearing to what Google has to say about it.

A recommender system or a recommendation system (sometimes replacing “system” with a synonym such as platform or engine) is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item

and putting in a simple language “a recommendation system suggest anything relevant based on the used interest

The recommendation system is classified into two type Content-Based and Collaborative based recommendation system. Let’s try to understand each one by one.

The idea behind Content-based (cognitive filtering) recommendation system is to recommend an item based on a comparison between the content of the items and a user profile.In simple words,I may get recommendation for a movie based on the description of other movies.

The theory behind collaborative filtering to work with collaboration with user or movie id.For example, there are two user A and B, user A likes movie P,Q,R,S and user B like movies Q,R,S,T. Since movies Q, R and S are similar to both user, therefore, movie P will be recommended to user B and movie T will be recommended to used A.

Starting with understanding of data first. I have used The Movies Dataset. This dataset has metadata on over 45,000 movies and 26 million ratings from over 270,000 users. For our purpose, we will be using movies_metadata.csv and links_small.csv . This dataset describes the one-to-many relationship among the user and ratings. Before we dive in code lets try to figure out our approach towards the solution. For the ease of understanding, I have tried my hands on both (content and collaborative ) filtering approach on the same data set.

For content-based filtering, the approach is relatively simple we have to just convert the words or text in vector form and to find the closest recommendation to our given movie input title using cosine similarity

Let’s begin with the code.

  1. Reading the dataset from google drive into data frame. Deleting some absurd data and looking for only those movies Id’s that are actually present in links_small dataset( look up for movies metadata) and an important part is to merge all the metadata into one which in our case are “overview” and “tagline”.

2. Once gathering all data as per our need we have chosen TF-IDF to create the vectorizer of our words.The reason behind choosing this algorithm is to give less weight to the words that are occurring frequently example (the, is, and etc.).When calculating the term frequency we divide the total number of words in the document so that longer documents do not have a greater influence than shorter documents.

Since the implemention of Tfidf is very simple but needs to have few improvements. The words like “hate” and “don’t hate” have huge difference but still seems same for the Tfidf.So how can we get out this ?

Additionally, Using the concept of bigram or trigram where Tfidf help us to create vectors in a pair or more which can differentiate the meaning when comes in such pairs.

Once having the vector of all the words we are now ready to step into the algorithm which will eventually tell us who all vectors are similar to each other.

Finally, we will create a function which will show top best-recommended movies on the given input. For this task, I have build a micro-frame (flask) for making web services in Python.

Running the application.

  1. Navigate to the folder of your code.
  2. Once you’re in your project directory, run the Flask application by the command python predict.py
  3. If all went good, you will see the following line on your terminal

4. Copy paste this URL to your web browser and you are all set to see the output.

Movies similar to “3 Idiots”

Closing Note: I hope this blog will help you to build your own recommendation system. In my coming blog, I’ll try to build a generic recommendation system using various embedding technique and neural network.

Thank you for your patience…..Claps(Echoing)

--

--

Anchit Jain
Data Science 101

Machine learning engineer. Loves to work on Deep learning based Image Recognition and NLP. Writing to share because I was inspired when others did.