Metadata-based Recommender Systems in Python

This blog illustrates a metadata-based recommender system in python

Saket Garodia
Analytics Vidhya
5 min readJan 2, 2020

--

Before starting with the implementation of Metadata-Based Recommender systems in python, I will recommend you to give a short 4-min read to this blog which defines a recommender system and its types in laymen terms.

https://medium.com/@saketgarodia/the-world-of-recommender-systems-e4ea504341ac?source=friends_link&sk=508a980d8391daa93530a32e9c927a87

Through this blog, I will show how to implement a Metadata-based recommender system in Python on Kaggle’s MovieLens 100k dataset.

The dataset we will be using is the MovieLens 100k dataset on Kaggle :

Let us start implementing it.

Problem formulation

To build a recommender system that recommends movies based on the genre, cast, crew and some keywords of a previously watched movie.

Implementation

First, let us import all the necessary libraries that we will be using to make a content-based recommendation system. Let us also import the necessary data files.

After merging the independent data files credits.csv, keywords.csv and movies_metadata.csv, here’s what we have in the movies data frame:

Now, let us do the required wrangling of the data because the data seem to be in a messy format with a lot of information we won’t be using in our analysis.

The genres, cast, crew, and keywords columns all are of an object (or a string datatype). Let us get the required words we will be using from these columns by first using literal_eval to convert these strings into python objects (a list of dictionaries here) and use pandas and numpy to wrangle them.

After performing the necessary cleaning and choosing just 3 genres, 3 keywords and cast for illustration purpose, here’s how our data looks:

Now, we have the required clean data for building metadata-based recommender systems. We just need to remove spaces between the names and surnames because if we won’t remove the spaces, movies for Tom Cruise and Tom Hanks will be considered the same by machine because the first name ‘Tom’ is common. Let us remove the spaces so that Tom Hanks becomes TomHanks and Tom Cruise becomes TomCruise.

Now, all the names in the cast have 0 spaces which makes them unique.

Now, let us make 1 column of all the metadata that we have by appending the values in the genres, cast, crew, and keywords column.

Due to memory issues in google colab, I will just run the first 10000 movies to build a recommender system. The same codes can be used to scale further.

We will use a CountVectorizer to built numeric features from our metadata. We won’t use TfIdf here because there might be many movies with the same directors and we definitely don’t wanna penalize that director. It might be possible that a user wants to be recommended movies belonging to that director. Most of the words we have are names and genres whose counts are actually useful for recommending movies.

I will use cosine similarity to find the similarity between any 2 movies. Now let's make a cosine similarity matrix using count vectorizer values and then let's build a recommender function.

Now that we have built our recommender function, let's see how it works. Let’s try to get movies similar to ‘Blind Horizon’ using metadata-based recommender system.

recommend_movies_based_on_metadata(‘Blind Horizon’)

For the input movie ‘Blind Horizon’, we have been recommended 15 movies by learning its metadata. Isn’t it amazing?

To know about the Content and Collaborative-Filtering based approaches, go through my following blogs:

  1. Content-based Recommender Systems: https://medium.com/@saketgarodia/content-based-recommender-systems-in-python-2b330e01eb80?
  2. Recommender Systems using Collaborative Filtering: https://medium.com/@saketgarodia/recommendation-system-using-collaborative-filtering-cc310e641fde

Thank you

Keep learning

--

--

Saket Garodia
Analytics Vidhya

Senior Data Scientist at 84.51(Kroger), AI/Data Science, Psychology, economics, books; Linkedin — https://www.linkedin.com/in/saket-garodia/