Graph Neural Network based Movie Recommender System

Tamir
Stanford CS224W GraphML Tutorials
6 min readFeb 9, 2022

The Movie Recommender System is an important problem because these tasks are widely used for movie recommendations by services like Netflix or Amazon Prime video. There have been numerous efforts to build a recommender system using Collaborative Filtering, Neural Networks based on Embeddings, Neural Collaborative Filtering, etc. This post will introduce a Graph Neural Network (GNN) based recommender system. Specifically, we will focus on Inductive Matrix Completion Based on GNNs.

The full code for this post could be found in Google Colab.

  1. Exploratory Data Analysis

I’ve chosen a MovieLens dataset for this task, as it contains a rich amount of user-movie rating data. The goal is to predict the next movie recommendation using a past movie watch and rating information. We will work with the Movielens dataset containing 100k ratings, as it has a manageable size to train on a single GPU. The dataset contains ratings generated by 610 users for 9724 movies. The ratings are between 0.5 to 5 stars, in 0.5 stars increment.

Let’s start by downloading the data from the MovieLens website, loading data into Python, and merging the rating data with movie metadata.

Movielens Dataset

Next, let’s look at the distribution of the rating data:

Movielens 100k Dataset Rating Distribution

The most popular rating is 4-stars, followed by 3-starts. Surprisingly, not many people assign ratings below 2-stars.

First, we need to convert our data to a rating matrix R, where each user u represents a row, each column v represents a movie. An edge between u and v represents the rating given by u to v and has a value of r = Rᵤᵥ . This is a bipartite graph, meaning no edges between two users or two items.

The transformation could be done conveniently with one line of code using Pandas:

I’ve selected a few active users and their ratings for the popular movies. NaN is shown if a user never rated a given movie.

Bipartite graph

The matrix R only contains past events, meaning users who already watched a particular movie and rated it. The goal is to complete the matrix R by predicting the rating users would have given to the movies if they watched them. Then, the highest-rated predicted movies could be used recommended to users.

2. Data Preprocessing

To predict the user’s movie preferences, we will use a novel Inductive Graph Based Matrix Completion (IGMC) framework introduced at the ICLR 2020 conference.

h-hop neighborhood

In the first part of the IGMC framework, we extract an enclosing subgraph for a given graph G. For a given rating between user u and item v we extract an h-hop enclosing subgraph around these two nodes. For example, 1-hop neighbors are immediate neighbors of a given node. 2-hope includes neighbors of neighbors. h-hop neighborhoods are the nodes that could be reached at most h-steps from the target node.

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

We remove the edge (u, v) at a training time because we need to predict it, then feed the obtained subgraphs to a GNN model and regress on the rating value Rᵤᵥ . We use the obtained subgraphs to predict the value Rᵤᵥ directly at a test time. I won’t go into the basics of GNN, however, you can read about it here.

The second part of the IGMC framework is node labeling. Ideally, the algorithm needs to store some information to understand which nodes are target user and target item and differentiate between users and items in the enclosing subgraphs. The paper proposes to assign 0 to the target user and 1 to the target item. Next, the users encountered in the neighborhood during the iᵗʰ hop would get a value 2i, whereas items get a value 2i + 1. The approach will help differentiate between users (even values) and items (odd values), as well as user-item encountered at different hop distances (smaller value means closer to the target user-item).

Then we will one-hot encode these node ids and feed this information as a node feature. Since we are assigning node value at a local neighborhood subgraph level, this approach will generalize well to the new nodes, unlike the global node id approach.

The code for the preprocessing part has been borrowed from the IGMC paper authors’ official Github code. Specifically, the data load and preprocessing and enclosing subgraph extraction.

3. Model Architecture

The IGMC architecture consists of the message passing layer and pooling steps.

First, we define an optional graph-level dropout layer. It randomly drops edges from the graph, helping avoid overfitting and making the model more robust.

Next, the message passing layer extracts node information for each node in the subgraph. It could be implemented using R-GCN to handle different edge types. We stack L message passing layers and add tanh non-linearity between each layer. Then we concatenate the node representations at each layer in the final node representation h. Further we pull graph level features g by concatenating target user and item representations. Finally we add two linear layers with ReLU and Dropout layers in between. All the model parameters were chosen following the IGMC paper.

4. Training

The next part is to train IGMC.

First, we load training and test data using DataLoader in Pytorch with a batch size of 50.

We use an Adam optimizer for this task, as it is an excellent optimizer to choose by default. We start from a learning rate of 0.001 and decrease by a factor of 10 every 20 epochs.

Here we use a Mean Squared Error loss between the predicted and actual values and train the model for 80 epochs. Then we evaluate the model performance on the test set.

I’ve reached an RMSE of 0.943, which is a good result and somewhat in the ballpark of the RMSE of 0.905 quoted in the IGMC paper

5. Model Evaluation

Now the fun part!

I’ve decided to rate several movies, input my data to the model and see the top recommendations. Here is my input data:

  • Toy Story - 4.5
  • Seven (a.k.a. Se7en) - 3.5
  • Logan - 5.0
  • John Wick: Chapter Two - 4.0
  • Split - 5.0

The model’s top recommendations came as:

  • The Lord of the Rings: The Return of the King
  • The Lord of the Rings: The Fellowship of the Ring
  • Léon: The Professional (a.k.a. The Professional)
  • Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark)
  • Star Wars: Episode IV — A New Hope
  • The Dark Knight
  • The Matrix
  • The Lord of the Rings: The Two Towers
  • Star Wars: Episode V — The Empire Strikes Back
  • Inception

In general, I like the recommendations! The Lord of the Rings, Star Wars, and Matrix movies are among my favorite movies as a kid. I like Inception and Dark Knight too. I haven’t watched the Indiana Jones and Leon movies, so I guess those would be on my watchlist over the weekend.

6. Conclusion

I’ve chosen this model because it has shown good performance for recommender system tasks. The benefit of this approach against the previously used method, it does not use any side information but instead uses the graph structure to make a prediction. The IGMC model is inductive, hence it generalizes well to new data points, unlike transductive methods that are not generalizable to movies that weren’t seen in the training data. The model needs to be inductive with the movie prediction, as many new movies are released each year!

7. References

--

--