LightGCN with PyTorch Geometric

Published in

Stanford CS224W GraphML Tutorials

9 min readJan 18, 2022

By Hikaru Hotta and Ada Zhou as part of the Stanford CS224W course project

Behind digital services like Netflix, Twitter, and Spotify are recommender systems that predict your interests and influence what you might buy, watch, and read. In this post, we walk through LightGCN, a graph-based collaborative filtering recommender system model, implemented in PyTorch and PyG.

How did you find new movies that you might like in the analog? Maybe you asked friends with similar tastes for new recommendations. This is the crux of collaborative filtering, a common approach utilized by many recommender systems. The underlying assumption of collaborative filtering is that if user A has the same opinion about an item as user B, they are more likely to have the same opinion about a different item [4].

For example, say we were to use a collaborative filtering approach to recommend movies to User A.

We know User A likes the movies [X] and [Y]. We also know that three other users, B, C and D, like movies [X] and [Y] and movie [Z]. From this, we might recommend movie [Z] for User A. In collaborative filtering, we use User A’s previous preferences and the preferences of other users for new recommendations, making the recommendation “collaborative” in nature.

Now let’s walk through how we could implement a recommender system for movies using a graph-based collaborative filtering approach. You can also follow along on our Colab Notebook.

LightGCN Google Colaboratory

PyG Implementation of LightGCN on MovieLens Dataset

colab.research.google.com

An overview diagram — Overview Diagram of LightGCN Model

Brief Overview of LightGCN

Standard collaborative filtering methods such as user-based collaborative filtering using the Nearest Neighbor algorithm tries to find another user with the most similar rating vector to the query user [4]. This rating vector is simply a row of the user-item adjacency matrix corresponding to the user. However, this method is not sufficient to capture the complex interactions between users and items. Therefore, learning vector representations of users and items is crucial for modern recommendation systems.

Thus we use LightGCN, a collaborative filtering model introduced in this paper, in our model which not only learns robust representations of user-item interactions but also reduces the computational overhead of other recommendation systems by removing unneeded parameters and non-linearities [1].

LightGCN is a simple yet powerful model derived from Graph Convolution Networks (GCNs). GCN’s are a generalized form of CNNs — each pixel corresponds to a node on a graph and pixels touching each other correspond to edges — which can be applied to graph structures beyond the euclidean structure of images. GCN’s are a popular choice for collaborative filtering due to their ability to capture structural information of a graph network. LightGCN tailors GCN for recommendation by simplifying its design and computational complexity while continuing to capture salient structural information on graphs.

In this article, we provide a breakdown of the implementation of LightGCN. Firstly we will delineate how to read in the dataset and the nuances of splitting graph-based datasets. From there we will discuss the detailed components of a LightGCN model (ie. initial embeddings, message and aggregation functions, and loss function). Finally we will discuss evaluation and metrics of our model.

Building A Recommender System

Setup: Reading In Data

We will utilize the MovieLens Small Dataset which is used as a benchmark in many recommender system papers [3]. The dataset contains datasets of users, movies, and ratings that users have given movies.

Our objective is to suggest movies to users that they haven’t yet rated. We represent both users and movies as nodes and ratings between users and movies as edges. Since we only want to recommend movies that users might like, we preprocess the data by only including ratings (thus edges) of 4 or higher which were originally on a 0.5–5 scale. Our training objective now boils down to an edge prediction task in which we predict whether a valid edge exists between two nodes. We set this up as a homogeneous graph, where there is no distinction between the user and movie nodes in the graph structure. We are able to treat users and movies as the same type of node, because the graph is bipartite, with edges existing only between users and movies. The adjacency matrix of our homogeneous graph representation will be sparse as shown in the figure.

Representations of our input data: (a) dataset; (b) chosen graph structure; (c) matrix representation

We represent our adjacency matrix as a PyG SparseTensor as the vast portion of elements in our adjacency matrix are zero. This representation vastly reduces our memory overhead.

Setup: Splitting Data

With our graph representation determined, we need to split our data into a training, validation and test sets. Since we are working on an edge prediction task, we choose to transductively split our input graph by edges. Therefore, only a subset edges are seen by each of the dataset splits and the underlying input graph remains the same across each split.

Input Graph Split for Training Set and Validation/Evaluation Set

LightGCN: Initialization

Now that we have our graph processed, we will build our LightGCN model. The only trainable parameters in LightGCN are the initial node embeddings (E⁰) aka the 0-th layer embedding vector for the users and movies. We use a shallow embedding representation where the encoder is simply an embedding look up.

Diagram of how our input graph nodes get encoded into the embedding space with a shallow embedding

The PyTorch module, nn.Embedding as seen below will assign each node an embedding vector such that we have an embedding matrix Z where each row is the d-dimensional embedding vector for a node in our graph.

LightGCN: Message and Aggregation

The message and aggregation functions of a GNN propagate information at each layer. Neural Graph Collaborative Filtering (NGCF), the predecessor of LightGCN follows a similar embedding propagation function to Graph Convolution Networks by using the propagation function defined below.

LightGCN simplifies the above propagation rule by removing the non-linearity, feature transformation matrices, and skip connection as shown below. The intuition behind this is that since the inputs to collaborative filtering are merely the node ids (without rich features), performing multiple non-linear transformations will not contribute to better performance.

The following is mathematically equivalent to the following:

The removal of the nonlinearities significantly simplifies the model and allows embeddings to be diffused along the graph.

After k-levels of simple diffusion propagation, the final embedding of nodes will be a weighted average of the embeddings at every layer. This helps to approximate self-loops in nodes where the final embedding is informed by the embedding of the node itself and the neighbors and also helps to prevent over-smoothing.

We utilize PyG’s gcn_norm function to generate the symmetrically normalized adjacency matrix in our implementation of LightGCN. The GCN normalization of the adjacency matrix scales the “importance” of each neighbor to a node based on how niche the neighbor is.

Weighted average of embeddings at every layer for final embedding

LightGCN: Loss Function & Negative Sampling

Decoding final node embedding to a score for each pair of user and movie

The score function for edge prediction takes the dot product of the final embeddings of users and movies. We want to encourage the model to predict a high score between users and movies that they have rated highly and to predict a low score between users and movies that they have not rated highly.

To do this we use a Bayesian Personalized Ranking (BPR) loss:

For each user, BPR loss encourages the model to predict a higher score for a positive sample than for a negative negative sample. Read more about BPR here.

We use PyG’s structured_negative_sampling function to conduct negative sampling which we use to compute BPR loss along with the positive samples:

Our model’s BPR Loss plot for train and validation sets

Evaluation Metrics

To evaluate our recommender system, we take the dot product of our final user and movie embeddings to obtain the scores of every user-movie pair. We take the top K scoring movies for each user that were not in the training set and quantify the following metrics: (1) recall; (2) precision; and (3) Normalized Discounted Cumulative Gain (NDCG). NDCG is calculated with the following:

NDCG is a common metric used to evaluate recommendation systems. While precision and recall are order invariant, NDCG takes into account the ordered ranking of the recommendations. NDCG takes the ratio of the Discounted Cumulative Gain (DCG) of the predicted recommended order over the ideal order. DCG essentially calculates the sum of all the scores by weighing the scores by the ranking of the movie for the user. Read more about NDCG here.

Our model metrics:
+------------+-----------+--------------+---------+
|   Split    | Recall@20 | Precision@20 | NDCG@20 |
+------------+-----------+--------------+---------+
| Train      |   0.12159 |      0.24097 | 0.28055 |
| Validation |   0.15031 |      0.04602 | 0.10599 |
| Test       |   0.12543 |      0.04629 | 0.10125 |
+------------+-----------+--------------+---------+

Conclusion

Congratulations! You have now built a LightGCN recommendation system for movies. Use our Colab to test out to see what your model recommends for random users in the graph:

Here are some movies that user 1 rated highly:
title: Forrest Gump (1994), genres: Comedy|Drama|Romance|War 
title: Matrix, The (1999), genres: Action|Sci-Fi|Thriller 
title: Silence of the Lambs, The (1991), genres: Crime|Horror|Thriller 
title: Star Wars: Episode IV - A New Hope (1977), genres: Action|Adventure|Sci-Fi 
title: Fight Club (1999), genres: Action|Crime|Drama|Thriller 
title: Schindler's List (1993), genres: Drama|War 
title: Star Wars: Episode V - The Empire Strikes Back (1980), genres: Action|Adventure|Sci-Fi 
title: Braveheart (1995), genres: Action|Drama|War 
title: American Beauty (1999), genres: Drama|Romance 
title: Usual Suspects, The (1995), genres: Crime|Mystery|Thriller 

Here are some suggested movies for user 1:
title: Shawshank Redemption, The (1994), genres: Crime|Drama 
title: Pulp Fiction (1994), genres: Comedy|Crime|Drama|Thriller 
title: Godfather, The (1972), genres: Crime|Drama 
title: Terminator 2: Judgment Day (1991), genres: Action|Sci-Fi 
title: Lord of the Rings: The Return of the King, The (2003), genres: Action|Adventure|Drama|Fantasy 
title: Lord of the Rings: The Fellowship of the Ring, The (2001), genres: Adventure|Fantasy 
title: Lord of the Rings: The Two Towers, The (2002), genres: Adventure|Fantasy 
title: Apollo 13 (1995), genres: Adventure|Drama|IMAX 
title: Aladdin (1992), genres: Adventure|Animation|Children|Comedy|Musical 
title: Sixth Sense, The (1999), genres: Drama|Horror|Mystery

The above may have highlighted an issue of the collaborative filtering approach: how could you recommend movies to new users or how would you recommend new movies to users? This is referred to as the cold start problem. Something for you to look into and explore as you can continue to learn about recommender systems and models beyond LightGCN (below are some relevant links to continue your deep dive)!

Links

https://towardsdatascience.com/recommender-systems-in-practice-cef9033bb23a — General overview of recommender systems
https://www.hindawi.com/journals/misy/2020/8912065/ — Cold Start Research Paper
https://medium.com/yusp/the-cold-start-problem-for-recommender-systems-89a76505a7 — High-level overview of cold start
https://paperswithcode.com/task/recommendation-systems — List of recommendation systems for popular datasets

References

[1] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, M. Wang, LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation (2020), arXiv:2002.02126.

[2] S. Rendle et al., BPR: Bayesian Personalized Ranking from Implicit Feedback (2009), arXiv:1205.2618.

[3] F. M. Harper and J. A. Konstan, The MovieLens Datasets: History and Context (2015), ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19.

[3] S. Luo, Introduction to Recommender System (2018), Towards Data Science

[4] J. Leskovec, GNNs for Recommender Systems (2021), CS224W Machine Learning with Graphs

[5] X. Wang, X. He, M. Wang, F. Feng, T. Chua, Neural Graph Collaborative Filtering (2019), arXiv:1905.08108

[6] P. Chandekar, Evaluate your Recommendation Engine using NDCG (2020), Towards Data Science