Building a Recommender System Using Graph Neural Networks

Jérémi DeBlois-Beaucage
Decathlon Digital
Published in
11 min readMar 31, 2021

This post covers a research project conducted with Decathlon Canada regarding recommendation using Graph Neural Networks. The Python code is available on GitHub, and this subject was also covered in a 40min presentation + Q&A available on Youtube.

Image source: decathlon.ca

Graph Neural Networks (GNNs) have been soaring in popularity in the past years. From numerous academic papers to concrete implementations, multiple researchers have pushed forward the understanding of GNNs.

One of the popular tasks tackled with this new methodology is recommendation. Recommender systems are tools for finding relevant information among ever increasing options, and have become widespread in the digital world.

This article covers the whole process of building a recommender system using GNNs, from getting the data to tuning the hyperparameters. We will be following the case of Decathlon Canada, a general sporting goods store with e-commerce platforms.

Table of contents

  1. Defining the task
  2. Getting the data
  3. Building the graph
  4. Designing the model
  5. Training the model
  6. Seasonality
  7. Hyperparameter tuning
  8. Results, Next avenues and Conclusion

Defining the task

Recommend items of interest to users

Recommendation has gathered lots of attention in the last few years, notably through efforts of giants such as Amazon and Netflix. Users are suggested items, movies, or any type of content, that are considered as interesting for them. Those recommendations are usually based on user or item characteristics, or on the users’ previous clicks, purchases and interactions.

Example of recommendation: Movies considered as “Top Picks” for the user

Recommendation is a complex task; consumers are multi-faceted and preferences can be hard to predict. At Decathlon Canada, recommendations are served to members at multiple locations, notably on the website and in emails.

Example of item recommendations for a user interested in weightlifting. (Reconstructed, not actual screenshot of website)

Previous models tried

To tackle the task, the artificial intelligence team at Decathlon Canada tried numerous models. Among those we can count matrix factorization, RNNs and nearest neighbours models, as presented in two previous articles (here and here).

After having experimented with common models, the AI team got interested in Graph Neural Networks, notably for their complex modelling capacities.

Why GNNs

The basic intuition is the following: the available data might be better represented in a graph. GNNs can leverage both content information (user, item features) as well as graph structure (user-item interaction, relations with sports) [1], whereas typically, traditional models can leverage only one of the two.

Our approach

In simple terms, we propose an encoder-decoder, or representational learning approach. It can be divided in two steps.

  • Generate high quality embeddings for all users and items (more info on embeddings here)
  • For all users, predict item preferences using the embeddings

Generating embeddings is done through information propagation, also called neural message passing. Predicting preferences is done through simple cosine similarity. Using this approach, we manage to reach promising results and can propose further adjustments to enhance the model.

To start off, let’s look at the available data.

Getting the data

Three main datasets are available and relevant for the task.

User-item interactions

A user-item interaction can be either a user clicking on an item or a user buying an item.

User-item interactions. (Data was modified to protect confidentiality)

Item features

Item features include information about if the item is for male, female, or junior, about label and diverse groups related to the item, and about sports linked.

Item features. (Data was modified to protect confidentiality)

User features

User features only include sports declared as favorites. In the case of Decathlon, user features are limited. However, features such as age, gender and geolocation are often available in typical e-commerce settings.

User features. (Data was modified to protect confidentiality)

Building the graph

A graph can be defined as a set of nodes (or vertices) and edges (or links) that connect the nodes.

Graphs are common in the real world (e.g. social networks, molecules).

Examples of graphs.

We use the available data to build the Decathlon graph. In our case, the nodes are users, items and sports. The edges are click, purchase, favorite, related and belonging, as shown in the example graph below.

Simplified Decathlon graph: 3 types of nodes, with 5 types of edges.

For example, a user will be linked to items they purchase, to items they click on and to their favorite sports.

Designing the model: embedding generation

In simple terms, the embedding generation model consists of as many GNN layers as wished. Each layer performs an exchange of information between all immediate neighbors in the graph. The number of layers defines “how far” information is propagated: a node in a 2-layer GNN model will receive information from its immediate neighbors and its immediate neighbors’ neighbors, for example.

Here is the pseudo-code, for a given node:

  1. Fetch incoming messages from all neighbors
  2. Reduce all those messages into 1 message by doing mean aggregation
  3. Matrix multiplication of the neighborhood message with a learnable weight matrix
  4. Matrix multiplication of the initial node message with a learnable weight matrix
  5. Sum up the results of step 3 and 4
  6. Pass the sum through a relu activation function
  7. Repeat for as many layers as wished. The result is the output of the last layer.

Here is a visual overview of the process. For simplicity’s sake, the graph shown only includes users, items and 2 edge types.

Visual overview of message passing. Inspired by Graph neural networks: Variations and applications

Mathematically, the process can be defined as the following:

(Note that aggregation here is presented as a sum, but other options are available, notably mean, max and min.)

We use Deep Graph Library to build the model, with PyTorch as the backend framework. The code for a single layer of message passing can be simplified to this:

class ConvLayer(nn.Module):
def __init__(self):
self.fc_self = nn.Linear(in_feats, out_feats)
self.fc_neigh = nn.Linear(in_neigh_feats, out_feats)
def forward(self, graph, x):
h_neigh, h_self = x
graph.srcdata['h'] = h_neigh
graph.update_all(fn.copy_src('h', 'm'),fn.mean('m', 'neigh'))
h_neigh = graph.dstdata['neigh']
z = self.fc_self(h_self) + self.fc_neigh(h_neigh)
z = F.relu(z)

(Reminder: the project code is available on GitHub.)

Designing the model: scoring preferences

The generated embeddings are used to predict the probability that a connection between two nodes exists. The predicted probability of interaction between a user u and an item v is given by the following equation, where f( · ) is a cosine similarity function:

For each user, we compute scores for all items that they did not interact with. The highest scored items are recommended.

Training the model

The parameters optimized during the training are the trainable matrices, i.e. W in the embedding generation equation. To train the model, a max-margin loss function is used, with negative sampling. Only click and purchase edges are included in the training set.

In simple terms, the model has to predict a higher score for a “positive edge” than for random “negative edges”.

Positive edges are the actual edges between users and items they interacted with, whereas negatives edges are edges between a user and random items that they did not interact with.

Main concern: Seasonality

A particularity of the Decathlon case concerns seasonality. Some e-commerce platforms like Decathlon’s thrive on seasonality: chances are that no one will buy beach tennis rackets when snow gloves are popular, and vice-versa. Seasonality accounts for a lot of the variations in the pattern of customers’ interests.

The leveraging of seasonality represents one of the contributions of this research. Two main adjustments are put forward.

First, the model is trained on pairs of users and items, yet not all pairs are used. The model is only trained on recent transactions; the model learns to give great scores only to products that were bought recently — allowing us to safely assume that these are products adequate in this season. The other pairs are still kept in the graph, allowing the model to get as much information as possible from the previous interactions, while only being optimized for seasonal products.

Second, we give greater importance to recent interactions when computing the loss. The loss function presented in the previous section is adjusted so that the loss for a given interaction is inversely proportional to the number of days that have passed since that interaction occurred.

Hyperparameter tuning

We conducted tests for multiple functionalities. In our specific case, the model’s performance prove to be sensitive to some functionalities and much less sensitive to others.

Adding Sports as Nodes

Typically, GNN recommender systems use bipartite graphs, with only user and item nodes. We added sports as nodes and multiple edge types. Compared to a simple bipartite graph, this complex graph significantly enhances the model’s performance; the model has a richer source of information to learn from.

Number of Layers

The embedding generation process can be repeated as many times as wished, i.e. for a fixable number of layers. The optimal number of layers is between 3 and 5. More or less layers lead to poorer results. Apparently, a greater number of layers seems to yield unspecific embeddings: numerous presumably different nodes end with similar embeddings.

Loss Function Delta

As presented in the previous section, the loss function includes a fixed hyperparameter, ∆. The optimal delta is found to be around 0.266. A too low delta often leads to poor learning, where the model does not give low scores to negative pairs, whereas a too high delta often leads to no learning at all, the task being too hard for the model.

Embedding Dimensions

When generating embeddings, we have to define the size of hidden latent space and output space. The optimal hidden space is around 256 dimensions, and the optimal output space is around 128. Spaces with lower dimensionality leads to embeddings of lesser quality, whereas spaces with greater dimensionality slow the model training without improving the model’s performance.

Negative Sample Size

In the loss function, the numbers of negative samples has to be fixed. The optimal number is between 700 and 3000; too few samples lead to poor learning, and too many slow the training without improving the performances.

Other Considerations

Different aggregating function are tried, notably maximum and minimum. Other learnable weight matrices are inserted in the equation. Aggregating information from the different neighborhoods is also tried with mean, sum and maximum. Normalization of the embedding after each layer is also
tried. In the end, these functionalities have limited impact on the performance.

Removing false negative samples and changing the predicting function for a simple multi-layer perceptron also have negligible impact on the model performance.

Results

We split the data into train and test sets using temporal indicators. For all the available users, 50 weeks of data are used for the training, and 2 weeks are used for testing.

The GNN model’s performances are compared to a simple baseline model, where all users are recommended the most popular items of the past 2 weeks.

Metrics used for comparison are recall, precision, and coverage, for 10-item recommendations.

  • Recall is the fraction of items that users have interacted with (as seen in the test set) that can be found back in the recommended items.
  • Precision is the fraction of recommended items that our user has indeed interacted with.
  • Coverage represents the percentage of the items in our catalog which has been recommended to at least one user.

The GNN model largely outperforms the most popular items baseline. According to subject-matter experts at Decathlon, the GNN model achieves results similar to those of the current models in place. However, the GNN model uses representational learning: the learned embeddings might be useful for other tasks, such as different types of recommendations, e.g. sports activities.

Also, multiple additional improvements are at reach for the GNN model, whereas the current models in place appear to have reached their peak performances.

Next avenues

More data

A major improvement concerns the input data: it could be interesting to add more knowledge into the graph. For Decathlon, this means adding nodes of item families, departments, and universes, accordingly linked to subgroups and items. This richer structure could allow the extraction of greater information from the graph.

In parallel to changing the graph structure with new nodes and edges, it could be interesting to attribute other features to the nodes. Currently, nodes are initialized with basic features. More complex features could be considered, especially for the items: nodes could be initialized with the embedded vectors of their images or of their text label.

Exploring other GNN methodologies

The message passing could be enhanced by neighborhood sampling as
presented in [1], and different functions could be tried for embedding generation.

Other loss functions such as Weighted Approximately Ranked Pairwise (paper here) might also prove to be relevant. Enhancing the negative sampling methodology to generate harder negative samples could also be considered. Finally, attention has gathered keen interest in multiple subfields of machine learning. Some research explored its relevance for GNNs, such as Knowledge Graph Attention Network [2].

Conclusion

We explored GNNs for e-commerce recommendation, detailing the embedding generation with message passing and the scoring function with cosine similarity. We proposed adjustments to tackle seasonality and ways to build a more complex graph with richer information. The results are encouraging and as such, GNNs are a promising new methodology.

GNNs as R&D for Experimented Teams

In my opinion, GNNs are not a low-hanging fruit. The methodology is still a bit experimental: industrial implementation papers and established libraries are limited, especially compared to CNNs and RNNs. However, the subject gathers keen interest; it should be more and more documented and accessible in the coming years.

Changing the Data Paradigm: Major Advantages

GNNs learn on graph data instead of tabular data. This allows to integrate virtually all available information, which makes GNNs a very flexible framework. Both content information (user, item features) and graph structure can be leveraged. Multiple different tasks can also take place in the same graph, e.g. sports recommendations in the case of Decathlon.

Graph Neural Networks are very interesting; at Decathlon Canada, we are looking forward to more implementations of this nascent methodology. Questions, comments, ideas? Feel free to reach out to me via LinkedIn!

Thanks to the members of the AI team at Decathlon Canada for the comments and review, in particular Samuel Mercier, Cheng Zhang and Yan Gobeil. Also thanks to Professor Laurent Charlin from HEC Montreal for helpful discussions, feedback, and guidance. This project was supported by the Mitacs Accelerate program.

Decathlon Canada is hiring!

Are you interested in computer vision and the application of AI to improve sport accessibility? Follow https://joinus.decathlon.ca/en/annonces to see the different exciting opportunities, notably AI Backend Developer roles.

References

[1] Ying, Rex, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. “Graph Convolutional Neural Networks for Web-Scale Recommender Systems.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, July 19, 2018, 974–83. https://doi.org/10.1145/3219819.3219890.

[2] Wang, Xiang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. “KGAT: Knowledge Graph Attention Network for Recommendation.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, July 25, 2019, 950–58. https://doi.org/10.1145/3292500.3330989.

Other notable references

--

--