Friend Recommendation using GraphSAGE

Published in

Stanford CS224W GraphML Tutorials

9 min readFeb 9, 2022

By Canwen Jiao, Yan Wang as part of the Stanford CS224W course project in Autumn 2021.

1. Domain Introduction: Friend Recommendation

Social Media has become one of the best means for people to build social relations among those who share similar backgrounds, values, and real-world connections. Many people rely on social media to extend their contacts and social circles.

As one of the most important components of social media apps, the friend recommendation system is a key element to decide if a social media platform can attract more users.

Friend recommendation has traditionally been done with methods like graph-based heuristics. Recent advances in Graph Neural Networks (GNN) have provided new methods for this task since this family of models can more powerfully capture similarities between users.

In this blog post, we will explain how a GNN-based model works, and build an end-to-end friend prediction model step-by-step.

2. Dataset Description

We use Facebook social circles data in this project [1]. Facebook data was collected from survey participants using Facebook App. The dataset includes node features (profiles), circles, and ego networks.

“Ego” means the owner of the network. The ego user may form circles based on common bonds and attributes between themselves and the users whom they follow [2]. In the figure above, the central user (the ego) is friends with all other users in the network. There are 5 circles annotated on the graph: friends under the same advisor, CS department friends, college friends, family members, and high school friends. The circles may overlap.

For the simplicity of this project, we will only use one of the ego networks in the dataset (e.g. the ego network for User 0). The graph is composed of 347 nodes and 5038 edges. Each node is represented by a 224 dimension feature vector. The code snippet below downloads the data and handles data preprocessing.

# user id, which correspond to file name prefix
USER = 0
file_edges = f'facebook/{USER}.edges'
file_feat = f'facebook/{USER}.feat'
edges_u, edges_v = [], []with open(file_edges) as f:  # load edges file
    for line in f:
        e1, e2 = tuple(int(x) - 1 for x in line.split())
        edges_u.append(e1)
        edges_v.append(e2)
edges_u, edges_v = np.array(edges_u), np.array(edges_v)num_nodes = 0  # assumes nodes are sequential
feats = []  # node featureswith open(file_feat) as f:  # load node features file
    for line in f:
        num_nodes += 1
        a = [int(x) for x in line.split()[1:]]feats.append(torch.tensor(a, dtype=torch.float))
feats = torch.stack(feats)
g = dgl.graph((edges_u, edges_v))  # construct graph
g.ndata['feat'] = feats
g  # print graph metadata

3. Introduction to GNN

Graph Neural Network (GNN) is a type of Neural Network which directly operates on the Graph structure. One of the core concepts of GNN is node embeddings. The goal is to embed a node with multiple layers of non-linear transformations based on the graph structure. With GNN, we are able to solve multiple tasks: node classification, link prediction, community detection, network similarity. Since the tutorial is about friend recommendation, we will focus on the link prediction task here.

The key idea of GNN is to generate node embeddings based on local network neighborhoods. The model can be of arbitrary depth. Every node has its embedding at each layer. Layer-0 embedding of a node is its input feature. Layer-k embedding gets information from nodes that are k hops away. Every such layer is called a message-passing layer. During each message-passing iteration, the embeddings are updated by two steps: message computation and aggregation.

General message computation and aggregation formula — Fig 2: Message computation and aggregation

For example, in Figure 2, Node A is our target node. A’s neighbors are Nodes (B, C, D). Then the neighbors of Nodes B, C, D are (A, C), (A, B, E, F), and (A) respectively. In this message-passing paradigm, the model first computes the message from Node A and C and aggregates the messages. And it passes the aggregated information to Node B. At the same time, Node C and D get updated by message-passing as well. Then the model computes the new message from Node B, C, and D and aggregates the messages. Finally, it passes the aggregated information to Node A. This is a two-layer message passing GNN. We do the same thing for all nodes in the graph.

Fig 3: http://web.stanford.edu/class/cs224w/slides/06-GNN1.pdf

This message-passing operation is permutation equivariant (The target node has the same computation graph for different order plans) and enables every node to accumulate information in its local neighborhood. Through models like GCN and GraphSAGE, we can generate node embeddings for every node in the network. Then we can apply link prediction to the embeddings.

4. GraphSAGE

GraphSAGE is a framework for inductive representation learning on large graphs. It’s now one of the most popular GNN models. GraphSAGE is used to generate low-dimensional vector representations for nodes and is especially useful for graphs that have rich node attribute information [3]. Figure 4 shows the details of the algorithm.

Fig 4: https://www-cs-faculty.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

GraphSAGE algorithm is inductive, which is in contrast to the previous transductive graph neural networks like GCN (Graph Convolutional Networks)[4]. To be more specific, GraphSAGE can be generalized to unseen nodes while GCN can only generate embeddings for the nodes present in the fixed graph during the training. Since one’s friend circles are subject to change over time, GraphSAGE is undoubtedly the better algorithm to use for our friend recommendation task.

However, since the dataset is fixed, we can still use GCN in our project for comparison with GraphSAGE. We implement both models with the DGL library [5].

from dgl.nn import GraphConv
from dgl.nn import SAGEConvclass GCN(nn.Module):
    def __init__(self, in_feats, h_feats):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, h_feats)    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return hclass GraphSAGE(nn.Module):
    def __init__(self, in_feats, h_feats):
        super(GraphSAGE, self).__init__()
        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')
        self.conv2 = SAGEConv(h_feats, h_feats, 'mean')    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

5. Link Prediction

After we obtain node embeddings by running the GNN model, we are still one step away from friend suggestion. Friendship is all about connections in the graphs, so we need to transform what we know about the nodes (users) into information about the edges (friendships); specifically, whether an edge exists in the graph or not.

We can model this link prediction task as estimating a “likelihood” score to every potential edge. One simple way to obtain this score is to take the dot product between the learned embeddings of two nodes. Note that it’s not really an estimate of probability since the dot product can be outside the range of [0, 1] (although we could use a sigmoid to squish the score into this range).

import dgl.function as fnclass DotPredictor(nn.Module):
    def forward(self, g, h):
        with g.local_scope():
            g.ndata['h'] = h
            g.apply_edges(fn.u_dot_v('h', 'h', 'score'))
            return g.edata['score'][:, 0]

We can also choose a more complex function like a single-layer neural network, or other nonlinear functions, to score an edge based on 2 node embeddings.

6. Rank potential friends

Now, we can obtain the suggested list of friends for a given user. For this user, we calculate the scores between the user and all other users that they’re currently not friends with. Then we sort those users by the scores, taking the top K users to recommend to this user as potential friends.

In practice, the K recommended users are probably not the K users with the highest scores, but a random subset of, say, 10 × K users with the highest scores, since a social media platform may want to recommend a different set of friends every day or even every time the user refreshes the page.

To sum up, the entire pipeline is described in the following figure:

7. Experiments

Performance measurement is an important task in any machine learning project, and this consists of two parts: how to split the test set, and how to score the model.

7.1 Train-test split

Normally in a recommender systems setting, the testing set is split based on time. For example, the train set contains all friendships formed up to 2020, and the test set contains all new friendships in 2021, which the model will try to predict.

But since we don’t have access to information about time for the Facebook dataset, we simply split the edges randomly for simplicity. Splitting by time also introduces the cold start problem (how to make suggestions to new users), which is hard to deal with and outside of our project scope.

7.2 Evaluation metrics

For a recommender system, there are numerous ways to evaluate a model, and they usually fall under two categories: ranking-based and classification-based.

7.2.1 Ranking-based metric: HitRate@k

HitRate@K is an example of a ranking-based metric. It measures the fraction of users for which a correct answer is included in the recommendation list of length K. In our scenario, a hit means the user is actually a friend with one of the K friend suggestions.

As illustrated above, we fix K at 3, meaning the recommendation list consists of 3 users; the recommender model does give a successful “hit” for User A. By Computing the proportion of “hits” over all the users, we can obtain a HitRate@K score. Obviously, the higher the score, the better.

7.2.2 Prediction-based metric: AUC

On the other hand, Area Under ROC Curve (AUC) is an example of a classification-based metric, and it is what we use for our project. To explain AUC, we first need to discuss the ROC curve.

On the ROC (Receiver Operating Characteristics) curve, the False Positive Rate (FPR) is plotted on the x-axis, while the True Positive Rate (TPR) is plotted on the y-axis. When the classification threshold is 0, FPR and TPR are both 0, since everything is labeled “negative”. As we increase the threshold all the way to 1, the model classifies more and more data points as “positive”, leading to an increase in both TPR and FPR.

As shown in the graph below, a random classifier (which randomly labels data as positive or negative) is represented by the red dashed line. Better models will have more true positives and fewer false positives.

Now we know how the ROC curve works, AUC is just the area under the ROC curve. Its value ranges from 0 to 1; long story short, it measures the probability that the model produces a higher score for a random positive example, compared to a random negative example.

7.3 Performance of our models

So how do our models perform? In our project, we split the testing dataset by randomly selecting 30% of friendship edges from the original dataset, as described before. GCN achieves an AUC score of ~0.93 on this test set, while SAGE performs slightly worse at a score of ~0.89.

8. Summary

In this project, we walked through all the components in our machine learning pipeline for friend recommendation, from the input features, to the GNN model, to the friend ranking algorithm, and finally the evaluation metrics. Hope you learned something new!

Colab Link https://colab.research.google.com/drive/1Nd9alFlfNvlFrTRGFYFhmLGkP7Iy1kle?usp=sharing

References:

[1] McAuley, J. J., & Leskovec, J. (2012). Learning to discover social circles in ego networks. In NIPS (Vol. 2012, pp. 548–56).

[2] Mcauley, J., & Leskovec, J. (2014). Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(1), 1–28.

[3] Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in neural information processing systems, 30.

[4] Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

[5] Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., … & Zhang, Z. (2019). Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315.