# OhMyGraphs: GraphSAGE in PyG

In a (much) earlier post, I described the intuition and some of the math behind a basic graph neural network (GNN) algorithm, GraphSAGE. How can we implement GraphSAGE for an actual task?

I’m a PyTorch person and PyG is my go-to for GNN experiments. For much larger graphs, DGL is probably the better option and the good news is they have a PyTorch backend!

If you’ve used PyTorch before, most of this will be intuitive so let’s jump in!

**Installation**

PyG is rapidly being developed and new releases are frequent. I find that I always have some sort of conflict with the various packages needed whenever there is a new release. Below are the versions I’m using in this notebook. You can install with `pip`

or `conda`

but beware to select the right device version: ie, `cuda10`

, `cuda9`

or `cpu`

. Installation instructions in the docs are here.

`torch 1.8.0`

torch-cluster 1.5.9

torch-geometric 1.7.0

torch-scatter 2.0.6

torch-sparse 0.6.9

torch-spline-conv 1.2.1

**The convolution layer**

The goal of graph convolution is to change the feature space of every node in the graph. It’s important to realize the graph *structure *doesn’t change ie, in the before and after visual below, the same nodes are connected to each other. The magic behind graph convolution occurs in *how* that new feature is computed for each node.

PyG has various types of convolution layers; in this post, we’ll simply go over utilizing the SAGEConv layer which is one iteration of the aggregate-and-update step (see previous post!). You can instantiate one layer of graph convolution by simply specifying the input and output feature shapes expected — very similar to normal convolution in PyTorch.

`from torch_geometric.nn import SAGEConv`

conv = SAGEConv(input_dim, output_dim)

A forward pass through the convolution layer requires two things, the node feature matrix, **X **and** the adjacency matrix.**

`x = conv(data.x, data.adj_t)`

Recall, the **X** matrix is an **(n x D)** matrix where the **D** is the dimensionality of every node in the graph. Alternatively, if you cannot create an adjacency matrix (since they can explode in size with a large number of nodes!), you can use an edge list. The **edge list** is expected to be a **(2 x n)** matrix where the first row in the matrix represents source nodes and the second row represents target nodes.

## What’s happening under the hood?

The default aggregation function for `SAGEConv`

is mean aggregation which just means, I’m going to take my neighbours node features and average them (that’s the second term). The update step is simply a linear combination of the neighbours representation and the newly transformed current node representation (the first term). PyG handles the message passing and figuring out the neighbours of every node *i *etc.

# Creating a model

The GraphSAGE model is simply a bunch of stacked `SAGEConv`

layers on top of each other. The below model has 3 layers of convolutions. In the forward method, you’ll notice we can add activation layers and dropout (you could even throw in some batch norm!)

The below model is training a **node classification** model. This model is effectively trying to get the last layer of the model to have the same number of neurons as there are classes in the dataset. Adding a softmax at the end trains the model to output the most likely class for each node.

`class GraphSAGE(torch.nn.Module):`

def __init__(self, in_dim, hidden_dim, out_dim, dropout=0.2):

super().__init__()

self.dropout = dropout

self.conv1 = SAGEConv(in_dim, hidden_dim)

self.conv2 = SAGEConv(hidden_dim, hidden_dim)

self.conv3 = SAGEConv(hidden_dim, out_dim)

def forward(self, data):

x = self.conv1(data.x, data.adj_t)

x = F.elu(x)

x = F.dropout(x, p=self.dropout)

x = self.conv2(x, data.adj_t)

x = F.elu(x)

x = F.dropout(x, p=self.dropout)

x = self.conv3(x, data.adj_t)

x = F.elu(x)

x = F.dropout(x, p=self.dropout)

return torch.log_softmax(x, dim=-1)

Beware, I’m calling this model `GraphSAGE`

but the original paper’s set up of conv layers, activation etc are described here. The only “SAGE” thing about this model is the `SAGEConv`

layers.

# Datasets

I haven’t talked too much about datasets because much of the research in GNNs use standard datasets available in PyG. There’s definitely nothing stopping you from creating a custom dataset but that’s another post for another day (especially when you have a large graph!). In this example, I’ll use a dataset that comes packaged in OGB. To load this into your notebook, make sure to `pip install ogb`

.

The below snippet loads an Amazon products dataset from `ogb`

.

import torch

import torch_geometric.transforms as T

from ogb.nodeproppred import PygNodePropPredDatasetdevice = f'cuda' if torch.cuda.is_available() else 'cpu'

device = torch.device(device)dataset = PygNodePropPredDataset(name='ogbn-products',

transform=T.ToSparseTensor())

data = dataset[0]# this dataset comes with train-val-test splits predefined for benchmarking

split_idx = dataset.get_idx_split()

train_idx = split_idx['train'].to(device)

Some basic information about the dataset is packaged in the `data`

object:

print(f' dataset has {data.num_nodes} nodes where each node has a {data.num_node_features} dim feature vector')print(f' dataset has {data.num_edges} edges where each edge has a {data.num_edge_features} dim feature vector')print(f' dataset has {dataset.num_classes} classes')

This particular dataset the train, val and test indexes split out for us.

`print(split_idx['train'].shape)`

print(split_idx['valid'].shape)

print(split_idx['test'].shape)

The adjacency matrix is pre-populated in `data.adj_t`

which is a `SparseTensor `

matrix since it is `n x n`

in shape and that’s huge since there are ~2M nodes!

`SparseTensor(row=tensor([ 0, 0, 0, ..., 2449028, 2449028, 2449028]),`

col=tensor([ 384, 2412, 7554, ..., 1787657, 1864057, 2430488]),

size=(2449029, 2449029), nnz=123718280, density=0.00%)

# Training

When I first started playing with GNNs, I thought it was weird that in the train loop, we always pass the entire graph — we have to pass the entire graph because we need the full structure available to compute the aggregation-and-update steps. But, since we need to *train* on a certain set of nodes and validate/test on another set of nodes, we just mask out the gradients we want with the indexes of the nodes in our train/val/test set!

P.S. below code stolen from Matthias Fey’s ogb submission!

# compute activations for train subset

out = model(data)[train_idx]# get gradients for train subset

loss = F.nll_loss(out, data.y.squeeze(1)[train_idx])# evaluate model on test set

out = model(data)[test_idx]

For this `ogb`

dataset, the `train`

and `test`

functions can be packaged like so:

def train(model, data, train_idx, optimizer):

model.train() optimizer.zero_grad()

out = model(data)[train_idx]

loss = F.nll_loss(out, data.y.squeeze(1)[train_idx])

loss.backward()

optimizer.step()return loss.item()@torch.no_grad()

def test(model, data, split_idx, evaluator):

model.eval() out = model(data)

y_pred = out.argmax(dim=-1, keepdim=True) train_acc = evaluator.eval({

'y_true': data.y[split_idx['train']],

'y_pred': y_pred[split_idx['train']],

})['acc']

valid_acc = evaluator.eval({

'y_true': data.y[split_idx['valid']],

'y_pred': y_pred[split_idx['valid']],

})['acc']

test_acc = evaluator.eval({

'y_true': data.y[split_idx['test']],

'y_pred': y_pred[split_idx['test']],

})['acc'] return train_acc, valid_acc, test_acc

`ogb`

comes packaged with an `Evaluator`

to help score output predictions.

lr = 1e-4

epochs = 50

hidden_dim = 75

evaluator = Evaluator(name='ogbn-products')model = GraphSAGE(in_dim=data.num_node_features,

hidden_dim=hidden_dim,

out_dim=dataset.num_classes)optimizer = torch.optim.Adam(model.parameters(), lr=lr)for epoch in range(1, 1 + epochs):

loss = train(model, data, train_idx, optimizer)

result = test(model, data, split_idx, evaluator)

#logger.add_result(run, result)if epoch % 10 == 0:

train_acc, valid_acc, test_acc = result

print(f'Epoch: {epoch}/{epochs}, '

f'Loss: {loss:.4f}, '

f'Train: {100 * train_acc:.2f}%, '

f'Valid: {100 * valid_acc:.2f}% '

f'Test: {100 * test_acc:.2f}%')

# TL,DR: rapidly building GNNs in PyG is ez!

Also, if you want to experiment with `GAT`

or other types of convolution layers, it would (for the most part) be a simple swap-in-swap-out scenario. Check out the other available layers in the docs here.

The full notebook script is available here although it is mostly a broken down version of Matthias’ code.