From graph topology to Node Features with Neural Networks.

9 min readApr 10, 2024

In our previous article on Node2Vec based on Word2Vec with the skip-gram architecture, the networks (graph) we covered in the architecture involved just the network topology with the nodes and edges. However, nodes and edges in a network also have features that are essential in providing valuable insights through embeddings. Node and edge features can be represented in a tabular format that makes it suitable for machine learning models like neural networks.

In this article, our focus will be on utilizing node features in a tabular format to implement a Vanilla Neural Network based on two popular graph datasets namely Citeseer and Facebook Page-Page. We would first proceed to implement a Vanilla Neural Network without topological information which would serve as our base model for both datasets, and subsequently utilize the topological information in the Neural Network leading to our first GNN architecture involving both node and edge features.

Review of Citeseer and Facebook Page-Page Dataset

Citeseer Dataset

The Citeseer dataset is a citation network dataset that is commonly used as a benchmark for evaluating node classification in network anaysis. The key points about the Citeseer dataset are:

The key characteristics of the Citeseer dataset are:
The Citeseer dataset contains 3,327 scientific publications categorized into 6 classes[1][2][4][5].
- Each publication (node) is represented by a 3,703-dimensional word vector feature[1][2].
- The dataset contains 4,732 citation relationships (edges) between the publications, forming a directed graph[2].

The Citeseer dataset has been used to assess the performance of GNN models like GCN, GCNII, EGAT, and CEN-DGCNN in tasks such as node classification[1][2].Compared to smaller datasets like Cora, the Citeseer dataset has denser node neighborhoods, making it more challenging for GNN models to learn effective node representations.

The Citeseer dataset, along with the Cora dataset, are representative of real-world citation networks and have been widely used as benchmark datasets for evaluating the performance of various GNN models[1][4][5].

Facebook Page-Page Dataset

The Facebook page-to-page dataset is a graph dataset that represents the connections between verified Facebook pages. The key points about this dataset are:

The key characteristics of the Facebook Page-Page dataset are:
- It is an undirected graph, where the nodes represent official Facebook pages and the edges represent mutual likes between the pages.[1][2][3]
- The dataset was collected through the Facebook Graph API in November 2017 and includes pages from 4 categories: politicians, governmental organizations, television shows, and companies.[2][3]
- The dataset has been used for tasks such as multi-class node classification to predict the category of a Facebook page based on its connections and features.[2]
- The dataset contains information on the number of nodes (pages) and edges (mutual likes) for each category, such as 7,057 nodes and 89,455 edges for government pages, and 50,515 nodes and 819,306 edges for artist pages.[3]
- The dataset has been used in research on graph embedding and self-clustering techniques for social network analysis.[3]

Thus, the Facebook page-to-page dataset is a valuable resource for like-minded graph enthusiasts like yourself or researchers studying the structure and dynamics of graphs using verified Facebook pages and their interconnections.

Implementing Vanilla Neural Networks for Node Classification

As we observed earlier these datasets have node features that offer more information compared to the Zachary’s Karate Club dataset. Thus this enables us to represent these features in tabular format and to proceed to use the Multilayer Perceptron architecture which is a basic neural network architecture but opens up many possibilities.

For now, we would stick with just a basic MLP and represent our node features in a tabular dataset. This would suffice for our classification tasks and serve as a base model given that this does not take into consideration the topological structure of of graph. Subsequently, we would proceed to add the topological structure and benchmark the result to the ones obtained without the topical structure in this section.

A brief detour to explain MLP architecture
The multilayer perceptron (MLP) is a feedforward neural network, meaning information flows from the input layer through the hidden layers to the output layer, it enables the learning of complex non-linear relationships in data through backpropagation training.
- It consists of an input layer, one or more hidden layers, and an output layer. Each layer is composed of interconnected nodes (neurons) that use a nonlinear activation function.
- The hidden layers allow the MLP to learn complex, non-linear relationships in the data. The more hidden layers, the more complex patterns the MLP can model.
- The connections between nodes have numeric weights that are tuned during the training process using a backpropagation algorithm. Backpropagation allows the MLP to learn by adjusting the weights to minimize the error between the predicted and actual outputs.

step-by-step implementation

Let’s proceed to install and import the necessary libraries, and datasets, and set the seed for reproducibility.

pip install torch, torch_geometric

import torch
from torch.nn import Linear
import torch.nn.functional as F
import pandas as pd

import torch_geometric.transforms as T
dataset_cs = Planetoid(root="./", name="Citeseer")
data_cs = dataset_cs[0]



df_cs = pd.DataFrame(data_cs.x.numpy())
df_cs['label'] = pd.DataFrame(data.y)
df_cs.head(5)

2. Define our MLP architecture with input, hidden, and output dimensions, and initialize our model.

# Define MLP class
class MLP(torch.nn.Module):
    ##Multilayer Perceptron
    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        self.linear1 = Linear(dim_in, dim_h)
        self.linear2 = Linear(dim_h, dim_out)

    def forward(self, x):
        x = self.linear1(x)
        x = torch.relu(x)
        x = self.linear2(x)
        return F.log_softmax(x, dim=1)

# Initialize MLP
model = MLP(dim_in=your_input_dimension, dim_h=your_hidden_dimension, dim_out=your_output_dimension)

3. Define loss function and optimizer. However, because we going to be training our model on two datasets, this is just for illustration, we would overwrite this in the training function so that on each run it is reset and previous runs on another dataset do transfer to a new run.

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

4. Define the training loop in the train function, which takes the model, data, and number of epochs as inputs.

# Define training function
def train(model, data, epochs):
  
    # define loss function and optimizer
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
    
    model.train()
    for epoch in range(epochs + 1):
        optimizer.zero_grad()
        out = model(data.x)
        loss = criterion(out[data.train_mask], data.y[data.train_mask])
        acc = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
        loss.backward()
        optimizer.step()

4. Define the testing function, test, which is used to evaluate the model's performance on the test set.

def test(model, data):
    model.eval()
    out = model(data.x)
    acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
    return acc

5. Train the model using your data.

train(model, data_cs, epochs=100)

6. Test the trained model on the test set and print the test accuracy.

# Test the model
test_accuracy = test(model, data_cs)
print(f'Test Accuracy: {test_accuracy*100:.2f}%')

After training our model on the CiteSeer data and running our test we get an accuracy of 55.40%, and it is worth noting our model does not include the network topology just the node features.

Using the same model we would proceed to train and test our model on the Facebook Page-Page dataset and we obtain an accuracy of 62.10%.

Vanilla Neural network results on node features without network topology
i. Citeseer Accuracy: 55.3%
ii. Facebook Page-Page Accuracy: 62.10%

We can observe that there’s a difference between the results obtained from the Citeseer and Facebook Page-page datasets despite their similarity. Thus we can now proceed to include both node features and network topology.

Implementing a simplified Graph Neural network for Node Classification Task

It is worth understanding the key difference between GNNs and other deep learning architectures that use linear layers as seen in previous implementations, and architectures like CNN (convolutional Neural Network) used for computer vision and imaging to recognize patterns in images and RNN (Recurrent Neural Networks) used for sequence and patterns recognition. As noted in our first article in this series, Graphs are peculiar due to their connectivity and sparse nature making nodes not only relations with other nodes but also features. Our previous implementation of the Vanilla Neural Network takes into consideration only the node features without their connectivity/relations with other nodes in the network which is not representative.

Thus, at the core of GNN is that the representation of a node is influenced not only by its features but also by the features of its neighboring nodes. Here’s why this matters

Capturing Relationships: Graphs are all about relationships between entities. Understanding a node requires understanding how it’s connected to others.
Shared Information: Nodes in a neighborhood often share similarities or influence each other (think of social networks, citation networks, etc.).

Demystifying the maths in GNN
1. The Basic Linear Layer: Let's recall that the basic linear layer in a neural network transforms an input vector ‘x’ using a weight matrix ‘W’: and is represented as | new_representation = W * x
However, this is a limitation with graphs, where each node has its input vector (its features). A standard linear layer doesn’t consider how nodes relate to each other.
2.Graph Neural Network Layer: In a graph neural network, the input vectors are node features, but nodes are interconnected, unlike in traditional neural networks.
To capture the context of a node, we need to consider its neighbors. Let’s denote the set of neighbors of node i as Ni.
The equation for the graph linear layer involves summing the features of neighboring nodes weighted by a shared weight matrix W. We also consider the central node’s features by adding self-loops to the adjacency matrix.
The updated equation for the graph linear layer is:

Here, A is the adjacency matrix representing connections between nodes, and xj are the features of neighboring nodes.
By performing matrix multiplication, this equation efficiently aggregates features from neighboring nodes, including the central node.

We would proceed to implement our Graph Neural Network with PyTorch in the following steps

Graph Neural Network layer aggregates information from the node’s neighbors to compute the node’s output,
Adjacency Matrix Creation
Implementing VanillaGNN Class

Vanilla Graph Neural Network Layer

class VanillaGNNLayer(torch.nn.Module):
    def __init__(self, dim_in, dim_out):
        super().__init__()
        self.linear = Linear(dim_in, dim_out, bias=False)

    def forward(self, x, adjacency):
        x = self.linear(x)
        x = torch.sparse.mm(adjacency, x)
        return x

__init__(self, dim_in, dim_out): Initializes the layer, taking input and output feature dimensions. A basic linear transformation as seen previously in the vanilla Neural Network class (Linear(dim_in, dim_out, bias=False)) is used for simplicity.
forward(self, x, adjacency):

Linear Transformation: x = self.linear(x) applies the linear layer to the input features to learn node embeddings.

Neighborhood Aggregation: torch.sparse.mm(adjacency, x) is the key operation. The sparse adjacency matrix dictates how features from neighbor nodes are summed to update node representations.

2. Adjacency Matrix Creation

from torch_geometric.utils import to_dense_adj

adjacency = to_dense_adj(data_cs.edge_index)[0]
adjacency += torch.eye(len(adjacency))
adjacency

to_dense_adj(data.edge_index)[0]: Converts the edge index representation (coordinates of edges) to a dense adjacency matrix. This is necessary for the matrix multiplication to work correctly.
adjacency += torch.eye(len(adjacency)): Critically, adding the identity matrix ensures self-loops for each node, allowing them to incorporate their own features.

3. VanillaGNN Class

class VanillaGNN(torch.nn.Module):
    """Vanilla Graph Neural Network"""
    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        self.gnn1 = VanillaGNNLayer(dim_in, dim_h)
        self.gnn2 = VanillaGNNLayer(dim_h, dim_out)

    def forward(self, x, adjacency):
        h = self.gnn1(x, adjacency)
        h = torch.relu(h)
        h = self.gnn2(h, adjacency)
        return F.log_softmax(h, dim=1)

Let’s proceed to implement our train and test our model after initiating the model. One key difference in the training and test loop which differs from the implementation earlier of the Vanilla Neural network, is the addition of the adjacency matrix in the test and train function.

Vanilla Graph Neural network results with node features and network topology
i. Citeseer Accuracy: 62.60%
ii. Facebook Page-Page Accuracy: 84.53%

We can proceed to compare the results obtained after using the simplified MLP architecture which consisted of just node features and the Vanilla GNN which included both node features with topology of the network.

Thus it is obvious that using a GNN architecture significantly boosts our accuracy. Though our GNN is a simple implementation that uses some algebra the results are significant and clearly illustrate the importance of node features and topology of the network.

I hope you enjoyed reading this article and now have clarity on the GNN architecture. We will be enhancing this architecture in the next article by exploring Graph Convolutional Neural Networks a variant of GNN.

Neighborhood reconstruction with DeepWalk to create embeddings

Exploring graph properties and their role in Graph Neural Networks

References:

1. Facebook Page-Page Dataset
https://github.com/Andreaierardi/SocialNetworkAnalysis-project
https://paperswithcode.com/dataset/facebook-page-page
https://snap.stanford.edu/data/gemsec-Facebook.html
https://networkx.org/nx-guides/content/exploratory_notebooks/facebook_notebook.html
https://towardsdatascience.com/multilayer-perceptron-explained-with-a-real-life-example-and-python-code-sentiment-analysis-cb408ee93141

2. Citeseer dataset
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-023-
https://arxiv.org/html/2401.15444v1
https://www.datacamp.com/tutorial/comprehensive-introduction-graph-neural-networks-gnns-tutorial

https://networkrepository.com/citeseer.php.

3. Pytorch workflow fundamentals: https://www.learnpytorch.io/01_pytorch_workflow/

Colab notebook: https://colab.research.google.com/drive/14gTL2MdWcnZVa2B5ToCX6AshDmGeC0BU?usp=sharing