Graph Neural Networks Series | Part 1 | An Introduction.

Let’s talk about Graphs:

Published in

The Modern Scientist

24 min readNov 8, 2022

Graphs are one of the most important tools that we use in our lives. They allow us to visualize data and see relationships between different variables. Graphs also allow us to make predictions about future events. Social media platforms such as Facebook and Twitter use graphs to show us our social networks. We can see who our friends are, who our friends are friends with, and what groups we belong to. Medical professionals also use graphs to visualize data and see relationships between different variables. For example, they may use a graph to track the spread of a disease.

What is a Graph ?

A graph is a collection of points, called vertices, and the lines connecting them, called edges. To put it in a formal description A graph G = (V, E) consists of a set V of vertices and a set E of edges, where each edge `e` in E is a pair of vertices `v` and `w` in V.

Take this example below : The Karate Club network. The Karate Club network is an example of a graph. The vertices in this graph represent members of a karate club, and the edges represent relationships between the members.

In order to plot this you can use the NetworkX package one of the famous libraries for GNNs such as (PytorchGeometric, NetworkX,Deep Graph Library (DGL)…etc)

"""===========Karate Club===========Zachary's Karate Club graphData file from:http://vlado.fmf.uni-lj.si/pub/networks/data/Ucinet/UciData.htmZachary W. (1977).An information flow model for conflict and fission in small groups.Journal of Anthropological Research, 33, 452-473."""import matplotlib.pyplot as pltimport networkx as nxG = nx.karate_club_graph()print("Node Degree")for v in G:print(f"{v:4} {G.degree(v):6}")nx.draw_circular(G, with_labels=True)plt.show()

Adjacency matrix

One way to represent a graph is through the adjacency matrix `A` of `G`. Adjacency matrix is a matrix that is used to represent a graph. The matrix is made up of 0s and 1s, where the 0s represent no connection and the 1s represent a connection.. The `(i, j)` entry of `A` is equals to 1 if there is an edge between `v_i` and `v_j`, and 0 otherwise. A[u,v] = 1 if (u,v) is an edge in the graph otherwise A[u,v] = 0. where u and v are two vertices of the graph. Before we go any further we need to explain what a symmetric matrix is. A matrix B is symmetric if B[i,j] = B[j,i] for all i and j. So, for our adjacency symmetric matrix is to be symmetric we need to make sure that

A[u,v] = A[v,u] for all u and v.

Now If the graph contains only undirected edges then A will be a symmetric matrix, but if the graph is directed ( direction matters) then A will not be symmetric. Take a moment to think about this. Below is an example of directed graph vs undirected graph.

Weighted Adjacency Matrix

A graph is said to be weighted if it has weighted edges which means that there are some cost associated with each edge. A weighted adjacency matrix is a matrix where the `(i, j)` entry is equal to the weight of the edge between `v_i` and `v_j`. If there is no edge between `v_i` and `v_j`, then the `(i, j)` entry is equal to 0. The values could be anything like distance between the two vertices, time or any other metric. An example of a weighted adjacency matrix is given below:

Here is a challenge. Try to figure the graph out assuming the values on the edges are purely duration values. Name it however you like, this is just a fun exercise.

What is the difference between homogenous graphs and heterogenous graphs ?

A homogenous graph is a graph where all the vertices have the same degree, and a heterogenous graph is a graph where the vertices have different degrees Or A graph is homogenous if all of its vertices are the same type, and it is heterogenous if there are different types of vertices.

In the context of graph theory, the degree of a vertex in a graph is the number of edges incident to that vertex. In other words, it is the number of other vertices that the vertex is connected to by an edge. The degree of a vertex is denoted by d(v), where v is the vertex.

For example, consider the following graph:

A -- B -- C

The degree of vertex A is 1 because it is connected to only one other vertex (B). The degree of vertex B is 2 because it is connected to vertices A and C. The degree of vertex C is also 1.

Here’s the code to create the graph using NetworkX

import networkx as nx

# Create an empty graph
G = nx.Graph()

# Add nodes A, B, and C
G.add_nodes_from(['A', 'B', 'C'])

# Add edges between the nodes
G.add_edges_from([('A', 'B'), ('B', 'C')])

# Print the adjacency matrix
print(nx.adjacency_matrix(G).todense())

This will output the following adjacency matrix:

[[0 1 0]
 [1 0 1]
 [0 1 0]]

And here’s the code to create the same graph using PyTorch Geometric:

import torch
from torch_geometric.data import Data

# Define the edge indices
edge_index = torch.tensor([[0, 1], [1, 2]], dtype=torch.long)

# Define the node features (in this case, we’ll just use one-hot encoding)
x = torch.eye(3, dtype=torch.float)

# Create the PyTorch Geometric data object
data = Data(x=x, edge_index=edge_index.t().contiguous())

# Print the edge index tensor
print(data.edge_index)the code for creating the graph A — B — C using the Python libraries NetworkX and PyTorch Geometric.

Here’s the code to create the graph using NetworkX:

import networkx as nx
# Create an empty graph
G = nx.Graph()
# Add nodes A, B, and C
G.add_nodes_from(['A', 'B', 'C'])
# Add edges between the nodes
G.add_edges_from([('A', 'B'), ('B', 'C')])
# Print the adjacency matrix
print(nx.adjacency_matrix(G).todense())

This will output the following adjacency matrix:

[[0 1 0]
 [1 0 1]
 [0 1 0]]

And here’s the code to create the same graph using PyTorch Geometric

import torch
from torch_geometric.data import Data

# Define the edge indices
edge_index = torch.tensor([[0, 1], [1, 2]], dtype=torch.long)
# Define the node features (in this case, we'll just use one-hot encoding)
x = torch.eye(3, dtype=torch.float)
# Create the PyTorch Geometric data object
data = Data(x=x, edge_index=edge_index.t().contiguous())
# Print the edge index tensor
print(data.edge_index)

# This will output the following adjacency matrix:
tensor([[0, 1, 1, 2]])

Note that in the PyTorch Geometric code, we use a tensor to represent the edge indices instead of an adjacency matrix. The edge_index tensor has two rows: the first row contains the source nodes of each edge, and the second row contains the destination nodes of each edge. So in this case, the first edge goes from node 0 (A) to node 1 (B), and the second edge goes from node 1 (B) to node 2 (C).

Multi-relational Graphs

A multi-relational graph is a graph that has more than one type of relationship between the vertices. For example, in a social network you may have friends, family, and co-workers all represented by vertices in a graph. The edges in the graph would represent the relationships between the vertices. Here the edge notation type would be something like (v1, v2, type), where type is either “friend”, “family”, or “co-worker”. Formally, a multi-relational graph is a graph G = (V, T, E), where V is a set of vertices, E is a set of edges, and T is a set of edge types. The Adjacency matrix here would be a 3-dimensional matrix A or a `tensor`, where A[v1,v2,type] = 1 if there is an edge of type “type” between v1 and v2.

here’s an example of how you might implement a multi-relational graph using a 3-dimensional adjacency matrix in Python, using the PyTorch library:

import torch

# Define the vertices in the graph
V = ['Alice', 'Bob', 'Charlie', 'Dave']

# Define the types of edges in the graph
T = ['friend', 'family', 'co-worker']

# Define the edges between the vertices, along with their types
E = [('Alice', 'Bob', 'friend'),
     ('Alice', 'Charlie', 'friend'),
     ('Bob', 'Charlie', 'family'),
     ('Charlie', 'Dave', 'co-worker'),
     ('Dave', 'Alice', 'co-worker')]

# Initialize the 3-dimensional adjacency matrix
A = torch.zeros(len(V), len(V), len(T))

# Populate the adjacency matrix based on the edges
for v1, v2, t in E:
    i = V.index(v1)
    j = V.index(v2)
    k = T.index(t)
    A[i, j, k] = 1

# Print the adjacency matrix
print(A)

This will output the following adjacency matrix:

tensor([[[0., 0., 0.],
         [1., 0., 1.],
         [1., 1., 0.],
         [0., 1., 0.]],

        [[1., 0., 1.],
         [0., 0., 1.],
         [1., 0., 0.],
         [0., 0., 0.]],

        [[1., 1., 0.],
         [1., 0., 0.],
         [0., 0., 0.],
         [1., 0., 1.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [1., 0., 1.],
         [0., 1., 0.]]])

In this example, the 3-dimensional adjacency matrix A has dimensions len(V) x len(V) x len(T), where len(V) is the number of vertices in the graph, and len(T) is the number of types of edges. The value A[i,j,k] is equal to 1 if there is an edge of type T[k] between vertices V[i] and V[j], and 0 otherwise. So in this example, for instance, A[0,1,0] is 1 because there is an edge of type "friend" between Alice (vertex 0) and Bob (vertex 1), while A[0,2,2] is 0 because there is no edge of type "co-worker" between Alice and Charlie.

What is a multiplex graph ?

If you are a beginner I don’t suspect you will be using this type early on in your career but Nonetheless this type of graph is very powerful and you should know about it. A multiplex graph is a graph in which there are multiple types of edges between the same pair of vertices. Furthermore, each type of edge has its own weight. But it introduces Intra-layer and inter-layers. so that different types of edges can be distinguished. Lets take the transportation network as a in example, so for instance in a multiplex transportation network each node might represent a city and each layer might represent a different mode of transportation such as air travel, train, car …etc). Intra-layer edges would then represent cities that are connected by different modes of transportation, while inter-layer edges represents the possibility of switching modes of transportation within a particular city. Simply A multiplex graph is a graph that has multiple layers, with each layer representing a different type of relationship.

One last time,
Intralayer refers to relationships within a layer of a multiplex graph. In other words, it refers to edges that connect nodes within the same layer. For example, in a transportation network multiplex graph, intralayer edges might represent connections between different cities using the same mode of transportation.

Interlayer refers to relationships between layers of a multiplex graph. It refers to edges that connect nodes across different layers. For example, in the transportation network multiplex graph, interlayer edges might represent the ability to switch between different modes of transportation within a city.

To provide an example, consider the following transportation network multiplex graph:

Note that colors show if a city is accessible through mode of transportation (to and from certain cities).

Layer 1: Air Travel
New York ----- Chicago
  |               |
  |               |
  |               |
Los Angeles ----- Miami

Layer 2: Train
New York ----- Chicago
  |               |
  |               |
  |               |
Los Angeles ----- Miami

Layer 3: Car
New York ----- Chicago
  |               |
  |               |
  |               |
Los Angeles ----- Miami

In this graph, there are three layers representing different modes of transportation: air travel, train, and car. Within each layer, there are edges representing connections between cities that can be reached using the same mode of transportation (e.g., New York and Chicago are connected by air travel, train, and car). Between layers, there are edges representing the ability to switch between modes of transportation within a particular city (e.g., New York is connected to Chicago and Los Angeles by both air travel and train).

So in this example, intralayer edges would be edges within each layer (e.g., New York to Chicago by air travel), while interlayer edges would be edges between layers (e.g., New York to Chicago by both air travel and train).

here’s an example of how you might implement the transportation network multiplex graph using PyTorch Geometric:

Note

In PyTorch Geometric, the Data object represents a graph as a collection of nodes and edges, along with associated attributes such as node features and edge weights. The edge_index tensor is a required attribute of the Data object, and it specifies the edges in the graph.

import torch
from torch_geometric.data import Data

# Define the vertices in the graph
V = ['New York', 'Chicago', 'Los Angeles', 'Miami']

# Define the types of edges in the graph
T = ['air travel', 'train', 'car']

# Define the edges between the vertices for each layer, along with their types
E1 = [(0, 1), (2, 3)]
E2 = [(0, 1), (2, 3)]
E3 = [(0, 1), (2, 3)]

# Define the edge index tensor for each layer
edge_index_1 = torch.tensor(list(zip(*E1)), dtype=torch.long)
edge_index_2 = torch.tensor(list(zip(*E2)), dtype=torch.long)
edge_index_3 = torch.tensor(list(zip(*E3)), dtype=torch.long)

# Define the node features (in this case, we'll just use one-hot encoding)
x = torch.eye(len(V), dtype=torch.float)

# Create the PyTorch Geometric data objects for each layer
data_1 = Data(x=x, edge_index=edge_index_1.t().contiguous())
data_2 = Data(x=x, edge_index=edge_index_2.t().contiguous())
data_3 = Data(x=x, edge_index=edge_index_3.t().contiguous())

# Print the edge index tensor for each layer
print(f'Layer 1 ({T[0]}):')
print(data_1.edge_index)
print(f'Layer 2 ({T[1]}):')
print(data_2.edge_index)
print(f'Layer 3 ({T[2]}):')
print(data_3.edge_index)

This will output the following edge index tensors for each layer:

Layer 1 (air travel):
tensor([[0, 1, 2, 3]])
Layer 2 (train):
tensor([[0, 1, 2, 3]])
Layer 3 (car):
tensor([[0, 1, 2, 3]])

In this example, we first define the vertices in the graph and the types of edges in the graph, just like before. Then we define the edges between the vertices for each layer (in this case, just pairs of nodes since the graph is symmetric). We then create a separate PyTorch Geometric Data object for each layer of the multiplex graph, with the same one-hot encoded node features in each layer. The edge_index tensor in each Data object is constructed from the corresponding edge list. Finally, we print the edge_index tensor for each layer. This can get confusing so allow me to explain what is edge_index

edge_index = torch.tensor([[0, 1, 2],
                           [1, 2, 0]])

here are a few examples of edge_index tensors and their corresponding graphs:

In this example, there are three nodes and three edges in the graph. The edge_index tensor has two rows and three columns, which means there are three edges in the graph. The first row contains the source nodes of each edge (0, 1, and 2), and the second row contains the destination nodes of each edge (1, 2, and 0). This edge_index tensor corresponds to the following graph:

0 --- 1
 \   /
  \ /
   2

Example 2:

edge_index = torch.tensor(
[ [0, 1, 2, 2, 3], ← Source nodes
 [1, 2, 0, 3, 2]]) ← Targest/ Destination nodes

In this example, there are four nodes and five edges in the graph. The edge_index tensor has two rows and five columns, which means there are five edges in the graph. The first row contains the source nodes of each edge (0, 1, 2, 2, and 3), and the second row contains the destination nodes of each edge (1, 2, 0, 3, and 2). This edge_index tensor corresponds to the following graph:

0 --- 1
|     |
|     |
2 --- 3
  /   
  2

Optional section

here’s an example of how you might connect the layers using PyTorch Geometric:

import torch
from torch_geometric.data import Data

# Define the vertices in the graph
V = ['New York', 'Chicago', 'Los Angeles', 'Miami']

# Define the types of edges in the graph
T = ['air travel', 'train', 'car']

# Define the edges between the vertices for each layer, along with their types
E1 = [('New York', 'Chicago', 'air travel'),
      ('Los Angeles', 'Miami', 'air travel')]

E2 = [('New York', 'Chicago', 'train'),
      ('Los Angeles', 'Miami', 'train')]

E3 = [('New York', 'Chicago', 'car'),
      ('Los Angeles', 'Miami', 'car')]

# Initialize the lists to hold the edge indices and node features for each layer
edge_indices = []
node_features = []

# Define a dictionary to map each vertex name to a unique ID
vertex_id = {v: i for i, v in enumerate(V)}

# Define the edge indices and node features for each layer
for k, t in enumerate(T):
    layer_edges = []
    for e in eval(f'E{k+1}'):
        i = vertex_id[e[0]]
        j = vertex_id[e[1]]
        layer_edges.append([i, j])
    edge_indices.append(torch.tensor(layer_edges, dtype=torch.long).t().contiguous())
    node_features.append(torch.eye(len(V), dtype=torch.float))

# Create the PyTorch Geometric data object
data = Data(x=torch.cat(node_features, dim=1), edge_index=torch.cat(edge_indices, dim=1))

# Print the edge index tensor
print(data.edge_index)

here’s an example of how you might connect the layers using PyTorch Geometric:

import torch
from torch_geometric.data import Data

# Define the vertices in the graph
V = ['New York', 'Chicago', 'Los Angeles', 'Miami']# Define the types of edges in the graph
T = ['air travel', 'train', 'car']# Define the edges between the vertices for each layer, along with their types
E1 = [('New York', 'Chicago', 'air travel'),
      ('Los Angeles', 'Miami', 'air travel')]E2 = [('New York', 'Chicago', 'train'),
      ('Los Angeles', 'Miami', 'train')]E3 = [('New York', 'Chicago', 'car'),
      ('Los Angeles', 'Miami', 'car')]# Initialize the lists to hold the edge indices and node features for each layer
edge_indices = []
node_features = []# Define a dictionary to map each vertex name to a unique ID
vertex_id = {v: i for i, v in enumerate(V)}# Define the edge indices and node features for each layer
for k, t in enumerate(T):
    layer_edges = []
    for e in eval(f'E{k+1}'):
        i = vertex_id[e[0]]
        j = vertex_id[e[1]]
        layer_edges.append([i, j])
    edge_indices.append(torch.tensor(layer_edges, dtype=torch.long).t().contiguous())
    node_features.append(torch.eye(len(V), dtype=torch.float))# Create the PyTorch Geometric data object
data = Data(x=torch.cat(node_features, dim=1), edge_index=torch.cat(edge_indices, dim=1))# Print the edge index tensor
print(data.edge_index)

This will output the following edge index tensor:

tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 4, 6, 9, 0, 3, 5, 8, 1, 3, 6, 8],
        [1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 4, 1, 9, 6, 3, 0, 8, 5, 3, 1, 8, 6]])

Machine Learning on Graphs.

Deep learning on graphs is a relatively new field of research that is still being actively developed. There are many different ways to perform deep learning on graphs, and the best approach for a particular problem depends on the structure of the data and the desired output. The question is do we still bucket them into supervised learning, unsupervised learning, RL …. etc ?. It is not that easy sometimes as sometimes the definition is blurred. In the next section you will tell me what you think.

Node classification

Node classification is a process of classifying the nodes in a graph into different categories. Given a small number of manually labelled nodes, the goal is to automatically label the remaining nodes. This can be useful, for instance, to automatically label the nodes in a social network according to their roles.

At first glance node classification appears to be a straightforward supervised learning (Right ?), but there is a difference. The most important point is that they are not independent and identically distrusted. Node classification completely destroys this independence and identically distributed assumption (i.i.d). Infact in GNNs in general we mainly leverage these connections. One idea is to exploit homophily, which is the tendency for nodes to share attributes with their neigbors in the graph. For example, two nodes that are connected in a social network are likely to share the same attributes, such as age, demographic, or interests. Beyond homophily there is structural equivalence which is the tendency for nodes to have similar roles in the graph. For example, two nodes that have similar numbers of connections, or that are connected to similar nodes, are likely to have the same label. On the other end there is heterophily (Like Gender for instance, most people in the world are straight ergo we can leverage this to our advantage)which is what you might expect opposites attract. Whether it is the first or the latter, it is still pattern based on connections.

So did you guess the type yet ? don’t feel too bad. it is often referred to as Semi-supervised network. Tada !!

There are a few different ways to classify nodes in a graph. One approach is to use the degree of the node, which is the number of edges incident to the node. Another approach is to use the betweenness centrality of the node, which is a measure of how often the node lies on shortest paths between other nodes. Finally, the clustering coefficient of a node is a measure of how tightly connected the node’s neighbors are.

Here is an example in PyG:

https://github.com/OMS1996/pytorch-geometric/blob/main/2_Node_Classification.ipynb

Transductive Vs inductive

There are two ways to define the node classification problem:

The first is the transductive setting, in which the test set is known when the model is trained. This is a more realistic scenario, since in many applications the test set is not known in advance. For example, when classifying the nodes in a social network, the test set would be the nodes that are added to the network after the model is trained.

The second is the inductive setting, in which the test set is not known when the model is trained. This is a less realistic scenario, but it is easier to train models in this setting. For example, when classifying the nodes in a social network, the test set would be the nodes that are not in the network when the model is trained. So the previous semi-supervised learning example is in fact inductive.

if you are still confused. Transduction is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. Like so:

Transductive: Let’s say we have a social network with 10 nodes, and we want to classify the nodes into two classes: “friend” and “acquaintance”. We know that 5 of the nodes will be in the test set, and we want to train a model to classify these nodes. In this case, we have specific information about the test set when we train the model, so this is a transductive setting.

Inductive: Let’s say we have a social network with 10 nodes, and we want to classify the nodes into two classes: “friend” and “acquaintance”. We don’t know which nodes will be in the test set, but we have a large dataset of social networks with similar characteristics. We can use this dataset to train a model to classify nodes in any social network, even ones that we haven’t seen before. In this case, we don’t have specific information about the test set when we train the model, so this is an inductive setting.

In both cases, the goal is to train a model to accurately classify nodes into different classes based on their characteristics and relationships with other nodes in the graph. The difference is whether we have specific information about the test set when we train the model or not.

The choice between transductive and inductive node classification depends on the specific problem and the available data. Here are some factors to consider when deciding which approach to use:

Transductive:

Better when the test set is already known, and it is representative of the overall population of nodes.
Better when the graph is small and it is feasible to label all of the nodes.
Better when the model needs to be evaluated on a specific set of nodes with known labels.

Inductive:

Better when the test set is not known when the model is trained, or when it is not representative of the overall population of nodes.
Better when the graph is large and it is not feasible to label all of the nodes.
Better when the model needs to be evaluated on a variety of different graphs.

In general, transductive classification is more appropriate when we have specific knowledge about the test set and we want to accurately classify those specific nodes. Inductive classification is more appropriate when we want to generalize the model to classify nodes in new and unseen graphs. However, the choice between these approaches ultimately depends on the specific problem and the available data.

Questions to ask yourself to figure out the type of problem.

Here is a set of questions you can ask yourself to determine whether a learning problem is transductive or inductive:

Do I have specific knowledge about the test set when training the model?

Transductive: If you have specific knowledge about the test set, such as knowing which instances will be in the test set, it suggests a transductive setting.
Example: You are classifying nodes in a social network, and you already know the specific set of nodes that will be in the test set.

2. Can the model be trained using a subset of the available data?

Inductive: If the model can be trained using a subset of the available data without needing specific knowledge about the test set, it indicates an inductive setting.
Example: You are predicting stock prices, and you train the model using historical data but don’t know which stocks will be in the test set.

3. Is the performance evaluation focused on specific, known instances?

Transductive: If the performance evaluation is primarily based on how well the model predicts specific, known instances, it aligns with a transductive setting.
Example: You are classifying images, and the performance of the model is evaluated on a pre-defined set of images.

Real-world examples

The classification problem of predicting drug properties or activities can be either transductive or inductive, depending on the specific scenario and available data. Let’s explore both possibilities:

Transductive Drug Prediction: If you have a specific set of drugs for which you want to predict properties or activities, and you already know the identities of those drugs during the model training phase, then it falls into a transductive setting. In this case, the model can learn from the known drug instances and their corresponding properties or activities to make predictions specifically for those drugs in the test set. The goal is to accurately predict properties or activities for the known drugs.

Inductive Drug Prediction: If the goal is to generalize and make predictions on new, unseen drugs, where the identities of the drugs in the test set are unknown during training, then it falls into an inductive setting. In this case, the model learns from a training set of drugs with known properties or activities, and it aims to capture patterns and relationships from that training data to predict properties or activities for unseen drugs. The focus is on the ability of the model to generalize its predictions to new drug instances.

Which one is picked most of the time ?

In practice, the choice between transductive and inductive learning for node classification depends on various factors such as the specific problem, available data, and the goals of the analysis. There is no one-size-fits-all answer, and both approaches are used depending on the context. However, it is worth noting that inductive learning is more commonly used in many real-world scenarios. Here are a few reasons for the prevalence of inductive learning:

Generalization: Inductive learning focuses on generalizing patterns learned from the training data to unseen test data. This is often desired as it allows the model to make predictions on new, previously unseen nodes in the graph.

Scalability: Inductive learning is more scalable when dealing with large graphs, as it does not require knowledge of the entire graph or specific test set during training. It can handle cases where new nodes are continuously added to the graph without retraining the model.

Realistic scenario: In many applications, the test set is not known in advance or it evolves over time. Inductive learning aligns better with such scenarios, making it more practical in real-world situations.

That said, transductive learning still has its merits and can be useful in specific cases where the test set is known or fixed. Ultimately, the choice between transductive and inductive learning should be based on a careful analysis of the problem, available data, and the specific requirements of the application at hand.

Where do they fall short ?

Transductive learning has the advantage of having access to information about the test set during training, which can potentially lead to higher accuracy on the specific test set. However, it may suffer from overfitting if the test set is not representative of the overall population or if there is a significant difference between the training and test data.

Inductive learning, on the other hand, aims to generalize patterns learned from the training data to unseen test data. It focuses on the ability to classify nodes in new, previously unseen graphs. This can be advantageous when dealing with diverse and evolving test sets. However, the generalization may come at the cost of lower accuracy on the specific test set.

Link prediction

Link prediction Also known as graph completion and relational inference i.e. relational prediction. It is one of the most popular methods. The standard setup is that we are given a set of nodes V and an incomplete set of edges between these nodes E for training that are a subset of E Where is E is edges. Our goal is to use this partial information to infer the missing edges. The complexity of this task is dependent on the type of graph and data. Like node classification , link prediction also blurs the boundaries of traditional of machine learning categories. Often being referred to as both supervised and unsupervised and it requires inductive biases that are specific to the graph domain. In addition, like node classification, there are many variants of relational prediction. There are two main types of link prediction methods: local methods and global methods.

Local methods make predictions based on the immediate neighborhood of a node, while global methods make predictions based on the entire graph. There are also two main types of link prediction tasks:

Predicting the existence of an edge: this is a binary classification task, where the goal is to predict whether an edge exists between two nodes.

2. Predicting the weight of an edge: this is a regression task, where the goal is to predict the weight of an edge between two nodes.

link prediction is typically considered an inductive learning problem rather than a transductive one.

In link prediction, the goal is to predict the existence or likelihood of connections (links) between nodes in a network based on the information available from the observed connections. The task involves learning patterns and features from the existing network structure and using them to make predictions about unobserved or future connections.

Inductive learning is suitable for link prediction because it focuses on generalizing patterns learned from the training data to make predictions on unseen data. The model is trained on a subset of the network’s edges and then evaluated on the remaining edges or on a separate test set. It aims to capture underlying relationships and structural properties of the network to infer missing or future connections.

In contrast, transductive learning would require specific knowledge about the test set, including the actual edges to be predicted. This is not typically available in link prediction tasks, as the objective is to predict new links in the network rather than predict the presence or absence of existing edges.

Therefore, link prediction is considered an inductive learning problem, leveraging patterns and information learned from the training network to infer missing connections in a generalizable manner.

Clustering and community detection

Both node classification and relational prediction (Link Prediction) infer missing information about the graph data this could be supervised learning A.K.A community detection or on the other hand, the graph analog of unsupervised learning. The goal is to infer latent community structures like who worked with who on what and possibly when ?.

Graph Classification

Graph Classification This involves regression and classification . For instance how toxic is this molecule Give me a number ?! . Or it could simply be cluster the toxic from the non toxic. Of all ML tasks classification and regression are the easiest.

Here is an example

https://github.com/OMS1996/pytorch-geometric/blob/main/3_Graph_Classification.ipynb

INTRODUCTION BY EXAMPLE

We shortly introduce the fundamental concepts of PyG through self-contained examples. At its core, PyG provides the following main features:

Data Handling of Graphs 

A graph is used to model pairwise relations (edges) between objects (nodes). A single graph in PyG is described by an instance of torch_geometric.data.Data, which holds the following attributes by default:

data.x: Node feature matrix with shape [num_nodes, num_node_features]
data.edge_index: Graph connectivity in COO format with shape [2, num_edges] and type torch.long
data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
data.y: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, *] or graph-level targets of shape [1, *]
data.pos: Node position matrix with shape [num_nodes, num_dimensions]

None of these attributes are required. In fact, the Data object is not even restricted to these attributes. We can, e.g., extend it by data.face to save the connectivity of triangles from a 3D mesh in a tensor with shape [3, num_faces] and type torch.long.

Note

PyTorch and torchvision define an example as a tuple of an image and a target. We omit this notation in PyG to allow for various data structures in a clean and understandable way.

We show a simple example of an unweighted and undirected graph with three nodes and four edges. Each node contains exactly one feature:

import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)data = Data(x=x, edge_index=edge_index)
>>> Data(edge_index=[2, 4], x=[3, 1])

Note that edge_index, i.e. the tensor defining the source and target nodes of all edges, is not a list of index tuples. If you want to write your indices this way, you should transpose and call contiguous on it before passing them to the data constructor:

import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1],
                           [1, 0],
                           [1, 2],
                           [2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)data = Data(x=x, edge_index=edge_index.t().contiguous())
>>> Data(edge_index=[2, 4], x=[3, 1])

Although the graph has only two edges, we need to define four index tuples to account for both directions of a edge.

Note

You can print out your data object anytime and receive a short information about its attributes and their shapes.

Besides holding a number of node-level, edge-level or graph-level attributes, Data provides a number of useful utility functions, e.g.:

print(data.keys)
>>> ['x', 'edge_index']

print(data['x'])
>>> tensor([[-1.0],
            [0.0],
            [1.0]])for key, item in data:
    print(f'{key} found in data')
>>> x found in data
>>> edge_index found in data'edge_attr' in data
>>> Falsedata.num_nodes
>>> 3data.num_edges
>>> 4data.num_node_features
>>> 1data.has_isolated_nodes()
>>> Falsedata.has_self_loops()
>>> Falsedata.is_directed()
>>> False# Transfer data object to GPU.
device = torch.device('cuda')
data = data.to(device)

References:

[1] Graph Representation Learning.

[2] https://www.merriam-webster.com/dictionary/transduction

[3] https://theaisummer.com/gnn-architectures/

[4] Pytorch Geometric documentation.

Images used:

Huge thanks to the PyG Team !

Graph Neural Networks Series | Part 1 | An Introduction.

Let’s talk about Graphs:

What is a Graph ?

Adjacency matrix

Weighted Adjacency Matrix

What is the difference between homogenous graphs and heterogenous graphs ?

Multi-relational Graphs

What is a multiplex graph ?

Example 2:

Machine Learning on Graphs.

Node classification

Transductive Vs inductive

Questions to ask yourself to figure out the type of problem.

Real-world examples

Which one is picked most of the time ?

Where do they fall short ?

Link prediction

Clustering and community detection

Graph Classification

INTRODUCTION BY EXAMPLE

Data Handling of Graphs

References:

Written by Omar Hussein

Data Handling of Graphs 