Graph Neural Networks Series | Part 2 | Graph Statistics

Omar Hussein
The Modern Scientist
28 min readNov 19, 2022

NOTE: Please note that this article explains the foundations of Graph methods before even using any DL. if you do not seek this level of depth. Read Part 1 and from part 2 read Node level features / stats then skip to Part 3. However having said that I would advise you to go through this and try to understand as much as possible because it really helps your intuition and how you reason about Graphs in general.

Before Deep Learning we used to hand craft our features or extract them. Then bundle them up all together collective format to provide statistics to be fed into a machine learning model.

Case study graph: Florentine Families

Aristocratic Italian family of powerful merchants and bankers who ruled Florence in the 15th century. The network is often used to benchmark new centrality indices. The premise is that the Medici should always emerge as one of the most central once (if not the most central).

Questions for later:

  • Are the Medici universally the most central family?
  • Using centrality as explanatory variable (Can centrality explain the wealth attribute?)
  • Is the Pazzi family important ?

Node level features / stats

Let’s introduce some node level features that we may be interested in and we can use them in our future ML models.

This abridged table provides a quick overview of the definitions and main focuses of each node-level statistic. It can serve as a handy reference for comparing and understanding the different metrics in a concise format.

Node degree

Node degree is defined as the number of edges connected to that node.

Degree centrality

Degree centrality is defined as the number of edges connected to that node divided by the maximum number of edges that could possibly be connected to that node.

Check out the NetworkX documentation below. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Here is a simple example

Code

import networkx as nx

# Create the Florentine Families graph
G = nx.Graph()

# Add nodes representing the families
families = [
"Medici",
"Albizzi",
"Strozzi",
"Guadagni",
"Salviati",
"Pazzi",
"Acciaiuoli",
"Barbadori",
"Bischeri",
"Castellani",
"Ginori",
"Lamberteschi",
"Peruzzi",
"Ridolfi",
"Tornabuoni"
]
G.add_nodes_from(families)

# Add edges representing connections between families
connections = [
("Medici", "Salviati"),
("Medici", "Tornabuoni"),
("Medici", "Guadagni"),
("Medici", "Bischeri"),
("Medici", "Acciaiuoli"),
("Medici", "Barbadori"),
("Medici", "Ridolfi"),
("Medici", "Albizzi"),
("Albizzi", "Ginori"),
("Strozzi", "Ridolfi"),
("Strozzi", "Peruzzi"),
("Strozzi", "Bischeri"),
("Guadagni", "Lamberteschi"),
("Guadagni", "Tornabuoni"),
("Salviati", "Pazzi"),
("Pazzi", "Barbadori"),
("Pazzi", "Ridolfi")
]
G.add_edges_from(connections)

# Calculate degree centrality for each node
degree_centrality = nx.degree_centrality(G)

# Print node degrees and degree centrality
for family in families:
degree = G.degree[family]
centrality = degree_centrality[family]
print(f"Family: {family}, Degree: {degree}, Degree Centrality: {centrality}")

Output

Family: Medici, Degree: 8, Degree Centrality: 0.5714285714285714

Family: Albizzi, Degree: 2, Degree Centrality: 0.14285714285714285

Family: Strozzi, Degree: 3, Degree Centrality: 0.21428571428571427

Family: Guadagni, Degree: 3, Degree Centrality: 0.21428571428571427

Family: Salviati, Degree: 2, Degree Centrality: 0.14285714285714285

Family: Pazzi, Degree: 3, Degree Centrality: 0.21428571428571427

Family: Acciaiuoli, Degree: 1, Degree Centrality: 0.07142857142857142

Family: Barbadori, Degree: 2, Degree Centrality: 0.14285714285714285

Family: Bischeri, Degree: 2, Degree Centrality: 0.14285714285714285

Family: Castellani, Degree: 0, Degree Centrality: 0.0

Family: Ginori, Degree: 1, Degree Centrality: 0.07142857142857142

Family: Lamberteschi, Degree: 1, Degree Centrality: 0.07142857142857142

Family: Peruzzi, Degree: 1, Degree Centrality: 0.07142857142857142

Family: Ridolfi, Degree: 3, Degree Centrality: 0.21428571428571427

Family: Tornabuoni, Degree: 2, Degree Centrality: 0.14285714285714285

In this example, we first create a NetworkX graph G and add nodes representing the Florentine families using the add_nodes_from() method. We then add edges between the families using the add_edges_from() method.

To calculate the degree centrality for each node, we use the degree_centrality() function from NetworkX, which returns a dictionary where the keys are the node names and the values are the degree centrality scores.

Finally, we iterate over the families and print the degree and degree centrality for each family.

Here’s a plain and simple explanation of the two concepts:

Node Degree:

  • Node degree refers to the number of edges connected to a particular node in a graph.
  • It measures the immediate connectivity or the number of direct connections a node has.
  • In other words, node degree tells us how many other nodes are directly connected to a specific node.
  • Node degree can provide insights into the local importance or prominence of a node within a network.

Degree Centrality:

  • Degree centrality is a measure of the relative importance or centrality of a node within a network based on its degree.
  • It is calculated by dividing the node’s degree (number of edges connected to the node) by the maximum possible degree in the network.
  • Degree centrality provides a normalized measure that takes into account the size of the network and allows for comparisons across different networks.

Suppose we have a social network represented by the following graph:

          A
/ \
B---C
\ /
D

In this network, we have four nodes: A, B, C, and D.

To calculate the degree centrality of each node, we follow these steps:

Calculate the degree of each node, which is the number of edges connected to the node:

  • Node A: Degree = 2 (connected to nodes B and C)
  • Node B: Degree = 3 (connected to nodes A, C, and D)
  • Node C: Degree = 3 (connected to nodes A, B, and D)
  • Node D: Degree = 2 (connected to nodes B and C)

the determine the maximum possible degree in the network. In this case, the maximum possible degree is 3 because nodes B and C have the highest degree among all nodes.

Finally, calculate the degree centrality for each node by dividing its degree by the maximum possible degree:

  • Node A: Degree Centrality = 2 / 3 ≈ 0.67
  • Node B: Degree Centrality = 3 / 3 = 1.0
  • Node C: Degree Centrality = 3 / 3 = 1.0
  • Node D: Degree Centrality = 2 / 3 ≈ 0.67

It is often used to identify the most connected or influential nodes in a network. Nodes with higher degree centrality are considered more central or influential within the network.

What are Eigenvectors and Eigenvalues and why are they important for graphs ?

Eigenvectors and eigenvalues are mathematical concepts that are important for understanding how certain types of graphs work. Eigenvectors are vectors that remain unchanged when a linear transformation is applied to them. Eigenvalues are scalars that determine how much a vector is changed by a linear transformation. These concepts are important for understanding how certain types of graphs work because they help to describe how the graph changes when certain operations are performed on it. Furthermore, eigenvectors and eigenvalues can be used to find the equilibrium points of a graph, which are points where the graph is not affected by any changes that are made to it.

Let’s understand eigenvectors and eigenvalues in the context of real-world graphs with a simplified example.

Imagine we have a social network where each person is represented as a node, and an edge exists between two nodes if those individuals are friends. Now, let’s consider a scenario where we want to identify the most influential individuals in the network.

Eigenvectors and eigenvalues come into play when we analyze the graph’s adjacency matrix. The adjacency matrix represents the connections between nodes, where each element indicates whether there is an edge between two nodes.

Eigenvectors represent special vectors that remain in the same direction, up to a scalar multiple, when multiplied by the adjacency matrix. In our social network example, an eigenvector could represent a set of individuals who are highly connected to each other.

Eigenvalues are associated with eigenvectors and represent the scalar values by which the eigenvectors are scaled. In the context of graphs, eigenvalues provide insights into the importance or centrality of certain groups of nodes.

For instance, if we compute the eigenvectors and eigenvalues of the adjacency matrix, we can identify groups of individuals who have strong connections within themselves. The corresponding eigenvalue indicates the importance of that group in the network. Nodes with higher eigenvalues are considered more influential or central in the graph.

By analyzing eigenvectors and eigenvalues, we can identify key groups or clusters in the network, understand the flow of influence, and determine the most influential individuals.

Example Graph

Suppose we have a directed graph representing a social network with 4 individuals (nodes) and their connections (edges) as follows:

   A ----> B
^ / \
| / v
| / C
| \ |
+---- D <--+

To calculate the eigenvectors and eigenvalues, we need to compute the adjacency matrix of the graph.

Adjacency Matrix: The adjacency matrix represents the connections between nodes. In this case, the adjacency matrix would be:

   A  B  C  D
A 0 1 0 0
B 0 0 1 1
C 0 0 0 1
D 1 0 1 0

Now, we can use numerical methods or linear algebra libraries to calculate the eigenvectors and eigenvalues of the adjacency matrix.

Python Example: Here’s how you can calculate the eigenvectors and eigenvalues using the NumPy library in Python:

import numpy as np

adjacency_matrix = np.array([[0, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 0, 1],
[1, 0, 1, 0]])

eigenvalues, eigenvectors = np.linalg.eig(adjacency_matrix)

print("Eigenvalues:")
print(eigenvalues)
print()
print("Eigenvectors:")
print(eigenvectors)

Output

Eigenvalues:
[-1. +0.j -0.41421356+0.5860095j -0.41421356-0.5860095j
1. +0.j ]

Eigenvectors:
[[ 0. +0.j 0.52415636+0.j 0.52415636-0.j
0. +0.j ]
[ 0.57735027+0.j -0.2236068 +0.31799948j -0.2236068 -0.31799948j
0.57735027+0.j ]
[ 0.57735027+0.j -0.2236068 -0.31799948j -0.2236068 +0.31799948j
0.57735027+0.j ]
[-0.57735027+0.j -0.89442719+0.j -0.89442719-0.j
-0.57735027+0.j ]]

In this example, we obtained the eigenvalues and eigenvectors of the adjacency matrix. Each eigenvalue corresponds to an eigenvector, and together they provide insights into the graph’s structure and centrality.

Note that eigenvalues can be complex numbers if the graph is directed or weighted. Real eigenvalues indicate important nodes, and their associated eigenvectors represent the centrality or influence of those nodes.

By analyzing the eigenvectors and eigenvalues, we can gain insights into the most influential nodes, detect clusters or communities, and understand the network’s overall structure and dynamics.

What insights can they provide me that the node degree and degree centrality cannot provide me ?

Eigenvalues and eigenvectors provide additional insights beyond node degree and degree centrality. While node degree and degree centrality focus on the local connectivity of individual nodes, eigenvalues and eigenvectors capture global properties of the graph.

Here are a few insights that eigenvalues and eigenvectors can provide:

  1. Graph Connectivity: The largest eigenvalue (known as the spectral radius) can indicate the graph’s connectivity. If the largest eigenvalue is close to zero, it suggests that the graph is disconnected into separate components.
  2. Graph Structure: Eigenvectors corresponding to the largest eigenvalues can reveal the overall structure or patterns within the graph. These eigenvectors can highlight influential nodes or communities that are not easily identified by node degree or degree centrality alone.
  3. Centrality Measures: Eigenvector centrality is a measure of node importance that considers both the node’s direct connections and the importance of its neighboring nodes. It assigns higher scores to nodes that are connected to other highly central nodes. This centrality measure can provide insights into nodes that have indirect influence or bridge different parts of the graph.
  4. Network Dynamics: Eigenvectors can be used to understand the stability or equilibrium of network dynamics. For example, in networks with diffusion or information spreading, the dominant eigenvector can represent the steady-state distribution or the long-term behavior of the system.
  5. Community Detection: Eigenvectors can be used in community detection algorithms to identify groups of nodes that have similar connectivity patterns. Community structure analysis based on eigenvalues and eigenvectors can uncover clusters or subgraphs within the larger network.

THIS HARD TO UNDERSTAND; LET ME GIVE YOU AN EXAMPLE
Here are some examples that illustrate the insights provided by eigenvalues and eigenvectors:

  1. PageRank Algorithm: The PageRank algorithm, used by search engines like Google, assigns importance scores to web pages based on their connectivity. It utilizes the concept of eigenvectors to determine the importance of each page in a large web graph. The eigenvector corresponding to the largest eigenvalue helps identify influential web pages that have many incoming links from other important pages.
  2. Community Detection: Spectral clustering, a popular community detection algorithm, leverages the eigenvalues and eigenvectors of a graph’s Laplacian matrix. The eigenvectors provide insights into the connectivity patterns within communities, allowing the algorithm to partition the graph into cohesive groups of nodes.
  3. Structural Analysis: Eigenvalues and eigenvectors can help identify important structural properties of a graph. For instance, the second smallest eigenvalue of the Laplacian matrix, known as the algebraic connectivity, is associated with graph connectivity and can help identify graph cuts or bridges.

By analyzing the eigenvector associated with the second smallest eigenvalue, we can gain insights into the graph’s connectivity patterns. The eigenvector provides information about the nodes that bridge different parts of the graph or act as connectors between clusters. Nodes corresponding to the eigenvector with the second smallest eigenvalue can represent potential cut points or bridges that, if removed, would disconnect the graph or separate its components.

This information can be valuable in various applications, such as identifying critical nodes or edges in transportation networks, finding vulnerabilities in communication networks, or understanding the resilience of social networks. By leveraging the algebraic connectivity and associated eigenvectors, we can gain a deeper understanding of the graph’s structural properties and make informed decisions about network design, optimization, or security.

Interpreting the numbers associated with eigenvalues and eigenvectors depends on the specific context and application. Here are some general guidelines for interpretation:

Eigenvalues:

  • Magnitude: The magnitude of an eigenvalue represents its importance or influence in the graph. Larger eigenvalues indicate greater significance or dominance.
  • Sign: The sign of an eigenvalue may have different interpretations depending on the context. In some cases, it may indicate positive or negative correlations between nodes or attributes.
  • Relative Comparison: Comparing eigenvalues within the same graph can provide insights into the relative importance or centrality of different nodes or components.

Eigenvectors:

  • Magnitude: The magnitude of an eigenvector entry reflects the relative importance or influence of a node or attribute in the graph. Larger magnitudes indicate stronger associations or contributions.
  • Positive and Negative Values: Positive and negative values of eigenvector entries indicate different characteristics or roles of nodes or attributes. Positive values may represent attractors, influencers, or clusters, while negative values may indicate repellers or outliers.
  • Comparison and Ranking: Comparing eigenvector entries within the same eigenvector or across different eigenvectors can help rank or prioritize nodes or attributes based on their importance or centrality.

It’s important to note that the interpretation of eigenvalues and eigenvectors is highly context-dependent and requires domain knowledge and understanding of the specific graph and its properties. The interpretation may vary based on the type of graph, the application domain, and the specific analysis or problem at hand. Therefore, it’s crucial to consider the specific context and consult domain experts to derive meaningful interpretations from eigenvalues and eigenvectors in your particular use case.

Here are some rules of thumb, I will apply them to the transportation network (You have to figure it out for your context):

  1. Dominant Eigenvalue: The dominant eigenvalue (the largest in magnitude) indicates the overall importance or influence of the transportation network. A larger dominant eigenvalue suggests a more interconnected and influential network.
  2. Dominant Eigenvector: The dominant eigenvector associated with the dominant eigenvalue can provide insights into the most significant transportation hubs or nodes. Nodes with higher magnitudes in the dominant eigenvector are likely to be major transportation hubs, while nodes with lower magnitudes may have less impact on the overall network.
  3. Community Detection: Eigenvectors associated with smaller eigenvalues can be used for community detection or identifying clusters within the transportation network. Nodes with similar signs or patterns in these eigenvectors may belong to the same transportation community or share similar characteristics.
  4. Connectivity and Bridges: The algebraic connectivity (second smallest eigenvalue) of the graph’s Laplacian matrix can indicate the connectivity or presence of bridges in the transportation network. A smaller algebraic connectivity suggests more disconnected components or potential bridges that connect different parts of the network.
  5. Ranking Nodes: Comparing the magnitudes of eigenvector entries can help rank or prioritize nodes in terms of their importance or centrality within the transportation network. Nodes with larger magnitudes are likely to be more critical or central in terms of transportation flows or accessibility.

if you are an Applied Machine Learning Engineer you need to understand how it works to a decent level but if you are a Machine Learning Scientist or Researcher then perhaps you need to delve deeper here is a medium post:

Eigenvector centrality

Eigenvector centrality is defined as the sum of the centrality scores of all the neighbors of that node. For example in the Florentine Families, the question would be how important are the members of the Pazzi family in relation to the other families? The answer would be that the Pazzi family is not very important. This is because the Pazzi family only has a few members, and most of them are not very central within the network. Eigenvector centrality is a measure of the importance of a node in a network. It is based on the principle that a node is important if it is connected to other important nodes. The eigenvector centrality of a node is proportional to the sum of the centralities of its neighbors. Therefore, a node with many high-centrality neighbors will have a high eigenvector centrality. You can say that on the other hand that the Medci family is more influential on the account of how influential its connections are. Let that sink in for a minute.

Eigenvector centrality = Σ(centrality score of neighbor)/(#of neighbors)

we could also write it in vector format as

λ* e = A * e

where:

λ: is the Eigenvector centrality score,

e: is the vector of centrality scores for the neighbors of the node,

A: is the adjacency matrix for the network.

In the context of the Florentine Families, the Pazzi family would have a low Eigenvector centrality score because they are not connected to many other important families.

import networkx as nx

# Create the Florentine Families graph
G = nx.Graph()
G.add_edges_from([
("Medici", "Barbadori"),
("Medici", "Albizzi"),
("Medici", "Salviati"),
("Medici", "Pazzi"),
("Medici", "Ridolfi"),
("Medici", "Strozzi"),
("Medici", "Bischeri"),
("Medici", "Guadagni"),
("Medici", "Ginori"),
("Medici", "Lamberteschi"),
("Medici", "Tornabuoni"),
("Medici", "Peruzzi"),
("Albizzi", "Ginori"),
("Barbadori", "Ridolfi"),
("Bischeri", "Guadagni"),
("Guadagni", "Lamberteschi"),
("Guadagni", "Tornabuoni"),
("Guadagni", "Peruzzi"),
("Strozzi", "Ridolfi"),
("Strozzi", "Bischeri"),
])

# Calculate eigenvector centrality
eigenvector_centrality = nx.eigenvector_centrality(G)

# Print eigenvector centrality scores
for node, centrality in eigenvector_centrality.items():
print(f"{node}: {centrality:.4f}")
Medici: 0.5153
Albizzi: 0.1701
Barbadori: 0.1559
Ridolfi: 0.2389
Salviati: 0.1559
Pazzi: 0.0267
Strozzi: 0.3303
Bischeri: 0.2492
Guadagni: 0.3525
Ginori: 0.0676
Lamberteschi: 0.0977
Tornabuoni: 0.1559
Peruzzi: 0.1559

Closeness centrality

Photo by Martin Sanchez on Unsplash

Closeness centrality is defined as the sum of the shortest path lengths from that node to all other nodes in the graph. For example in a Graph made of locations of different states in the US, the node that is the closest to all other nodes, is the node that is the center of the graph. Closeness centrality can be used to find the best location for a new store or other business. The node with the highest closeness centrality is the most central node in the graph, and is the best location for the new store.

Closeness centrality differs from other statistics we discussed (such as degree centrality and eigenvector centrality) in terms of the specific information it captures and the insights it provides about a node’s position in a graph.

Here are the key differences:

  1. Focus on Reachability: Closeness centrality is primarily concerned with the reachability or accessibility of a node in the graph. It measures how easily a node can reach other nodes in terms of shortest path lengths. This is in contrast to degree centrality, which solely considers the number of edges connected to a node, and eigenvector centrality, which takes into account the importance of a node’s neighbors.
  2. Consideration of Global Structure: Closeness centrality takes into account the entire graph’s structure, considering the shortest path lengths from a node to all other nodes. It provides a global perspective on a node’s centrality by accounting for the overall connectivity and distances within the graph. In contrast, degree centrality focuses on the direct connections of a node, while eigenvector centrality emphasizes the importance of a node’s neighbors.
  3. Emphasis on Efficiency: Closeness centrality measures the efficiency of a node in terms of reaching other nodes. Nodes with higher closeness centrality tend to have shorter average path lengths to other nodes, implying that they can reach other nodes more quickly or with less effort. This aspect is particularly relevant for transportation networks, where efficient travel and connectivity are crucial considerations.

Betweenness centrality

Betweenness centrality is defined as the number of shortest paths from all vertices to all others that pass through that node. Which shows how often a node lies on the shortest path between two other nodes, this could be tolls and tunnel importance analysis. TL;DR How often a node lies on the shortest path between two other nodes, this could be tolls and tunnel importance analysis.

Closeness centrality and betweenness centrality are both measures of node importance in a network, but they capture different aspects of centrality.

Closeness centrality focuses on the proximity of a node to all other nodes in the network. It is calculated as the inverse of the average shortest path length from a node to all other nodes. In the context of a transportation system, closeness centrality measures how quickly a node can reach all other nodes in terms of transportation time or distance. Nodes with high closeness centrality are considered central in terms of accessibility and efficiency of travel, as they are close to many other locations in the network. In the context of a crowded transportation system, a node with high closeness centrality would represent a hub that is well-connected to many other locations and can be easily reached from various parts of the network.

On the other hand, betweenness centrality focuses on the extent to which a node lies on the shortest paths between other pairs of nodes in the network. It quantifies the influence or control a node has over the flow of information or resources between other nodes. In the context of a transportation system, betweenness centrality captures the importance of a node in terms of facilitating the movement of traffic or passengers between different locations. Nodes with high betweenness centrality act as bridges or intermediaries in the network, connecting different parts and facilitating the flow of transportation. In the case of identifying crowded nodes, high betweenness centrality would indicate nodes that serve as critical junctions or transit points where a significant amount of traffic passes through.

Clustering coefficient

We cannot all be born to the medici family, some of us are born to other families, so what about them how do we rank their influence. We cannot rely on the degree nor centrality. For example the Peruzzi and Gaudagni nodes in the graph have very similar degrees and the values are 3 vs 4. and also a similar eigenvector 0.28 and 0.29. However there is a difference between the two families look at how the Guadagni family has a star-like role. Given all of that information let us formally define the clustering coefficient measures the proportion of closed triangles in a node’s local neighborhood.

This means that

Clustering coefficient = (# of triangles in a node’s local neighborhood)/(# of possible triangles in a node’s local neighborhood)

Transitivity is the fraction of all possible triangles present in the graph. In our example the Gaudagni family has a high clustering coefficient indicating that it is well-connected to other families. The Peruzzi family has a low clustering coefficient indicating that it is not well-connected to other families. The clustering coefficient can be used to find communities in a network. A community is a group of nodes that are more densely connected to each other than to the rest of the network. The nodes in a community are said to be clustered together. The clustering coefficient is a measure of how clustered a node is. A node with a high clustering coefficient is more likely to be in a community than a node with a low clustering coefficient.

Closed triangles, Ego Graphs, and Motifs

Closed triangles are sets of three nodes where each node is connected to the other two nodes. Ego graphs are subgraphs consisting of a focal node and all the nodes that are connected to that focal node. Motifs are patterns of node connectivity that occur more often than would be expected by chance.

So what does all of that have to do with our example?

The Gaudagni family has a high clustering coefficient because it is involved in many closed triangles. This means that the Gaudagni family is well-connected to other families. The Peruzzi family has a low clustering coefficient because it is not involved in many closed triangles. This means that the Peruzzi family is not well-connected to other families.

The Gaudagni family is also involved in many ego graphs. This means that the Gaudagni family is connected to many other families. The Peruzzi family is not involved in many ego graphs. This means that the Peruzzi family is not connected to many other families. The Gaudagni family is also involved in many motifs. This means that the Gaudagni family is connected to many other families in a way that is not random. The Peruzzi family is not involved in many motifs. This means that the Peruzzi family is not connected to many other families in a way that is not random. Notice that by examining the a node’s ego graph we essentially transform the task of computing node-level statistics and feature to a graph level task.

Transportation example summary:

Graph-level features

So far we have been taking about node level statistics, but the thing is node level statistics is for node level tasks. What if we want to do graph-level tasks such as classifying molecule types based on graph structure, Well, we would need graph level statistics. So let’s look at how to extract these features using graph kernel methods.

What are graph kernel methods ?

Graph kernel methods are a class of machine learning algorithms that can be used to learn from data that is represented as a graph. Graph kernel methods are a powerful tool for learning from data that is structured as a graph.

What are the benefits of using graph kernel methods?

There are many benefits of using graph kernel methods. Graph kernel methods can be used to learn from data that is structured as a graph. This is important because many real-world datasets are structured as graphs. For example, social networks, biological networks, and chemical compounds can all be represented as graphs. Graph kernel methods are also a powerful tool for learning from data that is not well-structured. For example, natural language data is often unstructured and can be represented as a graph. Graph kernel methods can be used to learn from this type of data.

What are the disadvantages of using graph kernel methods?

There are some disadvantages of using graph kernel methods. Graph kernel methods can be computationally expensive. This is because they require the computation of kernel matrices, which can be time-consuming. Graph kernel methods can also be difficult to interpret. This is because they are based on the computation of kernel matrices, which can be difficult to understand.

What are some applications of graph kernel methods?

There are many applications of graph kernel methods. Graph kernel methods can be used for social network analysis, drug discovery, and text mining.

What is Bag of nodes ?

Bag of nodes is a method for representing graphs as vectors. This method is used by many graph kernel methods. The bag of nodes representation is a simple way to represent a graph as a vector. It is based on the idea that a graph can be represented as a set of nodes. Each node in the graph is represented as a vector. The bag of nodes representation is the concatenation of all the node vectors. This aggregation is then used to describe the entire graph.

What is Weisfeiler-Lehman graph kernel ?

The Weisfeiler-Lehman (WL) graph kernel is a graph kernel method. It is based on the idea of representing a graph as a set of nodes. Each node in the graph is represented as a vector. The Weisfeiler-Lehman graph kernel is the dot product of all the node vectors. The basic idea behind the WL adheres to the following steps.

Step 1: Represent each node as a vector with a label for each node.

Step 2: Compute the dot product of all the node vectors.

Step 3: Use the dot product to represent the entire graph.

The Weisfeiler-Lehman graph kernel is a powerful tool for learning from data that is structured as a graph. It is also a computationally expensive method. In other words the Weisfeiler-Lehman (WL) is computed by

K(G1,G2) = <v1,v2>

where: G1 and G2 are the graphs to be compared,

v1 and v2 are the node vectors for G1 and G2.

One popular way to use WL is to approximate graph isomorphism to check whether or not two graphs have the same label set after K rounds of the WL algorithm, where K is a hyperparameter. If they have the same label set, then they are considered isomorphic and are assigned a score of 1.0. If they have different label sets, then they are considered non-isomorphic and are assigned a score of 0.0.

Example: Let’s consider two transportation networks represented as graphs: Network A and Network B. Each node in the graph represents a transportation hub, and the edges represent the connections between the hubs.

  • In Network A, we have hubs labeled as A, B, C, and D.
  • In Network B, we have hubs labeled as A, B, C, D, and E.

We can apply the Weisfeiler-Lehman graph kernel to compare the similarity of these networks.

Explanation: The Weisfeiler-Lehman (WL) graph kernel is a method to compare and represent graphs. In the context of the transportation networks, it works as follows:

  1. Step 1: Each node in the graph is represented as a vector, where each vector element corresponds to a label assigned to the node. For example, node A in Network A and Network B will have the same label “A,” and so on.
  2. Step 2: The dot product of all the node vectors is computed. This dot product serves as a representation of the entire graph.
  3. Step 3: The resulting dot product represents the similarity or dissimilarity between the two graphs. If the dot product is high, it indicates that the graphs have similar label sets and are more likely to be isomorphic (structurally similar). If the dot product is low, it suggests that the graphs have different label sets and are less likely to be isomorphic.

Here are the steps of the Weisfeiler-Lehman (WL) graph kernel applied to the transportation networks algebraically:
Step 1: Assign Labels to Nodes: Network A:

  • Node A: Label 1
  • Node B: Label 2
  • Node C: Label 3
  • Node D: Label 4

Network B:

  • Node A: Label 1
  • Node B: Label 2
  • Node C: Label 3
  • Node D: Label 4
  • Node E: Label 5

Step 2: Compute Dot Product of Node Vectors: Network A:

  • Node Vector A: [1, 0, 0, 0]
  • Node Vector B: [0, 1, 0, 0]
  • Node Vector C: [0, 0, 1, 0]
  • Node Vector D: [0, 0, 0, 1]

Network B:

  • Node Vector A: [1, 0, 0, 0, 0]
  • Node Vector B: [0, 1, 0, 0, 0]
  • Node Vector C: [0, 0, 1, 0, 0]
  • Node Vector D: [0, 0, 0, 1, 0]
  • Node Vector E: [0, 0, 0, 0, 1]

Certainly! Here are the steps of the Weisfeiler-Lehman (WL) graph kernel applied to the transportation networks algebraically:

Step 1: Assign Labels to Nodes: Network A:

  • Node A: Label 1
  • Node B: Label 2
  • Node C: Label 3
  • Node D: Label 4

Network B:

  • Node A: Label 1
  • Node B: Label 2
  • Node C: Label 3
  • Node D: Label 4
  • Node E: Label 5

Step 2: Compute Dot Product of Node Vectors: Network A:

  • Node Vector A: [1, 0, 0, 0]
  • Node Vector B: [0, 1, 0, 0]
  • Node Vector C: [0, 0, 1, 0]
  • Node Vector D: [0, 0, 0, 1]

Network B:

  • Node Vector A: [1, 0, 0, 0, 0]
  • Node Vector B: [0, 1, 0, 0, 0]
  • Node Vector C: [0, 0, 1, 0, 0]
  • Node Vector D: [0, 0, 0, 1, 0]
  • Node Vector E: [0, 0, 0, 0, 1]

Step 3: Dot Product Calculation: Network A: Dot Product = [1, 0, 0, 0] * [0, 1, 0, 0] * [0, 0, 1, 0] * [0, 0, 0, 1] = 0

Network B: Dot Product = [1, 0, 0, 0, 0] * [0, 1, 0, 0, 0] * [0, 0, 1, 0, 0] * [0, 0, 0, 1, 0] * [0, 0, 0, 0, 1] = 0

Step 4: Interpretation: In this example, the dot product of the node vectors for both Network A and Network B is 0. This indicates that the label sets of the two graphs are different, suggesting that the graphs have different structural patterns. Therefore, they are considered non-isomorphic according to the Weisfeiler-Lehman graph kernel.

Please note that this is a simplified example for illustration purposes. In practice, the label assignment and dot product calculations can involve more complex representations and computations, depending on the specific graph and its characteristics.

Graphlets and Path-based methods

Path-based methods are a class of graph kernel methods. They are based on the idea of representing a graph as a set of paths. Each path in the graph is represented as a vector. The path-based graph kernel is the dot product of all the path vectors. Path-based methods are a powerful tool for learning from data that is structured as a graph. They are also a computationally expensive method. But it is less computationally expensive than WL.

Neighborhood overlap detection

Neighborhood overlap detection answers the question of “Which pair of nodes are related based on their similarities?” for example the simplest neighborhood overlap measures just counts the number of neighbors that two nodes share and perhaps based on that you can predict a link between these two nodes. but you need to set a threshold to determine when to predict the existence of an edge. There are many ways you can do but lets examine the local overlap statistics it has three basic kinds Sorenson, Salton, Jaccard Sørensen index Normalizes the count of common neighbors by the sum of the node degrees. Normalization of some kind is usually very important; otherwise, the overlap measure would be highly biased towards predicting for nodes with large degrees. The formula would be

Sørenson index(A,B) = 2* common neighbors(A,B) / (degree(A) + degree(B))

The rest are simply variations of this, so for example Salton the denominator is different

Salton index(A,B) = 2* common neighbors(A,B) / Sqrt((degree(A) * degree(B)))

Here is an example problem on how to use the Salton index to predict edges in a graph. Given the following graph, use the Salton index to predict which pairs of nodes are most likely to be connected. The Salton index for each pair of nodes is:

(A,B) = 2 * 1 / Sqrt(3 * 4) = 0.5

(A,C) = 2 * 2 / Sqrt(3 * 5) = 0.8

(A,D) = 2 * 2 / Sqrt(3 * 4) = 0.8

(B,C) = 2 * 2 / Sqrt(4 * 5) = 0.8

(B,D) = 2 * 3 / Sqrt(4 * 4) = 1.0

(C,D) = 2 * 2 / Sqrt(5 * 4) = 0.8

Based on the Salton index, the pairs of nodes that are most likely to be connected are (A,C), (A,D), (B,C), and (B,D). You should be able to answer this question.

Global Overlap measures

Global Overlap measures are different because while local overlap measures are very good for Link prediction and they often achieve competitive predictions, However the local approaches are limited, for instance two nodes could have no local overlap but they are still members of the same community. Enter the Katz index, the Katz index is a global overlap measure that is based on a parameterized random walk on the graph. The Katz index is defined as: Katz index(A,B) = α * common neighbors(A,B) + β * 2nd order neighbors(A,B) + γ * 3rd order neighbors(A,B) + … so on where: α, β, γ, … are parameters that control the importance of each order of neighbor. The Katz index can be used to predict which pairs of nodes are most likely to be connected. Given the following graph, use the Katz index to predict which pairs of nodes are most likely to be connected.

The Katz index for each pair of nodes is:

(A,B) = 0.5 * 1 + 0.25 * 2 + 0.125 * 3 + … = 1.875

(A,C) = 0.5 * 2 + 0.25 * 3 + 0.125 * 4 + … = 2.875

(A,D) = 0.5 * 2 + 0.25 * 3 + 0.125 * 4 + … = 2.875

(B,C) = 0.5 * 2 + 0.25 * 3 + 0.125 * 4 + … = 2.875

(B,D) = 0.5 * 3 + 0.25 * 4 + 0.125 * 5 + … = 3.875

(C,D) = 0.5 * 2 + 0.25 * 3 + 0.125 * 4 + … = 2.875

Based on the Katz index, the pairs of nodes that are most likely to be connected are (A,B), (A,C), (A,D), (B,C), (B,D), and (C,D).

But the main issue with the Katz is that it is biased by node degree. To fix this paper by Leicht et al. [2006] proposed an improved metrics by considering the ration between the actual number of observed path and the number of expected path between two nodes. In other words,

Katz index(A,B) = common neighbors(A,B) / expected common neighbors(A,B)

The expected common neighbors is a function of the node degree and the number of walks that you want to take. So by increasing the number of walks you can decrease the bias. There is another global overlap measure called the SimRank, SimRank is very similar to the Katz index but instead of using a random walk on the graph SimRank uses a recursive process to compute the similarity between two nodes.

Random walk methods

Random walk methods are based on the idea that two nodes are related if they can be reached from each other by a random walk. For example, the transition probabilities of a random walker on a graph are defined by the adjacency matrix of the graph. If two nodes are related, the random walker is more likely to transition from one node to another, and vice versa. This idea can be used to define a similarity measure between two nodes, based on the transition probabilities of a random walker on the graph. The transition probabilities can be estimated from the adjacency matrix, or from the node degrees.

Example on its application, Thanks Google for PageRank.

https://medium.com/@gbrnc28/random-walk-method-page-rank-algorithm-using-networkx-7da8696ecc38

TL;DR The similarity between a pair of nodes is based on how likely they are to reach each other.

Graph cuts and clustering

In order to motivate the Laplacian spectral clustering approach, we first must define the notion of a graph cut. Given a graph G = (V,E), a cut is defined as a partition of the nodes V into two disjoint subsets V1 and V2. The cost of a cut is typically defined as the number of edges connecting nodes in V1 with nodes in V2, which we will denote by |E(V1,V2)|.

The goal of spectral clustering is to find a graph cut that minimizes the cost function |E(V1,V2)|. This optimization problem can be reformulated as a quadratic program, which can be solved using the graph Laplacian matrix. The Laplacian matrix is a matrix that encodes the structure of a graph. It is defined as L = D — A, where D is the degree matrix and A is the adjacency matrix. The Laplacian matrix can be used to find the optimal graph cut by solving the following optimization problem: minimize trace(X’LX) subject to: X’X = I where X is a matrix whose columns are the indicators of the V1 and V2 sets. The solution to this optimization problem is the eigenvector corresponding to the smallest eigenvalue of the Laplacian matrix. This eigenvector can be used to partition the nodes into the two sets V1 and V2. The Laplacian spectral clustering approach can be used to cluster data points in any metric space. Given a set of data points, we can construct a graph where the nodes are the data points and the edges are defined by some similarity metric. The Laplacian matrix can then be used to find the optimal clustering of the data points.

Approximating the RationCut with laplacian spectrum

The ratio cut is a graph partitioning criterion that attempts to balance the number of edges cut by the partition and the size of the resulting partitions. The ratio cut problem can be formulated as the following optimization problem: minimize trace(X’LX) / trace(X) subject to: X’X = I where X is a matrix whose columns are the indicators of the V1 and V2 sets. The solution to this optimization problem is the eigenvector corresponding to the second smallest eigenvalue of the Laplacian matrix. This eigenvector can be used to partition the nodes into the two sets V1 and V2. The Laplacian spectral clustering approach can be used to cluster data points in any metric space. Given a set of data points, we can construct a graph where the nodes are the data points and the edges are defined by some similarity metric. The Laplacian matrix can then be used to find the optimal clustering of the data points.

--

--