Traditional ML for Graphs

7 min readJun 8, 2024

In the last blog post, I discussed in brief about different topics in the context of Graph Neural Networks. In this blog post, I will be discussing about the traditional feature extraction approaches that were popular in the past and used along with classical ML algorithms to solve different graph based problems. So let’s get started…

Hand Crafted Features

Traditional ML pipelines used hand-crafted features to perform the modelling. To achieve a good performance, effective feature creation is a crucial step. Overall, we can divide the feature extraction process in 3 categories:

Node Level
Edge/Link Level
Graph Level

We will be discussing different approaches for all three categories one by one. So tighten your seat belt and lets begin…

Node Level Features

1. Node Degree

Node Degree refers to simply counting the number of edges for each node and treat them as features.

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Although, this approach seems to be pretty straight forward, it has a drawback. It does not considers the importance of node into account. Therefore, we need a better strategy that takes node importance into consideration as well.

2. Node Centrality

It is a feature extraction method which takes node centrality into account. There are different ways by which we can calculate node centrality, let’s discuss them one by one.

Eigenvector Centrality: The idea behind this is that a node is important if it is surrounded by important neighboring nodes. It is calculated as the sum of centrality of neighboring nodes.

If we rewrite the above equation in matrix form, we can observe that eigenvector centrality is nothing but the eigenvector. The leading eigenvector will be used to calculate the centrality.

2. Betweenness Centrality: According to betweenness centrality, a node is important if it lies on many shortest paths between other nodes.

3. Closeness Centrality: It says that a node is important if it has smallest path lengths to all other nodes.

3. Clustering Coefficient

It is based on the idea that how well a node’s neighboring nodes are connected. It can be calculated by below equation:

The above network where there is a central node and other nodes surround it is called ego network.

4. Graphlet Degree Vector (GDV)

In GDV, we calculate number of graphlets rooted at a given node. So the question comes, what is graphlet? It is basically a rooted connected non-isomorphic sub-graph of a given graph. Below image shows possible graphlets till 5 node sub-graphs.

So how to calculate GDV? The below images shows how to calculate it followed by it’s description:

In the above graph, you can see how to calculate GDV for sub-graph rooted at v. We will calculate 2 and 3 node graphlets for this sub-graph.

Important observation
Node degree counts number of edges that a node touches
Clustering Coefficient counts number of triangles that a node touches
GDV counts number of graphlets that a node touches

Overall, the above methods can be divided into two categories:

1. Importance-based Features
      - Node Degree
      - Node Centrality

2. Structure-based Features
      - Node Degree
      - Clustering Coefficient
      - Graphlet Degree Vector

Edge/Link Level Features

1. Distance based features

Shortest Path distance between two nodes: In this approach, we simply calculate the shortest path distance between two nodes. However, this approach does not capture the degree of neighborhood overlap. Therefore, it is not a very good metric to calculate edge features, although a good starting point.

2. Local Neighborhood Overlap

Common Neighbors: Calculates the number of common neighbors, two nodes share.

2. Jaccard’s Coefficient: Calculates the intersection over union of neighboring nodes of two nodes.

3. Adamic-Adar Index: It is calculated as one over log of neighboring nodes, a node has.

The main issue with local neighborhood overlap is that the value will be zero if there are no common neighbors between two nodes.

3. Global Neighborhood Overlap

To overcome the limitation of local neighborhood overlap, global neighborhood overlap comes in the picture.

Katz Index: It counts the number of paths of all lengths between all pair of nodes. To do so, it utilizes the adjacency matrix of the input graph. Let’s now discuss about how to calculate it:

Let Pᵤᵥᵏ = #path of length k between u and v
===> Pᵤᵥ¹ = #path of length 1 between u and v which is nothing but the adjacency matrix A of input graph

Now, how to compute #paths of length 2?

Step 1: Compute #paths of length 1 between each of node u’s neighbor and v.

Step 2: Sum up these #paths across u’s neighbors using below equation:

You can see that the #path of length 2 is nothing but the multiplication of adjacency matrix to itself. Therefore, in this way, you can calculate #paths for any arbitrary length.

To finally calculate the Katz index, just do a summation of #paths for all length across any two nodes v1 and v2 as shown in below figure:

Finally, to calculate the entire matrix, you can use the below equation:

Summary

1. Distance based features
      - Uses shortest path length b/w two nodes but does not capture
neighborhood overlaps.

2. Local Neighborhood Overlap
      - Captures number of common neighbors but can become zero if no 
common neighbors.

3. Global Neighborhood Overlap
      - Uses entire graph structure to calculate Katz Index.

Graph Level Features

While creating graph level features, kernel level features are popular approach. They are inspired from classical machine learning where kernel methods are quite popular. We will be discussing about two popular methods known as Graphlet Kernels and Weisfeiler-Lehman Kernel.

1. Graphlet Kernels

The idea of graphlet kernel is to count the number of graphlets in a graph like we did in Graphlet Degree Vector. Although the idea of graphlet here is slightly different. Here, graphlets need not to be disconnected and not rooted as well. Below example shows how to count graphlet for 3 node subgraph:

So, how to create graphlet kernel with this approach. Assume you have a graph G, then the graphlet vector Gⱼ can be defined as graphlet count vector:

(f𝓰)ᵢ = #(gᵢ ⊆ G) for i = 1,2,….,nⱼ

The below image shows how to calculate graphlet vector for the sample k = 3 for the input graph G.

Then, if you have two graphs G and G’, you can calculate graphlet kernel with the below formula:

Although, one thing you should keep in mind that the size of graph G and G’ can be different. In that case, you can normalize the vectors to make them same scale.

There is one issue with the above approach. The approach too much complex to implement. Counting size-k graphlets for a graph with size n take nᵏ time.

2. Weiseiler-Lehman Kernel

To overcome the complexity of Graphlet Kernels, WL kernels were introduced. The WL Kernels are based on color refinement algorithm which is defined below:

Given: A graph 𝐺 with a set of nodes 𝑉.
1. Assign an initial color 𝑐 , 𝑣 to each node 𝑣.
2. Iteratively refine node colors by