Message Passing in Graphs

Dhaval Taunk

Follow

Published in

Analytics Vidhya

11 min readJul 4, 2024

--

In this blog post, I will discuss the message passing algorithm, which serves as the backbone for graph neural network algorithms. This algorithm facilitates information processing through the nodes of the graph, enabling the network to learn various graph attributes. Consequently, we can perform tasks such as node classification and link prediction. Let’s dive in!

Message Passing

Let’s explore message passing through the lens of a node classification task. So, what exactly is node classification?

Node Classification: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Example: In a network where some nodes are identified as fraudsters and others as fully trusted, the challenge is to identify additional fraudsters and trustworthy nodes based on their interactions and behaviors within the network?

Node classification typically falls under the category of semi-supervised learning. It leverages both labeled and unlabeled nodes within a graph to predict the labels of unlabeled nodes.

First let’s discuss the intuition behind message passing framework.

Message passing in graphs entails nodes exchanging information (messages) with their neighbors iteratively. This process updates each node’s state based on aggregated information, enabling nodes to integrate knowledge from their local neighborhoods. This correlation in graph networks supports tasks such as node classification or link prediction.

Simply put, nodes that are similar are connected in some manner. Node classification is addressed using a technique called collective classification, where labels are assigned to all nodes in a network simultaneously.

Now let’s discuss different techniques of message passing algorithms.

We will be discussing majorly 3 different message passing techniques.

Relational classification
Iterative classification
Belief propagation

We will be discussing them in detail one by one. But before that, let’s discuss some important things that we will be using throughout this blog.

Individual behaviors are correlated in the network
Correlation: nearby nodes have the same color (belonging to the same class)

There are mainly two types of dependencies that lead to correlation:

Homophily
Influence

Let’s discuss both of them.

HomoPhily: Homophily refers to the tendency of individuals to associate and bond with others who are similar to themselves in characteristics such as beliefs, interests, or demographics.
- Example 1: Researchers who focus on the same research area are more likely to establish a connection (meeting at conferences, interacting in academic talks, etc.)
- Example 2: Online social network
— Nodes = people
— Edges = friendship
— Node color = interests (sports, arts, etc.)

People with the same interest are more closely connected due to homophily — Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

2. Influence: Social connections can influence the individual characteristics of a person.
- Example: I recommend my musical preferences to my friends, until one of them grows to like my same favorite genres!

How do we leverage this correlation observed in networks to help predict node labels?

Intuition: Similar nodes are typically close together or directly connected in the network

Approach:
— Guilt-by-association: If I am connected to a node with label 𝑋, then I am likely to have label 𝑋 as well.

Example:
— Malicious/benign web page: Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines.

Classification label of a node 𝑣 in network may depend on:

Features of 𝑣
Labels of the nodes in 𝑣’s neighborhood
Features of the nodes in 𝑣’s neighborhood

So, how to find label? Let’s understand the task first.

Given:
— Graph
— Few labeled nodes

Find: class (red/green) of remaining nodes

Main assumption: There is homophily in the network

Let’s try to solve it using semi-supervised learning approach.

Example task from above graph:

Let 𝑨 be a 𝑛×𝑛 adjacency matrix over 𝑛 nodes
Let Y = {0, 1}ⁿ be a vector of labels:
- Yᵥ = 1 belongs to Class 1
- Yᵥ = 0 belongs to Class 0
- There are unlabeled node needs to be classified
Goal: Predict which unlabeled nodes are likely Class 1, and which are likely Class 0

As mentioned earlier, this problem is solved by collective classification. Let’s learn how to perform collective classification.

Collective Classification

Intuition: Simultaneous classification of interlinked nodes using correlations.
Probabilistic framework
Markov Assumption: The label 𝑌ᵥ of one node v depends on the labels of its neighbors 𝑁ᵥ

P(Yᵥ) = P(Yᵥ | Nᵥ)

Collective classification involves 3 steps:

Local Classifier: Used for initial label assignment
- Predicts label based on node attributes/features
- Standard classification task
- Does not use network information
Relational Classifier: Capture correlations
- Learns a classifier to label one node based on the labels and/or attributes of its neighbors
- This is where network information is used
Collective Inference: Propagate the correlation
- Apply relational classifier to each node iteratively
- Iterate until the inconsistency between neighboring labels is minimized
- Network structure affects the final prediction

Till now I have covered all the basic concepts required to understand the message passing algorithm. Before going back to 3 message passing algorithms I mentioned above, let’s first define a problem statement that we will take as example to understand message passing.

Problem Statement

How to predict the labels 𝑌ᵥ for the unlabeled nodes v(in grey color)?
Each node v has a feature vector 𝑓ᵥ
Labels for some nodes are given (1 for green, 0 for red)
Task: Find 𝑃(𝑌ᵥ) given all features and the network

We focus on semi-supervised node classification.
Intuition is based on homophily: Similar nodes are typically close together or directly connected.

Relational Classification

Basic idea: Class probability 𝑌ᵥ of node 𝑣 is a weighted average of class probabilities of its neighbors.

Steps:

For labeled nodes 𝑣, initialize label 𝑌ᵥ with ground-truth label 𝑌ᵥ*
For unlabeled nodes, initialize 𝑌ᵥ = 0.5
Update all nodes in a random order until convergence or until maximum number of iterations is reached
Update for each node 𝑣 and label 𝑐 (e.g. 0 or 1)

If edges have strength/weight information, 𝐴ᵥ,ᵤ can be the edge weight between v and u
𝑃(𝑌ᵥ = 𝑐) is the probability of node v having label c

Challenges:

Convergence is not guaranteed
Model cannot use node feature information

Example:

Initialization:

All labeled nodes with their labels
All unlabeled nodes 0.5 (belonging to class 1 with probability 0.5)
Let 𝑃ᵧ₁ = 𝑃(𝑌₁ = 1)

Update for the 1st Iteration:

For node 3, 𝑁₃ = {1, 2, 4}

For node 4, 𝑁₄ = {1, 3, 5, 6}

For node 5, 𝑁₅ = {4, 6, 7, 8}

After Iteration 1 (a round of updates for all unlabeled nodes)

After Iteration 2

After Iteration 3

After Iteration 4

All scores stabilize after 4 iterations. We therefore predict:
- Nodes 4, 5, 8, 9 belong to class 1 (𝑃ᵧᵥ > 0.5)
- Nodes 3 belong to class 0 (𝑃ᵧᵥ < 0.5)

Iterative classification

Relational classifiers do not use node attributes. How can one leverage them? Main idea of iterative classification: Classify node v based on its attributes 𝒇ᵥ as well as labels 𝒛ᵥ of neighbor set 𝑵ᵥ.

Input: Graph
- fᵥ : feature vector for node v
- Some nodes 𝑣 are labeled with 𝑌ᵥ
Task: Predict label of unlabeled nodes
Approach: Train two classifiers:
𝜙%(𝑓ᵥ) = Predict node label based on node feature vector 𝑓ᵥ
𝜙’ (𝑓ᵥ, 𝑧ᵥ) = Predict label based on node feature vector 𝑓ᵥ and summary 𝑧ᵥ of labels of v’s neighbors.

Now the question comes about computing summary. How do we compute the summary 𝒛ᵥ of labels of v’s neighbors 𝑵ᵥ?

Idea: 𝒛ᵥ = vector
- Histogram of the number (or fraction) of each label in 𝑁ᵥ
- Most common label in 𝑁ᵥ
- Number of different labels in 𝑁ᵥ

Now, let’s discuss the overall process of training iterative classifiers.

Phase 1: Classify based on node attributes alone
- On a training set, train classifier (e.g., linear classifier, neural networks, …):
- 𝜙₁(𝑓ᵥ) to predict 𝑌ᵥ based on 𝑓ᵥ
- 𝜙₂(fᵥ, 𝑧ᵥ) to predict 𝑌ᵥ based on 𝑓ᵥ and summary 𝑧ᵥ of labels of v’s neighbors
Phase 2: Iterate till convergence
- On test set, set labels 𝑌ᵥ based on the classifier 𝜙₁, compute 𝑧ᵥ and predict the labels with 𝜙₂
- Repeat for each node 𝑣
- Update 𝑧ᵥ based on 𝑌ᵥ for all 𝑢 ∈ 𝑁ᵥ
- Update 𝑌ᵥ based on the new 𝑧ᵥ (𝜙₂)
- Iterate until class labels stabilize or max number of iterations is reached
- Note: Convergence is not guaranteed

Let’s understand the above process with an example. We will take example of web-page classification for this.

Input: Graph of web pages
Node: Web page
Edge: Hyper-link between web pages
Directed edge: a page points to another page
Node features: Webpage description
- For simplicity, we only consider 2 binary features
Task: Predict the topic of the webpage

Steps:

Baseline: train a classifier (e.g., linear classifier) to classify pages based on binary node attributes

Each node maintains vectors 𝒛𝒗 of neighborhood labels:
- 𝐼 = Incoming neighbor label information vector
- 𝑂 = Outgoing neighbor label information vector
I₀ = 1 if at least one of the incoming pages is labelled 0. Similar definitions for 𝐼₁, 𝑂₀, and 𝑂₁

On a different training set, train two classifiers:
- Node attribute vector only (green circles): 𝜙₁
- Node attribute and link vectors (red circles): 𝜙₂

On the test set:
- Use trained node feature vector classifier 𝜙₁ to set 𝑌ᵥ

Update 𝑧ᵥ for all nodes

Re-classify all nodes with 𝜙₂

Continue until convergence
- Update 𝑧ᵥ
- Update 𝑌ᵥ = 𝜙₂(𝑓ᵥ, 𝑧ᵥ)

Stop iteration
- After convergence or when maximum iterations are reached

And that’s how you perform iterative classification approach. Now let’s discuss the last approach known as Belief Propagation.

Belief Propagation

Belief Propagation is a dynamic programming approach to answering probability queries in a graph (e.g. probability of node v belonging to class 1). Iterative process in which neighbor nodes “talk” to each other, passing messages. When consensus is reached, calculate final belief.

Introduction:

Task: Count the number of nodes in a graph*
Condition: Each node can only interact (pass message) with its neighbors

Example: Path Graph

Algorithm:

Define an ordering of nodes (that results in a path)
Edge directions are according to order of nodes
- Edge direction defines the order of message passing
For node 𝑖 from 1 to 6
- Compute the message from node 𝑖 to 𝑖 + 1 (number of nodes counted so far)
- Pass the message from node 𝑖 to 𝑖 + 1

Condition: Each node can only interact (pass message) with its neighbors
Solution: Each node listens to the message from its neighbor, updates it, and passes it forward m: the message

Generalizing to a tree

We can perform message passing not only on a path graph, but also on a tree-structured graph.
Define order of message passing from leaves to root.

Update beliefs in tree structure

Question: What message will 𝑖 send to 𝑗?

It depends on what 𝑖 hears from its neighbors
Each neighbor passes a message to 𝑖 its beliefs of the state of 𝑖

Let’s first define some notations before going into the algorithm

Label-label potential matrix 𝝍 : Dependency between a node and its neighbor. 𝝍(𝑌ᵢ, 𝑌ⱼ) is proportional to the probability of a node 𝑗 being in class 𝑌ⱼ given that it has neighbor 𝑖 in class 𝑌ᵢ.
Prior belief 𝝓: 𝜙(𝑌ᵢ) is proportional to the probability of node 𝑖 being in class 𝑌ᵢ.
𝑚ᵢ →ⱼ(𝑌ⱼ) is 𝑖’s message / estimate of 𝑗 being in class 𝑌ⱼ.
ℒ is the set of all classes/labels

Steps:

Initialize all messages to 1
Repeat for each node:

After convergence:

𝑏ᵢ(𝑌ᵢ) = node 𝑖’s belief of being in class 𝑌ᵢ

Let’s understand this with an example. We will include a graph with cycles as well. This process of using Belief Propagation in cycled graph is also called as Loopy Belief Propagation.

Messages from different subgraphs are no longer independent!
But we can still run BP, but it will pass messages in loops.

Beliefs may not converge
- Message 𝑚ᵤ → ᵢ (𝑌ᵢ) is based on initial belief of 𝑖, not a separate evidence for 𝑖
- The initial belief of 𝑖 (which could be incorrect) is reinforced by the cycle i → 𝑗 → 𝑘 → 𝑢 → 𝑖
However, in practice, Loopy BP is still a good heuristic for complex graphs which contain many branches.

Challenges:

Messages loop around and around: 2, 4, 8, 16, 32, … More and more convinced that these variables are T!
BP incorrectly treats this message as separate evidence that the variable
is T (true).
Multiplies these two messages as if they were independent.
- But they don’t actually come from independent parts of the graph.
- One influenced the other (via a cycle).

Advantages:
- Easy to program & parallelize
- General: can apply to any graph model with any form of potentials
- Potential can be higher order: e.g. 𝝍(𝑌ᵢ, 𝑌ⱼ, 𝑌ₖ, 𝑌ᵥ … )
Issues:
- Convergence is not guaranteed (when to stop), especially if many closed loops
Potential functions (parameters) :
- Require training to estimate

That’s all for message passing algorithm. Next we will be discussing about graph neural networks in upcoming blog. Stay tuned for that….

Message Passing in Graphs

Message Passing

Collective Classification

Relational Classification

Iterative classification

Belief Propagation

Written by Dhaval Taunk