Message Passing in Graphs

Dhaval Taunk
Analytics Vidhya
Published in
11 min readJul 4, 2024

In this blog post, I will discuss the message passing algorithm, which serves as the backbone for graph neural network algorithms. This algorithm facilitates information processing through the nodes of the graph, enabling the network to learn various graph attributes. Consequently, we can perform tasks such as node classification and link prediction. Let’s dive in!

Message Passing

Let’s explore message passing through the lens of a node classification task. So, what exactly is node classification?

Node Classification: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Example: In a network where some nodes are identified as fraudsters and others as fully trusted, the challenge is to identify additional fraudsters and trustworthy nodes based on their interactions and behaviors within the network?

Node classification typically falls under the category of semi-supervised learning. It leverages both labeled and unlabeled nodes within a graph to predict the labels of unlabeled nodes.

First let’s discuss the intuition behind message passing framework.

Message passing in graphs entails nodes exchanging information (messages) with their neighbors iteratively. This process updates each node’s state based on aggregated information, enabling nodes to integrate knowledge from their local neighborhoods. This correlation in graph networks supports tasks such as node classification or link prediction.

Simply put, nodes that are similar are connected in some manner. Node classification is addressed using a technique called collective classification, where labels are assigned to all nodes in a network simultaneously.

Now let’s discuss different techniques of message passing algorithms.

We will be discussing majorly 3 different message passing techniques.

  1. Relational classification
  2. Iterative classification
  3. Belief propagation

We will be discussing them in detail one by one. But before that, let’s discuss some important things that we will be using throughout this blog.

  • Individual behaviors are correlated in the network
  • Correlation: nearby nodes have the same color (belonging to the same class)
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

There are mainly two types of dependencies that lead to correlation:

  1. Homophily
  2. Influence

Let’s discuss both of them.

  1. HomoPhily: Homophily refers to the tendency of individuals to associate and bond with others who are similar to themselves in characteristics such as beliefs, interests, or demographics.
    - Example 1: Researchers who focus on the same research area are more likely to establish a connection (meeting at conferences, interacting in academic talks, etc.)
    - Example 2: Online social network
    — Nodes = people
    — Edges = friendship
    — Node color = interests (sports, arts, etc.)
People with the same interest are more closely connected due to homophily — Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

2. Influence: Social connections can influence the individual characteristics of a person.
- Example: I recommend my musical preferences to my friends, until one of them grows to like my same favorite genres!

How do we leverage this correlation observed in networks to help predict node labels?

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Intuition: Similar nodes are typically close together or directly connected in the network

Approach:
Guilt-by-association: If I am connected to a node with label 𝑋, then I am likely to have label 𝑋 as well.

Example:
Malicious/benign web page: Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines.

Classification label of a node 𝑣 in network may depend on:

  • Features of 𝑣
  • Labels of the nodes in 𝑣’s neighborhood
  • Features of the nodes in 𝑣’s neighborhood

So, how to find label? Let’s understand the task first.

Given:
— Graph
— Few labeled nodes

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Find: class (red/green) of remaining nodes

Main assumption: There is homophily in the network

Let’s try to solve it using semi-supervised learning approach.

Example task from above graph:

  1. Let 𝑨 be a 𝑛×𝑛 adjacency matrix over 𝑛 nodes
  2. Let Y = {0, 1}ⁿ be a vector of labels:
    - Y = 1 belongs to Class 1
    - Y = 0 belongs to Class 0
    - There are unlabeled node needs to be classified
  3. Goal: Predict which unlabeled nodes are likely Class 1, and which are likely Class 0

As mentioned earlier, this problem is solved by collective classification. Let’s learn how to perform collective classification.

Collective Classification

  1. Intuition: Simultaneous classification of interlinked nodes using correlations.
  2. Probabilistic framework
  3. Markov Assumption: The label 𝑌 of one node v depends on the labels of its neighbors 𝑁

P(Yᵥ) = P(Yᵥ | Nᵥ)

Collective classification involves 3 steps:

  1. Local Classifier: Used for initial label assignment
    - Predicts label based on node attributes/features
    - Standard classification task
    - Does not use network information
  2. Relational Classifier: Capture correlations
    - Learns a classifier to label one node based on the labels and/or attributes of its neighbors
    - This is where network information is used
  3. Collective Inference: Propagate the correlation
    - Apply relational classifier to each node iteratively
    - Iterate until the inconsistency between neighboring labels is minimized
    - Network structure affects the final prediction

Till now I have covered all the basic concepts required to understand the message passing algorithm. Before going back to 3 message passing algorithms I mentioned above, let’s first define a problem statement that we will take as example to understand message passing.

Problem Statement

  1. How to predict the labels 𝑌 for the unlabeled nodes v(in grey color)?
  2. Each node v has a feature vector 𝑓
  3. Labels for some nodes are given (1 for green, 0 for red)
  4. Task: Find 𝑃(𝑌) given all features and the network
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  1. We focus on semi-supervised node classification.
  2. Intuition is based on homophily: Similar nodes are typically close together or directly connected.

Relational Classification

Basic idea: Class probability 𝑌 of node 𝑣 is a weighted average of class probabilities of its neighbors.

Steps:

  • For labeled nodes 𝑣, initialize label 𝑌 with ground-truth label 𝑌ᵥ*
  • For unlabeled nodes, initialize 𝑌 = 0.5
  • Update all nodes in a random order until convergence or until maximum number of iterations is reached
  • Update for each node 𝑣 and label 𝑐 (e.g. 0 or 1)
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • If edges have strength/weight information, 𝐴ᵥ,ᵤ can be the edge weight between v and u
  • 𝑃(𝑌 = 𝑐) is the probability of node v having label c

Challenges:

  • Convergence is not guaranteed
  • Model cannot use node feature information

Example:

Initialization:

  • All labeled nodes with their labels
  • All unlabeled nodes 0.5 (belonging to class 1 with probability 0.5)
  • Let 𝑃ᵧ₁ = 𝑃(𝑌₁ = 1)
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Update for the 1st Iteration:

  • For node 3, 𝑁₃ = {1, 2, 4}
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • For node 4, 𝑁₄ = {1, 3, 5, 6}
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • For node 5, 𝑁₅ = {4, 6, 7, 8}
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • After Iteration 1 (a round of updates for all unlabeled nodes)
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • After Iteration 2
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • After Iteration 3
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • After Iteration 4
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • All scores stabilize after 4 iterations. We therefore predict:
    - Nodes 4, 5, 8, 9 belong to class 1 (𝑃ᵧᵥ > 0.5)
    - Nodes 3 belong to class 0 (𝑃ᵧᵥ < 0.5)
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Iterative classification

Relational classifiers do not use node attributes. How can one leverage them? Main idea of iterative classification: Classify node v based on its attributes 𝒇 as well as labels 𝒛 of neighbor set 𝑵ᵥ.

  • Input: Graph
    - f : feature vector for node v
    - Some nodes 𝑣 are labeled with 𝑌
  • Task: Predict label of unlabeled nodes
  • Approach: Train two classifiers:
  • 𝜙%(𝑓) = Predict node label based on node feature vector 𝑓
  • 𝜙’ (𝑓, 𝑧ᵥ) = Predict label based on node feature vector 𝑓 and summary 𝑧 of labels of v’s neighbors.

Now the question comes about computing summary. How do we compute the summary 𝒛 of labels of v’s neighbors 𝑵?

  • Idea: 𝒛 = vector
    -
    Histogram of the number (or fraction) of each label in 𝑁
    - Most common label in 𝑁
    - Number of different labels in 𝑁
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Now, let’s discuss the overall process of training iterative classifiers.

  1. Phase 1: Classify based on node attributes alone
    - On a training set, train classifier (e.g., linear classifier, neural networks, …):
    - 𝜙₁(𝑓) to predict 𝑌 based on 𝑓
    - 𝜙₂(f, 𝑧ᵥ) to predict 𝑌 based on 𝑓 and summary 𝑧 of labels of v’s neighbors
  2. Phase 2: Iterate till convergence
    - On test set, set labels 𝑌 based on the classifier 𝜙₁, compute 𝑧 and predict the labels with 𝜙₂
    - Repeat for each node 𝑣
    - Update 𝑧 based on 𝑌 for all 𝑢 ∈ 𝑁
    - Update 𝑌 based on the new 𝑧 (𝜙₂)
    - Iterate until class labels stabilize or max number of iterations is reached
    - Note: Convergence is not guaranteed

Let’s understand the above process with an example. We will take example of web-page classification for this.

  • Input: Graph of web pages
  • Node: Web page
  • Edge: Hyper-link between web pages
  • Directed edge: a page points to another page
  • Node features: Webpage description
    - For simplicity, we only consider 2 binary features
  • Task: Predict the topic of the webpage

Steps:

  • Baseline: train a classifier (e.g., linear classifier) to classify pages based on binary node attributes
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Each node maintains vectors 𝒛𝒗 of neighborhood labels:
    - 𝐼 = Incoming neighbor label information vector
    - 𝑂 = Outgoing neighbor label information vector
  • I₀ = 1 if at least one of the incoming pages is labelled 0. Similar definitions for 𝐼₁, 𝑂₀, and 𝑂₁
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • On a different training set, train two classifiers:
    - Node attribute vector only (green circles): 𝜙
    - Node attribute and link vectors (red circles): 𝜙₂
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • On the test set:
    - Use trained node feature vector classifier 𝜙to set 𝑌
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Update 𝑧 for all nodes
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Re-classify all nodes with 𝜙₂
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Continue until convergence
    - Update 𝑧
    - Update 𝑌= 𝜙₂(𝑓, 𝑧)
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Stop iteration
    - After convergence or when maximum iterations are reached
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

And that’s how you perform iterative classification approach. Now let’s discuss the last approach known as Belief Propagation.

Belief Propagation

Belief Propagation is a dynamic programming approach to answering probability queries in a graph (e.g. probability of node v belonging to class 1). Iterative process in which neighbor nodes “talk” to each other, passing messages. When consensus is reached, calculate final belief.

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Introduction:

  • Task: Count the number of nodes in a graph*
  • Condition: Each node can only interact (pass message) with its neighbors

Example: Path Graph

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Algorithm:

  • Define an ordering of nodes (that results in a path)
  • Edge directions are according to order of nodes
    - Edge direction defines the order of message passing
  • For node 𝑖 from 1 to 6
    - Compute the message from node 𝑖 to 𝑖 + 1 (number of nodes counted so far)
    - Pass the message from node 𝑖 to 𝑖 + 1
  • Condition: Each node can only interact (pass message) with its neighbors
  • Solution: Each node listens to the message from its neighbor, updates it, and passes it forward m: the message
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Generalizing to a tree

  • We can perform message passing not only on a path graph, but also on a tree-structured graph.
  • Define order of message passing from leaves to root.
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Update beliefs in tree structure
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Question: What message will 𝑖 send to 𝑗?

  • It depends on what 𝑖 hears from its neighbors
  • Each neighbor passes a message to 𝑖 its beliefs of the state of 𝑖
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Let’s first define some notations before going into the algorithm

  • Label-label potential matrix 𝝍 : Dependency between a node and its neighbor. 𝝍(𝑌ᵢ, 𝑌ⱼ) is proportional to the probability of a node 𝑗 being in class 𝑌ⱼ given that it has neighbor 𝑖 in class 𝑌ᵢ.
  • Prior belief 𝝓: 𝜙(𝑌ᵢ) is proportional to the probability of node 𝑖 being in class 𝑌ᵢ.
  • 𝑚ᵢ →ⱼ(𝑌ⱼ) is 𝑖’s message / estimate of 𝑗 being in class 𝑌ⱼ.
  • ℒ is the set of all classes/labels

Steps:

  • Initialize all messages to 1
  • Repeat for each node:
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

After convergence:

  • 𝑏ᵢ(𝑌ᵢ) = node 𝑖’s belief of being in class 𝑌ᵢ
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Let’s understand this with an example. We will include a graph with cycles as well. This process of using Belief Propagation in cycled graph is also called as Loopy Belief Propagation.

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu

Messages from different subgraphs are no longer independent!

But we can still run BP, but it will pass messages in loops.

Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Beliefs may not converge
    - Message 𝑚ᵤ → ᵢ (𝑌ᵢ) is based on initial belief of 𝑖, not a separate evidence for 𝑖
    - The initial belief of 𝑖 (which could be incorrect) is reinforced by the cycle i → 𝑗 → 𝑘 → 𝑢 → 𝑖
  • However, in practice, Loopy BP is still a good heuristic for complex graphs which contain many branches.

Challenges:

  • Messages loop around and around: 2, 4, 8, 16, 32, … More and more convinced that these variables are T!
  • BP incorrectly treats this message as separate evidence that the variable
    is T (true).
  • Multiplies these two messages as if they were independent.
    - But they don’t actually come from independent parts of the graph.
    - One influenced the other (via a cycle).
Stanford CS224W: Machine Learning for Graphs https://cs224w.stanford.edu
  • Advantages:
    -
    Easy to program & parallelize
    - General: can apply to any graph model with any form of potentials
    - Potential can be higher order: e.g. 𝝍(𝑌ᵢ, 𝑌ⱼ, 𝑌ₖ, 𝑌ᵥ … )
  • Issues:
    - Convergence is not guaranteed (when to stop), especially if many closed loops
  • Potential functions (parameters) :
    - Require training to estimate

That’s all for message passing algorithm. Next we will be discussing about graph neural networks in upcoming blog. Stay tuned for that….

--

--

Dhaval Taunk
Analytics Vidhya

MS by Research @IIITH, Ex Data Scientist @ Yes Bank | Former Intern @ Haptik, IIT Guwahati | Machine Learning | Deep Learning | NLP