A Gentle Introduction To Graph Spectral Filtering

10 min readJun 7, 2022

--

An Illustrated Guide to Graph Neural Networks – DAIR.AI

This blog post is intended to give the readers a general idea about graph spectral filtering. I intend to make this topic more comprehensible for readers who are just getting into Graph Neural Networks(GNNs) by going through the requisite equations in an intuitive demeanour.

This blog post borrows heavily from the amazing works of Aleksa Gordić and Boris Knyazev which I recommend checking out to get a firmer grasp on graph spectral filtering and other graph machine learning related topics.

Finally, the book Deep Learning on Graphs By Yao Ma and Jiliang Tang has been very influential in the writing of this blog.

So without further ado let's get started.

GNNs have taken the world by storm. GNN systems are being deployed in every field ranging from particle physics/biology/chemistry, social media applications, recommender systems etc. GNNs were most notably used in DeepMinds groundbreaking AlphaFold 2 for the protein folding problem.

Graph neural networks can be viewed as a process of representation learning
on graphs. For node-focused tasks, GNNs target learning good features for
each node such that node-focused tasks can be facilitated. For graph-focused
tasks, they aim to learn representative features for the entire graph where
learning node features is typically an intermediate step. GNN architectures generally consist of two operations:-

Graph filtering operations:- Generally focussed on node-related tasks.
Graph pooling operations:- Generally focussed on graph-related tasks.

Fig 1.1: The image indicates the graph filtering operation where node representations are refined by a graph filter. A represents the adjacency matrix of the graph whilst F(if) and F(of) represent the input and output feature matrices of dimensions d(if) and d(of) respectively.

Fig 1.2: The image indicates the graph pooling operation where graph representations by pooling. A(ip) and A(op) represent the adjacency matrix of the input and output graphs with dimensions N(ip) and N(op) respectively whilst F(ip) and F(op) represent the input and output feature matrices of dimensions d(if) and d(of) respectively.

There are various perspectives to designing graph filters, which can be roughly split into two categories: (1) spatial-based graph filters and (2) spectral-based graph filters. Spatial based filters work mainly with the structure of graphs. Discussion about spatial based filters is out of scope for this blog post. To talk about spectral-based filters we will first require some knowledge about Graph Spectral Theory.

Graph Spectral Theory

Spectral graph theory studies the properties of a graph by analyzing the
eigenvalues and eigenvectors of its Laplacian matrix. Here I introduce an arbitrary sample graph G = {V, E} with A as its adjacency matrix and V, E representing vertices and edges of the graphs respectively. The sample graph is shown below. The given graph has 5 nodes with the undirected edges represented by the adjacency matrix as shown in figure 2.1.

Fig 2: An undirected sample graph with 5 nodes.

Fig 2.1: The 5X5 adjacency matrix for the sample graph.

Fig 2,2: The 5X5 Laplacian matrix for the sample graph.

The term spectral whilst sounding intricate generally implies breaking down signals into linear simple components. When we talk about the term “Spectral” in the signal/image processing domain we generally mean transforming the signal to the frequency domain by the means of a Fourier Transform where the signals can be represented as a combination of simple sine and cosine waves and hence can be filtered appropriately. However “Spectral” in the graph domain denotes the eigendecomposition of the graph Laplacian matrix into simpler orthonormal basis components. The Laplacian matrix of a graph plays a central role in graph spectral theory.

The Laplacian Matrix is generally represented as :

where D is a diagonal degree matrix D = diag(d(v₁),…… ,d(vₙ)) where d(vᵢ) describes describes the degree of node vᵢ where i = 1,2 … n for n nodes in the graph and A is the n X n sized graph adjacency matrix.

A normalised version of the Laplacian matrix is generally used in practice to avoid computations from blowing up after successive neural network layers. A normalised Laplacian matrix can be constructed by the following formula:-

The Laplacian matrix is symmetric as both A and D are symmetric matrices. The Laplacian matrix has several useful properties like the following:-

Let f denote a vector where its ith element f[i] is associated with node vᵢ. Multiplying L by f, we can get a new vector h as:-

Which results in

Similarly

Where N(vᵢ) indicates the set of neighbours of node vᵢ. Intuitively These equations can be interpreted as calculating the difference of values between adjacent nodes hence giving us the idea of how smooth a graph is. Lower Differences between the node values will result in a smoother graph whilst higher differences between node values will generally result in a coarser graph.

Graph Fourier Transform

In many real-world graphs, there are often features or attributes associated with the nodes. This kind of graph-structured data can be viewed as graph signals, which capture both the structure information (or connectivity between nodes) and data (or attributes at nodes). A graph signal consists of a graph G = {V, E} and a mapping function f defined on the node domain, which maps the nodes to real values. Mathematically, the mapping function can be represented as:-

where d is the dimension of the value (vector) associated with each node. Without loss of generality, I set d = 1 and denote the mapped values for all nodes as f with f[i] corresponding to node vᵢ for further sections.

The graph signal can be represented in two domains i.e the spatial domain and the spectral domain (or frequency domain). The spectral domain of graph signals is based on the graph Fourier transform. We can convert representations from the spatial domain to the spectral domain using a graph Fourier transform and vice versa by applying an inverse graph Fourier transform.

The graph Fourier transform for a graph signal f on graph G can be represented as

where uₗ is the lth eigenvector of the Laplacian matrix L of the graph. The
corresponding eigenvalue λₗ represents the frequency or the smoothness of the eigenvector uₗ. In matrix form graph Fourier transform can be denoted as:-

where the lth column of the matrix U is uₗ. The eigenvalue λₗ measures the smoothness of the corresponding eigenvector uₗ. More specifically, the eigenvectors associated with small eigenvalues differ slowly across the graph; i.e., the values of the eigenvector at connected nodes are similar. However, the eigenvectors corresponding to large eigenvalues may have very different values on two nodes even if they are connected. The eigenvectors uₗ represents the Fourier basis and the obtained coefficients by applying the transform give us information about how much each Fourier basis contributes to the input signal.

For our graph in Fig 2. let us consider an example feature vector f = [1,2,3,4,5] for nodes 1 to 5 respectively. The following graphs represent the results of the Fourier transform.

Fig 3.1: Plot of Eigenvalues/Frequency Vs The Fourier Basis Produced after the eigendecomposition of the Laplacian matrix for our sample graph presented in fig 2 with feature vector [1,2,3,4,5] with elements corresponding to each node value respectively.

Fig 3.2: Graph for the Fourier transform coefficients V/S Fourier Basis after applying the transform to the sample graph presented in Fig 2 with feature vector [1,2,3,4,5] having elements describing each node value respectively.

Spectral-Based Graph Filters

In this section, we finally discuss the spectral-based graph filtering methods. Analogous to its signal/image processing domain the basic idea of transforming the features to a different space i.e. (frequency domain), performing relevant filtering operations and transforming features back to the original domain will remain unchanged. The whole workflow of graph spectral filtering is denoted by the figure given below.

Let us walk through the given stages step by step

Firstly we decompose the original features by applying a Fourier transform we do this by utilising the equation

To modulate/filter the frequencies of the signal f, we filter the graph Fourier coefficients are as follows

where γ(λᵢ) is a parameterised filtering function which can be learned. The whole operation can be signified by the equation

where Λ is a diagonal matrix consisting of the frequencies (eigenvalues of the
Laplacian matrix).

After applying the required filtering operation to get desired features we can finally transform the new features back into our original domain by performing an inverse graph Fourier transform(IGFT).

where f’ represents the final filtered features. The entire filtering procedure can be modelled by applying the operator U . γ(λᵢ). U^T to the original graph features f.

The choice of the filter function γ is critical in determining the properties of the overall graph filter and the extracted features. Just like the classical neural networks, the graph filters can be learned in a data-driven way. Specifically, we can model γ with certain functions and then learn the parameters with supervision from data.

Some choices for the γ function are explored as follows:-

One filter we can consider is placing parameter elements θ on the diagonals of the matrix Λ instead of eigenvalues λ. which finally looks like the matrix below

However, there are drawbacks to this filter which are

The parameters to be learned scale proportionally to the number of nodes in the graph which becomes a problem in cases where we have a lot of nodes in our graph.
The matrix for the operator U . γ(λᵢ). U^T becomes a dense matrix which could then include all the nodes for the calculation of the ith element of the output vector f’.

Poly-Filter

To avoid the drawbacks mentioned above we may use a polynomial filter operator which we may denote as Poly-Filter. In matrix form, the filter can be written as

A nice property of Poly-Filters is that the number of parameters utilised is K+1 which is independent of the number of nodes.

Furthermore U . γ(λᵢ). U^T can be simplified to be a polynomial of the Laplacian matrix which would mean that no eigendecomposition would be required. This can be represented as

Here K determines the K-hop neighbourhood nodes which are to be included whilst determining the new feature vector. Poly-Filter is localized in the spatial domain because it only involves K-hop neighbourhoods when calculating the output signal value for a specific node. One major Drawback of Poly-Filters is the coefficients are dependent on each other, making them unstable under perturbation during the learning process. In other words, an update in one coefficient may lead to changes in other coefficients. To address the above-mentioned drawback we utilize the Cheby-Filter based on the Chebyshev polynomial.

Cheby-Filter

Chebyshev polynomials follow the recursive relation:-

with T₀(y) = 1 and T₁(y) = y respectively for y ∈ [-1,1]. Because the domain for the Chebyshev polynomials is [-1, 1], to approximate the filter with the Chebyshev polynomials, we rescale and shift the eigenvalues of the Laplacian matrix as follows:

The parameterized Chebyshev filter can be applied to the original graph signal as follows:

Chebyshev Filters can be approximated as a Graph Convolutional Network(GCN) filter. we can produce a GCN filter by setting K = 1 and approximating λ(max) as 2. Under this simplification, the Cheby-filter equation reduces to

where θ₀ = -θ₁ = θ. We can train these filters for further downstream tasks by optimising the parameter values so that they produce the best node representations for the downstream task.

Conclusion

In this blog post, we have discussed the several terminologies that help us understand a graph whilst also trying to build up a general understanding of the spectral domain relating to graphs. We go through the basics of graph spectral theory and grasp elementary graph filtering concepts. We must take note of the fact that we have been talking about graph features in a single dimension to help facilitate an easier understanding of the given topics. There exist similar methods for the multidimensional graph features which we encounter in many real-life cases.

I hope this was a helpful introduction to the arduous world of graph spectral filtering. If I have missed anything or gone wrong anywhere please do let me know. Thank you for reading 😊.