Spectral-Biased Random Walks on Graphs

Charu Sharma
The Startup
Published in
8 min readJul 18, 2020

Overview

Graph embedding methods have gained prominence in a wide variety of tasks including pattern recognition, low-dimensional embedding, clustering, anomaly detection, node classification, and link prediction, etc. Graphs can be mapped to low dimensional space which encodes graph information and creates node representation. From an input graph G=(V,E), embedding a graph is to map node feature matrix to d-dimensional continuous vector space. The input in continuous domains are required for machine learning and deep learning methods. In machine learning, the task of producing graph embeddings entails capturing local and global graph statistics and encoding them as vectors that best preserve these statistics in a computationally efficient manner.

We present our work from our IJCNN 2020 paper titled “Learning Representations using Spectral-Biased Random Walks on Graphs” focusing on solving Link Prediction problem. Given a graph, predict if two nodes are likely to have a link between them. Link prediction has several applications in social networks, biological networks, knowledge graphs, etc.

Image source: Google

Challenges

There are quite a few heuristic methods which perform well but are based on strong assumptions for the likelihood of links, which can be beneficial in the case of social networks, but does not generalize well to arbitrary networks. Some of the latest GNN-based models outperform heuristic based methods. But, they require either (a) enclosing subgraph (i.e., the size of the neighborhood) around each edge, which is quite prohibitive and results in being able to only process small and sparse graphs or (b) random walks. Random walks appeal that they concisely capture the underlying graph structure surrounding a vertex. But, they cannot guarantee grouping of vertices with similar k-hop neighborhoods. Thus, vertices with structurally similar neighborhoods can easily be omitted. In our work, we incorporate such a grouping of structurally similar nodes directly into our random walks.

Our Method

Spectra of Vertex Neighborhoods

Given a vertex v and a fixed integer k > 0, the graph neighborhood G(v,k) of v is the subgraph induced by the k closest vertices (i.e., in terms of shortest paths on G) that are reachable from v.

Now, the graph neighborhood G(v,k) of a vertex v is represented in matrix form as a normalized Laplacian matrix L^(v) = l_ij, where i,j = 1 to k in dimensional space. Given L^(v), its sequence of k real eigenvalues ( λ_1 ( L^(v) )≥ … ≥ λ_k ( L^(v) ) ) is known as the spectrum of the neighborhood L^(v) and is denoted by σ(L^(v)). We also know that all the eigenvalues in σ(L^(v)) lie in an interval Ω ⊂ R.

Let μ denote the probability measure on Ω that is associated to the spectrum σ(L^(v)) and is defined as the Dirac mass concentrated on each eigenvalue in the spectrum. Furthermore, let P(Ω) denote the set of probability measures on Ω. We now define the p-th Wasserstein distance between measures, which will be used later to define our distance between node neighborhoods.

We now define the spectral distance between two vertices u and v, along with their respective neighborhoods L^(u) and L^(v), as

Spectral-Biased Random Walk on Vertex Neighborhoods

We introduce a bias based on the spectral distance between vertices (as shown in the above Equation) in our random walks. When moving from a vertex v to an adjacent vertex v’ in the 1-hop neighborhood N(v) of vertex v, vertices in N(v) which are most structurally similar to v are favored. The most structurally similar vertex to v is given by

Then, our spectral-biased walk is a random walk where each of its step is preceded by the following decision. Starting at vertex i, the walk transitions with probability 1-ε to an adjacent vertex j in N(v) uniformly at random, and with probability ε, the walk transitions to the next vertex with probability w_ij given in the bias matrix. Here is our Algorithm to decide the next vertex to move to depending on the transition matrix passed to it as input.

Thus, our new spectral-biased transition matrix can be written more succinctly as

where P is the original transition matrix for the simple random walk and W contains the biased transition probabilities we introduce to move towards a structurally similar vertex.

Example

Given a graph with initial vertex v_3, Our spectral-biased random walk transitions with probability 1-ε to an adjacent vertex uniformly at random from transition matrix P, and with probability ε, the walk transitions to the next vertex with probability w_ij from the bias matrix W. We set the top-k structurally similar nodes in the neighborhood to 2 in this example.

Packing Density of Spectrally Similar Nodes

We studied the packing density of spectrally similar nodes in fixed-length walks generated by both the spectral-bias and simple random walk methods. The below figures clearly show that our spectral-biased walk packs a higher number of spectrally similar nodes.

Percentage of spectrally-similar nodes packed in walks of varying length for Random Walk (RW) versus. Spectral-biased Walk (SW).

Given a start vertex v, a user-defined fixed constant c, and its surrounding Wasserstein ball B_w(v;c), we also found that our spectral-bias walk covers all spectrally similar vertices in the ball with much shorter walks than simple random walks as shown in the following Figure.

Walk lengths to cover entire ball of vertices on Celegans, USAir, Infect-hyper and PPI for our spectral biased walk (SW) and simple random walk (RW).

Our Neural Language Model with Wasserstein Regularization

Our approach of learning node embeddings is to use a shallow neural network. This network takes spectral-biased walks as input and predicts either the node labels for node classification or the likelihood of an edge / link between a pair of nodes for the link prediction task.

We consider a vertex as a word, a walk as a paragraph / sentence, and the entire graph as a document. Two walks are said to co-occur when they originate from the same node. Originating from each node v ∈ V, we generate K co-occurring spectral-biased random walks W^(v) = ( W_1^(v), …, W_K^(v) ), each of fixed length T. A family of all W^(v) for all v ∈ V is analogous to a collection of paragraphs in a document.

In our framework, each vertex v is mapped to a unique word vector w, represented as a column in a matrix W. Similarly, each biased walk w is mapped to a unique paragraph vector p stored as a column in a matrix P.
Given a spectral-biased walk as a sequence of words w_1, w_2, …, w_T, our objective is to minimize the following cross-entropy loss

the probability is typically given by the softmax function

Each y is the unnormalized log probability for w_i, given as

where U,b are softmax parameters, and h is constructed from W and P. A paragraph vector can be imagined as a word vector that is cognizant of the context information encoded in its surrounding paragraph, while a normal word vector averages this information across all paragraphs in the document. For each node v ∈ V, we apply 1d-convolution to all the paragraphs / walks in W^(v), to get a unique vector x_v.

We construct the following loss function with a Wasserstein regularization term to learn node embeddings

Here, x_v is the node embedding learned from the paragraph vector model and y_v is the 1d-convolution of node v’s 1-hop neighbor embeddings. L_class represents a task-dependent classifier loss which is set to mean-square error (MSE) for link prediction and cross-entropy loss for node classification. We convert the node embedding x_v and its combined 1-hop neighborhood embedding y_v into probability distributions via the softmax function, denoted by σ^(s) in the above Equation.

Our regularization term is the 2-Wasserstein distance between the two probability distributions, where γ is the regularization parameter. This regularizer penalizes neighboring nodes whose neighborhoods do not bear structural similarity with the neighborhood of the node in question. Finally, the overall loss L_ov is minimized across all nodes in G to arrive at final node embeddings.

Training

The training setup of our method is as follows. First, we construct a 2-hop neighborhood around each node for spectra computation. Probability p is set to 0.6, walk length W=100 with 50 walks per node in step one of the spectral-biased walk generation. Second, the context window size C=10 and
regularization term γ ranges from 1e-6 to 1e-8 for all the link prediction results. The model for link prediction task to compute final AUC is trained for 100 to 200 epochs depending on the dataset. The dimension of node embeddings is set to 128 for all the cases and a model is learned with a single-layer neural network as a classifier.

Reproducing the results

The code for our paper with the datasets to reproduce the results is provided in this github repository. Following are the steps to reproduce the results:

  1. Clone the repository and set up the environment mentioned in our github repository.
git clone https://github.com/charusharma1991/LinkPred.git

2. The directory End2End contains the end to end pipeline to run the code. Run the following command to reproduce the results on Celegans dataset.

bash end2end.sh

The bash file currently has code for Celegans dataset, provided as an example. You can change the dataset name and corresponding hyperparameters from the table given in the paper to reproduce the results. This directory has the complete pipeline, from generating the spectral walks, running the par2vec model to learn primary node embeddings and finally running the link prediction model.

3. The src directory contains the code with precomputed spectral-biased random walk and paragraph vector model output. Run the following command to run the link prediction part of the entire pipleline.

python LP.py

The LP.py is the python script which will run the link prediction model on all datasets and store the results in a text file inside the directory result_logs/LP.

Citation

Please cite the paper if you use this code.

@article{sharma2020learning,
title={Learning Representations using Spectral-Biased Random Walks on Graphs},
author={Sharma, Charu and Chauhan, Jatin and Kaul, Manohar},
journal={arXiv preprint arXiv:2005.09752},
year={2020}
}

Thanks for reading this article. For any query or suggestions, please write to charusharma1991@gmail.com.

--

--

Charu Sharma
The Startup

I’m a 4th year Ph.D student at IIT Hyderabad. My interests include Machine Learning, Topological Data analysis, Deep Learning.