Generating Global Graph Structure Using Local 1D Scalar Embeddings

Ricky Fok
Ricky Fok
Jun 11, 2019 · 4 min read

Random walks have been used to capture topological features (i.e. patterns of edges) of graphs since DeepWalk. It is done to obtain the graph embeddings from the samples taken along the random walks. The embeddings are a set of multivariate function on each of the nodes, S = {f(1),…, f(i),…,f(N)}. The typical number of dimensions of an embedding is in the range of [100,1000].

We managed to theoretically derive a set of 1-dimensional node embeddings that reflects the topological features of graphs, as opposed to the O(100) dimensional embeddings learned by machine learning algorithms. Under mild assumptions, the set of such 1-dimensional embeddings exists and is unique for each graph.

Based on this theoretically derived embedding, we propose a method to generate smaller topological equivalent graphs for machine learning models to train from, rather than feeding subgraphs sampled from random walks on a much larger graph as usually done in the literature. Our generated graphs are topologically similar, but is not necessarily a subgraph in the larger graph provided by the data.

We compared our approach against a state-of-the-art graph generation algorithm, GraphGAN, and observed that our approach requires 3 to 4 times shorter training time, while significantly outperforming GraphGAN and other state-of-the-art methods in predictive accuracy. The results indicates that the theoretical model captures the graph topology better than GraphGAN.

Different graph topologies

Link Prediction

Because the generated node pairs in a batch does not contain topological information by themselves, the model must rely on the subgraph generation method to provide such information.

Experiments

To obtain our model, we simply replace all subgraph generation procedures in GraphGAN with our approach using the scalar embedding.

Datasets

A visualization of our private dataset.

To form the training and validation data set, we hide 10% of the edges in the graph as positive samples and generate unlinked pairs as negative samples for the validation set. The remaining 90% of the edges is used for training.

Results

Our method versus GraphGAN on the social network.

On the social network, our approach achieves the same validation accuracy as the GraphGAN model 3 to 4 times faster due to the fact that our subgraph generation method is fully vectorizable. We used the code provided by the authors for GraphGAN.

Below is a table recording the link prediction results with state-of-the-art algorithms on the astrophysics citation network. Not only our approach is much faster than GraphGAN, we outperform all state-of-the-art results shown by a significant margin.

Validation link prediction accuracies of state-of-the-art methods

Conclusion

Crater Labs

Learnings from AI/ML Moonshots