Synopsis of GraphGAN: Graph Representation Learning with Generative Adversarial Nets
GraphGANs are constituted of a Generator that is learning the underlying connectivity distribution between vertices as P(V|v_c) and a Discriminator that is learning to predict the probability of edge existence between a pair of vertices as P(edge|(v_i,v_j)). GraphGANs exploit Graph Softmax that satisfies desired properties such as normalization, graph structure awareness, and computational efficiency. GraphGANs borrow the notion of Value Function and Policy Gradients from Reinforcement Learning. GraphGANs can be applied to a variety of applications including link prediction, node classification, social network analysis, knowledge graph representation, text embedding, and recommendations.
The Generator is designed as a Softmax Activation of the Inner Product of each vertex and a vertex c. G(V|v_c) = Softmax(h_v DOT h_v_c) where h denotes the Latent Representation of the vertices learned in the bottleneck layers of G.
The classic Softmax tends to ignore rich structural information encoded in the Latent Representation that can be used to model the Proximity of Vertices. Both of Hierarchical Softmax and Negative Sampling proposed for Word Embeddings by Mikolov et al. are favored in terms of Computational Efficiency, but still are not very well suited for Graph Representation Learning.
The authors of GraphGANs proposed a novel Graph Softmax that is normalized, efficient, and graph-structure-aware. Graph Softmax is normalized similar to classic Softmax by dividing over the sum of exponents of inner products of vertices. That is the output is always a probability distribution over neighboring vertices summing to one. The normalized assumption can be proven by induction on sub-trees of the BFS-tree. It is efficient since it makes use of only the parent and the children of a given vertex compared to performing such calculation for all the vertices of the graph. The parent and the children of a vertex are found using Breadth-First Search. The total number of involved vertices is O(d log V) where d is average degree of vertices and V is the number of vertices in graph G. Hence, log V is the depth of the BFS-tree.
The Generator performs a Random Walk over the BFS-tree in order to calculate the conditional probabilities P(V|v_c).
The Discriminator is designed as a Sigmoid Activation of the Inner Product of both v_i and v_j. D(v_i, v_j) = Sigmoid(h_v_i DOT h_v_j) where h denotes the Latent Representation of the vertices learned in the bottleneck layers of D.
Since GraphGANs are operating on Discrete Vertices, in contrast to continuous pixel values, Policy Gradient from Reinforcement Learning is adapted to calculate the gradient of Value Function V(G, D) with respect to Generator’s Parameters Theta. That is encouraging the Generator to move away from finding neighbor vertices that are likely to be classified by the Discriminator as fake.
- GraphGAN: Graph Representation Learning with Generative Adversarial Nets, Hongwei Wang et al., 2017