Graph embedding techniques

5 min readFeb 23, 2020

Image credits: https://www.yworks.com/pages/visualizing-graph-databases

Embedding is a well-known technique in machine learning consisting in representing complex objects like texts, images or graphs into a vector with a reduced number of features (~100) compared to the dimension of the dataset (several billions nodes in a graph for instance), while keeping the most important information about them.

It is quite easy to understand when thinking about words, when the word meaning has to be preserved by the encoding. It is often represented by this equation:

Queen = King — Man + Woman

which basically says that, in the embedding space, the representation of the word Queen must be equal to the vector representation of King minus Man plus the representation of Woman. It is also represented on the next schema.

Sutor, Peter & Aloimonos, Yiannis & Fermüller, Cornelia & Summers Stay, Douglas. (2019). Metaconcepts: Isolating Context in Word Embeddings. 544–549. 10.1109/MIPR.2019.00110.

Regarding graphs, embedding can be classified into several types depending on which elements are preserved and categories depending on the underlying algorithm.

Graph embedding classification

Within a graph, one may want to extract different kind of information. For instance;

Whole graph embedding: this can be used when studying several graphs, such as molecules in chemistry.
Node embedding: is actually the most famous of all graph embedding technique and is often called “graph embedding”.

Similarly to the example with words, node embedding must preserve the graph structure, meaning nodes close to each other in the graph must be close to each other in the embedding space. The following figure is an illustration of this concept with the Deep Walk algorithm run on the Zachary’s karate club graph.

Hamilton, William & Ying, Rex & Leskovec, Jure. (2017). Representation Learning on Graphs: Methods and Applications.

In the following, we will focus on node embedding only.

Transductive or inductive embedding

When talking about embedding techniques, it is important to be aware of another distinction between them.

Transductive embedding: a low-dimension vector representation is derived for each node in a graph, but it is not possible to find the vector representation for a new node. In data science terms, the algorithm can not make prediction based on unknown data.
Inductive embedding on the other hand is more consistent with the training/prediction model we are used to see in Machine Learning.

Even if the later sounds more interesting, there is many things to be learned from a transductive embedding, such as communities, and this post will only deal with them.

The very good paper Graph Embedding Techniques, Applications, and Performance: A Survey by Palash Goyal and Emilio Ferrara (2017) provides a very nice overview of the field and proposes another classification of static embedding algorithms. Here I will present two of them: the factorization methods and the graph traversal methods.

Matrix factorization

Locally Linear Embedding

A first and easy way to transform a graph to a vector space is by using adjacency matrix. For a graph of n nodes, this a n by n square matrix whose ij element Aij corresponds to the number of edges between node i and node j.

The adjacency matrix can still be huge and not fit into memory. One way to reduce its size is by using Locally Linear Embedding (LLE) which assumes that every node is a linear combination of its direct neighbors. It means the embedding for the ith node Ei can be expressed as in equation (1) below, where Ni stands for the set of neighbors of the node i.

In this notation, Ei is a vector with dimension d << n.

The embedding is then obtained by minimizing the equation (2), or the distance between the vector representing a given node in the embedded space, and the weighted sum of its neighbors embedding.

This technique tends to preserve the community structure of the graph.

Several other algorithms use the same idea, by changing the minimized function.

HOPE

For instance, HOPE minimizes the following objective function:

where Si,j denotes the similarity between nodes i and j. It can be computed using for instance Adamic/Adar similarity.

Graph traversal & Random Walk

Random walk techniques are closer to the one developed to find word embedding, such as word2vec (and the skip-gram model).

Image credit: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

This algorithm is well explanained in the tutorial the image comes from, or in this another medium story: A simple Word2vec tutorial.

DeepWalk

Similarly to the above image, DeepWalk creates “sentences” by performing random walks through the graph.

The following image show different possible walks from a simple graph.

Contrarily to the other techniques described earlier, Deep Walk is not only aware of the direct neighbors of a given node, but also the higher order structure of the graph (neighbors of neighbors…)

Going further

There are many more algorithms, both using matrix factorization (Laplacian Eigenmaps, Graph Factorization…) or random walks (node2vec…). On top of the kind of graph structure that they tend to preserve (locality, community…), the algorithms have different time complexity that should be taken into account for large graphs. For instance, LLE runs in O(Ed²) (E being the number of edges and d the dimension of the embedding space), while Deep Walk runs in O(Vd) (V being the number of nodes in the graph).

If you want to try the different algorithms describes here, you can use GEM, an open source python package developed by the authors of the paper Graph Embedding Techniques, Applications, and Performance: A Survey.

They propose an implementation of up to six different graph embedding techniques, based on networkx for graph representation and scikit-learn and keras for machine learning.

DeepWalk also has an implementation released by the authors of the initial paper themselves (Bryan Perozzi et al.) available here: https://github.com/phanein/deepwalk

It’s time to practice on your own graph!

References

The paper Graph Embedding Techniques, Applications, and Performance: A Survey by Palash Goyal and Emilio Ferrara (2017) on axiv: https://arxiv.org/abs/1705.02801
The GitHub repository for GEM, python implementation of the algorithms described above and more: https://github.com/palash1992/GEM
Very good Word2vec explanation at: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
DeepWalk paper: http://www.perozzi.net/publications/14_kdd_focused.pdf