DeepWalk: Online Learning of Social Representations

An in-depth explanation

Tornike Onoprishvili
6 min readAug 30, 2021

Networks are ubiquitous. Modeling important networks, such as the internet, human relationships and product similarity, is the primary reason behind the success of tech-giants like Facebook, Amazon and Google. Graph mining, or, structure mining, is a process of extracting useful information from unstructured data, and, in particular, networks. “DeepWalk’’ is a scalable graph mining algorithm for converting the unstructured network relationship information into a more efficient vector embedding. The vector space generated by DeepWalk is not only more storage-efficient, but also encodes the network node relationship information: Nodes that are closely connected with each other are mapped into vectors that are close to each other and vice versa. DeepWalk does this by generating truncated random walks from every node and then applying the Word2Vec algorithm (Mikolov, 2013) on random walks to create vector embeddings (Perozzi et al., 2014). The resulting continuous vector space is much more easily exploited by various statistical models than the sparse adjacency matrix. DeepWalk is a trivially parallelizable algorithm that can be used for efficient network analysis and community detection. Its code is publicly available on Github.

The Algorithm, Step By Step

Suppose we were to analyse this graph:

A typical “barbell” graph. Two tightly knit communities, joined by a single member

As we can clearly see, the graph consists of two fully connected regions, or communities:

  • First region nodes: 6,7,8,9 and 10.
  • Second region: 0,1,2,3 and 4.

The goal of DeepWalk is to translate every node into a vector so that vectors from nodes of one community are close together, while nodes from different communities are far from each other. In addition to this, the resulting vectors must have much lower dimensionality than the vectors from the adjacency matrix.

Step 1 — generating random walks

The DeepWalk begins by performing a random walk of a predefined size from every single node. Random walk is a simple graph traversal algorithm which starts at a given node and then, with equal probability, randomly moves on to the next neighbouring node, repeating this process indefinitely. In the case of DeepWalk, though, the maximum length of the walk is a tunable hyperparameter.

Here is the result of performing random walks of length 20 from every single node once:

In this matrix, there is one row per every node. Each row represents the random walk from one node and so every walk is of length 20. For example, the random walk from node 2 is represented like this:

The first step is, obviously, the starting node itself — 2. Next randomly selected node is 0, next again 2 and so on. In practice, due to the stochastic nature of this process, one may find that generating multiple random walks from each node yields better results. Therefore, the number of random walks per node is another tweakable hyperparameter. More random walks per node might generate better results, at the cost of increasing computational burden.

Step 2 — apply Word2Vec

Word2Vec is an algorithm used to create efficient word embeddings for each word in a text corpus (Mikolov, 2013). The fundamental assumption of the Word2Vec is that words with similar meaning tend to occur in similar contexts. As an example, suppose we have a simple text corpus like this:

  1. I like to eat hamburgers.
  2. Hamburgers are delicious.
  3. I like eating fries.
  4. Fries and hamburgers are tasty.
  5. Fries are delicious.

In this corpus, words “Fries” and “Hamburgers” share most of their surrounding words, i.e. their context. Word2Vec is designed to exploit this similarity of context and would generate a vector embedding where “Fries” and “Hamburgers” would map to vectors that are very close to each other. The Word2Vec model has an important hyperparameter, the window length, which sets the maximum allowable “context radius” around the given word. So, for example if window length was 1, only the immediate neighbours of “Hamburgers” would be considered.

In the case of DeepWalk, however, we do not have a text corpus, but rather a corpus generated by truncated random walks. Surprisingly, Word2Vec also performs very well on random walks! Using Word2Vec, nodes that share most of their surrounding nodes are assigned similar vector embeddings. This way, Word2Vec creates vector embeddings that group adjacent nodes together.

Using Gensim implementation of Word2Vec algorithm, we can transform the input random walks into a following vector space with 2 dimensions for visualisation:

Here one can clearly notice the two fully connected groups in the graph. The code to reproduce these plots is given as a kaggle notebook.

Advantages and limitations of DeepWalk

One of the biggest advantages of DeepWalk is its easily parallelizable nature. As we have seen previously, the random walks on every node do not depend on each other at all. They can be massively parallelized on a big server cluster in order to analyze huge networks, like Facebook friendship networks or Amazon product relationship graphs.

DeepWalk is extremely simple to understand and implement. If Word2Vec implementation is available, DeepWalk almost feels like a natural extension of it. As shown in the notebook it can be implemented with only a few lines of python code.

However, this simplicity is also limiting DeepWalk’s ability to global and local structures in the network. For example, the Facebook network can be analyzed in order to answer the following two questions:

  1. How densely are two different cities connected to each other?
  2. How densely are two different countries connected to each other?

While DeepWalk can be applied to vectorize relationships between individual users, it has no way of emphasizing the local versus global structure of the network. Even the window size parameter from Word2Vec can not handle this problem, as cities and countries have vastly different numbers of network nodes (e.g. Vatican and Beijing or Luxembourg and the United States).

Improvement Ideas and Conclusion

The principal disadvantage of DeepWalk, the inability to emphasize local and global structure of the network, can be addressed by using ideas from a more sophisticated graph embedding algorithm called Node2Vec (Grover & Leskovec, 2016). The Node2Vec algorithm is almost completely the same as the DeepWalk algorithm, but with one crucial difference: Node2Vec defines two important hyperparameters, called P and Q. P and Q control the way the random walk chooses the next node to explore. In DeepWalk, a random walk selects any node — even the one it just visited — with the exact same probability. In Node2Vec however, re-visiting parent nodes has probability of P^-1 while visiting unexplored nodes has probability of Q^-1.

P and Q control the probability of exploring new nodes vs. returning to the origin

Therefore, P and Q can be used in order to influence how the algorithm processes the network: If one desires to emphasize the local structure, Q can be increased and vice versa.

Conclusion

In conclusion, DeepWalk is a scalable way of embedding the unstructured network connection data into something more tangible for statistical learning methods. It can be used to analyze networks at scale and gain useful insights and visualisations. With a little improvement it can also be used to analyze global and local network structures.

--

--

Tornike Onoprishvili

I'm a data scientist with a strong academic and mathematical background. I specialize in tensorflow.