Graph Layouts with Neo4j Graph Algorithms and Machine Learning

İrfan Nuri Karaca
Neo4j Developer Blog
5 min readJan 11, 2019

A couple of months ago we were having a chat with my colleague Mark Needham about their amazing work in developing Graph Algorithms in Neo4j when we somehow ended up in discussing use of them in graph layouts which has been a consistent matter of discussion in graph visualisation. We sat down together to see what we can make out of them and managed to generate a useful layout for communities within a dataset.

Weeks later I had time to wrap up our work and present what we have done here. In this post we will see how graph algorithms and machine learning practices can help data visualisation, specifically layout of connected data.

In 3 simple steps we will;

  1. Detect the communities within a dataset
  2. Calculate a layout that helps identification of clusters at once
  3. Visualise communities with d3.js

First some motivation

Force directed layout algorithms are known to be the most useful ones for general purpose visualisation of a network of data; aka graphs. Briefly these algorithms consider the graph as a physical system where the energy in the system is calculated using repulsive forces between nodes and attractive forces of relationships. The goal is to minimise the energy by simulating physic in the system which is expected to result in a better positioning of nodes.

But for more specific cases, different approaches can facilitate user to interpret the nature of data under examination. The specific case we will focus on today is the detection and visualisation of communities within a graph.

To give a better understanding, below is a Force directed layout of communities (people acted in same movies) in Movies dataset.

Force directed layout of communities in Movies dataset (Neo4j Browser)

As you can see one can not easily spot what are the communities and their members. We need a different approach to calculate a better layout to serve our purpose. In the following steps, we will use graph algorithms and ML practices to produce a better visualisation of these communities.

Below is the process flow we are going to follow.

Community Detection

First of all lets get some data and find communities within it. We are going to use the most popular graph database, Neo4j to store the graph data and detect communities (You can make a quick start to use Neo4j by downloading the Neo4j Desktop). Our sample dataset is well known movies database that includes some popular movies and actors. For the ones who are not familiar, you can create this data by typing :play movies to the cypher / command bar in Neo4j browser located at the top and following the guideline that will be presented. The objective is to detect the communities of actors considering the movies they co-acted in.

We do not have to implement a community detection algorithm, since our brilliant colleagues in Neo4j labs have developed some of them as a plugin to the Neo4j in the Graph Algorithms library. In order to be able to use them you will need to install APOC and Graph Algorithms plugins in Neo4j Desktop. If you haven’t installed plugins before, our colleague Jennifer Reif has an excellent post explaining how to do this.

We will use the Louvain algorithm to detect and store the community of each person in the database. This is as easy as running the cypher below in the Neo4j browser.

CALL algo.louvain(
“MATCH (a:Person) where (a)-[:ACTED_IN]->() RETURN id(a) as id”,
“MATCH (a)-[:ACTED_IN]->()<-[:ACTED_IN]-(b) return id(a) as source, id(b) AS target, count(*) as weight”, { graph: “cypher”, iterations: 1, includeIntermediateCommunities: true})

This will store community value of each node in the node itself as a property. To check if it worked properly, we can look at community values of some people in the same movie, for instance the great movie Unforgiven;

MATCH (p:Person) WHERE (p)-[:ACTED_IN]->(:Movie {title: “Unforgiven”}) return p.name, p.community

To double check lets see another movie, The Devil’s Advocate this time;

MATCH (p:Person) WHERE (p)-[:ACTED_IN]->(:Movie {title: “The Devil's Advocate”}) return p.name, p.community

As you can see Keanu Reeves did not end up in the same community with two others since he is more strongly connected to the people of The Matrix trilogy.

Layout Calculation

I believe for the ones familiar with ML, the community of the nodes appeared as a charming feature candidate at first sight. But how can one feature help us to come up with a layout, simply x and y coordinates of the nodes? If you have dealt with neural networks before, you will at once recall that you would prefer those community information encoded as binary values, like one hot encoding. So the node in a community ‘2’ will be represented as [0, 0 ,1, 0, 0, 0…] where the length of the array is equal to the number of communities detected. We will again use the One Hot Encoding function from the Neo4j graph algorithms plugin for this purpose:

MATCH (n) where exists(n.community) 
with collect(distinct n.community) as communities
MATCH (n) where exists(n.community)
set n.oneHotEmbedding = algo.ml.oneHotEncoding(communities, [n.community])

Lets check the output;

Now the problem has been reduced to assigning x & y values to nodes using this binary vectors. Here comes in the t-SNE (t-Distributed Stochastic Neighbor Embedding), which is a machine learning algorithm for dimensionality reduction. Oh wait. What does it mean and what does it have to do with our problem? Well, it means if you have a n-dimensional point, like the binary vectors above, than t-SNE will reduce that to a lower dimension at your will, which will be 2D for us now. The magic is the proximity of positions will be preserved during the reduction process, resulting nodes in the same community to be posinitied closely.

Visualise data with d3.js

We will use a javascript implementation of t-SNE to calculate the layout and draw it with d3.js as seen below.

The community members are positioned in the vicinity of each other which enables users to spot communities at first sight, without any effort.

You can find the code for layout and drawing in Codesandbox, here: https://codesandbox.io/s/q9z09nnpv9

An important advantage of using ML practices in data visualisation is we can reuse the outputs of a layout in other parts of the dataset, other datasets or in different sessions. We will explore that in our following posts about this subject.

--

--