Community detection in social networks with Neo4j and NetSCAN

Vitor Horta
Neo4j Developer Blog
3 min readAug 29, 2018

If graphs and social networks are everywhere, communities are too. As people have a great tendency to form groups, investigating these groups can be very useful.

For example, if you are a developer then you are probably making use of several open source packages and you’re definitely getting a lot of help in Q&A forums and other websites like this one.

Perhaps you are the one making a great difference to someone else around the world and… aren’t you curious to know who they are or what are they creating with your kind collaborations?

Community detection methods can help us finding those people and a lot of applications can benefit from this. Some good examples are recommendation systems, question routing, viral marketing and many many others.

But this is not a trivial task, specially in big and dense networks. To show this, let’s take a look at a network modeled from StackOverflow data and constructed in the Neo4j.

Stackoverflow network example in Neo4j. The blue nodes represent users and the relationships represent answers given by the source node to the target node.

In a network like this, nodes represent users and the edges are answers given by the source node to the target node. It’s reasonable to say that in this example we have two communities, a node acting like a bridge between these groups and two influential nodes that have most of the connections.

So you can ask me: “what’s the big deal? One could detect this instantly by looking at the figure!”. And you’re right, so let’s turn things more complicated.

Gephi visualization of the entire StackOverflow network.

How about now?

As you can see it’s not trivial task. It’s clear that, if we want to detect communities and find influential people in such a huge network, we need automation. In order to tackle this problem we have recently developed the NetSCAN algorithm. NetSCAN is a density-based method for detecting communities in social networks and finding influential people. It can also detect the semantic meaning of groups, which you’ll see, it’s great!

The algorithm was implemented for Neo4j and the execution is the simplest one. All you have to do is to run the procedure as a cypher query!

CALL netscan.find_communities('User','ANSWERED','userId','weight', 'HIGHER_BETTER', eps, minPts, radius);

The installation instructions and parameter definitions are available on github and all the details about the algorithm can be found in the published paper. But for now, let’s just take a look at the algorithm in action and some results.

Two communities found by NetSCAN. The first is a python community (red) with one influential node (green) and the second is a c++ community (red) with two influential nodes (green).

As we can see, even for huge networks NetSCAN can detect communities, find influential people and also identify their topics of interests. And there are many more interesting things, such as the possibility to find developers participating in multiple communities of different subject, characterizing people with multidisciplinary skills.

So again, if you want to use this in your own network and Neo4j graphs I’m sure you’ll find many interesting possibilities. Give it a try, it’s very simple to install and to use. Also feel free to give feedback in this post, github or email. I’ll be happy to help you discover communities. Maybe we’ll find out that we are part of one!

--

--