Social Media Analysis with Cloud Graph Database — Part 2
Community detection and visualization
We have discussed how to create graph with Neo4J in Part 1. If you haven’t read it yet, please read and follow the tutorial about how to create graph and calculate centrality in part 1 before continuing. This part will continue the discussion and the tutorial about data analysis using social media data in network perspective.
Community detection is the most common task that people do, while conducting analysis data from social media. We have many algorithms to performing that task. One of the most famous community detection algorithm is Louvain.
The Louvain method is an algorithm designed for identifying communities within extensive networks. It aims to maximize a modularity score for each community, which measures the effectiveness of node assignments to communities by assessing the density of connections within each community relative to connections in a random network.
The Louvain algorithm operates as a hierarchical clustering technique, repeatedly combining communities into a single node and applying modularity clustering to the simplified graphs.
This tutorial will demonstrates how to implement Louvain community detection on retweet network that have been created in previous section.
The requirement for this section are below:
- Blank Graph Data Science Neo4j Sandbox;
- You have done with part 1, the retweet graph (rtGraph) has been projected and ready to be used;
Let’s do the community detection ……..
Check whether the graph is already projected
Firstly, we need to check whether the retweet graph has been projected, it can be done by call gds method to show list of available graphs.
call gds.graph.list
Result from that command will be the list of available graphs, like below.
If you dont see any graph on the result, that means you haven’t made a graph projection and need to do that.
Estimate the resources need to perform community detection
We can estimate the required resource for running community detection by execute this cypher.
CALL gds.louvain.write.estimate('rtGraph', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
From the result, we know that the community detection task with retweet graph only require small memory, so this is not a heavy task.
Run the Louvain community detection
Graph data science (gds) addon provide function for community detection task.
CALL gds.louvain.stream('rtGraph')
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId
ORDER BY name ASC
After execute that cypher we will get list of node with community id on each of them. The result is as below.
we can find about how many communities found by passing this cypher on the Neo4J Browser.
CALL gds.louvain.stats('rtGraph')
YIELD communityCount
Graph visualization could help up to deliver the result of community detection, it is very intuitive, we just assigned color to the node based on community id. We need the community id to be written to the node as a new attribute.
CALL gds.louvain.write('rtGraph', { writeProperty: 'community' })
YIELD communityCount, modularity, modularities
After that cypher executed and finished, we will have new item on property key.
Visualizing the communities of user with Bloom
Bloom is a beautiful and expressive data visualization tool to quickly explore and freely interact with Neo4j’s graph data platform with no coding required. We can open Bloom from list of project, by choose Open with Bloom.
After loged in into Bloom, we will get the blank page. We need to define what to be displayed by choose the node and the relationship. We can configure visual aspec of the node such as color and size by click the node tab and configure it.
Rule based style is used to configure the graph visualization. We apply color to the nodes according to their community id, and also apply size based on indegree centrality value.
The graph visualization after applying the rule based style configuration is as below.
Recap
This section has demonstrated the steps of community detection and how to visualizing the graph with additional information about community and degree centrality. Those new calculated attributes of node is represented as different color and size of nodes. I hope this article will be useful and easy to follow. Thanks for your attention.