Neo4j Graph Data Modeling based on Reverse Engineering approach

May 17: World Telecom Day, One Month Graph Challenge

Vlad Batushkov
May 17 · 4 min read

Welcome word

In this series of small posts I do one simple graph daily. Domain model of graph somehow related to day’s history, some historical event, celebration or person. I do this challenge to learn Neo4j Data Modeling and Cypher. Every day. One month. Follow me. Maybe you will be inspired and next month would be yours One Month Graph Challenge. #OMGChallenge

Domain model

Today is World Telecommunication and Information Society Day. For sure, Information technologies significantly changed the face of civilization. And from the moment of IT-boom, changes and discoveries in this area still going very fast. Today I want to talk about past days. Days, when internet been something special. Days, that gone. Days, when me and my neighbors used local area networks (LAN).

Domain model, that I plan to have would be very simple. Every node of network is an actor with uniqie IP-address inside LAN. Actor, I mean would be some role of LAN, that represent a PC of real user, modem, switch, router or repeater. I want to try something like a Reverse Graph Engineering. I want to setup random network first, and only then by applying of specific rules to determine to each node a particular role it plays in this network.

Each node is Spot, ended Spot is PC of some person. PC connected to the network only via Switch. Switch connected to another Switch only via Repeater. All together is a local area network (LAN). Also would be fun to identify Subnetworks as group of some Spots.

Graph

Random network generation possible with power of APOC library in one simple hit:

MATCH (n) DETACH DELETE n;
CALL apoc.generate.er(100, 75, 'Spot', 'CONNECT');

Good, in one click I have bunch of Spots CONNECTED to each other. Not a LAN network model yet, but first step is done. Second step: remove all isolated nodes:

MATCH (s:Spot)
WHERE NOT (s)--()
DELETE s

Not ideal. Seems like some isolated groups of nodes still here. Will try another way: remove all graphs except the longest one.

MATCH (s1)-[*]-(s2)
WITH COUNT(DISTINCT s2) AS size, s1
WITH s1, size
SET s1.size = size
WITH max(size) - 1 as max
MATCH (s:Spot)
WHERE s.size < max
DETACH DELETE s

Only one longest graph left now. Great. I can continue. Even right now I can try to identify Subnetworks. For such task I can use on of the Community Detection Algorithms, presented in algo library.

CALL algo.louvain.stream('Spot', 'CONNECT', {})
YIELD nodeId, community
CALL apoc.create.setProperty([nodeId], 'subnetwork', community + 1) YIELD node as n1
CALL apoc.create.addLabels([n1], ['Subnetwork' + toString(community + 1)]) YIELD node as n2
WITH collect(n2.uuid) as spots, community as subnetwork
RETURN subnetwork, size(spots) as numberOfSports

8 subnetworks identified. Interesting, but far away from reality. I mean, we can imagine 8 standalone subnetworks, but for this picture and rules, that I set, it is definitely not a correct answer. But looks beautiful, anyway.

Time to detect PC. PC is a ended Spot:

MATCH (s:Spot)
WHERE apoc.node.degree(s, '') = 1
SET s:PC

Time to set Switch as Spot next to PC.

MATCH (s1:PC)-[:CONNECT]-(s2:Spot)
SET s2:Switch

All the rest is Repeaters. One more idea come into my head. Repeater with more than 2 connection — is a Router.

MATCH (s:Spot)
WHERE apoc.node.degree(s, '') > 2 AND NOT (s:PC OR s:Switch)
SET s:Router

Ok, now, all the rest is Repeaters.

MATCH (s:Spot)
WHERE NOT (s:PC OR s:Switch OR s:Router)
SET s:Repeater
PC (Orange), Switch (Yellow), Router (Red), Repeater (Blue)

Resume

Today what I planed about algorithms, not happend. My initial idea on LAN graph was just to apply several algorithms to solve my needs, but unfortunatly I ended with handling all identifications manually.

Good to mention, that using automatic graph generation, for some scenarios can be really helpful. But be ready to fix some stuff in your raw graph. It was interesting expirience, maybe I will try to use auto-generation again.

What to improve here? I have an idea. We can define rules to set IP-addresses to each Stop. Wanna try? It must be fun!

Resources

Vlad Batushkov

Written by

Indie view on software development. Brewed with hops, web and rock’n’roll.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade