“Duplicate” nodes in graph databases

Dispelling my confusion regarding uniqueness of vertexes in AgensGraph and Neo4j

Attila Gulyas
Scientific breakthrough of the afternoon
3 min readAug 24, 2017

--

After 2 weeks of trying (and failing) to model the relationships as a graph in PostgreSQL for our organization (after a couple rounds of previous pure relational failures), I realized that probably this is the reason why graph databases exist.

The terms “vertex” and “node” are used interchangeably as Neo4j uses “node” and AgensGraph uses “vertex” in their documentation referring to the same notion.

The conundrum

A couple late night extensive Google search sessions led me to David Kitchen’s comment that perfectly summed up my situation:

Storing the data isn’t an issue. It’s fairly trivial to come up with a number of very good solutions to describing a graph.

The issue is querying the data, specifically when that involves walking the graph.

Being pedantic (and having slight OCD), I previously had an Entities and an Addresses table with at least 5 different implementation plans drawn up to model our business (or in our case, non-profit) logic.

Addresses should be static entities representing fixed locations that are related to other entities (people, organizations, etc.) in various ways. For example, a client could own a house but live at another address or a homeless person could use an address where mail could be sent to — and these relationships matter!

My options boiled down to

  1. denormalizing the schema (and tackle the redundancy)
  2. reinvent the wheel by realizing the infrastructure for a graph database (and I am not that smart)
  3. or look into and start using a proper graph database.

So I started looking at Neo4j (of which I heard before but put aside at the time) and AgensGraph (that I found reading through countless forums and Stackoverflow questions). AgensGraph made me particularly excited because it is built on top of PostgreSQL fork so (theoretically) I could keep using Phoenix with its PostgreSQL adapter.

I already had my desk full of sketches of graphs and reading about the property graph model, which is the base for the above databases, I pat myself on the back of reinventing the (partial) wheel. The Graph Databases book has been (still is) an awesome resource and it is free to download.

The idea below was simpler (and easier to implement now) but I kept thinking in relational terms: most of the entities below may represent different locations yet their names could be the same so nodes with the same label and same properties will yield confusion.

This case below is simple: Logan and Kilgore has a relationship with the same exact address.

A digraph representing a location that is used by multiple people.

I wasn’t sure how I would input this one:

Example of vert(exe|ice)s that hold the exact same information but represent distinct entities.

As it turns out, this is a non-issue with both Neo4j and AgensGraph as they generate a ID for each new vertex, unless there is uniqueness constraint specified on the node property.

Here’s the Neo4j gist and the queries in both graph databases as well:

Testing the creation of “identical” Neo4j and AgensGraph nodes

The diff between the 2 queries:

See Using GraphViz to visualize property graphs about the GraphViz code used to create above graphs with extra resources.

--

--