Keeping track of graph changes using temporal versioning

Ljubica Lazarevic
Dec 16, 2019 · 8 min read

In this post we’re going to cover why versioning is important and how to do time-based versioning in Neo4j including managing retrospective (bi-temporal) changes.

Introduction

Why versioning?

Another reason for versioning would be for what-if analysis. Whereas keeping track of change shows us historical changes, suggesting a change that hasn’t already occurred allows us to start examining what might happen under certain scenarios and project into the future.

There are many use-cases where you might see versioning in use, for example:

  • Identity and Access management — We can keep track of who is accessing what and when, and start to do analysis on any interesting behaviours we should be investigating
  • Monitoring changes in networks — Much like the above example, not only can we keep track of what’s been accessed to watch out for unusual behaviour, but we can also do predictive behaviours based on what happens when we make changes and what might the impacts be
  • Collaborative working — Think GitHub and other collaborative working tools. We can track changes being made, see when they occurred, and reduce the likelihood of losing contributions.

Versioning in Neo4j

One thing to bear in mind, the community has created a versioning plugin, called Versioner-core. It does provide for some degree of automated versioning. You can check it out from the author’s GitHub repository.

Introduction to time-based versioning

  • You can use it to track changes, reversing any errors
  • You can make updates to your data without deleting anything
  • You too can become a time-traveller and move through time to understand state change.

The principles behind time-based versioning are pretty straightforward:

  • Separate the object from state, these will be linked by a relationship
  • Capture change times within the relationship property linking these two entities.

And so, it is time to introduce our example (no pun intended!):

Image for post
Image for post
Our original data item, Product with some associated properties

Scenario 1

There is a company that produces a product called Widget. On the 4th May 2016, a couple of decisions were made:

  • Rename Widget to MiniWidget, due to a new product similar to Widget coming out
  • Reduce the price of this product down to 3.99
Image for post
Image for post

As you can see, we’ve split out our Product node which just contains the property to uniquely identify it (id), and we then capture other information such as name and price within ProductState nodes. We then connect Product to ProductSate using the HAS_PRODUCT_STATE relationship. Lastly, we capture information about when this state was valid with from and to properties on the relationship itself.

With our newly refactored graph, we can start to ask some questions of it.

Query 1: What is the current name of product with id:123?

MATCH (:Product {id:123})-[r:HAS_PRODUCT_STATE]->(ps:ProductState)
WHERE NOT EXISTS(r.to)
RETURN ps.name;

We use WHERE NOT EXISTS to bring back the most recent state, as that will be the node with no to property.

Query 2: How much did the product with id:123 cost 3 years ago?

MATCH (:Product {id:123})-[r:HAS_PRODUCT_STATE]->(ps:ProductState)
WHERE r.from <= 20161010 AND (r.to>=20161010 OR NOT EXISTS(r.to))
WITH ps, r ORDER BY r.from LIMIT 1
RETURN ps.price

Here we try to gather all the ProductState nodes from 3 years ago, bearing in mind the current node may still be current.

Query 3: What is the SKU (Stock Keeping Unit) for MiniWidget?

MATCH (:ProductState {name:”MiniWidget”})<-[:HAS_PRODUCT_STATE]-(p:Product)
RETURN p.id;

As you can see, just by separating out object from state, we are able to capture a lot of information about changes, and pull back information depending on the time filter.

Versioning relationships

The principles behind versioning relationships is pretty much the same as how we version object states:

  • Connect the two nodes involved in the relationship
  • Provide a time range for when that relationship became live, if relevant

Time to have a look at our next scenario!

Image for post
Image for post
BUYS would be our versioned relationship in this image

In the next iteration of our data model, you can see how we’ve extended versioning to relationships by joining Customer and Product with the BUYS relationship. By adding a date as a property, we show when that relationship occurred, hence versioning it. Some of you may have spotted that Customer is not versioned, more on that in the next model….

Scenario 2

A customer buys a product on 18th September 2016. Sometime after the purchase has been made, the customer has moved home and updates their address.

Image for post
Image for post
Here Jane Doe purchases a product, and then later on updates her address, as reflected in CustomerDetails

As previously, let’s ask some questions!

Query 4: Which customer last purchased a product with id:123?

MATCH (:Product {id:123})<-[r1:BUYS]-(c:Customer)
WITH c, r1 ORDER BY r1.date DESC LIMIT 1
MATCH (c)-[r2:HAS_CUSTOMER_DETAILS]->(cd:CustomerDetails)
WHERE NOT EXISTS r2.to
RETURN cd.name;

We use the ORDER BY r1.date DESC LIMIT 1 to get the newest BUYS relationship.

Query 5: Where has Jane lived and when did she move?

MATCH (:Customer {id:456})-[r:HAS_CUSTOMER_DETAILS]->(cd:CustomerDetails)
RETURN DISTINCT cd.address AS Address, r.from AS From, r.to AS To
ORDER BY From;

Managing retrospective changes

  • Dealing with new information that is discovered that needs to be reconciled — for example, you deposited some money into your savings account, but the bank missed it
  • Provide an audit trail for regulatory purposes — a company needs to demonstrate to a governing body what went wrong and how it was corrected
  • Used in what-if and other analysis based on events happening at different potential points in time — applying when a process should execute in the future, but the current date when it was authorised, e.g. price increase in a monthly subscription or changing how much power is flowed down a network.

This type of versioning is also commonly known as bi-temporal versioning.

The principles are fairly simple, we now use two date/timestamps instead of one:

  • one to represent when something should have happened, in our following scenario we shall refer to this as business date, or bizDate for short
  • one to represent when something actually happened, for example when a correction has taken place. In the scenario we shall refer to this as process date, or procDate for short

So, on to our final scenario!

Scenario 3

A customer buys a product from the company. Unfortunately something has happened during the process and the transaction to capture the sale fails. During the bi-annual stock-take, it is identified that there is one item of type product less compared to the records. After some investigation, the missing transaction is identified and rectified.

Image for post
Image for post
Customer with id:836 bought a product, but the transaction didn’t register. Later on in the year the situation is rectified

In our latest iteration we added bi-temporal versioning elements to the BUYS relationships:

  • bizDate captures the business date of when a transaction took place (or should have taken place). E.g. this is the date we’d show the sale went through
  • procDate captures the date of when the transaction or any corrections actually took place. If everything is working as expected, bizDate and procDate will be identical. If, per our scenario, we miss and then later identify a transaction, we would set procDate as the current date when the correction is applied, and bizDate would be the retrospective date of when the transaction should have taken place.

Query 6: How many transactions were missed that we retrospectively captured?

MATCH (:Customer)-[r:BUYS]->(:Product)
WHERE r.bizDate < r.procDate
RETURN count(*);

Query 7: What were the captured transactions for the past year?

MATCH (:Customer)-[r:BUYS]->(p:Product)
WHERE r.bizDate = r.procDate AND r.bizDate >= 20181010
RETURN p.id AS SKU, r.bizDate AS `Sale Date`
ORDER BY `Sale Date`;

Query 8: What are all of the actual transactions for the past year?

MATCH (:Customer)-[r:BUYS]->(p:Product)
WHERE r.bizDate >= 20181010
RETURN p.id AS SKU, r.bizDate AS `Sale Date`, r.procDate AS `Transaction Date`
ORDER BY `Sale Date`;

Advantages and Disadvantages of these approaches for capturing change

Advantages:

  • All changes to the data are captured, including relationship changes
  • Able to step backwards and forwards in time according to the questions we are looking to answer.

Disadvantages:

  • Need to do additional work to model changes in relationships
  • No indexing on relationship properties — further model iteration is required if there are many state changes
  • Querying is a bit more complex.

What has been shown here is the formalised view for time-based versioning. Of course, the data you are working with and the questions you want to ask will provide you with opportunities to exercise pragmatism and only use the components that are useful to capture up to the level of versioning you require.

Summary

We looked at time-based versioning, and how this can be extended for versioning relationships, as well as capturing retrospective changes.

Last but not least, we discussed some of the advantages and disadvantages with this method, and the importance of pragmatism in your versioning approaches.

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store