Exploring Graph Database Based Apps Using a Dynamic Model

Mathias Tiberghien
The Startup
Published in
24 min readJul 24, 2020

Introduction

Creating an application often leads to the following data workflow:

Serialized data → memory data → view data → memory data →serialized data

The application deserializes data then optionally process it into memory then display it in a view.

A user can modify a data through a view, which is optionally processed in memory and then stored into a file or a database.

Because the serialized object, the memory objects, and view objects are different entities, the code will have some code like this:

myMemoryObject.memoryPropertyName = mySerializedObject.serializedPropertyName;

myViewObject.viewPropertyName = myMemoryObject.memoryPropertyName;

Which means that the synchronization between the different entities are based on a static key value system.

Unfortunately, during the development the system will change a lot for different reasons among:

  • The serialization system is changing
  • The view components are changed or upgraded
  • The structure of the objects has changed, to handle new features, or improvements

When a change occurs, the developer must update the bindings in the code using some find and replace operations and recompile the application.

There are many developer’s tools that efficiently handle these tasks, but the base Graph seems to offer great possibilities to deal with binding.

The current experimentation is trying to leverage on Graph database to have a dynamic binding system.

The system will also put the primitive values at the center of relationships showing how to automatically build subset of interesting queries.

The document will explore the creation of a simple Angular web app, displaying data from a Neo4j Graph database.

The final code can be found at : https://github.com/mathiastiberghien/graph_based_coding.git

To make it work, without any changes, you’ll need to install Neo4j Desktop create a database with ‘admin’ as password. The database mus have the setting ‘cypher.lenient_create_relationship’ at true in the neo4j.conf file of the database.

This document is targeting Graph database users or application developers. The sample provided in the documents are very simple and are more focused on the concept that the technology. We will describe the relevant piece of code so that developers can follow the development logic, step by step but this document is not really a tutorial.

Note: I’m very new to Graph database, and I tried through these experiments to understand the reasons of starting to use Graph database in production applications.

Database dynamic model

Before to begin, we need to define the way we will store data into the database. We decided to consider an abstract model representing data from a developer perspective.

Here’s the schema of our graph database:

Dynamic model schema

It contains 5 labels:

  • Model (properties: name)
  • Key (properties: name, isArray)
  • Instance (properties: id)
  • KeyValuePair
  • Type (properties: name)

And 8 relationships:

  • EXTENDS
  • HAS_TYPE
  • HAS_KEY
  • HAS_MODELTYPE
  • IS_EQUIVALENT
  • HAS_VALUE
  • IS_INSTANCE_OF
  • HAS_KEYVALUE_PAIR

This describe the following:

  • A Model defines keys
  • Instances are related to a Model
  • Instances define a collection of KeyValuePair
  • KeyValuePair is mapping a specific Key to a specific Instance
  • A Key has a Type, and can be related to a Model (if the type is an object)
  • A Model can extend other models
  • A Model can be equivalent to another Model
  • A Key of a model can be equivalent to the key of another Model
  • Retrieving Instances of a model should give all the instances of the model but also the instances of the extending models and the instances of the equivalent models
  • The Instances of a model should expose the keys of the model but also the key of extended models. It should use keys equivalency relationship to provide the appropriate key identifier

The database model is then expressed in typescript as following:

DBModel classes

It is considered, at least in this document, as a dynamic model because the labels of the nodes in the Graph database are representing abstractions and the same model can be used whatever the objects it will store.

In a similar approach, a static model is referring to a model with nodes and relationships based on the nature of stored objects. As an example, here is the schema of the Movie sample database in Neo4j:

Movie sample schema

The database is storing ‘Person’ and ‘Movie’ objects which have relationships based on the relationships existing between persons and movies. If we change some concept in our structure, we will probably have to add, remove, or modify the nodes or the relationships, to fit with the new logic.

In a dynamic model, the structure shouldn’t change and still being able to represent a lot of different concepts.

Database Interaction

We first, used neo4j Desktop to create an empty database that runs locally and is accessible using bolt protocol:

Neo4j Desktop

IMPORTANT: To be able to perform the update query, it requires to change the cyper.lenient_create_relationship setting in the database (through the neo4j.conf file) and set it to true:

Updating neo4j.conf

The application presented in the sample is an Angular application that will communicates directly with the graph database through a service named ModelService:

ModeService class

The service is able to:

  • Update the database according to a collection of DBModel objects (defined above)
  • Retrieve instances of a specific model with optional filtering by id option as a collection of Instance<T> objects where T will be a javascript object containing the key properties of the model

An Instance<T> is defined as following:

Instance class

Where:

  • ‘instance’ property represents the instance database meta data (in the sample we will only use the id, but we can imagine adding other information like a creationDate, lastUpdate, etc…)
  • ‘value’ property is a javascript object that will be used by the application

The ModelService class has 2 public methods:

  • buildSample: This will update the Database model using the private methods createOrUpdateDBModels and clearUnusedObjects providing a sample that declare the data structure and the instances contained in the database
  • getInstances will take a model name and will retrieve the instances for that model

It has also 2 private methods:

  • createOrUpdateDBModels will be used to update the data structure of our project using a DBModel collection as an input
  • clearUnusedObjects will removed object that have are no more use in the application (orphan nodes)

The queries were defined once and won’t be updated during the experimentation. They will be detailed at the end of the document.
NOTE: In a real-life project, the client shouldn’t off course communicate directly with the database and the CRUD operations should be handled in a client/server architecture.

Building the sample

The AppComponent is the main component of the app and is implemented as following:

AppComponent

The application will run the buildSample method once, when launching.

The purpose of the method is to adapt the database using a DBModel collection (referred as sample) representing the Models with their instances. The changes made to the database will be differential: it means that the method will compare the existing database with the sample and create or remove the nodes and relationships to represent exactly the changes between the current version of the database and the sample. In a real project we wouldn’t use such a method, but it was created to check that a single generic query was able to modify dynamically the database.

The buildSample code is detailed below:

buildSample method

It calls createOrUpdateDBModels method which will run a Cypher query to the database providing the models then it calls clearUnusedObjects that will clean the database.

The changes in the databases (nodes, labels, relationships created or deleted) of the 2 queries are aggregated and logged so we can track these changes.

At the beginning of the project the sample const was defined like this:

sample constant

This means that the database contains nothing and should remains empty (we started with an empty database) when launching the application.

During the evolution of the project we will modify the sample to declare what we want to find in the database.

In short, the buildsample method will apply a declarative definition of the content of the database to the Graph database and log the effective changes in the database.

Let’s build a simple Movie application

We want to display a list of movies and we created a simple Movie class that represents a movie which is defined by its title.

Let’s first define a Movie model:

Movie class

We create then a MovieListComponent:

MovieListComponent class

And its template:

MovieListComponent template

The movies are read from the database using the ModelService class and getInstances method.

The MovieListComponent template will display the each movie title and also the movie’s id.

Let’s update the sample like this:

sample constant

This tells the system, that we want a Model named ‘Movie’, containing the key named ‘title’ and having 2 instances ‘m1’ and ‘m2’ configured with ‘The Matrix’ and ‘Harry Potter’ as a title.

When looking into the logs, the first time the application is launched after having updated the sample, we will see :

update logs

If we refresh the page or relaunch the application the logs will be:

update logs

This is because of the createOrUpdateDBModels method is not incremental but differential (the query will be detailed later).

Note: In the real life we probably never declare DBModel objects manually, but this is convenient for a simple experimentation.

When running the application, it will display the movie list:

Running the application

Let’s now have a look at the database:

Neo4j db nodes

We can see exactly what we defined in the sample:

  • A model, named ‘Movie’ which has a key named ‘title’
  • 2 instances ‘m1’ and ‘m2’ having both a key/value pair binding the title to values which are instances having no model meaning that they are primitive
  • The key has a type which is a string

Our model is considered dynamic by the fact that ‘Movie’ is just a node labelled as ‘Model’ and ‘title’ a node labelled as ‘Key’. The abstraction model was defined statically but we can work with movies, fruits, or products, the labels and relationships will remain the same, which mean than we can have a generic method based on model name that will dynamically create the desired object according to the specifications defined in the sample.

This way of doing creates more nodes than for a static model, and add complexity to the graph (when displaying the nodes) and also some verbosity to the query (because of the KeyValuePair acting as an hyperedge), but due to the efficiency of Graph database to deal with relationships, it should also have some benefits that we will explore trying to improve the application.

Important: In that model, the primitives are not stored as property of nodes, but as nodes, the id of the node being the primitive. This allows to create relationships between object instances, keys and primitive values. These relationships will create physical subsets of correlations that seems very promising.

Improving the application

A movie should not only be defined by a title. It has also actors and we want to display the actors of a movie in our app.

We create first a Person class as following:

Person class

And a PersonComponent:

PersonComponent class

With its template:

PersonComponent template

The Movie model should now expose an ‘actors’ property which is a Person collection:

Movie class

We can then improve the MovieListComponent template:

MovieListComponent template

We will update the sample as following:

sample constant

We have added the model ‘Person’ and some instances of it and added the ‘actors’ key to the ‘Movie’ model.

Because of the separation between models, keys, and instances it is quite easy to update the model in a differential way, adding keys, and linking the node together.

The application will now display the following:

Running the application

In the database we have now:

Neo4j db nodes

On the right are all the primitives representing the ‘firstName’ and ‘surname’ which are related to the instances of Person. The movies have now new key/value pairs that stores the actors collection.

One relationship was added: HAS_MODELTYPE which tells to the system that actors are Person instances.

Note that we didn’t have to change the ModelService queries or the MovieListComponent. The changes in the sample have updated the database which has now all the information to return the updated Movie data structure.

What will happen now, if we change the data structure of the sample?

Changing the data structure

In our application, having the actors of a movie is nice but we should have also the character played in each movie. We need to update our model to reflect that need, creating a ‘Role’ model containing the ‘actor’ and the ‘character’ properties, and changing the ‘actors’ properties to ‘roles’ in the ‘Movie’ model.

This is the Role class definition:

Role class

And this is a RoleListComponent:

RoleListComponent class

With its template:

RoleListComponent template

The Movie class is now like this:

Movie class

And the MovieListComponent template will be:

MovieListComponent template

We have then to update the sample as following:

sample constant

The sample reflects the changes in the model. We added the ‘Role’ model and updated the ‘Movie’ model. We have also added a new movie into the database.

The application will now display:

Running the application

And the database looks now like:

Neo4j db nodes

Watching the entire graph beginning to become a little confusing, let’s focus on some nodes of interest:

Neo4j nodes without Instance or KeyValuePair

Excluding the instances and the key/value pairs we can focus on the data structure and we can see that:

  • The property ‘actors’ has disappeared from the model ‘Movie’ which has now the property ‘roles’ with ‘Role’ as model type
  • The model ‘Role’ was added with 2 keys ‘character’ and ‘actor’ which is a ‘Person’
  • The ‘roles’ and ‘actor’ keys have an ‘object’ type
  • The other keys are defining primitives of type ‘string

Let’s have a look at Keanu Reeves now, stored by the instance ‘p1’:

Neo4j db — Watching Keanu Reeves

Because Roles ‘r1’ and ‘r5’ have both ‘p1’ as an actor (Keanu Reeves), they are sharing the same key/value pair.

Important: In the system, a KeyValuePair binding a specific key to a specific instance will always be unique.

We can also see that there is a physical connection between ‘p1’ and ‘r1’ and ‘r5’. If a user wants to know which roles are played by Keanu Reeves, we just need to search for a KeyValuePair mapping Person ‘p1’ to the key ‘actor’ and all the instances having that KeyValuePair should be the answer to the question.

If we have a look now at ‘Harry Potter’ primitive:

Neo4j db — Watching ‘Harry Potter’

Because ‘r3’ as ‘Harry Potter’ as character and ‘m2’ has ‘Harry Potter’ as title, they have both a key/value pair sharing the same primitive instance.

Important: A primitive instance id being the value of the primitive, primitive Instance with the same value will be unique.

This has a very interesting consequence: it will create relationships (through KeyValuePair) between instances having the same primitive: It means that the system will generates a lot of value-based connections between the instances. This should have a lot of interest when you have to run analytic queries on the database. Starting from a primitive instance node, or a key node you will find all the instances related to that key or value without having to anticipate what do you want to find.

By example, as an application user, you might be interested by ‘Harry Potter’. When you reach the ‘Harry Potter’ node you can find instances of different models having a key/value pair directly related to it. In our sample, you can learn that the value ‘Harry Potter’ is associated with the key ‘title’ of a model ‘Movie’ and the key ‘character’ of a model ‘Role’. You can also see which instance of ‘Movie’ and ‘Role’ is related to ‘Harry Potter’. The relationship exists physically in the database: you don’t need to iterate through all the objects to find those sharing the ‘Harry Potter’ value. You don’t need neither to know every concept (Key or Model) associated with that value. This will probably be very useful when you don’t know how the users of the application will consume its data.

The Model, Key, KeyValuePair, Instance system also allows us to easily modify the entire structure of the data without having to destroy everything and recreate everything. We can destroy some relationships, add new ones, and then easily remove the orphan nodes.

Let’s now explore how we can extend our model.

Extending the model

We have what we want in our movie list, but there is a new requirement for the application: display movies but also books.

We will create a Media class that will store the title. The Book class will have an ‘author’ property which is a Person. The Movie class will be updated to remove the ‘title’ property and both Book and Movie will extend the Media class:

Media, Movie and Book classes

This will allow us to create a BookListComponent:

BookListComponent class

With its template:

BookListComponent template

And a MediaListComponent :

MediaListComponent template

With its template:

We will then add some buttons in the TopBarComponent template with different routes:

TopBarComponent template

Our app having a routing module describing the following routes:

AppRoutingModule class

We will now have to update the sample:

sample constant

We added ‘Media’ and ‘Book’, added a ‘Person’ instance and a ‘Book’ instance into the database. The Movie doesn’t have the key ‘title’ anymore and extends now ‘Media’, such as the ‘Book’ model.

The app will now display the following:

Running the application — Media
Running the application — Movies
Running the application — Books

Note that the objects have been retrieved with the getInstances method, and the unique change in the query was the name of the model. Because the relation between models are described in the database the method can retrieve the appropriate objects for each model just knowing the model name.

The database model structure looks now like this:

Neo4j db nodes without Instance or KeyValuePair

We can see that ‘Book’ and ‘Movie’ are extending Media which has the ‘title’ key.

The system is extensible, but what happens now when introducing a new feature, that has a different data structure but is similar (conceptually) to other models?

Merging new concepts

Let’s say that we have another app, displaying some ‘Items’ that are defined by their ‘name’.

We want to add these items to the application and be able to consider the ‘Media’ instances as ‘Items’.

First, we create the Item class:

Item class

With an ItemListComponent:

ItemListComponent class

And its template:

ItemListComponent template

Then we add a navigation button to the TopBarComponent template:

TopBarComponent template

With the according route in the Routing module:

AppRoutingModule class

As there is no change in the other models, we just need to add the ‘Item’ concept in our DBModel collection:

sample constant

The application has now an Items panel displaying all the items and media objects:

Running the application — Items

In the database, looking at the model structure:

Neo4j db nodes without Instance or KeyValuePair

We can see the equivalency relationships between the‘Item’ and the ‘Media’ models and between the ‘name’ and the ‘title’ keys.

This equivalency system can be useful to represents different structures for the same objects (the memory objects on server side and view object on client side by example) of a same model.

Considerations about primitives

Primitives are instances without defined model. They will store number, or string. Having those primitives as nodes can be useful to retrieve transversal relationships by value between the different instances.

This can also be interesting with dates.

Let’s create a Date model containing the keys ‘year’, ‘month’, ‘day’. We will create first a DBDate model:

DBDate class

And add a ‘dob’ property to Person model and ‘releaseDate’ to Media model:

Person class
Media class

We will then create a MediaComponent:

MediaComponent class

With its template:

MediaComponent template

We can then use the MediaComponent template in the MediaListComponent, BookListComponent and MovieListComponent templates:

BookListComponent template
MediaListComponent template
MovieListComponent template

We will also update the PersonComponent that will calculate the age of a Person using its date of birth:

PersonComponent class
PersonComponent template

The sample is updated with some ‘Date’ instances:

sample constant

The application is now displaying actor’s age (for Keanu Reeves) or media release date (for Mary Poppins):

Running the application — Movies

In the database if we focus now on ‘1964’ primitive:

Neo4j db — Watching ‘1964’

We can find a correlation between a day of birth and the release date of a movie. Once again, if a user want to know which movie was released on a specific year, the query won’t have to look for all the instances matching the condition: they will be linked to the specific year primitive through Date instances. Looking for the instances of ‘Movie’ having a key/value pair mapping the key ‘releaseDate’ and a ‘Date’ instance that have a year primitive being the desired year, we will be able to answer to the user without exploring the entire ‘Movie’ database.

The good thing having primitives as node instead as properties is that the objects of the database will define relationships by value that will be very useful when you’ll want to analyze or search your database. You can start from a Key, a Model, or an Instance, you’ll always find all the object related to the node at a few hops (the number of hops depending of you data structure that is also defined in the database).

Note: Because Instance object are representing unique values, the id of the instance being the value, there are some questions that will need to be tested when the system will have a lot of values.

Now that we have explored some typical application development use cases, let’s have a closer look at the cypher queries behind the ModelService.

Queries in details — Instances Retrieval

The ModelService has a method getInstances that will:

  • Query all the instances of a specific model using a cypher query that will return json objects. The query can optionally be filtered by instance id
  • Transform json objects into Instances object that will contain the instance meta data and its object representation

Here’s the code of the Cypher query:

Get Instances cypher query

The query first tries to reach the model by its name (the parameter of the query)

It looks then for the keys defined by the model and the keys defined by the parent models (if they exist) and uses those keys as a unique collection named ‘key’.

Example: A ‘Book’ model will have ‘author’ key from book and ‘title’ key from the ‘Media’ model because ‘Book’ extends ‘Media’.

Here’s the code of that part of the query:

Getting properties from model and parents

With the model and all its keys, the query search for eventual equivalent model then for all the models extending them. The model, the equivalents and their children are gathered in a single collection a as ‘model’.

Example: an ‘Item’ model is equivalent to ‘Media’ and ‘Media’ is extended by ‘Book’ and ‘Movie’. These 4 models will be considered now as ‘model’.

The query can then get all the instances of the models.

Here’s the code:

Gettings instances from model, equivalents and their children

The query will search for the KeyValuePair nodes of each instances that are related to a model key. It will also search for the KeyValuePair that might be equivalent to a model’s key.

Example: ‘Item’ model will have instances from ‘Book’, ‘Movie’, ‘Media’ and ‘Item’. ‘Item’ has the key ‘name’, but ‘name’ being equivalent to ‘title’, the query will gather all the KeyValuePair from the instances of ‘Book’, ‘Movie’, ‘Media’, or ‘Item’ that are related to ‘name’ or ‘title’

The KeyValuePair will be gathered as ‘kvps’.

Here’s the code:

Getting KeyValue pairs

Now the query can find the values associated to each KeyValuePair. It will also try to find the model associated with each value. If there is a model, the instance should be considered as a reference otherwise as a primitive. An instance reference will be an object defining the instance id. A primitive will return its id property as a value.

Example: A ‘Book’ has a ‘title’ and an ‘author’. The instance ‘b1’ has a value for ‘title’ which not related to any model so it will return the id of the instance which is ‘Moby Dick’. It has also a value for ‘author’ which is related to the model ‘Person’ so it will return {id: ‘p1’} which is a reference to the instance ‘p1’.

Here’s the code:

Getting the values

Finally, The instances will be return as an array of json object containing:

  • The instance meta data (containing the id and the model name of the instance)
  • The instance properties: an array of key/value pair objects

The key/value pair object will be defined with:

  • The key identifier
  • The original key identifier in case of key mapping between equivalent models
  • The model in case the values are instance references
  • The values
  • The information if the key is an array or a single value

Here’s the code:

formatting the result

In the application the result of the query is defined as collection of Record described as following:

Query result classes

In the ModelService, the getInstances method is defined as following:

getInstances method

The method takes 4 arguments:

  • The model’s name
  • An optional InstanceRef collection that can be used as a filter
  • An optional recursive index that defines how many levels of instances should be extracted (0 means all). Example: A Movie is composed by a ‘title’ and ‘roles’. The ‘roles’ property has instances of ‘Role’ which have instance of ‘Person’ in ‘actor’ property, which has an instance of ‘Date’ as ‘dob’ property. If the recursiveDepth is equal to 0, the method will recursively call itself until it reaches only primitives. If it is equal to 1, it will only return the Instances references for the ‘roles’ and not the Role objects. If it is equal to 2, it will return the Movie with the Roles objects, but the actor will be an InstanceRef
  • The current recursive level

It calls the instances Record objects using the query described above adding optionally a filter for the instances with that line:

An optional filter

Then for each Record, it creates an instance object setting the properties of the object dynamically using the key and values properties of the KeyValuePair collection of the Record.properties.

It is using recursivity to optionally retrieve the complete objects.

It returns the Instance collection that can then be used directly by the application components.

Query in details — Updating the database

The Cypher query takes a DBModel array as parameter and update the database accordingly.

It removes the Model nodes and the relationships existing in the database and not in the DBModel objects and create the nodes and relationships that don’t exist in the database.

The query is quite long because it does a lot of things. It was created to verify how the structure can be updated using queries. I discovered Cypher creating this project and I have to admit that I was amazed how the language is intuitive and allows to perform complex tasks in a single query.

Note: In the real life, we probably won’t never use such a query, but it’s convenient for a POC.

The query first looks for the Model objects existing in the database and not in the DBModel collection and remove them. Then it merges the Model objects using the name property of the node:

Updating the models

The query then looks for the relationships that are not defined in the specification (the DBModel collection) and remove those relationships in the database. It will then create the [:IS_EQUIVALENT] relationship between the Model objects according to the specification:

Updating model equivalency

The query creates then the Key nodes related to a model. It merges the Type objects for each key and remove or merge the [:IS_EQUIVALENT] relationships according to the specification. It creates then [:HAS_KEY] and [:HAS_TYPE] relationships that links models to keys and keys to types:

Updating keys and types

The query delete then the relationships that are not defined in the specification and create optionally the [:HAS_MODELTYPE] relationships between a key and its model when a key is related to another object. It removes then all the [:HAS_KEY] relationships between model and keys that are not defined in the specification:

Updating relationships between models, keys and types

The query then removes the relationships [:IS_INSTANCE_OF] that are not defined in the specification. Then merge Instance defined for each model and create the relationship between them:

Updating instances and relationships with models

The query continues removing unspecified relationships between the Instance and KeyValuePair nodes then update the [:EXTENDS] relationships between Model objects according to the specification:

Updating relationships between instances and key/value pairs

Finally, the KeyValuePair / Value relationships are updated, merging the Instance by id:

Updating relationships between key/value pairs and values

In the ModelService, the createOrUpdateDBModels method simply calls the Cypher query described above and return the update information from the server (number of nodes, labels, relationships that were created or deleted):

createOrUpdateDBModels method

Query in details — Remove orphan nodes

The previous query might have removed some relationships and the database might have nodes that have no more meaning. This query ensures that every unwanted object is removed from the database.

The query removes:

  • The KeyValuePair nodes that are not associated with any instance
  • The Key nodes that are not referenced
  • The Type nodes that are not referenced
  • The Instance nodes that are not referenced

Here’s the code:

Removing orphan nodes

In the ModelService, the clearUnusedObjects method simply calls the Cypher query described above returning the counters:

clearUnusedObjects method

Edition and Versionning considerations

You probably noticed that there is a ‘Type’ node and it was never used during the experiments.

The type will be important to handle generic editor that won’t be covered in the current document.

We can easily imagine adding some edition information on types or keys such as editors, validation rules (a title should contain 100 character at max, etc..). This should allow us to specify the edition rules and components directly into the database to ensure some flexibility with edition requirements.

Another concept that might be interesting: Version. Adding the label ‘Version’ with the relationships [:FROM_VERSION] and [:TO_VERSION] will allow us to track the different versions of our models in the database. A client query could be able then to retrieve a specific data structure for a specific version. Developers would be also able to track the history of the application which can be very interesting.

Conclusion

We have discovered during that “little” experiment that a graph database could be used to store primitives as node to create physical relationships between different objects and concepts without have to anticipate them. The system should generate a lot of subsets of objects linked by value that should allow queries to avoid looking into the entire database.

The Model, Key, KeyValuePair, Instance approach also allows to dynamically generate the way these primitives should be returned without having to change the query.

This will of course have a cost on the size, and there will requires the improve this system for productions project, but I’m really excited by the opportunities that Graph database can offer to application development.

As I’m new to Graph database, I’m not sure if these approach is a good way to do, so I’ll be very happy for any feedback from Graph database developers community.

--

--

Mathias Tiberghien
The Startup

I have just completed a Big Data Master’s degree in order to invest my experience and skills in an innovative project.