Graph value based modeling

Discover the benefits of having primitives being the core of a graph database

Mathias Tiberghien
Sep 29, 2020 · 9 min read

Introduction

This summer, I was on holiday discussing with a new neighbor. He asked me about my job, and as usual when I told him that I was into computing, he tried to be polite and asked what was I doing but seemed worried that I might try to really answer to the question. To keep him entertained, I tried to focus on its interests (the guy loves fishing) explaining that maybe when he goes fishing he tries to understand which are the settings that helps him to have a better fishing. He suddenly became very interested and told me that after 30 years of fishing, he knows how to find the fish, but still cannot be sure if the fish will bite or not. As an example he perceived that after a storm during 2 or 3 days, the fish won’t bite, but that some days at a certain hour and not always the same, the fish is hungry and bites in a second. I could then answer that as developer, I could help him by creating an app that will help him to define a fishing experience, with its satisfaction level, and different settings that he think that can have an influence on the quality of the experience. My app will then try to order the parameters by importance to show him which settings should be considered with attention. He asked me to be able to modify the list of parameters, and that the user interface should be very simple because he’s not into computer at all, neither in data science. This seemed to be an excellent use case that can fit with my Graph modeling experiments. This document presents a POC of such an app relying on a Neo4j Graph database and using the primitives values as the center of the modeling.

The application is divided in 3 features:

  • Define the models
  • Store the data
  • Define goals and watching analytics
App — Top Bar

Defining the models

The Models tab should hep the user to create, edit or delete models. A model being composed of parameter keys having a type and a default value. The type should define the field user interface (Yes/No buttons, rating stars, list, text, number, or date)

App — Creating a model
App — Selecting a model
App — Updating model

Updating History

The History tab should allow the user to add edit or delete a record of a specific model.

App — Creating a record — Editing values
App — Creating a record — Reviewing values
App — Reviewing History

Defining Goals and Reviewing analytics

The Goals tab should allow the user to define some goals using some parameters and watching analytics for the other parameters. The “analytics” will be very simple here. The tool will look for the number of each values for each parameters for all the records and the number of each values for each parameters for the records matching the goals. It will then order the parameters using the ratio between the 2 totals. It means that if a parameter value doesn’t change for a specific goal, it will be on the top of the list, and should be considered as a parameter that might have an influence on the goal. At the opposite, a parameter that will have all the possible values when matching a goal, will be at the bottom of the list. The view should also show a spark line for each parameter showing if the parameter seems to be cyclic, random, or stable.

App-Reviewing goals and analytics

The app is really simple and has only a few features, but it only required a couple days of work using Angular “Getting started” app as a base and Neo4j as storage. It also helped my neighbor to understand how computing can help people in an amusing way (he’s now very interested by having the app when he goes fishing).The app has another benefit: It’s called ‘Fisherman’s app’ but there are a lot of use cases when you might be interested to determine causalities so it can be renamed to ‘Causality finder’ without having to change any other code than the title.

For us developers, the interesting part is the database structure and how value based modeling seems to be very promising with Graph database.

Exploring the database

Let’s first have a look at the schema of our database:

Database — Reviewing schema

We have first the User node, because we anticipate that many different users might interact with the application in the future. Users can create Models, Records and Goals.

  • A Model is a collection of Keys, each Key having a Type (used for the user interface).
  • A Record is a collection of KeyValuePairs (KeyValuePair being an hyper edge, linking Keys and Primitives to a Record) and is also an instance of a specific Model
  • A Goal is related to as specific Model and a specific KeyValuePair : it acts as a simple filter on records

Exploring a Model

Let’s have a closer look to the model structure using the following query:

MATCH (m:Model{name:'Fishing'})-[:HAS_KEY]->(k:Key)-[:HAS_TYPE]->(t:Type)
RETURN *
Database — Reviewing Model ‘Fishing’

The Model object has a name property and is defined by its Keys. The HAS_KEY relationship contains the index of the key for edition purpose. Each Key is linked to a Type trough the HAS_TYPE relationship and will be used by the application to select the user interface when creating records. The names, types and number of keys for a model will be defined by the user.

Exploring a Record

Let’s now have a look at a Record of the ‘Fishing’ model with the query:

MATCH (r:Record)-[:IS_INSTANCE_OF]->(m:Model{name:'Fishing'})
WITH m, r LIMIT 1
MATCH (r)-[:HAS_KEYVALUEPAIR]->(kvp:KeyValuePair)
MATCH (k:Key)<-[:HAS_KEY]-(kvp)-[:HAS_VALUE]->(v:Primitive)
RETURN *

And the following result:

Database — Reviewing a Record of Model ‘Fishing’

A record is related to a Model with the IS_INSTANCE_OF relationship. The values of a record are not stored as a property but has a collection of KeyValuePairs using the HAS_KEYVALUEPAIR relationship. As a relationship can only bind one node to another. The KeyValuePair is the link between a specific Record, a specific Key and a specific Primitive. The system is the reason of the title of the document: value based modeling. The primitives being nodes and the value of each primitive being the identifier of the node, they will gather records sharing the same values and create a lot o subsets of records which will have an interesting consequence: we won’t have to iterate trough the entire database when trying to find records having specific values, or trying to find keys or models sharing the same values.

Adding a Model to the database

Let’s use our sample app to add a new model to the database. The tool allows to use existing keys from other models.

App — Creating new model

We will also add some records for that new model:

App — Adding records to history

We can now return to the database and explore the benefits of that value based modeling.

Exploring values

Let’s build different queries on the values. Let’s start with the value angle with the query: I want to see records having a value equals to ‘Yes’

MATCH (p:Primitive{id:'Yes'})<-[:HAS_VALUE]-(kvp:KeyValuePair)<-[:HAS_KEYVALUEPAIR]-(r:Record)-[:IS_INSTANCE_OF]->(m:Model)
MATCH (kvp)-[:HAS_KEY]->(k:Key)
RETURN *

We will have the following nodes:

Database — Querying Primitive ‘Yes’

Both models are defining the same key ‘Satisfied’, and when the value is ‘Yes’ we have 4 records for the Model ‘Fishing’ and 2 records for the Model ‘State of mind’. The KeyValuePair Node linking the Key ‘Satisfied’ to the Primitive ‘Yes’ has a natural subset of Records matching the condition → We don’t have to search the records anymore, they are attached to the KeyValuePair node.

If we focus on the Primitive which has value of 3 using almost (just need to change the id of the primitive) the same query:

Database — Querying Primitive 3

Fishing’ model has a ‘Rating’ key having 2 records with 3 as value and ‘State of mind‘ model has a ‘Score’ key with one record with 3 as value. With our value based modelling we have now physical subsets of concepts sharing the same value. This can be very interesting when trying to find correlation by value between different models.

Starting now from a Key perspective: we might want to know what are all the different values for a specific key. Here’s the query for the key ‘Rating’:

MATCH (k:Key{name:'Rating'})<-[:HAS_KEY]-(kvp:KeyValuePair)-[:HAS_VALUE]->(v:Primitive)
RETURN *

With the result:

Database — Querying Key ‘Rating’

Once again the list of all different values for a Key exists physically, no matter the amount of records. No need to iterate trough the entire list of records to find those values. Even better, if a value is not attached to a key, it means that there are no records having that value → We don’t need to explore records (which number will increase indefinitely), but focus on values which is often a limited set of values (‘Rating’ will only have values from 1 to 5, by example).

The instances of a specific Model are also retrieved using a physical relationship:

Database — Querying records of Model ‘Fishing’

And If we need to get all the key/value pairs for each record:

MATCH (kvp)<-[:HAS_KEYVALUEPAIR]-(r:Record)-[:IS_INSTANCE_OF]->(m:Model{name:'Fishing'})
MATCH (v:Primitive)<-[:HAS_VALUE]-(kvp)-[:HAS_KEY]->(k:Key)<-[rk:HAS_KEY]-(m)
WITH r, k, v, rk ORDER BY r.created, rk.index
RETURN r, collect({key:k.name, value:v.id}) as keyValuePairs

With the result:

Database — Querying records key/value pairs of Model ‘Fishing’

We cans see that even if the query is a little more verbose, it still easy to get all the values from a record.

Let’s finish our exploration with something more complex, querying all the different values for the key ‘Wind Strength’ when the records are matching the condition ‘Rating’ = 3. Here’s the query:

MATCH (k:Key{name:'Rating'})<-[:HAS_KEY]-(kvp:KeyValuePair)-[:HAS_VALUE]-(v:Primitive{id:3})
MATCH (m:Model{name:'Fishing'})<-[:IS_INSTANCE_OF]-(r:Record)-[:HAS_KEYVALUEPAIR]->(kvp)
WITH m, r, k as k1, kvp as kvp1, v as v1
MATCH (k:Key{name:'Wind Strength'})<-[:HAS_KEY]-(kvp:KeyValuePair)-[:HAS_VALUE]->(v:Primitive) WHERE (r)-[:HAS_KEYVALUEPAIR]->(kvp)
RETURN *

And the result:

Database — Querying values of key ‘Wind Strength’ when ‘Rating’ is 3

Once again, we can see that all the subsets of data are already existing in the database.

Conclusion

Leveraging on Graph Database to remove the properties of objects and store them as nodes seems to give new opportunities regarding analytics. It seems that this way of modeling will generates subsets of data that will allows us to question the data without being impacted by the amount of records. This can be very useful when you don’t know how the user will want to use the data he is storing. The primitive identifier property should probably indexed to allows to support a huge amount of primitive, but that system should be great for industry where the set of values are finite (temperatures, engine speed, container measurements have a limited scales) when the number of records will increase indefinitely over the time. Next step should be to test the model at a larger scale and compare with properties based model (optimized with indexation) to compare performances…

SMARTER FACTORY

Thoughts about Data visualisation, Graph databases, Dashboarding, Industrial automation

SMARTER FACTORY

Smarter factories are the new ways of creating new forms of efficiency and flexibility by connecting different processes, information streams and stakeholders in a streamlined manner

Mathias Tiberghien

Written by

Currently studying Big Data at https://www.univ-paris8.fr/

SMARTER FACTORY

Smarter factories are the new ways of creating new forms of efficiency and flexibility by connecting different processes, information streams and stakeholders in a streamlined manner