Data Vault & Knowledge Graphs, a Love Story…
Pardon the expressive title, the intent of this article is to marry the two concepts (data vault & knowledge graphs) along their similarities and disparities and present motivations behind using both alongside one another.
To begin with, we will define what each concept is and their motivations. Next, we will explore the underlying technology where each of these concepts perform at their best. Then the similarities and disparities between the two; how data vault can influence and enable knowledge graphs and how knowledge graphs can influence and enable data vault. Finally, we will also discuss how and where the two concepts should be used in tandem.
First up… some definitions!
- Data vault modelling maps business entities, their relationships to other business entities and historically tracks the information state of those entities and relationships. It is designed to be flexible and non-destructive to change and acts as the integration between an enterprise’s software data landscape and analytical value. A data vault model will integrate by business key with a view of supporting the business architecture.
- Knowledge graphs represent a structured representation capturing relationships and entities along with their properties with the goal of studying the relationship between business entities. A knowledge graph is based on the enterprise ontology where relationships can be both observed and inferred.
There is a large amount of overlap between the two techniques, and structurally they look like they are synonymous. Let’s take a closer look at the elements that make up a data vault and a knowledge graph.
As you can see, structurally data vault is based on categorising the knowledge domain according to business architecture, knowledge graphs take instances of the knowledge domain to draw relationships between instances of the elements of business architecture.
An example
Let’s borrow an example from Juan Sequeda’s book.
In here represented as a knowledge graph are customers and orders and the relationship between the nodes represented as edges, to be precise we see:
- Two nodes of the business object type ‘customer’
- Two nodes of the business object type ‘order’
- Two edges of the relationship type ‘placedby’
- One edge of the relationship type ‘knows’
The knowledge graph does not infer nodes, but it can infer relationships. The knowledge graph would have observed existing real-world nodes for customers and orders and observed that the customers made independent orders, it could have inferred that one customer knows the other. Where did we get this information from? How do we know they know each other? Was this sourced from a social media app or is it a fact recorded in a transaction or based on some householding rule?
Where a Data Vault fits in
If we were to deduce the data vault model from which the knowledge graph is based upon the data vault model could look like this.
For the purposes of demonstrating where data vault shows its importance, let’s extend the model problem space with the operational reality we face in the information technology world:
- Some source applications will utilise its own durable business key to represent a business object uniquely. In the ideal world the same business key representing the same business object will be shared passively, however this is not always the case.
- When integrating by business key, the resolution of getting to the unique business key for a business entity is modelled into a data vault with the utility of a same-as link table. For every integration of a business key adds a row to the same-as link table ensuring that the same-as link table as a solution is unbounded, your analytical query can go in with any source business key and return with the single business key for your business object. Your business and your customers and partners will not want to work with multiple keys for the same business object it is up to the data vault modeller to design the integration and the business to decide which key is the correct one (business rule).
A node in the knowledge graph must be based on the same business key, the knowledge graph does not resolve source system complexities, the data vault does. Let’s extend the data vault model with the resolution of source application complexity with the extension of a same-as link, this relationship should not propagate to the knowledge graph.
Where a Knowledge Graph fits in
Now, earlier we showed that the knowledge graph deduced the edge called ‘knows’ and the knowledge of this can be inferred through a mix of business rules that can return with a high degree of confidence the relationship between customers. Will this be useful analysis to the wider analytics community in your enterprise? It can be, but let’s make this available back in the data vault by extending the data vault model with a business vault link table. Our unit of work that will be useful is the congregation of orders as well; therefore, we include the orders relationship with the inference we received from the knowledge graph model. This is also a rare area in a data vault where a dependent-child key in the link table is useful.
Remember, we have always been promoting the separation of business rule implementation with the storage of business outcomes. All your software applications are automation engines for your business rules that you have either bought off the shelf or built on your own. Raw vault loads those business rules outcomes and business vault being based on raw vault hub, link and satellite tables is recording the inference from the knowledge graph relationships. This implementation makes the data vault infinitely scalable and infinitely extensible.
Wrap it up
Makes sense, but what else can we do with the relationship between data vaults and knowledge graphs? Well, earlier we used the knowledge graph model as an instance of business entities and relationships and yes that means that by using a data vault as a base you could also instantiate these elements across time, after all, data vault is an enterprise’s corporate memory.
Can either discipline exist without the other?
Yes of course, the point of this article is not only to show the similarities between each discipline but also to highlight that much of the work behind each discipline can be centralised because essentially they are working on the same three things every business wants accurately represented in their knowledge domain:
- Uniquely work with business objects.
- Recording interactions, transactions and events between business objects.
- Descriptive detail behind these interactions and business objects.
As Juan Sequeda put it bluntly, “do you want to do it right or do you want to do it twice?”
References
· Designing and Building Enterprise Knowledge Graphs, https://www.amazon.com.au/Designing-Building-Enterprise-Knowledge-Graphs/dp/1636391745
The views expressed in this article are that of my own, you should test implementation performance before committing to this implementation. The author provides no guarantees in this regard.