A graph database allows a more natural modelling of data. Its data structure provides the highest level of expressivity and is thus perfect for capturing highly interconnected data in complex domains. It also enables the natural querying of data with high degrees of separation. In the current age of interconnected data, graph databases and graph query languages are pivotal to the evolution of databases.
In the recent years, we’ve seen an increasing growth of graph databases. Some of the key technologies in the industry are Neo4J, TitanDB, BlazeGraph, HypergraphDB, OrientDB, IBM Graph and the list goes on. Recently, Datastax, a prominent technology provider in the NoSQL space, launched their DataStax Enterprise Graph database. The development of graph databases in the commercial industry brings a lot of confidence to the future of graph computing and how the field will help revolutionise the database industry.
However, neither graph computing as a field nor graph databases as a technology are new, but have been around for at least 30 years. So why is it that they are still not the most adopted database?
We believe the lag in uptake hinges on the fact that developing on a graph database requires you to understand too many low-level implementation details of graph computing. As a graph database user, you need to overcome plenty of prerequisite challenges and go through a steep learning curve before you can use the technology optimally.
In this article, we’ll talk about the top three challenges of working with current graph databases and how can we overcome these challenges.
1. Modelling Highly Interconnected Data
Due to the high level of expressivity that graph databases provide, modelling a domain on a graph is not easy, and is equivalent to modelling knowledge, i.e. ontology engineering.
Specialist graph data engineers are needed to model a graph structure. This approach, however, is not scalable for widespread adoption. Instead, what is needed is a system which would allow any engineer to easily model their domain on a graph, without having to be proficient in ontology engineering or be an expert in the underlying graph data structure.
2. Maintaining Consistency of Data
The job is not done even when an ontology is derived to model the domain and govern the structure of the graph database. It is essential that data loaded into the graph database complies with the ontology. The data model defined for the domain, as described above, does not act as a “schema” to which the graph database adheres. Graph databases, like other NoSQL databases, delegate adherence to a schema to the application system.
For example, if you have modelled:
- a “Company” that can be related to a “Person” through an “Employment” relationship
- a “Person” that can be related to an “Cat” through a “Has-Pet” relationship
You really don’t want the graph database to allow storage of data that says: a cat named “Kitty” to be “employed” by the company “Apple Inc.”.
Although developing such a system for one particular domain is feasible, it take significant effort. It is yet more challenging to deliver a system that is generic enough to guarantee consistency of data with regards to the model, but that maintains the highest level of expressivity possible.
Many would argue that there is no such thing as data without a schema, at least if you want to extract significant value from it. A schema is either explicitly defined (such as with relational databases), or it is implicit at the database user level. Given the degree of complexity of highly interconnected data captured by graphs, the lack of data consistency becomes another hurdle in getting the world to adopt graph databases confidently.
3. Writing the Graph Queries Itself
Let’s say that, at this point, you have managed to model a domain, as well as develop a system that will govern the consistency of data. The next step is to develop graph queries that will interrogate the graph database — this task also has its challenges. When developing graph queries, although you are provided with the power of querying high degrees of separation between data, you need to be explicit in defining the path to traverse between data instances. Given that your data model governs the paths between your data instances, you now have to design your queries specific to the way that you defined your model.
What makes this sort of querying challenging is that you may not have modelled your data in the most generic, consistent and conceptually correct model, (e.g. sometimes you defined a relationship as a node, other times as an edge). Consequently, the graph queries that you write are also not generic, and thus not reusable between problems, let alone across domains. Every question that you would like to ask the graph requires a custom graph query, written based on your custom domain model, which may or may not provide the most optimal path for querying your data. Therefore, your graph data engineer is not able to abstract the graph query into functions that would take your user’s input as an argument, and reuse those functions across multiple problem use cases.
Writing graph queries is already a challenging task as you need to understand graph algorithms and data structures. With the additional challenge of not being able to abstract and reuse your graph queries to the point where you can just focus on your problem domain, adoption of graph databases become much slower than what it should be.
How can we overcome these challenges?
Technology is invented for the sole purpose of exploiting its strengths to help solve difficult problems in this world. The shape or form it is delivered in is irrelevant; thus, it should not get in the way of its potential purpose.
As we have described, in order to adopt graph databases, the need to understand the notion of vertices and edges, how they are used to capture data and details of the graph algorithms required to interact efficiently with them, is a huge burden that hinders the uptake.
To solve this challenge, we have abstracted the nitty gritty of a graph database by developing a more Knowledge Representation and Reasoning database system, transforming it into a knowledge graph.
A knowledge graph is a knowledge base in the form of a graph. The advantage of using a knowledge graph for data storage and analysis is that it lets you focus on modelling your domain expressively and writing queries without having to learn graph computing/algorithms, while trusting that the database will guarantee the data consistency with respect to their model.
A knowledge graph abstracts the low-level implementation details of graph computing and lets you adopt the technology without a steep learning curve.
So the next questions are:
1. How exactly does a knowledge graph alleviate the pain points of working with a graph database?
2. What are the advantages of a knowledge graph other than better delivering the strengths of graph databases to the world?
I’ll talk about these questions in detail in separate articles. Stay tuned.