In-database Graph analytics — Executing SQL & Gremlin queries on the same copy of data

Shruthi Machimada
4 min readJun 7, 2020

--

As companies grow, the volume of data generated and used to run critical operations also grows. Users have realized that this vast network of connected data can do more than what they were initially designed to do, leading to the rapid adoption of Graph databases, since they’re built around relationships and represent data in a way that is more intuitive to read and gain insights.

Image by Gerd Altmann from Pixabay

There are a number of graph databases that handle graph only use cases well, however enterprises require analytic systems that can perform data transformations, aggregations and other operations in addition to graph analytics. While graph databases perform exceptionally well for certain types of analyses, they are not suitable for workloads that involve aggregation over a large amount of data, they are not the best at transactional processing and don’t scale well. Most existing graph databases are stand-alone and cannot easily integrate with these other analytics systems.

Relational databases may be rigid, but they are resilient, scalable and performant and have been used for almost 50 years now across various industries, with ~80% of enterprises today using a relational database. Enterprises run mission critical applications that have been running on top of relational databases, and introducing another database for graph-only uses would require them to interrupt these applications and import their data to the graph database, adding to an already complex data and infrastructure pipeline. The time required to export and load data between different systems and data sources could be spent analyzing data to gain insights and make critical decisions.

What if there was a solution that combined the flexibility and intuitiveness of graph data models with the scalability and transactional performance of relational models - while keeping existing applications uninterrupted?

Db2 Graph

Db2 Graph will enable Graph analytics on top of Db2, allowing you to perform Graph analytics and SQL (for transactional processing, transformations, BI and other analytics use cases) on the same copy of data, without data duplication or requiring any changes to the underlying database structure.

Db2 Graph will create a virtual graph view of the underlying data using the existing relationships already defined in Db2. Alternatively, you can create your own graph model, by defining how the tables and views already defined in Db2, map into nodes and edges in your graph.

Db2 Graph then exposes the graph model so you can execute Gremlin queries. Db2 Graph fetches only the necessary data from Db2 at the time of query execution, so any updates made to data in Db2 will be reflected.

The benefits

  • Save costs

With Graph database capabilities on Db2, you no longer have to buy or set up another data store for Graph use cases. Db2 Graph will save licensing, maintenance costs and complexity associated with introducing another data store in your IA.

Db2’s powerful compression capabilities, combined with the simple nature of representing relational data (compared to ~7x more disk space required to represent in graph format in some of the standalone graph databases) , will save storage costs.

  • No data transfer overhead, Ensure data consistency and security

Since the graph queries are running on the data stored in Db2, updates made to the data in Db2 is immediately reflected in Db2 Graph, enabling real time graph analytics on transactional data.

Since the data does not need to be moved or transformed from Db2, you can leave existing applications that connect to Db2 and practices for data security and auditing untouched

  • Multiple graph models without the need to load data multiple times

Since the graph model is defined by the mapping of relational tables and there is no other details being captured by Db2 Graph, you can define as many graph models as you need, and simply open the graph that you want to work with at any point

How to use Graph with Db2

Db2 Graph will be available as a stand-alone container on IBM Cloud Registry. You can deploy the container and connect to a Db2 database using a JDBC connection

  1. Start the Db2 Graph container
  2. Use a JDBC connection to connect to your Db2 database from the Db2 Graph container
  3. Define the graph model
  4. Db2 Graph will auto-generate a graph model, using the relationships defined in your relational model

OR

You can create a graph schema by defining the mapping between relational tables/views and nodes and edges in your graph model

5. Query your graph data using Gremlin and continue to run SQL queries on your relational tables

Summary

Db2’s multi modal architecture enables you to run mixed workloads on row and column organized tables, perform in-database machine learning, support JSON and XML formats and will now support Graph analytics. These capabilities are available with all editions of Db2- including the free Community edition.

Db2 Graph will be available as Tech Preview from June 30th 2020.

--

--