Configure JanusGraph with Astra DB Storage Backend for Dynamic Workloads
Authors: Kathiresan Selvaraj and Saurabh Verma
JanusGraph is a popular open-source graph database that stores and queries billions of vertices and edges across a multi-machine cluster. One of its many advantages is the option to configure and set up any of these supported storage backends. Within that list, you’ll notice a few familiar CQL-based storage backends, like the open-source Apache Cassandra®.
To run Cassandra without the pain, you can use DataStax Enterprise either on-premises or in a cloud infrastructure. Better yet, if you’d rather pay as you go and still save yourself from all the operational burden, you can go with DataStax Astra DB.
In this post, we’ll dive into how you can configure JanusGraph to work with Astra DB by creating an Identity Resolution (IDR) application. We’ll even provide the complete source code of this application.
Installation and configuration
Prepare Astra DB
First of all, create a free account on Astra DB and follow the steps in the Astra DB documentation to spin up a new database instance. Once the database is started and the instance status turns Active, get all the connection details listed below for configuring JanusGraph to connect to Astra DB.
Secure Connect Bundle (SCB)
For this, navigate to the Connect tab and then click on any topic under the Connect using a driver section. There you’ll see a button to download the SCB in a .zip file.
Client id and client secret
You can get a client id and client secret by generating a token for the “Administrator User” role and storing them in a secure location. Navigate to the Organization Settings and then click on Token Management to select a role and generate a token. You can find more on this step in the Astra DB FAQ.
Finally, make a note of the keyspace name provided when you created the database.
That’s all we need from Astra DB, now let’s move on to the JanusGraph configuration.
Installing and configuring JanusGraph
Download the latest version of JanusGraph and unzip it, then copy the packaged CQL Gremlin server YAML file as the default YAML file.
Edit the conf/janusgraph-cql.properties file as shown below.
Add a new file called astra.conf under conf directory and add the following:
And now you’re ready to get JanusGraph up and running, which is precisely what we’ll show you how to do next.
Starting up JanusGraph
You can start JanusGraph either embedded within an application or as a remote server for clients to connect. Here, we’ll test the setup via Gremlin Console.
In the Gremlin Console REPL, you can configure and test basic queries as mentioned below.
Demo application: Identity Resolution
Identity resolution (IDR) is the process of matching identifiers across devices and touchpoints to a single profile. This helps build a cohesive, omnichannel view of a consumer, enabling brands to deliver relevant messaging throughout the customer journey. You can find a demo IDR application with a complete code base in our DataStax Labs GitHub repo.
Tune the configurations below to avoid timeouts on the client side, as per your requirements.
To build the demo app and load data into JanusGraph with Astra DB as your backend, follow the steps in this GitHub section. This setup loads the test data, which then loads about 100 identities and 300 linkages between these identities to the JanusGraph with Astra DB.
The demo schema showcases the following graph elements:
- PropertyKeys: Attributes of the Vertex and Edge elements. For example:
createDatein this demo.
- VertexLabels: Attach a domain name to the Vertex, which are useful to distinguish different types of vertices, like user vertices and product vertices. For example:
nodein this demo.
- EdgeLabels: Each edge connecting two vertices has a label that defines the semantics of the relationship. For instance, an edge labeled friend between vertices A and B encodes a friendship between the two individuals. For example:
linkrelationship in this demo.
- GraphIndexes: Graph indexes make global retrieval and traversal operations efficient on large graphs. For example:
idValue_idType_compcomposite index in this demo, which optimizes the retrieval via <
idType> properties of a
For more detailed information on these elements, check the official documentation on JanusGraph schema and data modeling.
Congratulations! You now know how to set up the latest version of JanusGraph with DataStax Astra DB. You also know how to create an Identity Resolution application and test it using Gremlin Console.
As a side note, when it comes to accessing JanusGraph through API-based clients, it’s really no different from what we’ve shown here. Since it’s just a different storage backend, the way you communicate with JanusGraph stays the same. If you’d like to explore this setup a bit more, check the list of resources at the end of this post for some interesting topics, including OLAP traversals, performance testing, cost efficiency with Astra DB and more.
Follow DataStax on Medium for exclusive posts on all things open source, including Pulsar, Cassandra, streaming, Kubernetes, and more. To join a buzzing community of developers from around the world and stay in the data loop, follow DataStaxDevs on Twitter and LinkedIn.
- GitHub: Astra DB and JanusGraph demo app
- Configure JanusGraph with Astra DB as storage backend
- View health and metrics | DataStax Astra Documentation
- Price Reduction in Astra DB Write Operations
- The Mechanics of Gremlin OLAP
- DataStax Astra DB | Powered by Apache Cassandra
- Official Astra DB documentation
- Find courses at DataStax Academy
- Free workshops | DataStax Devs YouTube Channel