Serving multiple Titan graphs over Gremlin Server (TinkerPop 3)
An often asked question is how to properly serve multiple graphs data using TinkerPop3 and Gremlin Server. Following a discussion on this topic on the TinkerPop mailing list, this blog post gives a detailed walkthrough on how to properly configure Gremlin Server to expose multiple graphs using Titan v1.0.0 graph database.
Define each Titan graph storage and indexing backends
Within the context of Titan graph database, there are two important things to have in mind when configuring graph backends and external indexing to work with multiple graphs.
Assuming a single storage backend cluster, you’ll be required to define distinct Cassandra keyspaces or HBase table names for each graph. When using BerkeleyDB, simply supply distinct folders in which to store the data.
Assuming a single indexing backend such as an Elasticsearch cluster, make sure you configure each graphs with distinct index names.
Let’s define two graphs, “primus” and “secundus” stored in the same Cassandra cluster within distinct keyspaces and indexed in the same Elasticsearch cluster in distinct indexes.
Define the first graph, primus.properties
And the second graph file, secundus.properties
You’re not required to give keyspaces and index names the exact same name as your graphs, though it may be easier for keeping track of things.
Depending on your needs, you’re obviously free to store each graph on distinct storage and/or index clusters. Supplying distinct keyspaces/table names and index names may then become optional.
Please refer to the Chapter 12 — Configuration reference in the Titan documentation for further information on how to configure storage and indexing backends.
Configure Gremlin Server to initialize the graphs at launch
The next step consists in editing the Gremlin server configuration file, located in conf/gremlin-server/gremlin-server.yaml, to point to each of the graph .properties files. This configuration file defines a graphs property as a list of graphs with their corresponding .properties file. An example configuration for two graphs could be:
secundus: conf/gremlin-server/secundus.properties }
This will expose two graphs respectively referenced by the primus and secundus variables within the Gremlin script execution context. Then again, the variable names are not required to match the names of the graphs as defined in the .properties file, but we’ll do so for simplicity.
Reference each graph Traversal object in the Gremlin Server .groovy bootstrap script
After exposing your graphs as primus and secundus variables, you’re almost done. You must now update the Gremlin server bootstrap script located in scripts/empty-sample.groovy in order to define references to each graph’s Traversal object (the path to this script is also defined in the gremlin-server.yaml file and can be edited). Because we no longer expose a graph variable but primus and secundus graph variables, the empty-sample.groovy file should now look like this:
pg = primus.traversal()
sg = secundus.traversal()
Since TinkerPop3, graph traversals are no longer issued via a Graph instance. The default empty-sample.groovy script mimics the old TinkerPop 2.x behavior where a graph traversal would typically start with g. Because we now have two graphs, we must bind each graph’s Traversal object to distinct variables. Let’s call these pg and sg. The above initialization script will allow you to execute graph traversals such as pg.V() for the primus graph or sg.V() for the secundus graph, as defined in the gremlin-server.yaml file.
Putting this into practice: interacting with multiple graphs within the same Gremlin query
A nice side-effect of this approach is that you can now query multiple graphs within the same Gremlin query. You could then easily setup simple scripts for migrating moderately sized graphs from one database implementor to another.
This example is not limited to Titan graph database and can be tweaked to serve multiple graphs from a combination of any other graph databases implementing the TinkerPop framework such as ArangoDB, OrientDB or Neo4j.
Thanks to Stephen Mallette for reviewing this post.
Questions? Feedback? Twitter @jbmusso.