JanusGraph & Python

Benoît Guigal
5 min readJul 20, 2018

--

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

The easiest way to connect to JanusGraph is through the Gremlin console. It is a great tool to experiment but what if you want to develop an application in Python that will execute queries against JanusGraph in support of the application front-end ?

Exemple Python application architecture

Python is not a JVM-based language, hence we are not able to embed JanusGraph calls directly from a Python program. Instead, JanusGraph packages a long running server process that, when started, allows a remote client or logic running in a separate program to make JanusGraph calls. This long running server process is called JanusGraph Server. Moreover TinkerPop3 provides gremlin-python, a Gremlin language variant that allows developer to express Gremlin graph traversal natively and send requests to the server.

JanusGraph is just a set of jar files with no thread of executions. The primary pattern for using JanusGraph is by embedding JanusGraph calls from within a client program providing its own thread of execution. This is what is happening when you start a Gremlin console. This could also be done from any JVM-based language client programs like Java, Scala (see gremlin-scala) or Clojure (see ogre). This is a little bit confusing at first because most of the databases we know are packaged as a server (PostgreSQL, MySQL, MongoDB, Elasticsearch, etc).

JanusGraph server

JanusGraph uses Gremlin Server of the Apache TinkerPop stack to service client requests. You can start a JanusGraph server using janusgraph.sh script:

Usage: bin/janusgraph.sh [options] {start|stop|status|clean}

The janusgraph.sh script will fork Cassandra and Elasticsearch for you before calling bin/gremlin-server.sh conf/gremlin-server.yaml. If you need a different storage and index backend you can adapt this script to fit your needs.

The server is configured by a YAML file conf/gremlin-server/gremlin-server.yaml. The file tells the Gremlin server many things such as

  • the host and port to serve on
  • the script engines enabled
  • the serializers to make available
  • the different Graphinstances to expose
gremlin-server.yaml

A JanusGraph instance maintains a set of vertices and edges, as well as access to the underlying pluggable storage. In the configuration file above, the line graph:conf/gremlin/server/janusgraph-cassandra-es-server-properties tells the Gremlin server which storage backend to use.

Connecting via the gremlin console

Once the server is running, you can try it out in the Gremlin console.

./bin/gremlin.sh gremlin > :remote connect tinkerpop.server conf/remote.yaml session     ==> Configured localhost/127.0.01:8181gremlin > :remote console
==> All scripts will now be sent to Gremlin Server
gremlin > graph
==> standardjanusgraph[cassandrathrift:[127.0.0.1]]

The graph instance defined in the configuration file is injected as global variables in the console ! Note that we could have defined several graphs with different configurations and they would all have been made available.

The script scripts/empty-sample.groovy (also defined in the configuration file) defines a default traversal source as g.

gremlin > g 
==>graphtraversalsource[standardjanusgraph[cassandrathrift:[127.0.0.1]], standard]

Let’s load the graph of the gods data into our graph instance and perform some traversals.

gremlin > GraphOfTheGodsFactory.load(graph)
==>null
gremlin> g.V().count()
==>12
gremlin> saturn = g.V().has('name', 'saturn').next()
==>v[4240]
gremlin> g.V(saturn).in('father').in('father').values('name')
==>hercules

Connecting with Python

In order to connect to the server from Python we need to configure the gremlin-server for Python and to install the gremlin-pythonpackage.

# This will download a set of jar files in ./ext/gremlin-python
./bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.6
# Install gremlin-python from PyPi
pip install gremlinpython==3.2.6

The version of gremlin-python should match the TinkerPop version which is compatible with your version of JanusGraph. See the releases page to find about your version of JanusGraph.

Let’s connect to the Gremlin server from Python using the Gremlin Python driver:

Using the Python driver is okay but we can do better: it is possible to express Gremlin traversal in plain Python thanks to Gremlin Python language variant.

Gremlin is a graph traversal language that makes use of two fundamental programming constructs: function composition and function nesting. Given this generality, it is possible to embed Gremlin in any modern programming language. It elevates Gremlin to a top-level citizen in the language of choice.

Example usage of Gremlin Python language variant

There are some slight language variations compared to using Gremlin in the Gremlin console.

  • Traversal should be explicitly terminated by calling an action method among next(), nextTraverser() , toList(), toSet(), iterate().
  • as, in, and, or, is, not, from, global are reserved keywords in Python and you should a postfix notation. For instance: g.V().as('a').in_().as_('b').select('a','b').
  • You will need to explicitly import static enums into the scope to use anonymous traversal like out().

How does this magic happen ? Under the hood, a traversal in native Python is translated to Gremlin Bytecode, sent over the network and ultimately compiled to a Traversal by JanusGraph.

Gremlin variant architecture

For details about Gremlin language variants, please follow the tutorial in the TinkerPop documentation.

JanusGraph specific features

JanusGraph implements a lot of useful features that are not part of the TinkerPop specs. For example you can use an index like ElasticSearch and use text predicate to make a full text fuzzy search against a property of graph. For example

john = g.V().has('name', textContainsFuzzy('John Doe'))

Unfortunately, you won’t be able perform this traversal with the Python Gremlin variant because it is Janus specific. You will need to construct a string representation of the query and to submit it to the Python Gremlin driver.

Useful links

--

--