A Depth-First Look at Watson Conversation + Gremlin + JanusGraph

How I updated my sample chatbot to use the latest in graph databases

Published in

Center for Open Source Data and AI Technologies

8 min readJul 27, 2017

In a previous blog post I talked about how you could use a graph database to store conversational data, such as chatbot interactions. I covered using the Apache TinkerPop framework and the Gremlin graph traversal language to store vertices and edges in a graph by tracing a typical chatbot conversation.

Have you had “The Talk” with your chatbot about graph data structures?

A coming-of-age story for your database queries

medium.freecodecamp.org

Recently Compose announced support for JanusGraph, an open source Graph database that supports the Apache TinkerPop framework. You can learn more about JanusGraph at janusgraph.org.

In this blog post, I’ll show you how I took the concepts from my previous chatbot articles and added support for JanusGraph to the Recipe Chatbot example app.

A quick refresher

This article is the third in a series of posts [1, 2] about a chatbot called the Recipe Bot. The Recipe Bot is a Slack Bot that lets people request recipes based on specified ingredients or cuisines. Although not required, it may be useful to read the previous blogs that describe the application’s higher-level architecture.

At a minimum, it’s important to understand some of the basic relationships used in Recipe Bot’s graph database. As a conversation with the bot progresses, the application creates vertices and edges and stores them in the database. Here are the basics:

A person starts a conversation with the chatbot, and the code creates a person vertex.
The person requests an ingredient and the application creates an ingredient vertex and an edge from the person to that ingredient.
The chatbot recommends a list of recipes, the person selects a recipe, and the code creates a recipe vertex and an edge from the ingredient to that recipe.

Here’s a simplified version of the graph (you can see the real one with directed edges in the blog post cited above):

It’s more complicated than the diagram suggests. For example, the bot also creates an edge between the person and the recipe to make it easier to find a person’s favorite recipes. However, these simplified mechanics should give you an idea of how the app relates entities in the conversation to vertices & edges in the persisted graph.

Before jumping into the new chatbot implementation with JanusGraph, let’s talk a bit about Gremlin.

Gremlin at a glance

As I mentioned before JanusGraph supports Gremlin.

“Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application’s property graph.” — https://tinkerpop.apache.org/gremlin.html

In the previous article I touched on the structure of the vertices and edges associated with our conversational data, but I didn’t go into detail on how to create, update, and query those vertices and edges using Gremlin. So here’s a quick look at how that would work.

Creating a vertex

The following describes the structure of a person vertex used in the example app. The label identifies this vertex as a person, and this person has a name of U2JBLUPL2 (which is the user’s Slack ID).

{
  "label": "person",
  "type": "vertex",
  "properties": {
    "name": "U2JBLUPL2"
  }
}

To create this person in a graph database using Gremlin, run the following:

graph.addVertex(T.label, "person", "name", "U2JBLUPL2");

The response would look something like this:

{
  "id": 4224,
  "label": "person",
  "type": "vertex",
  "properties": {
    "name": [
      {
        "id": "17b-3aw-1l1",
        "value": "U2JBLUPL2"
      }
    ]
  }
}

Creating an edge

The following describes the structure of an edge. In this case it represents the edge between a person (with ID 4224) and a recipe that the person has selected (with ID 4320).

{
  "label": "selects",
  "type": "edge",
  "inV": 4320,
  "outV": 4224,
  "properties": {
    "count": 1
  }
}

To create this edge in a graph database using Gremlin run the following:

def g = graph.traversal();
def outV = g.V(4224).next();
def inV = g.V(4320).next();
outV.addEdge("selects", inV, "count", 1);

Here, you first needed to find the two vertices: the person (ID 4224) and the recipe (ID 4320). The response looks something like this:

{
  "id": "2dl-oe05k-3yt-9ns",
  "label": "selects",
  "type": "edge",
  "inVLabel": "recipe",
  "outVLabel": "person",
  "inV": 4320,
  "outV": 4224,
  "properties": {
    "count": 1
  }
}

Traversing a graph

In the previous post I showed you the following Gremlin query for getting a user’s top-five favorite recipes, sorted by count:

g.V().hasLabel("person").has("name","U2JBLUPL2")
.outE().order().by("count", decr)
.inV().hasLabel("recipe").limit(5)

This query uses the edge between a person and the recipes the person has selected, and ultimately returns the recipes themselves (inV). The response is an array of recipes:

[
  {
    "id": 4320,
    "label": "recipe",
    “type”: “vertex”,
    "properties": {...},
  }
  {
    "id": 4450,
    "label": "recipe",
    “type”: “vertex”,
    "properties": {...},
  }
  ...
]

How about the full graph for the query?

g.V().hasLabel("person").has("name","U2JBLUPL2")
.outE().order().by("count", decr)
.inV().hasLabel("recipe").limit(5).path()

In the query above, I simply added .path() to the end. The response includes an array of paths, where each path has an array of labels and objects. Each array of objects includes the vertices and edges traversed in the path. For example, a path would have an array of objects that includes the person vertex, the edge between the person and the recipe, and the recipe vertex. Here is a sample response:

[
  {
    "labels": [
      [],
      [],
      []
    ],
    "objects": [
      {
        "id": 4224,
        "label": "person",
        "type": "vertex"
        ...
      },
      {
        "id": "2dl-oe05k-3yt-9ns",
        "label": "selects",
        "type": "edge"
        ...
      },
      {
         "id": 4320,
         "label": "recipe",
         "type": "vertex"
         ...
      }
    ]
  },
  {
    "labels": [...],
    "objects": [...]
  }
  ...
]

Now that you have a few Gremlin queries under your belt, here’s how to run them on JanusGraph.

JanusGraph HTTP API

You can execute Gremlin queries on JanusGraph in a couple of ways:

You can connect to JanusGraph via WebSockets. This method allows you to have long-lived conversations between your application and JanusGraph, where state can be saved across Gremlin queries.
You can connect to JanusGraph via the HTTP API. In this method, each API call is its own unit of work, and state is not saved across API calls. Sometimes you may need to provide more information to execute an HTTP API call, but connection management is greatly simplified.

I’ll focus on the HTTP API for the rest of this post. JanusGraph exposes a single HTTP POST endpoint to execute Gremlin queries. The endpoint expects a JSON-formatted document with a single key (gremlin) that has the value of your Gremlin query:

{
 "gremlin": "YOUR_GREMLIN_QUERY_HERE"
}

Every Gremlin query discussed above can be submitted to this endpoint, with one minor update — you must prefix every query with the following:

def graph=ConfiguredGraphFactory.open("YOUR_GRAPH_ID");

You should replace YOUR_GRAPH_ID with the ID you specified when creating your graph. But, how do I create a graph? — with the HTTP API of course! To create a graph, send the following Gremlin query. This example creates a graph called "recipebot":

{
 "gremlin": "def graph=ConfiguredGraphFactory.create(\"recipebot\");0;"
}

Now, every other Gremlin query you want to run against the recipebot graph simply needs to include the prefix I mentioned above. For example, here is what the HTTP POST body looks like for creating a person vertex:

{
  "gremlin": "def graph=ConfiguredGraphFactory.open(\"recipebot\"); graph.addVertex(T.label, \"person\", \"name\", \"U2JBLUPL2\");"
}

The HTTP response to the Gremlin endpoint looks like this:

{
  "requestId": "6f0a533e-76ab-412e-a4f9-d73842ea12c2",
  "status": {
    "message": "",
    "code": 200,
    "attributes": {}
  },
  "result": {
    "data": [
      {
        "id": 4224,
        "label": "person",
        "type": "vertex"
        ...
      }
    ],
    "meta": {}
  }
}

Check out the JanusGraph documentation for sample queries, curl commands, and more information on how to query JanusGraph with Gremlin.

Tying it all together

The project uses a Node.js application that communicates with the Recipe Bot and JanusGraph. The architecture looks like this:

The Node.js application (labeled in the diagram as “Application”) manages the integration between Slack (for relaying messages to and from users), Watson Conversation (for running the chatbot), and JanusGraph (for storing and querying vertices and edges).

You can run the sample application by following the instructions in the GitHub repo at https://github.com/ibm-watson-data-lab/watson-recipe-bot-nodejs-janusgraph. You’ll find the graph-related code in the following files:

JanusGraphRecipeStore.js contains the high-level functions used by the chatbot to save and retrieve entities in JanusGraph. It includes functions like:

addUser — Adds a user vertex to the graph
addIngredient — Adds an ingredient vertex to the graph
recordIngredientRequestForUser — Adds or updates the edge between a user vertex and an ingredient vertex

These functions, in turn, call the lower-level functions defined in JanusGraphClient.js. That file contains generic functions for working with JanusGraph, including:

getOrCreateGraph — Creates a graph with the specified ID if it does not already exist
runGremlinQuery — Executes a Gremlin query on the specified graph
createVertex — Creates a vertex on the specified graph
createEdge — Creates an edge on the specified graph

Remember that the HTTP API for JanusGraph contains a single endpoint — an HTTP POST for executing Gremlin queries. The runGremlinQuery function takes a Gremlin query and executes the HTTP POST. It’s the only function really required to communicate with JanusGraph and is ultimately called by all of the other functions in JanusGraphClient.js.

The other functions, like createVertex and createEdge, are convenience functions that provide a higher level of abstraction for the developer. For example, createVertex allows you to pass in a vertex object with the following structure:

{
  "label": "person",
  "type": "vertex",
  "properties": {
    "name": "U2JBLUPL2"
  }
}

This object is then converted into a Gremlin query, like so, and sent to the runGremlinQuery function:

graph.addVertex(T.label, "person", "name", "U2JBLUPL2");

Feel free to use JanusGraphClient.js in your own project to help you get started with JanusGraph.

Start chatting

Conversations are a natural fit for graph data structures. By persisting chat metadata to a graph database, you can now see how it can be a more pleasant model to develop code against—without having to implement lots of special relationship tables as in a relational database. Hopefully, this article gives you an idea of the mechanics of using Gremlin to persist data to JanusGraph.

You can try out the Recipe Bot with JanusGraph support by following the instructions in the GitHub repo. The README has everything you’ll need to get up and running. You’ll walk through the process of configuring Slack and Spoonacular, creating a Watson Conversation service instance, and provisioning your first JanusGraph deployment.

Tell your Recipe Bot I said hello!

If you enjoyed this article, please ♡ it to recommend it to other Medium readers.