Graph DB — diving into JanusGraph part 1

Marcelo Coraça de Freitas
FiNC Tech Blog
Published in
3 min readApr 25, 2017

--

This is a 2 part article trying to clear out the way for newcomers to Graph Database technologies. In special, we will be focusing on JanusGraph. I'll try to explain to the best of my capabilities what JanusGraph is, and the related servers, so that users can continue from the official documentation with ease.

Also, this article was written on the verge of the first non-preview JanusGraph release, but it is highly probable that it has been released by the time you are reading this.

For those who are new to this world, JanusGraph is a fork of Titan DB, an Open Source Graph DB server (even though this author finds this term a bit confusing and misleading — more on this later), which was maintained by basically one company that decided to discontinue it in favor of their new, closed source, product.

But, before we continue, it is important to notice a couple of things. It is assumed the reader has at least some level of familiarity with:

  1. SQL Server (any flavor — be it MS, MySQL, PgSQL, Oracle, …).
  2. C-like language with OOP support (here we use Groovy).
  3. familiarity with the fluent interface pattern (https://en.wikipedia.org/wiki/Fluent_interface).

A tool for each problem

For ages we had the SQL wars between vendors. MySQL users usually think SQL Server users are dumb and vice-versa. They both have benchmarks to support their point of view and they are both at the same time right and wrong.

See, there is not such thing as a one solution for every problem of man kind, and it is certainly the same for handling data. At some point, the usual tools are simply not enough. And if you saw yourself trying to figure out what is the best Graph DB server or the best data structure store chances are you already reached the limit of relational DB servers for your application.

Those new solutions are here to stay and they solve a very specific problem: how to handle large data sets in a specific situation. If you are looking for a solution that completely replaces your relational database, sorry; you won't find one. You might find some vendors that support more than one kind of database in a single product, but that is essentially the same as using multiple servers, each one where it excels the most.

Key value stores are usually great when you need fast access to complex structures based on some sort of ID. But if then you need to run complex queries into those stores, you are better off using a search engine such as Elasticsearch. You are then left with the issue of synchronizing your data across all those servers and it is usually a nice idea to rely on something like relational databases which enforce data constraints on your main data store.

Graph Databases are another type of those services trying to solve one specific problem: relations between data.

Graph Databases

When dealing with billions of entries in your database you often find yourself in situations where a simple join to return a couple of results take took long for an web application that must return something in a matter of a couple ms. Graphs are good solution in such cases, but not only.

The good thing about graph databases is that when using them, there is no need to go through your entire index to find what you are looking for. Instead, the relationships are stored alongside the vertices — not exactly in a centralized index (more on that in the next article). This inherently leads to very efficient queries and, thanks to a feature called vertex centric indexes, you will be surprised to see how fast it is.

At FiNC, the usage of Graph Database let us serve on real time what before, during our close beta, was taking 14h a day in batch processing with exponential growth. Now our Graph is updated nearly on real time and we are exploring new possibilities, as hadoop and DynamoDB integration.

In the next article I will cover the basics about the JanusGraph server. Until then.

--

--