Why aren’t we all using graph DBs?

Falcor. GraphQL. The benefits of these systems are clear: efficient data fetching requests that get precisely what’s needed, and no more. They operate on the very reasonable idea that each view in your view hierarchy should be able to declare the information it needs, which can be collected from the view root into one efficient query (or batched up in Falcor’s case), and bulk loaded in one network fetch. In other words, the view graph generates a query graph.

The problem is on the other end, when the server receives this query. Most popular databases are either relational, key-value, or tabular. If graph databases were the norm, then simple adapters could be written to transform these query graphs into a proper query for the database. Instead, given that graph databases are so rare, you have to somehow map the parts of the query graph to your database. Every example I’ve seen of this so far is tedious and verbose, likely because of an impedance mismatch greater than that between ORMs and RDBMS. Sometimes there may not even be an efficient way to map the query, so consumers of that API may be surprised when some queries that seem straightforward are are slow.

When you get down to it, arranging data into a graph is a generally useful way of operating. It maps well to the way things are arranged in the real world. It’s mathematically useful. It maps well to how we often think about our data. Let’s think through querying for “a list of posts by all of Alice’s friends” in a graph vs. in SQL. In a graph you’d start at Alice, traverse to her friends, and traverse further to get each of her friend’s posts. In SQL you’d query the friend table, join that with the user and post tables, and filter that down to only Alice’s friends. The former is much more understandable to me. In fact, I think a layperson could understand that sentence.

So what graph databases are out there? There’s the venerable Neo4j, which exists. But their business model requires them to cripple the open source version with slow multicore performance, no clustering support, and more. There’s Cayley, a promising new entry by a Googler. But it’s still such a young project, and the scalability, performance, and correctness of the database and each of its backends is still unproven. There’s also OrientDB, ArangoDB, AllegroGraph, BlazeGraph, Sparksee, some of which are very early and others of which have been around for a while. What I’m getting at here is that there are a lot of graph databases in existence in various forms of stability and usefulness, but none have really risen to mainstream success.

Indeed despite the variety of graph databases that exist, you simply don’t hear about these databases as much as the standard MySQL, PostgreSQL, MongoDB, Cassandra, or HBase fare. You especially don’t hear of many companies using graph databases as their source of truth. Why is this? Maybe it’s because there just hasn’t been a good enough graph database yet. Maybe it’s because there’s no standard(ish) way of querying all of these graph dbs, like there is with SQL. Maybe it’s just because we didn’t have Falcor and GraphQL yet. To be completely honest, I don’t know why we don’t use them more. I think we should.