How and when to use OrientDB: A study by HelloClass

There has been a lot of talk going on recently around NoSQL/Graph databases. The need for NoSQL databases are not new. NoSQL opens up a whole new set of data storages, Cassandra, Redis, MongoDB, HBase, ElasticSearch, CouchDB, DynamoDB, Neo4J, OrientDB and what not. NoSQL in general offers simplicity of design, better horizontal scaling and a finer control over availability. In this article we are majorly going to talk about our use case of a graph DB.

At HelloClass, the need for a NoSQL/Graph Database arose when we talked about launching our home curated Study Materials. Ya, you read that right. Let me take you through all the scenarios, one by one, which we actually faced while implementing our Study Materials.

Neo4j or OrientDB?

Neo4j being the oldest Graph DB, has emerged as a go-to for companies when you talk about graph database. Same thing went with us as well. But when we actually spent some time on figuring out whether Neo4j is the right choice or if there are any other better alternatives available? Then we learnt about OrientDB. We learnt that OrientDB community edition offers many more features out-of-the-box which Neo4j doesn’t. I would not go into details but give you just a couple of winning points over here..

  • On the topic of clustered deployment, Neo4j supports only master-slave replication. It is generally only suited to single digit node deployments and the entire graph must belong on one machine. Whereas OrientDB has a full ability to do multi-master replication (every node can accept reads and writes), has the ability to shard data, intelligently distribute data using clusters and automate distributed queries and transactions.
  • OrientDB’s document-graph capabilities are just awesome. Yes, OrientDB is a multi-model database which not only supports documents and graph, but works with key-value and objects as well. That makes it a full database and not a supplement to other datastore. Which of-course Neo4j doesn’t have to offer.

So the choice was very much clear.

DB Schema/Architecture

OrientDB which is an ACID compliant DBMS, supports schema-less, schema-full and schema-hybrid (or mixed schema) solutions. In the case of mixed schema mode, it sets constraints for certain fields and allows the user to add custom fields to the record as well. We opted for hybrid-schema mode. For the Study Materials we wanted to roll-out the Question-Answer and Discussion like content to start with. As we were designing this content DB, we wanted to necessarily have few properties always defined/given for a Question-Answer/Discussion and hence the approach was chosen. We could also use OrientDB to store our document data of chat sessions by real time processing all of them to the DB.

Fig. — A basic diagram of our OrientDB setup

However we could keep both the DBs in one server. But, as this was our first interaction with OrientDB and we didn’t know how it would perform, we intentionally kept our chat sessions in a separate DB server. Also, it’s a better approach to maintain separation of concerns. It also gives you flexibility when you have a need to scale them.

OrientDB directly supports most of the cloud providers including AWS, Microsoft Azure and others. Ours is an AWS setup of OrientDB Community Edition 2.1.2.

OrientDB provides vertical and horizontal security both. Meaning, you can manage the security at schema level and per record level as well. It may not sound as a very generic use case of having data secured at record level, but it allows to completely separate database records as sandboxes, where only authorized users can access restricted records.

OrientDB adheres to the TinkerPop Blueprints standard and uses it as default Graph Java API. Codes written on top of Blueprints works over all Blueprints-enabled graph databases. So, if you stick to the Blueprints interfaces, you are free to change your underlying graph database to any other Blueprints-enabled graph DB in the future.

Let us have a look at some key points on where OrientDB performs well and where it doesn’t:

  • OrientDB server’s shutdown and startup time is really fast, takes 2–3 seconds.
  • Queries where it excels:
  • Internal Id function node lookup
  • Full text Lucene lookup, when query on indexed fields
  • Graph explorer, Result view, Node editor (in OrientDB Studio) works really well.
  • Support for function integration is good. Functions in OrientDB are just like stored procedure of an RDBMS.
  • However it supports SQL like queries which looks quite exciting at the beginning, it gets a little hard when you have to write complex queries.

Closure

Overall, OrientDB has been working great since the day we took it out on production. Since it is new DB compared to other data stores, definitely there is a lot of areas and possibilities of improvements. Given that a highly active community is working on it to make it better on a daily basis.