Database Reinvented : Datomic
Let me start with the what — What is Datomic??
Datomic is a new kind of database. Did I hear a database — what another database??? There are tons of database out there, then why chose Datomic ?
Good Question. Because Datomic is different. Datomic balances the capabilities of traditional RDBMS and elastic scalability of distributed storage systems.It is pretty amazing technology with a lot of interesting ideas that can be a game-changer in the field of database technologies.
In traditional client-server systems, the database resides on the big server where all the processing is done. When this architecture was developed (way back 70s) , memory was very expensive then, so they chose an architecture where everything was happening on the server. But things have changed, memory is way cheaper now. So, the Datomic team decided to break this client-server structure and come up with a set of services that interact with each other .
So Datomic essentially provides a set of services that help persisting information in a robust and exponential manner. Often equated to a ‘git’ of database, Datomic has an architecture that revolves around four models :
- Data Model
- Transaction Model
- Query Model
- Storage Model
The Data Model of Datomic defines the storage structure. It has 2 components
The datoms are the simplest thing that can be encoded as facts. It is essentially an entity, and attribute, its value and time. How about going through some real example?
entity attribute value tx devendra :phone 8552365147 50 devendra :address New Delhi 100
The datoms are immutable Why you ask? That’ s because Datomic works quite similar to a human memory. Say, if my phone number changed, what would you do with a SQL database? You update my existing number with my new new number. Yes,, that’s what we do, but is it same what our brain does? Does it change the old phone number with the new one? No. It stores both with an additional notion of time, which is quite similar to what Datomic does.! Datomic saves each “datom” with the time. Hence immutability. Once inserted, you can’t change it. So, you can actually travel back in time and retrieve the previous information without the need of backups, logs or manually saving the historical data.
EDN (Extensible Data Notation)
EDN or the extensible data notation format in which data is exchanged between the business logic layers and Datomic. Instead of a hard defined table-column structure or object based structure which requires a modification of the ‘result-set’ in traditional RDBMS, the EDN provides a flexible JSON like structure with additional advantages — it is more compact, easier to pretty print, includes an integer type, non-string map keys, has a nice built-in extension mechanism (which is much more elegant than any ad-hoc thing that JSON can support). It also supports basic Datomic types : lists, vectors, sets , maps etc. If you think I am praising ‘edn’ too much, then read this blog.
Datom and EDN together
This is how a simple “datom” (a 5 tuple entity) in edn looks :
entity attribute value transaction added
[ 42 :phone “8552365147” 13194139534315 true]
The first three tuples are self explanatory. Transaction references an entity that records the time at which the transaction was added to the system (and possibly other facts about the transaction). The fifth tuple ‘added’ is a boolean flag which represents whether the datom is inserted (true) or retracted (false).
Among the many brilliant and revolutionary design paradigms that Datomic has, is the decoupling of reads and writes, since essentially operations that affect the value of a piece of information are very different and independent from those which read the values.
The Transactor is the part of Datomic that performs all the data modification operations which means it takes care of all the Inserts, Updates and Retractions. It is the single point for all transactions, so all writes happen synchronously to a redundant storage, leaving no scope for any race condition (the ones that arise in distributed clustered storage systems), making the transactions are ACID in nature.
The Transactor is moved out of the database server (another great stroke of genius in revolutionary design) and stored as a separate entity, making the actual storage server lean. Now, everything that goes into Datomic database goes through this transactor, one at a time, serially.
The query model provides the query engine for the read operations. This engine is actually inside the application — which is termed as the Peer. Think of peer as a gateway to the rest of the database. It has all the components needed to communicate with the storage services and transactor. It also provides caching.
Datomic uses query model called Datalog for helping in processing of information.. Datalog is a truly declarative logic programming language that syntactically is a subset of Prolog. It has rules which are similar to view in SQL. The best part is it is extensible with functions and predicates. That means the functions can be local to the application, which is quite easy than submitting the method to the centralized database server as we do in client-server model.
The storage model of Datomic provides -guess what — the storage services! Surprising as it may sound, Datomic actually treats storage as a service. This means that, Datomic doesn’t provide the actual “storage” as part of its package. It only provides the ways and means to access the underlying storage, while providing the computing power via Datalog. . That being said, you can have data in memory for quick testing or data on local disk which is great for development or you can scale up to SQL storage (MySQL or postgre)/noSQL like (Riak/Couchbase) or either to storage services like Amazon Dynamo DB.
This was just a quick and simple introduction to the different pieces that make Datomic to get someone kickstarted. There is a large ground to cover and I highly recommend the official documentation and asking the community if you want to become proficient.
Most of this is the good stuff like Data immutability, datoms, query system, scalability. And like any other framework, it has its set of controversial topics like bottleneck at transactor among others. I would love to answer any questions regarding these — just post it as a comment below!
Originally published at blogs.quovantis.com on October 13, 2015.