What Datomic brings to businesses

Val Waeselynck
Jun 19, 2017 · 18 min read

A few days ago in the Clojurians Slack, someone asked how to express the value proposition of the Datomic database system to a non-technical stakeholder, in business terms. I was in a good position to lay out a basic answer, because I had done that exercise 18 months ago, when I convinced my partner on BandSquare that we needed a technology shift.

This post is an attempt to express the value of Datomic to a business in a more detailed and articulated form. I’ll try to express the advantages and drawbacks of Datomic as much as I can in non-technical terms, by describing the tangible consequences of using Datomic. I won’t try to support my arguments with technical details, but I’ll provide some references for technical experts to dig deeper.

As we’ll see, the business value does not lie in the ‘cool features’ — it’s about the problems you don’t have.

Expressive querying

Datomic provides high query power, which means it’s straightforward to translate a question about data to code.

Consequences: developers ship new features faster, and achieve more code reuse.

Symptoms:

- I know it sounds simple, but it will take a lot of time to write the query for it, as the data is not in a shape to accommodate for it.

- I can write this query, but it will likely be slow.

Example: at the time BandSquare used MongoDB, it was tremendously hard to write a query for “Give me the number of customers who have booked tickets for concerts of that organisation” — requiring dozens of LoCs with poor readability and terrible performance. In Datomic Datalog, it’s that easy:

Technical reasons*:

  • Datomic supports multiple paradigms for querying: logical/relational-like (via Datalog, which is as expressive as SQL), navigational/graph-like (via the Entity API), GraphQL-like (via the Pull API).

No data loss

With Datomic, it’s practically impossible to lose data by accident. You can always go back to every version of your data, and know when and how it evolved.

Consequences: it’s easy to debug and recover from human or programming errors.

  • If some of your data becomes lost or corrupted, you can always go back to the version where it was present and clean (without restoring a dozen backup files). You can know when (and usually how) the data went bad.

Technical reasons*: the database is a growing set of datoms, which only ever get added, not deleted. Database values provide an asOf() operation which yields a view of the database at any point in the past. Each write to the database is annotated with the time at which it happened, and optionally additional metadata (what user is at the origin of the transaction, etc. — see reified transactions).

Straightforward, flexible data modeling

Data modeling with Datomic is straightforward, which means you don’t need to spend time wondering in what shape you should store your data.

Consequences: your system is easy to evolve. You need little anticipation of future needs to store data.

Technical reasons*:

  • The Universal Schema (see Appendix C) relieves the developer of many early-on architectural decisions they would have to make when using table or document-oriented storage (‘in what shape should I store this to query efficiently later?’, ‘should this column go in this table or should I make another table’, etc.).

Testing is cheap

Automated testing is an important, well-established best practice in software development. If you’re not testing your software, it’s most likely costing you dearly in fixing bugs, manual QA, and difficulties developing new features.

Database code has been traditionally difficult to test, because databases don’t lend themselves well to simulation. However, Datomic pretty uniquely supports speculative writes, which among other things makes testing Datomic-related code easy and efficient.

Consequences: testing your code is cheap, to the point it’s always worth it, even on the short term. This results in a significant increase in quality and productivity, at a very low cost.

Technical reasons*:

  • Datomic has an in-memory implementation, which means your tests need little environmental setup.

Reproducibility

A key factor for solving bugs rapidly is your ability to reproduce the circumstances of a bug in your local environment. Datomic has 2 features that make it very easy:

  • you can obtain any past version of the database instantly (without needing a backup!)

Consequences: you can instantly reproduce your production environment locally, which makes for faster debugging. In addition, if you need to apply manual corrections to your data, you can safely dry-run patches before applying them in production.

Technical reasons*:

  • the ability to fork a database (see above)

Integrating other data systems

As a system grows, it usually requires adding new types of databases to satisfy diverse querying needs. For instance:

  • Maybe you need some business insights and want to add a data warehouse like BigQuery.

You usually have a ‘source of truth’ database, and several other ‘derived’ databases, at which point data synchronization between all those databases becomes an important issue. The key to data synchronization is the ability to answer ‘What changed?’ questions.

This is hard to do with most traditional databases (which are designed only to answer ‘What’s there?’ questions), but it’s trivial to do with Datomic thanks to its ‘log’ structure: if you want to know what changed since last Monday, you just read the log since last Monday!

Consequences: if you start a project with Datomic, it will be easy to send your data to other, complementary data systems in the future.

Technical reasons*:

  • The log-like structure of Datomic. The Log API literally gives you the exact changes, at the finest granularity, between 2 points in time.

Great performance characteristics for the most common use cases

It’s hard to describe the performance aspects of a database system in business terms, but they do become a business concern when the engineers spend their time coping with the database load instead of building new features, or when they have to compromise on the user experience of the product for database performance reasons.

Symptoms:

  • the whole website slows down or crashes when there’s a traffic spike

For the most common use cases, Datomic exhibits very interesting performance characteristics. By “the most common use cases”, I mean the ones for which 95% or systems are built (e-commerce, enterprise systems, content management, user management, etc.), most of which have similar database needs:

  • the users read (e.g browse content) a lot more than they write (e.g buy something, subscribe to a website, change their preferences in an app, etc.)

It turns out the architecture of Datomic makes it a great fit for those needs:

  • the reads can take a virtually infinite load (it’s only a matter of spinning up more machines)

Consequences: engineers don’t have to think much of the performance or operational aspects when building the system, and can focus on business features instead.

Technical reasons*:

  • Traditional databases stores data in mutable cells, which forces them to use locking as a technique for coordinating reads and writes. In contrast, Datomic stores data in a single immutable persistent data structure, and this property of being immutable has interesting implications on performance and scalability:

Community support

The Datomic community is small, but thanks to the enthusiasm of its members, it’s very responsive and welcoming. If you need help, you can just go to the Clojurians Slack, the Datomic mailing list, or StackOverflow, and you’ll get answers from experienced users, some of whom are very smart.

Consequences: your team gets expert advice for free.

Some of the hardest problems with databases are gone

This more philosophical section attempts to take a step back, reflecting on deeper database-related issues which undermine everyday software engineering.

Even though databases have existed for decades, data management is not a solved problem at all — as shown by the fact that mature and popular database systems likes MySql or PostgreSQL continue to issue releases with major new features.

There’s a handful of well-known, hard database-related problems that bite developers even in the most common use cases:

There’s no immediate solution for these problems, and I don’t believe there will ever be — they’re just inherent to the fundamental choices of traditional databases systems. But these problems simply don’t arise with Datomic — precisely because Datomic makes different fundamental choices (see Appendix C).

Consequences: developers can focus on business logic, instead of fighting the incidental complexity of their database.

Technical reasons*: essentially, this is enabled by 3 fundamental choices of Datomic — immutable storage, non-remote querying, and the Universal Schema. These are capabilities with far-reaching consequences for databases, just like Garbage Collection has far-reaching consequences for programming languages. Backing up these claims with technical evidence would take us too far here, but here are some insightful references:

Drawbacks and limitations

I’ve made a very pretty picture of Datomic so far, right? But I wouldn’t be intellectually honest with you if I told you to use it for any project. Knowing the limitations of a tool is just as important as appreciating the benefits you get from using it, so I’ll try to depict these too.

Human Resources: although Datomic is relatively easy to learn, it will still require a shift in mindset for people used to relational databases. Therefore, your software team should be willing to learn it, and use it for projects which require intensive development, not just occasional maintenance (in which case Datomic is probably overkill).

Big Data: Datomic is not a good fit for storing huge amounts of data. This does not mean a Big Data information system cannot use Datomic, only that it cannot use Datomic for everything. You’ll typically store critical data which is most often involved in business logic in Datomic, and the rest of it in complementary stores (S3, HDFS, Kafka, etc.). At BandSquare, we store at least 10x more data than what we use in Datomic!

Commercial database: any serious, business-critical production deployment of Datomic will need to buy the license. This is completely worth it for most companies (especially since you can start for free), but for some individual side-projects it may be more problematic. And of course, a commercial license is more constraining than open-source.

Infrastructure footprint: a minimal deployment of Datomic requires at least a few Gigabytes of RAM. Hosted on the cloud, it will cost at least a few 10s to a few 100s of dollars per month, which can be problematic for small side-projects.

Experience report: Datomic at BandSquare

So far, I’ve only described abstract general properties of Datomic; here’s how these have translated to building BandSquare’s platform, in concrete facts and numbers.

We started building a first version of our product in early 2014 using a stack that had a lot of hype among startups at the time — NodeJs and MongoDB. Over the course of 18 months, our business focus shifted from B2C and user-centric to B2B and data-centric, and our technical requirements evolved from small website prototypes to a whole data management platform providing advanced analytics and data visualization in addition richer set of consumer-facing features, with much more sophisticated business logic — all of this with a 2-developers team.

At the end of 2015, our technical situation was very difficult:

  • we spent 30 to 50% of our time fixing bugs, mostly because of the absence of tests and the difficulty of reproducing them

So we decided to migrate our server-side code from NodeJs and MongoDB to Clojure and Datomic. In 4 weeks, 13k lines of hacky, untested JavaScript code turned into 9k lines of well-tested Clojure code (plus 3k lines of tests).

After migrating, our situation drastically improved, mostly because of the above-mentioned properties of Datomic (although Clojure’s interactive development story also played a significant role in increasing productivity and quality):

  • the time we spend fixing bugs has gone down below 5%

Some productivity numbers:

  • Implementing a simplified Google Forms-style surveys system, with visualization and exports of results: 7 days, 600 LoC

Summary

Datomic has a combination of special features which offer a lot of leverage to developer teams: productivity, ease of debugging, ease of testing, auditability, extensibility… but my favorite feature of Datomic is that I don’t need to think a lot about it when I program: it just lets me focus without getting in the way:

  • When I write data, I don’t need to anticipate how I’ll query it or how its schema will change;

Once you get used to them, these capabilities seem like a given, because that’s what databases should feel like — I only realize how spoiled we are when going back to old-school databases, or seeing other teams struggle with them. If you wonder how impactful this is, there’s a similar historical precedent: ask experienced developers how it was to move from code in files to version control systems.

Appendix A: What do we expect of a database system?

A database system provides 2 primary operations:

  • storing data, which means not only ensuring that you store is saved durably, but also that the data you save is correct according to some business rules. Also called writing, persisting.

Appendix B: What’s the use case for Datomic?

The short answer is: the most common use case of databases in IT, the one for which people use MySQL, PostgreSQL or Oracle. Don’t use it for applications that only require very ‘dumb’ storage or for quick prototyping, that would be overkill!

Appendix C: What makes Datomic different?

Datomic is one of the very few pieces of technology I’d call revolutionary — trouble is, I have never met a database vendor who doesn’t call her product revolutionary. Here are some tangible elements that make Datomic stand out among database systems.

Datomic has 2 fundamental differences compared to mainstream database systems.

The first difference lies in the way Datomic stores information. Most databases work essentially like a slate, where a new piece of information is added by finding a place to write it, oftentimes by erasing an older piece of information. In contrast, Datomic works like a log (By log, I don’t mean a text file written by a web server; I mean the kind of log sailors write during a voyage to record what happens every day), in which every new piece of information is appended without touching the information that was previously written. Developers will refer to this as Datomic ‘immutable’, ‘accumulate-only’, or having the ‘Database as a value’ property.

The second difference lies in the way Datomic represents information. Whereas a mainstream database stores its information in tables or documents of various shapes, Datomic only represents information in the form of small units of data of similar shapes, called datoms, which represent facts. This uniformity of data representation is called the Universal Schema.

It’s not obvious why these two fundamental characteristics of Datomic are useful; but they’re actually enablers for other, more desirable properties (listed above). This means that the other databases systems that haven’t made these fundamental choices cannot achieve these desirable properties, not matter how many million engineering hours have been spent on them.

Appendix D: Who’s telling you this?

When it comes to choosing technologies, you should only ever listen to comparisons drawn by people who have given a fair try to each alternative. Hopefully this section will convince you I’m one of those :)

The bulk of my experience in IT comes from my job as CTO of BandSquare, in which I’ve had the chance of tackling a relatively wide spectrum of technical problems — from UI to web application backend to data analysis — using a variety of software stacks through several versions of the product: Scala/Play, NodeJs, Clojure, MongoDB, ElasticSearch, Postgresql, and of course Datomic. Prior to that, I’d been programming software for a few years through a variety of IT internships, side projects and school projects, during which I’ve had the chance of using Java, JavaEE, and bits of Ruby on Rails and Python, backed by SQL Server, MySQL, and Postgresql. You may find me on the web on Github, StackOverflow, and LinkedIn.

Note that I’m not affiliated to Cognitect — the company stewarding Datomic — in any way, except by being Datomic customer (still on the Datomic Starter free plan, I’m a bit embarrassed to admit).

Thanks to Chloé Julien, Baptiste Dupuch, Pauline Vialatte, Nathan Skrzypczak, and Benoit Cotte for helping me on drafts of this post.