Eventsourcing + PostgreSQL = ♥

Published in

Eventsourcing Publications

4 min readJun 29, 2016

The upcoming 0.4 release of Eventsourcing for Java features a new, well-known storage backend: PostgreSQL. Until now, the only other options were in memory and H2/MVStore (embedded file-based storage). While they are great for smaller projects, they are hard to get right for bigger, production-level applications. Particularly, it’s harder to debug the contents of such storage (no tooling), it’s harder to backup and replicate it, it’s harder to build a multi-node setup with it.

So, a new storage was necessary and it’s no wonder that PostgreSQL was chosen: it offers some really nice features that I will highlight in this article. Without these, the storage layout I ended up with wouldn’t be as simple.

Until now, all storages used a binary layout serialization format described in RFC 2/BES, but sticking that into PostgreSQL isn’t particularly exciting. It’d be easier to develop something like that, and it would work, but debugging would still be hard. I know this for a fact. An earlier incarnation of Eventsourcing did this.

So, with a hard constraint of making debugging the data layer fun again, the following approach was developed.

A table per entity type

Even though the journal is an append-only structure, it doesn’t have to contain a single list of journalled entities. Getting rid of a single “entities” table would help achieving a number of interesting benefits. We can see the journal as a set of multiple lists that we are appending. In fact, we’ll give every entity type (every command and every event) its own table.

First benefit we’re getting from this is that iterating over an entity type is now a trivial scan through the table. Even though this is something end-users never really use, it’s heavily used by the indexing subsystem in order to index data.

Second benefit is that it enables us to have a fixed table layout. Every entity layout has a fixed number of properties. So if we map Eventsourcing data types to PostgreSQL types, we can make querying entities a trivial job.

A table per version of the entity type

Eventsourcing uses a strict versioning policy for entities: whatever changes the essentials of an entity’s layout (name, property set, property type, property name), changes its version (represented by a SHA1 hash, or a fingerprint, see RFC 1/ELF).

It’s a great deal for the mapping we’ve chosen. Since properties of different layout versions are naturally different, we have to use different table structures to represent each version. And that’s what eventsourcing-postgresql does:

These table names use layout hash. Not a particularly user-friendly name. However, if you scroll this Gist to the right, you will see that we stick a human-readable COMMENT onto each table so it’s not very difficult to map them now.

Using composite types

Mapping basic types to PostgreSQL is rather straightforward and is not of a great interest. There’s a more interesting case, though: what should we do with entities that contain “object” properties? Even one of Eventsourcing’s basic structures, Timestamp, is an object — and its present in every single entity.

Luckily, PostgreSQL has support for what they call a composite type — something resembling a record, or a table row in a value.

Eventsourcing creates a composite type for every “object” type used by an entity in the system, and sticks a human-readable COMMENT onto it, and now we can easily query every record in this table:

If you were inquisitive enough to peek into the right side of this Gist, you might have noticed that timestamps looks like (0,-2657660719434565965) — it’s a composite type for the hybrid logical clock Timestamp. It’s not particularly readable (or even comparable), right? So there’s a helper function that makes them human-readable, too:

Using array types

Another interesting feature of PostgreSQL is the array type. Let’s inspect the CREATE TYPE statement behind the test column’s type in the example above:

We can see that the list column is actually an array of arrays of TEXT, so we can also easily use that in our queries:

While there’s no limit on the number of elements in a PostgreSQL array (besides the limit on the field size), because of the limitations of JDBC and PostgreSQL’s JDBC driver, we’re limited in the number of array elements we can insert. I still haven’t figured out how to create a non-primitive-type (think of composite types) array in JDBC, so instead, matching INSERT statements are generated to accommodate lists, but it puts the upper limit at 32K elements.

Not that it’s a small number — and most entities should probably avoid hitting that number, but it would be interesting to improve this part and lift this limitation.

Performance

At the time of this writing, very minimal profiling has been done and the performance is yet very exciting — up to 1200–1500 ops/sec on my MacBook Pro Retina. But its good enough as a start.

One unexplored angle here is to enable concurrent journalling (nothing wrong with that as we’re never updating, but only inserting) and the other one is to get more profiling time. The very minimal profiling that has been performed so far roughly doubled the performance.

Future

It’d be really great to get a PostgreSQL index engine for Eventsourcing developed to supplement the in-memory and the MVStore/H2 storages we have. But even without this, the delivery of this storage backend is definitely great news for application developers. The introspection part alone is worth it.

Pending few more features and some fixes (especially some in the third-party libraries), I will be releasing 0.4. But even before that, feel free to grab the latest snapshot (they are published on every commit) and play with it. We also have a small group on IRC #eventsourcing (FreeNode) and Gitter — many of your questions can be addressed there.