Promising Further Work in CRDTs

Gordon Guthrie
6 min readAug 14, 2017

--

This is a continuation of Part 1 Field notes on modelling relational data with CRDTs — as part of my gies-a-job series post-Basho.

Relationships to other work

Lets us look at how the Big Sets of Afrika are co-located on disk. As we walk left from the Left Hand Key and project a new table join we build a rolling composite key — this key grows across all the joins.

The importance of the composite key is that all Primary Keys for all tables contain the left-hand key — and therefore any SQL actions on the tables that reference the composite key can be hashed to the ring and directed to the right place where the data lies:

Projected joins and key order of rows on physical disk

What is potentially interesting about this from a clustered perspective is that we can reverse the keys-on-disk — which preserves the logical meaning but gives different physical characteristics and projects the tables out:

Key order after composite key inversion

The attraction of key inversion is that a high-availability cluster can be built where one side of the cluster (here shown as multi-data centre cluster — but it could be the same site) is optimised for joined reads and the other for table scans:

Key inversion and cluster layout

Fruitful areas for research

There are a number of other fruitful areas of research that I think CRDT boffins could look at.

These are:

  • eventually consistent indices
  • eventually consistent joins
  • client-side CRDTs
  • ledger-based data models

Eventually consistent indices

The first is whether or not Delta’s to Big Set or Big Map CRDTs can be repurposed. It seems a reasonable conjecture that a Delta representing the addition of deletion of an entry to a set, or the updating of a field in a map could also be used to update the state of a different CRDT that represents an index back to the Big Map/Big Set CRDT.

To understand why this matters lets look at how 2i indexes work in Riak right now.

Atomic write of value and key

The value is written in the normal way of a write — and in a separate keyspace on disk an 2i entry is written. This approach means that there is a local index on each vnode in the ring — and doing an index read then requires a full ring scan to query each local index.

It would be nice to have a single index for a record hashed on the ring by grabbing deltas from CRDTs and repurposing them:

Conjecture about re-using deltas to build and maintain eventually consistent indices

These indices could then be read by simplying going to the vnode they were stored on and going a read.

Proving that such a beast exists, is of course, another thing. There are also operational issued about such indices. Take the degenerate case — a boolean field which can bet set to true or false.

If you had a million entries across a ring of 50 nodes — each node would have 20,000 keys. But the index would be on two nodes (the partition key would be based on the value written, not the key) each with 500,000 entries.

Clearly as the cardinal value increased above the number of nodes the ring will eventually become balanced (assuming an even distribution of index values across the set). Preventing clusters hot-spotting with eventually consistent indexes will be non-trivial.

There are other issues associated with building indices that would also need to be solved — such as histograms for them — to provide heuristics to enable any query rewriter to decide on query rewriting paths.

Eventually consistent joins

The second fruitful area of research would be in establishing whether eventually consistent joins could be modelled.

The proposed projected joins of Afrika are very useful but have their limitations. It turns out that it is not possible to reduce all utile data models to a single left hand key one (even with the most generous application of enumerated data types to represent static data). You can build quite complex trees — but consider a retail bank (as discussed earlier with respect to spreadsheet models). There are two major left-hand keys:

Major left-hand key ‘families’ in banking data structures

In this world it would be great to be able to represent eventually-consistent joins. The best starting place would probably be a pseudo-pigs ear joined pair, like this representation of a social network:

Self-referential joins — very important in a lot of common application architectures

Client-side CRDTs

CRDT’s solve the problems of partition — where the network breaks. It stands to reason to deploy them at the point where the network breaks the most — the client-to-server network. The current focus has been on the server-to-server partition which is orders of magnitude rarer.

With the delta-based CRDTs that are mapped to SQL a client-side implementation with the data management being via SQLight tables seems obvious. It would need to be a SQL-CRDT-Map-SQL sandwich — using existing storage mechanism to stash the Big Maps/Big Sets and their vector clocks.

The problem with client-side CRDTs has always been one of actor-explosion — but there is promising research which might be able to solve that..

The obvious thing would be a javascript implementation of Big Sets/Big Maps Delta CRDTs.

There is a strong commercial reason for doing this. Mega corporations like Google can avoid the problems of partition and eventual consistency via use of super-expensive hardware solutions like Spanner which uses atomic clocks in data centres and expensive built in networks that distribute synchronous time.

This is not a solution that users of open source databases (like, but not confined to riak). However the Spanner solution is not extendable to the major source of network partition — server/client.

Building solutions that solve that makes considerable sense from a market perspective.

Ledger-based data models

It seems to be that ledger-based data modelling (on top of CRDTs) is the way forward.

Ledger-based data models eschew standard CRUD (Create, Read, Update, Delete) in favour of CAR (Create, Append, Read). To ‘create’ a person an entry is made in the Person table and and associated entry is made in the joined Status table saying created. To delete the person a second entry is appended to the Status table saying deleted. Data is always immutable.

Let us see how this works in principle. Creation is a 2-table operation:

Create a new person

And then we can do pseudo-deletes by append:

Pseudo-delete by append

This process is repeated for all tables — each record is represented by a ledger line that is timestamped and the current state is the most recent — all actions have audit and rollback automatically.

Essentially the core requirement of CRDTs for ledger-based systems is that append of records on sets during partition can be resolved deterministically. All actions on ledger-based data are idempotent and reversable — so the update

If we are to migrate CRDTs up from the base-level implementation into the mechanisms of our databases — we should develop data modelling practices at a higher level of abstraction, where the application developer lives, that make maximum use of the powers of CRDTs.

It is worth remembering that the point of CRDTs is to make developers lives easier — good data modelling practices are part of our world of concern as well.

In conclusion (and gies-a-job)

For those of you struggling with the pain of CRDTs and causality I leave you with poor Professor Lamport, of the causal front-ear, and his bad case of the CRDTs…

Gies a job. Ping me on twitter @gordonguthrie

--

--

Gordon Guthrie

Former SNP Parliamentary Candidate — Quondam Computer Boffin