How DeltaCrdt can help you write distributed Elixir applications

Earlier this year I published my new library, Horde. Horde provides two things: a distributed supervisor, and a distributed registry, and you can read about it in the introductory blog post I wrote at the time. Today I am going to write about the library that Horde is built on: DeltaCrdt (official documentation).


What is a CRDT?

CRDT stands for Conflict-free Replicated Data Type. Working from the inside out, we can dissect the meaning of this acronym. “Data Type”; so it’s got something to do with data. “Replicated”; so it’s about replicating data, aka, distributing data. So we can see that it’s a distributed data structure. “Conflict-free”; there won’t be any conflicts to resolve (in other words, the CRDT will resolve these conflicts for us).

A short example is in order (AWLWWMap stands for “Add-wins last-write-wins map”):

iex> {:ok, crdt1} = DeltaCrdt.start_link(DeltaCrdt.AWLWWMap)
iex> {:ok, crdt2} = DeltaCrdt.start_link(DeltaCrdt.AWLWWMap)
iex> DeltaCrdt.add_neighbours(crdt1, [crdt2])
iex> DeltaCrdt.read(crdt1)
%{}
iex> DeltaCrdt.mutate(crdt1, :add, ["CRDT", "is magic!"])
iex> DeltaCrdt.mutate(crdt2, :add, ["CRDT", "is very magical!"])
iex> DeltaCrdt.read(crdt1)
%{"CRDT" => "is very magical!"}
iex> DeltaCrdt.read(crdt2)
%{"CRDT" => "is very magical!"}

In this example, we have started two instances of DeltaCrdt and made them aware of each other. The initial value of the crdt is simply an empty map. Then we generated a conflict (writing both “is magic!” and “is very magical!” to the key “CRDT”), which was resolved automatically by the CRDT. In this case, the CRDT being used was an “Add-wins last-write-wins map”, so the last write, in this case “is very magical!” has won.

A CRDT is just a data-type, however. The other half of the puzzle is an anti-entropy algorithm, which is responsible for ensuring convergence. DeltaCRDT also implements anti-entropy (you can read about the details in the documentation).

Conflict-resolution is the central feature of CRDTs. CRDTs (in combination with an anti-entropy algorithm) are eventually guaranteed to converge globally, meaning that in the absence of network partitions, the data will eventually be the same at every replica.

What is a delta CRDT then?

To understand what a delta CRDT is, first we have to talk about the history of CRDTs. The first CRDTs were what we now call “state CRDTs”. In order to sync their state, it was necessary to send the entire state of the CRDT to its neighbours. The downside is of course that this becomes more and more expensive as the size of the CRDT grows. This was a huge limitation on their usefulness. That’s where delta CRDTs come in. A delta CRDT only has to send the delta of the state (aka, just the mutations) to its neighbours. This is incredibly efficient compared to state CRDTs, and it’s the technique that’s used in the DeltaCrdt library.

What can it be used for?

Hopefully at this point you can see the utility of having a distributed data structure that is guaranteed to converge globally. If not, here are a few examples:

Distributed registry

If you want to address processes globally (aka, there should be one process per key in your whole cluster, not just on a single node), then you could build a distributed process registry. This could be useful for IOT, for example, if you want to have just one process in charge of a particular hardware device in the field. This is what Horde.Registry does.

Distributed supervisor

A distributed supervisor could be handy for spreading load over multiple nodes in a cluster, and also for automatic failover. Say for example a node in your cluster dies, then other nodes could automatically take over the failed node’s processes. It can also be used to make your application geo-redundant, to the extent that your application can still use Erlang clustering. This is what Horde.Supervisor does.

Distributed circuit breaker

If you consume an API at scale, it might be useful to have a distributed circuit breaker at your disposal. If the API goes down, you can short circuit new requests until the API comes back up (and not spam it with requests which might slow down service recovery).

Distributed rate limiter

If you consume an API at scale, you probably also have a rate limit associated with that API. You could leverage DeltaCrdt to build a distributed rate limiter to control your API consumption on a cluster-wide basis.

Distributed database

You could build your own distributed version of Redis with DeltaCrdt. In fact, if you read the papers that DeltaCrdt is based on (they are linked in the documentation) then you’ll find references especially to Riak, where a lot of these ideas were pioneered.

Distributed cache

Now I’m starting to just name variations of previous ideas. The point is, there is a lot that you could conceivably build with DeltaCrdt.

Conclusion

DeltaCrdt is available, and I’ve spent some time recently bringing the API up to a more usable level, and improving the documentation. I would like to see DeltaCrdt being used in more libraries. It’s got a relatively modest 1100+ downloads on hex.pm, so it’s being used, and therefore continually being tested, somewhere (in addition to in my own projects).

As a final note, as a library author, we have very little information to go on when it comes to gauging the impact of our libraries. If you are using DeltaCrdt, please drop me a line, I’d be very interested to hear how you’re using it!