Conflict-free replicated data types
How to implement eventual consistency with strong guarantees
A key architectural consideration of cloud services is a performant approach to data replication as online services rely on geographic caches to maintain low latency, and must define a standard approach to consistency between nodes. Cloud services are typically specified to operate with high availability, fault tolerance, and low latency. Multiple copies of the same object are stored on different machines, and a backup replica must be available to deliver the same content in case of a primary dying. This means that replicas across a wide area network must be either programmed or mathematically bound to maintain consistency over time.
The most widely utilized approach to eventual consistency is based on the partial quorum approach as implemented in Amazon Dynamo, Basho Riak, LinkedIn Voldemort, and Apache Cassandra. However, probabilistic synchronization may introduce data inconsistencies which require programmers to write distributed applications to handle error cases in replica sets. As an alternative, conflict-free replicated data types have been proposed as an approach to guarantee consistency without additional synchronization mechanisms, while maintaining the latency and replication advantages of the probabalistic eventual consistency model. CRDTs are based on commutativity of operations, allowing replicas to apply operations in different orders with a strong guarantee replicas will converge
There are a variety of approaches in implementing replicated data types, with performance tradeoffs dependent on system access patterns(1). We first introduce basic considerations of the eventual consistency model before discussing concerns specific to CRDT.
Peer-to-peer vs Client-Server replication
In peer-to-peer systems every node shares the same interface and can be accessed to read or update values by any client. Alternatively, client-server replication has a primary replica which is responsible for defining order of operations on the replicas. There are two distinct design tradeoffs to these approaches to replication:
- Pessimistic (b) vs. Optimistic (a) replication: optimistic approaches are appropriate where stale data is less of a concern due to inconsistency latency in peer-to-peer writes, whereas pessimistic approaches may choose to implement a fully transactional client-server approach
- Active (b) vs. Passive (a) replication: passive approaches are appropriate where write time is a consideration as peer-to-peer writes will not lock, whereas active approaches may introduce bottlenecks in transactional writes
Operation Based vs. Stated Based replication
There are two fundamental methods to propagate updates amongst replicas. In state-based replication, all updates contain the full object state as in the above diagram, where each horizontal line represents an independent replica set, and arrows represent when updates between replica sets occur as time evolves towards the right of the axis.
Alternatively, in operation based replication, updates contain functional operations that modify the object. This approach is appropriate where transmitting the full object state introduces a large overhead in message size.
We now move on to properties specific to CRDTs, which require specified bounds for operations to maintain consistency. In order for a set to converge on the same value in an environment where replicas only communicate occasionally, the operations need to be order-independent and insensitive to message duplication and redelivery(2). Operations need to be:
- Associative (a+(b+c)=(a+b)+c), so that grouping doesn’t matter
- Commutative (a+b=b+a), so that order of application doesn’t matter
- Idempotent (a+a=a), so that duplication does not matter
In mathematics, these structures are known as join or meet semilattices. Conflict-free data types have been derived from the CALM conjecture, which states that logically monotonic programs are guaranteed to be eventually consistent. These are in contrast to non-monotonic logics, in which assertions made utilizing partial information may be invalidated by new knowledge.
Counters intuitively follow the bounds of semilattice logic, and for registers and lattices we introduce simple timelines to illustrate the consistency model.
- Grow-only counter (merge = max(values); payload = single integer)
- Positive-negative counter (consists of two grow counters, one for increments and another for decrements)
- Last Write Wins -register (timestamps or version numbers; merge = max(ts); payload = blob)
- Multi-valued -register (vector clocks; merge = take both)
- Grow-only set (merge = union(items); payload = set; no removal)
- Two-phase set (consists of two sets, one for adding, and another for removing; elements can be added once and removed once)
- Unique set (an optimized version of the two-phase set)
- Last write wins set (merge = max(ts); payload = set)
- Positive-negative set (consists of one PN-counter per set item)
- Observed-remove set
The intuition behind the OR-set is to tag each element added uniquely, without exposing the tags in the interface. This allows for the identification and removal of the elements specific to a given call. The OR-set is appropriate for the implementation of most set operations of the quorum pattern of eventual consistency, as well as additional use cases that may require convergence with bounded security constraints as provided by CRDTs, including shopping carts and e-commerce.
Conflict-free replicated data types have a variety of applications including performing computation in delay-tolerant networks, introducing latency tolerance while remaining consistency in wide-area networks, implementing distributed data aggregation, and architecting partition-tolerant cloud computing. Here we consider CRDTs on a directed acyclic graph, which is a common design pattern utilized in many implementations of distributed computation.
In the general case, CRDTs cannot support utilizing a DAG as a set with edges, a pair of sets (V,E) such that E ⊆ V ×V, due to the possibility of introducing edges that create cycles in a given replication set. However, we can define an add-only monotonic DAG defined with left and right endpoints ⊢ and ⊣, such that edges can only be added between vertices proceeding right along the horizontal axis. We know the monotonic DAG is a CRDT because adding edges concerns either different vertices, in which case execution is independent, or the same vertex, in which case execution is idempotent.
Extending monotonic DAG operators to edge removal is difficult due to resolving inconsistencies across replication sets. Giving precedence to edge removal allows merging replica sets through the use of representing removed verticies as tombstones, and resolving edges added to dead vertices as null operation. Below we see an example of a set of edge additions and removals that can never be satisfied, due to pre-conditions of edge removal never able to become satisfied between replica sets.
In summary, conflict-free replicated data types are a specific class of the probabilistic eventual consistency that provide stricter conditions on convergence, and eliminate the possibility of anomalies due to inconsistent operation. While utilizing CRDTs may require additional time upfront in defining application usage patterns, for use cases where data consistency and security are of paramount importance they may provide an interface that is more transparent to extend and maintain than more general consistency patterns.
- Key-CRDT Stores by Nuno Manual Ribeiro Preguiça
- Distributed systems for fun and profit by Mikito Takada
- A comprehensive study of Convergent and Commutative Data Types by Marc Shapiro
- CRDT in production at Soundcloud by Peter Bourgon
- Meangirls CRDT Examples in Ruby by Kyle Kingsbury
- Readings in CRDTs by Christopher Meiklejohn
- Photographic Reference New Old Stock by Cole Townsend