Consistency & Consensus for System Design Interview (4): implementing causality

SystemDesign
Tech Wrench
Published in
8 min readSep 3, 2023

PREV | HOME | NEXT | System Design Resource List

Don’t forget to get your copy of Designing Data Intensive Applications, the single most important book to read for system design interview prep!

Check out ByteByteGo’s popular System Design Interview Course

Introduction

Causal consistency requires that a database maintains knowledge of which operations occur before a particular operation. In case of concurrent operations, all replicas of a database must process operations in the same order. If a replica finds a missing operation then the latter operations must wait until the missing operation has been processed. So the next question is how does a database determine if an operation by a client is causally related to a preceding operation? This can be answered by knowing what version of the database the client reads, since a client can only build-upon or depend upon what it has already read. However, tracking what each client reads across the entire database quickly becomes impractical. Also, it can’t be differentiated what part of the overall data that a client reads causally precedes a later write.

Grokking Modern System Design for Software Engineers and Managers

An attempt at ordering events/operations can start at generating sequence numbers or using timestamps from logical clocks. Timestamps from time of day clocks suffer from skew issues and can’t be used. A sequence number is assigned to each operation/event and thus enforcing a total order across all the events/operations. By definition total order also imposes partial order and we can say that for any two events/operations A and B, if A occurs before B then A will have a sequence number smaller than that of B. Note that concurrent operations/events may be processed in arbitrary order and causal consistency shall still hold. In case of single leader replication, the leader can generate a monotonically increasing sequence number to each operation/event and as long as each replica applies each operation in the order of its assigned sequence a replica will correctly update to the state of the leader.

Land a higher salary with Grokking Comp Negotiation in Tech.

In case of multi-leader or leaderless replication systems, there are a few options to generate sequence numbers, however, such numbers aren’t causally consistent:

  • Every node in the system generates sequence numbers independently, however, these numbers will not be unique across the system. One solution is to prefix or suffix the generated sequence number with the unique node ID making every sequence number in the system unique. However, this scheme doesn’t capture causal relationships between operations that get sequence numbers generated at different nodes assigned to them.
  • Timestamps with high resolution from time of day clocks can be used as sequence numbers but then again due to clock-skew the timestamps can’t be used to determine causal relationship between operations. Systems that use last write wins (LWW) strategy can totally order operations using high resolution timestamps.
  • Another strategy could be to allocate ranges for each node to pull sequence numbers from. For instance, node#1 may assign sequence numbers from 0 to 50000 and node#2 may assign sequence numbers from 50001 to 100,000. Once a node uses up all the sequence numbers from its assigned range, a new range is assigned. Again this scheme suffers from lack of establishing causal consistency among operations that get assigned sequence numbers by different nodes.
Ace the machine learning engineer interview with Grokking the Machine Learning Interview.

Next, we’ll delve into how sequence numbers consistent with causality can be generated in a distributed system as described by Leslie Lamport in 1978.

Deep dive into mastering dynamic programming interview questions

Lamport Timestamps

The idea behind Lamport timestamps is as follows:

  1. Each node in the system generates a pair of values, which consists of a counter and a node ID. The node ID remains the same for each node, but the counter is incremented.
  2. When comparing two pairs, the one with the greater counter value is considered the greater timestamp and if the counter values are the same then the tie is broken on the node ID. With equal counter values the sequence number with the greater node ID is considered the greater sequence number.
  3. Each client interacting with the system retains the maximum counter value that it has seen so far and sends that maximum seen counter value along with every request the client makes to the system.
  4. A node upon receiving a client request examines the counter value it received from the client against the value it holds. If the client sent counter value is higher than what the node holds, then the node sets its own counter value to one plus the value received. Else, the node simply increments its own counter value and responds with the updated counter value to the client.

Check out the course Coderust: Hacking the Coding Interview for Facebook and Google coding interviews.

Note that Lamport timestamps as originally described define a partial ordering on events in a distributed system but by using an appropriate tie-breaker, as we did with the node ID, a total ordering can also be defined.

Let’s consider an example with two clients and two nodes A and B as shown below:

1. The two clients initiate a write request to the nodes A and B respectively. Both clients are just starting out and report a maximum counter value of 0. The writes are made by the two nodes and Lamport timestamp for the two writes is returned to the clients.

Notice that the two operations are concurrent and Lamport timestamps enforce a total order on them. Since neither operation depends on the other there’s no causal relationship between the two and it doesn’t matter whether one occurs before the other in the total order.

2. Next the first client makes two more writes to node A. The counter at node A increments with each write

3. Next, client#1 writes to node B, and along with its request sends the counter maximum value of 3 that it received from node A in the previous write.

Notice how node B increments its counter value to 4 from 1. The causal relationship among writes by client#1 is captured by lamport timestamps as shown below:

Get a leg up on your competition with the Grokking the Advanced System Design Interview course and land that dream job!

The first write by client#2 isn’t causally related to the first write by client#1 and both operations are considered concurrent. However, note that writes 3 and 4 in the above depiction have a higher sequence number than write-2, but that doesn’t mean that writes 3 and 4 happened after write-2. Infact, the first three writes by client#1 i.e. writes 1, 3 and 4 are all concurrent to write-2. Occurring earlier in a total ordering doesn’t imply that the write operation occurred earlier on the actual timeline too. It is possible that write-2 takes a long time to process or experiences network delay and finishes much later than writes 1, 3 and 4 or that it is processed very quick before any of the writes by client#1 is processed. However, write-2 causally occurs before write-5, because write-5 can only be processed by node B after write-2 has been processed. Moreover, writes 1, 3 and 4 all happen before write-5.

Grokking the Coding Interview: Patterns for Coding Questions

Looking at Lamport timestamps we can’t determine if two events are actually causally related or just concurrent, for instance the first three writes by client#1 are concurrent with the first write of client#2 even though two of the writes by client#1 have a higher sequence number. The only guarantee is that if two events are indeed causally related then the one occurring earlier than the other will necessarily have a smaller sequence number than the later occurring event. In contrast, version vectors (discussed briefly in previous lessons) allow us to distinguish between concurrent and causally related events. The advantage of Lamport timestamps is that they have a compact representation compared to version vectors.

4. Now client#2 writes to node A and node A updates its counter value to 4.

In the total ordering of all these events, we can see that for every write that is causally related to an earlier write, the later occurring write on the actual timeline also has a higher Lamport timestamp. Below is the total order for all the writes.

Writes 1, 2, 3, and 4 causally precede write-5. Similarly, write-6 must happen after writes 1, 2, 3, and 4 but we can’t establish a causal relationship between write-5 and write-6. They are concurrent and we can’t say which one occurs before the other.

Interviewing? consider buying our number#1 course for Java Multithreading Interviews.

Limitations of Lamport Timestamps

Though Lamport timestamps can produce a total ordering of events in a distributed system, that total ordering isn’t available as events are occurring. The total ordering can be constructed after the events have executed and this may not happen immediately or soon enough as nodes can experience network delays causing operations executed by them to be available for computing the total order.

If you are interviewing, consider buying our number#1 course for Python Multithreading Interviews.

This may pose a problem as two nodes may each receive a request such as registering a username by two different clients. Only after both the nodes perform the operation can there be an opportunity to compute the total order of events and resolve conflicts accordingly. For some use cases such as registering unique usernames (uniqueness constraint), finding the total order after execution of operations may be too late. In such scenarios total order isn’t sufficient to ensure the uniqueness constraint.

Your Comprehensive Interview Kit for Big Tech Jobs

0. Grokking the Machine Learning Interview
This course helps you build that skill, and goes over some of the most popularly asked interview problems at big tech companies.

1. Grokking the System Design Interview
Learn how to prepare for system design interviews and practice common system design interview questions.

2. Grokking Dynamic Programming Patterns for Coding Interviews
Faster preparation for coding interviews.

3. Grokking the Advanced System Design Interview
Learn system design through architectural review of real systems.

4. Grokking the Coding Interview: Patterns for Coding Questions
Faster preparation for coding interviews.

5. Grokking the Object Oriented Design Interview
Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions

6. Machine Learning System Design

7. System Design Course Bundle

8. Coding Interviews Bundle

9. Tech Design Bundle

10. All Courses Bundle

--

--

SystemDesign
SystemDesign

Written by SystemDesign

The ultimate Poor man’s system design interview prep guide -- https://systemdesign.medium.com/membership

No responses yet