Time and clocks and ordering of events in a distributed system

Introduction

Ameya
Coinmonks
Published in
7 min readNov 13, 2018

--

In this post, we will cover the notion of time in distributed systems. We rely a lot on time in our daily existence for ordering. One thing “happened before” another is a relation that is codified by time, using physical clocks as a mechanism to deliver time. So ordering of events is done using physical clocks in the real world. In a distributed system, this notion of time needs to be understood more carefully. Without relying on physical clocks that can be inaccurate, the central question becomes how do we define ordering of events in system of collection of nodes. Each node can establish their own local order of events. But how can one establish a coherent order of events across multiple nodes. This order is important for solving problems like distributed locking — a shared resource which can only be used by one node at a time and the node requesting the lock on the resource needs to be granted the lock as long as another node hasn’t asked for it before. In such cases local order is not sufficient and some global order needs to be established across multiple nodes.

Partial ordering of events in a distributed system

One way to define an order of events in a distributed system would be to have a physical clock. So “happened before” event can be described using t1 < t2. But clocks are not accurate, can drift and are not necessarily synchronized in lock-step. So this paper takes another approach to define “happened before” relation. (The meaning of “partial order” will become clear later. )

A distributed system can be defined as a collection of processes. A process essentially consists of a queue of events(which can be anything — like an instruction, or a subprogram, or anything meaningful) and has apriori order. In this system, when processes communicate with each other, sending of a message is defined as an event. Let’s establish a more precise definition of “happens before” (⇢) in such a system.

  1. In the same process, if event a comes before event b, then a ⇢ b
  2. When process i sends some message at a to process j and process j acknowledges this message at b, then a ⇢ b
  3. Happens before is transitive. a ⇢ b and b ⇢ c, then a ⇢ c.
  4. Two events a and b are described as concurrent when a hasn’t happened before b and b hasn’t happened before a. This condition generally reflects the fact that process i may not have knowledge about all events that could have happened in process j. So we cannot establish authoritative order in the system — this also makes it clear that we have only a partial order in the system. There could be a lot of events in the system for which no meaningful order can be established without relying on physical time.

Taking some examples might be useful.

Partial ordering of events in the system. Vertical bars are processes. Wavy arrows are messages. A dot is an event. Time elapses bottom up.

As you can see in the figure on the left above, p1 ⇢ p2 as those are the events in the same process. q1 ⇢ p2 due to sending of the message from process Q to process P. If we consider that time is elapsing bottom up and q3 seems to happen before p3 if we were to consider physical time. But in this system, we will call these events as concurrent. Both processes don’t know about these events — there is not causal relationship at this time. But what we can say is that q3 ⇢ p4, because q3⇢q5 and q5 ⇢ p4. Similarly we can say that p1 ⇢ r3. Now having gone through some examples, it becomes clear that, another way to envisage a ⇢ b is that event a can causally affect event b. On a similar note, q3 and p3 mentioned above are not causally related.

Logical clocks

Lamport’s paper introduces a new function, essentially a counter, in every process that can assign a number to an event. Let’s call this function as Ci(a) as a counter in process i for event a. There are no physical clocks in the system. The function C(a) establishes the invariant that, event a must have happened before b, then C(a) < C(b). Using this function, a partial ordering of events in the system can be established using the following two conditions:

C1: Ci(a) < Ci(b) if a happens before b in the same process i. This can be implemented using a simple counter in the given process.

C2: When process i sends message at event a and process j acknowledges the message at event b, then Ci(a) < Cj(b)

These clock functions can be thought of as ticks that occur regularly in a process. Between any two events, there needs to be at least one such tick. Each tick essentially increments the number assigned to the previous tick. This is illustrated in the right-hand side figure above. Similar numbered ticks are connected across process boundaries. In the same process one touches or crosses the tick boundary to land on the next event. Across processes when messages are sent, that message needs to cross or touch a tick boundary to define the “happens before” event.

This can be implemented in the following way:

IR1: This one obeys C1. This can be done by incrementing the Ci(a) between any two successive events in the system.

IR2: This is one implements C2. It can be done by sending Ci(a) as a timestamp to process j. When Process j acknowledges the receipt of this message at event b, it needs to set Cj(b). Cj(b) will be set to a value ≥ the current Cj and also greater than Ci(a)/timestamp.

Total ordering of events in the system

So far the system of logical clocks has established a partial order of events in the system. There are still events which are concurrent and it would be useful to break ties, specially for the locking problem described in the introduction — someone needs to get the lock. This can be done by introducing an arbitrary process priority. In case of a tie, the process with lower priority gets the event that happened before. More formally, a total order a ⇥ b (notice the new type of arrow)can be defined as:

  1. Ci(a) < Cj(b).
  2. If Ci(a) = Cj(b), then use Pi < Pj.

These two conditions imply that ⇥ completes the partial relationship ⇢. If two events are partially ordered then they are totally ordered already. While partial ordering is unique in the given system of events, total ordering may not be.

Distributed locks using total ordering

Consider the following problem which can be quite common in distributed systems. The central idea of the problem is to access a shared resource, but only one process can access it at any time. More formal conditions can be specified as:

  1. A process which has been granted a resource, must release it before any other process can acquire it.
  2. Resource access requests should obey the order in which requests are made
  3. If every process releases the resource it asked for, then eventually access is granted for that resource.

One possible solution could be to introduce a centralized scheduler. While one issue is that it is a completely centralized, another is that ordering condition 2 may not work. Consider the scenario: Process i asking for a resource to the scheduler. Then it informs process j about the request. Process j now asks for the same resource, but its message reaches the scheduler before process i’s. This means that event order was not obeyed. To address this issue, we can use total ordering based off of IR1 and IR2. With this, every event is totally ordered in the system. As long as all the processes know about requests made by other processes, the correct ordering can be enforced. A decentralized solution can be designed such that each process keeps a queue of lock and unlock operations.

  1. Process i asking for a resource lock, uses the current timestamp and puts lock(T,Pi) in the queue. It also sends this message to all other processes.
  2. All other processes put this message in their queue and send the response back with a new timestamp Tn.
  3. To unlock a resource, process i, sends unlock(T, Pi) message to all processes and removes the lock(T, Pi) message from its own queue.
  4. Process Pj, upon getting unlock message, removes lock(T, Pi) message from its queue.
  5. Process Pi is free to use the resource i.e. gets its lock request granted when: it has the lock(T, Pi) messages in its queue with T enforcing the total order such that T is before any other message in the queue. In addition, process Pi needs to wait until it has received messages from all the processes in the system timestamped later than T.

Conclusion

The notion of time/order-of-events is quite complicated in distributed system. Idea of logical clocks to ensure causal ordering, without having to rely on physical clocks, is quite useful in distributed systems. Lamport’s logical clocks ensure that if a ⇢ b then, C(a) < C(b). It is also good to understand that if C(a) < C(b) then it doesn’t necessarily mean that a ⇢ b. If we just looked at C(j) and C(k) and C(j) happened to be less than C(k), then those could be concurrent events and processes may not have communicated with each other yet. This later property, of just looking at some timestamps and realizing the causality, is achieved by vector clocks.

Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News

Also, Read

--

--