Time Synchronization in Distributed Systems

Published in

Distributed Knowledge

10 min readApr 8, 2020

Figure 1: Can you spot the anomaly? Source: HopkinsCinemAddicts

If you find yourself familiar with websites such as facebook.com or google.com, maybe not less than once have you wondered how these websites can handle millions or even tens of millions requests per second. Will there be any single commercial server that can withstand this huge amount of requests, in the fashion that each request can be served timely for end-users?

Diving deeper into this matter, we arrive at the concept of distributed systems, a collection of independent computers (or nodes) that appear to its users as a single coherent system. This notion of coherency for users is clear, as they believe they are dealing with a single humongous machine that takes the form of multiple processes running in parallel to handle batches of requests at once. However, for people who understand system infrastructure, things do not turn out as easy as such.

Figure 2: A data center, the heart of Distributed Computing. Source: Business Insider.

For thousands of independent machines running concurrently that may span multiple time zones and continents, some types of synchronization or coordination must be given for these machines to collaborate efficiently with each other (so that they can appear as one). In this blog, we will discuss the synchronization of time and how it achieves consistency among machines on ordering and causality of events.

The content of this blog will be organized as follows:

Clock synchronization
Logical clocks
Take away
What’s next?
References

1. Clock Synchronization

Physical Clocks

In a centralized system, time is unambiguous. Nearly all computers have a “real-time clock” to keep track of time, usually in sync with a precisely machined quartz crystal. Based on the well-defined frequency oscillation of this quartz, a computer’s OS can precisely monitor the internal time. Even when the computer is shut off or goes out of charge, the quartz continues to tick due to the CMOS battery, a small powerhouse integrated in the motherboard. When the computer connects to the network (Internet), the OS then contacts a timer server, which is equipped with a UTC receiver or an accurate clock, to accurately reset the local timer using Network Time Protocol, also called NTP.

Although this frequency of the crystal quartz is reasonably stable, it is impossible to guarantee that all the crystals of different computers will operate at exactly the same frequency. Imagine a system with n computers, whose every crystal running at slightly different rates, causing the clocks gradually to get out of sync and give different values when readout. The difference in the time values caused by this is called clock skew.

Under this circumstance, things got messy for having multiple spatially separated machines. If the usage of multiple internal physical clocks is desirable, how do we synchronize them with real-world clocks? How do we synchronize them with each other?

Network Time Protocol

A common approach for pair-wise synchronization between computers is through the use of the client-server model i.e let clients contact a time server. Let’s consider the following example with 2 machines A and B:

First, A sends a request timestamped with value T₀ to B. B, as the message arrives, records the time of receipt T₁ from its own local clock and responses with a message timestamped with T₂, piggybacking the previously recorded value T₁. Lastly, A, upon receiving the response from B, records the arrival time T₃. To account for the time delay in delivering messages, we calculate Δ₁ = T₁-T₀, Δ₂ = T₃-T₂. Now, an estimated offset of A relative to B is:

Based on θ, we can either slow down the clock of A or fasten it so that the two machines can be synchronized with each other.

Since NTP is pair-wise applicable, B’s clock can be adjusted to that of A as well. However, it is unwise to adjust the clock in B if it is known to be more accurate. To solve this problem, NTP divides servers into strata, i.e ranking servers so that the clocks from less accurate servers with smaller ranks will be synchronized to those of the more accurate ones with higher ranks.

Berkeley Algorithm

In contrast to clients that periodically contact a time server for accurate time sync, Gusella and Zatti, in 1989 Berkeley Unix paper [6], proposed the client-server model in which the time server (daemon) polls every machine periodically to ask what time it is there. Based on the responses, it computes the round-trip time of the messages, averages the current time with ignorance of any outliers in time values, and “tells all other machines to advance their clocks to the new time or slow their clocks down until some specified reduction haw been achieved” [3].

Figure 5: The time daemon asks all other machines to adjust their clocks. Source: [3].

2. Logical Clocks

Let’s take a step back and reconsider our base for synchronized timestamps. In the previous section, we assign concrete values to all participating machines’ clocks, henceforth they can agree on a global timeline at which events happen during the execution of the system. However, what really matters throughout the timeline is the order in which related events occur. For example, if somehow we only need to know event A happens before event B, it does not matter if Tₐ = 2, Tᵦ = 6 or Tₐ = 4, Tᵦ = 10, as long as Tₐ < Tᵦ. This shifts our attention to the realm of logical clocks, where synchronization is influenced by Leslie Lamport's definition of “happens-before” relation [1].

Lamport’s Logical Clocks

In his phenomenal 1978 paper Time, Clocks, and the Ordering of Events in a Distributed System [1], Lamport defined the “happens-before” relationship “→” as follows:

1. If a and b are events in the same process, and a comes before b, then a→b.
2. If a is the sending of a message by one process and b is the receipt of the same message by another process, then a→b.
3. If a→b and b→c, then a→c.

If events x, y occur in different processes that do not exchange messages, neither x → y nor y → x is true, x and y are said to be concurrent. Given a C function that assigns a time value C(a) for an event on which all processes agree, if a and b are events within the same process and a occurs before b, C(a) < C(b)*. Similarly, if a is the sending of a message by one process and b is the reception of that message by another process, C(a) < C(b) **.

Now let us look at the algorithm that Lamport proposed for assigning times to events. Consider the following example with a cluster of 3 processes A, B, C:

Figure 6: Three processes running out of sync (left). Lamport’s algorithm then corrects their clock values (right).

These 3 processes’ clocks operate on their own timing and are initially out of sync. Each clock can be implemented with a simple software counter, incremented by a specific value every T time units. However, the value by which a clock is incremented differs per process: 1 for A, 5 for B, and 2 for C.

At time 1, A sends message m1 to B. At the time of arrival, the clock at B reads 10 in its internal clock. Since the message has been sent was timestamped with time 1, the value 10 in process B is certainly possible (we can infer that it took 9 ticks to arrive at B from A).

Now consider message m2. It leaves B at 15 and arrives at C at 8. This clock value at C is clearly impossible since time can not go backward. From ** and the fact that m2 leaves at 15, it must arrive at 16 or later. Therefore, we have to update the current time value at C to be larger than 15 (we add +1 to the time for simplicity). In other words, when a message arrives and the receiver’s clock shows a value that precedes the timestamp that pinpoints the message departure, the receiver fast forwards its clock to be one unit of time more than the departure time. In figure 6, we see that m2 now corrects the clock value at C to 16. Similarly, m3 and m4 arrive at 30 and 36 respectively.

From the above example, Maarten van Steen et al [3] formulates Lamport’s algorithm as follows:

1. Before executing an event (i.e sending a message over the network, …), Pᵢ increments Cᵢ: Cᵢ <- Cᵢ + 1.
2. When process Pᵢ sends message m to process Pj, it sets m’s timestamp ts(m) equals to Cᵢ after having executed the previous step.
3. Upon the receipt of a message m, process Pj adjusts its own local counter as Cj <- max{Cj, ts(m)} after which it then executes the first step and delivers the message to the application.

Vector Clocks

Remember back to Lamport’s definition of “happens-before” relationship, if there are two events a and b such that a occurs before b, then a is also positioned in that ordering before b, that is C(a) < C(b). However, this does not imply reverse causality, since we can not deduce that a goes before b merely by comparing the values of C(a) and C(b) (the proof is left as an exercise to the reader).

In order to derive more information about the ordering of events based on their timestamp, Vector Clocks, a more advanced version of Lamport’s Logical Clock, is proposed. For each process Pᵢ of the system, the algorithm maintains a vector VCᵢ with the following attributes:

1. VCᵢ[j]: local logical clock at Pᵢ, or the number of events that have occurred before the current timestamp.

2. If VCᵢ[j] = k, Pᵢ knows that k events have occurred at Pj.

The algorithm of Vector Clocks then goes as follows:

1. Before executing an event, Pᵢ records a new event happens at itself by executing VCᵢ[i] <- VCᵢ[i] + 1.
2. When Pᵢ sends a message m to Pj, it sets timestamp ts(m) equal to VCᵢ after having executed the previous step.
3. When message m is received, process Pj update each k of its own vector clock: VCj [k] ← max { VCj [k], ts(m)[k]}. Then it continues executing the first step and delivers the message to the application.

By these steps, when a process Pj receives a message m from process Pᵢ with timestamp ts(m), it knows the number of events that have occurred at Pᵢ casually preceded the sending of m. Furthermore, Pj also knows about the events that have been known to Pᵢ about other processes before sending m. (Yes that’s true, if you’re relating this algorithm to the Gossip Protocol 🙂).

Hopefully, the explanation does not leave you scratching your head for too long. Let’s dive into an example to master the concept:

Figure 7: Vector Clocks explains causality when exchanging messages.

In this example, we have three processes P₁, P₂, and P₃ alternatively exchanging messages to synchronize their clocks together. P₁ sends message m1 at logical time VC₁ = (1, 0, 0) (step 1) to P₂. P₂, upon receiving m1 with timestamp ts(m1) = (1, 0, 0), adjusts its logical time VC₂ to (1, 1, 0) (step 3) and sends message m2 back to P₁ with timestamp (1, 2, 0) (step 1). Similarly, process P₁ accepts the message m2 from P₂, alters its logical clock to (2, 2, 0) (step 3), after which it sends out message m3 to P₃ at (3, 2, 0). P₃ adjusts its clock to (3, 2, 1) (step 1) after it receives message m3 from P₁. Later, it takes in message m4, sent by P₂ at (1, 4, 0), and thereby adjusting its clock to (3, 4, 2) (step 3).

Based on how we fast forward the internal clocks for synchronization, event a precedes event b if for all k, ts(a)[k] ≤ ts(b)[k], and there is at least one index k’ for which ts(a)[k’] < ts(b)[k’]. It’s not hard to see that (2, 2, 0) precedes (3, 2, 1), but (1, 4, 0) and (3, 2, 0) may conflict with each other. Therefore, with vector clocks, we can detect whether or not there is a causal dependency between any two events.

3. Take away

When it comes to the concept of time in distributed system, the primary goal is to achieve the correct order of the events. Events can be positioned either in chronological order with Physical Clocks or in logical order with Lamtport’s Logical Clocks and Vector Clocks along the execution timeline.

4. What’s next?

Next up in the series, we will go over how time synchronization can provide surprisingly simple answers to some classical problems in distributed systems: at-most-one messages [4], cache consistency [5], and mutual exclusion [3].

5. References

[1] Leslie Lamport: Time, Clocks, and the Ordering of Events in a Distributed System. 1978.

[2] Barbara Liskov: Practical uses of synchronized clocks in distributed systems. 1993.

[3] Maarten van Steen, Andrew S. Tanenbaum: Distributed Systems. 2017.

[4] Barbara Liskov, Liuba Shrira, John Wroclawski: Efficient At-Most-Once Messages Based on Synchronized Clocks. 1991.

[5] Cary G. Gray, David R. Cheriton: Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. 1989.

[6] Ricardo Gusella, Stefano Zatti: The Accuracy of the Clock Synchronization Achieved by TEMPO in Berkeley UNIX 4.3BSD. 1989.