All Things Clock, Time and Order in Distributed Systems: Physical Time in Depth

Published in

Geek Culture

18 min readMar 9, 2021

Introduction

As a human being, we take birth as a child ( 0th year), start going to school ( ~ 5th year ), university for under graduation ( ~ 18th year ), possibly post-graduation ( ~ 22nd year ), then start our career ( ~ 22nd to 24+ year ) and life moves on. The whole thing happens in a certain order: you probably can’t go to a university without completing school — you may if you are a prodigy, that’s a one-off case though. It takes on average around 20+ years to complete this journey. Once we look back on the journey, we say we have grown up so much with time, Wow!!.

Just observe, how the concept of time comes into the picture here to define a period and most importantly to define the order of events in our life.

Similarly, we derive time to define a period and order of events to enable synchronization of events in business systems. Let’s see some important use cases of ordering and time across industries:

Mobile and data networks use the precise time to enable smooth synchronization among themselves so that mobile handsets can more efficiently share a limited radio spectrum.
The aviation industry needs accurate weather information, so precise time is used to synchronize weather station data spread across locations.
Seismic monitoring networks help researchers to quickly locate the epicentres of earthquakes and other seismic events by using precise time across the network.
In order to detect electrical anomalies in a power grid, equipment and utilities need to use precise time. It helps engineers to analyze power blackouts and identify the exact point where the anomaly started.
Industries like Hollywood are adopting precise time to control audio, video data, multi-camera sequencing to give ultimate user experience.
Application and network monitoring systems are time-sensitive to help engineers efficiently detect performance issues.
Stocks trading platforms, online gaming, instant messaging have their own ordering and timing requirements.

These are just some vital examples of precise time just to make you understand how important it is in general. We are more interested to understand time in the parlance of low-level distributed systems and the rest of the article focuses on that.

Q. So, to start with, where do we need ordering in distributed systems?
A. There are many use cases that require some notion of ordering in computing:
Example 1: You are in an online meeting with your team for a daily stand-up ceremony but you receive a large chunk of video frames completely unordered. Will this leave a good taste of user experience to you?
Example 2: You are debugging a critical issue in an e-commerce company where there are many consecutive service calls. When you trace the calls you see that a customer order was confirmed even before payment took place. How would you feel about that?
Example 3: Similarly in a leader-less database system, multiple machines can accept requests for the same object for better availability and they need to sync the object in such a manner that the sequence of events that happened on that object remains acceptable in the system and business logic.
Example 4: Transactions need ordering in operations, even transactions themselves need to be ordered among each other in some cases. Whether, it’s a financial institution, e-commerce, or social media platforms, transactions are everywhere.

The truth is we can’t avoid the notion of ordering in many systems, but yes, not all systems require ordering, it all depends on how the business use case at hand looks like.

Now the question is how do we measure ordering?

To order a sample of data in a process running in a computer, the easiest intuition is to assign monotonically increasing numbers to them and compare who comes before whom. We can manage a counter variable inside that process and keep on incrementing it, simple!

What if our sample data is affected by multiple processes or worst even multiple threads running in the same computer? We can still manage a global counter ( possibly guarded by locks ) and still can achieve the ordering but obviously at the cost of multi-thread or process synchronization.

Is there any alternative? Can’t we simply use our computer clock since generally it’s monotonically increasing?

Both the ideas of managing counters and clocks look intuitive at first since our basic intuition is that they are monotonically increasing. But things are not as simple as they look like.

Let’s first discuss if clocks can rescue us here.

Computer Clock And Physical Time

Every computer ships a hardware clock in its motherboard made of some material which works on the mechanism of mechanical crystal oscillation. Mostly quartz clocks are so common as they are light, cheap and nowadays they are synthetically developed. Quartz crystals oscillates with a precise frequency when a particular voltage is applied to them, the clock counts these oscillations. A specified number of oscillations is called a tick. Every tick represents an unit of time. The clock internally manages a 64-bit counter and increments it to mark a tick.

The operating system maintains a software clock which in turn uses the hardware clock to calculate the current time.

Note: Not only Quartz clock, but any clock has to measure time based on some sort of oscillation. In Quartz clock, Quartz crystals are made to oscillate.

Using a computer clock to calculate order looks very intuitive at first, but for a distributed system containing multiple nodes, this gets tricky.

The Problem With Clock

There is no single global clock in a distributed system. Every computer
( any computing device but we are mostly concerned about server side ) has its own clock and their materials, physical properties, clock rates are different. Also depending on environment of the location ( physical condition ) where the servers are placed, oscillations of the clocks may get impacted due to temperature variation. So no two clocks would ever be exactly the same in terms of measuring time. There could be some milliseconds or even seconds of difference between two clocks.

Clock Skew ( offset ): The difference between the time on two clocks is called clock skew.

Clock Drift: As mentioned, no two clocks would have the same clock rate of oscillations i.e; clock rate would be different. The difference of clock rate is called clock drift. Ordinary quartz clocks drifts by ~1 second in 11–12 days. The drift rate varies from clock to clock.

Q. Are server ( computer ) clocks always monotonically increasing?
A. There are generally two very different kind of clocks in a Linux based system, an exhaustive list is here:

Real Time Clock or Wall Clock: This is the clock that we see in our computer ticking. The clock can synchronize to NTP ( we will discuss NTP after some time ) and jump forward or backward accordingly to adjust its current time. Linux CLOCK_REALTIME is the implementation which offers get_time() and set_time() APIs. System.currentTimeMillis() is Java’s real time clock implementation. So this time is not suitable to order events anyway.

Monotonic Clock: Monotonic clock time represents absolute elapsed time since some arbitrary fixed point in the past. This type of clock does not jump, rather after synchronizing to NTP, if it identifies that the local clock is lagging or leading NTP, it can accordingly adjust the clock rate. It takes server startup time or or any reference time ( Epoch ) as the base time and strictly monotonically increases that. CLOCK_MONOTONIC is the Linux implementation which takes server startup time as the base, if the server restarts, the clock also resets. The clock does not offer set_time() API. System.nanoTime() is Java’s implementation of monotonic clock. If you need to calculate the absolute time difference between two points in a process, you can use monotonic clock time for that purpose. However monotonic clock across processes are again not synchronized to each other, hence simply it also can not be used to order events in a distributed system.

Accuracy of quartz clock

According to this article by NASA, a quartz clock can drift by 1 nanosecond just after an hour, 1 millisecond after six weeks. So the drift is achieved pretty fast making quartz clock unreliable for super precision use cases.

More Accurate Clock: Atomic Clock

In atomic clocks, energy in some form ( like laser ) is applied to the atoms of some particular element ( like Caesium ). This causes the sub-atomic particles like electron in the atoms to move from one energy level to a slightly higher energy level ( called hyperfine ground states ) and shortly after they come back to the previous energy state, thus releasing that extra energy gained earlier in terms of microwave radiation at 9,192,631,770 Hz.

‘Hz’, hertz is the SI unit of frequency, defined as ‘per second’. The SI unit defines second as “the duration of 9,192,631,770 periods of the radiation corresponding to the transition between two hyperfine levels of the ground state of the cesium-133 atom.”

So when 9,192,631,770 waves of the microwave emission coming from the caesium atoms is detected, one second is gone. This measurement is so precise that atomic clocks are known to be the most accurate clocks till now.

Any clock has three general high level components:

Energy Source: In atomic clocks, say a laser source ( depending on design, the source may vary ) which is applied on the atoms to make them vibrate between designed energy states. In Quartz clocks, the battery ( in case of computer CMOS battery ) works as the source.

Resonator: In atomic clocks, the radiation that the atom releases is the resonator. In Quartz clocks, mechanical gears which moves in periodic manner work as resonator.

Energy source and resonator are together called oscillator.

Counter: In atomic clocks, the detector which measures the cycles of the microwave radiation. In Quartz clocks, the display is the counter.

Accuracy of atomic clocks

Generally atomic clocks are accurate to about a billionth of a second per day.

National Institutes of Standards and Technology (NIST) in Boulder, Colorado, USA maintains a Caesium atomic clock called NIST-F1 which serves as the primary time and frequency standard for the United States.

The NIST-F1 cesium atomic clock can produce a frequency so precise that its time error per day is about 0.03 nanoseconds, which means that the clock would lose one second in 100 million years.

More details of atomic clocks can be found here , here and here.

However, atomic clocks are not suitable for commodity servers and computers. They look bigger than a refrigerator, extremely expensive and require special maintenance. Look at the following NIST-F1 cesium atomic clock to get an idea:

GPS Clocks

Satellite onboard GPS clocks are smaller atomic clock installations which are very much precision correct but not as accurate as the giant ground atomic clocks described above. Certainly their energy source or technologies behind are different than ground atomic clocks. The following GPS clock is very interesting:

NASA’s Jet Propulsion Laboratory developed Deep Space Atomic Clock, a very advanced GPS clock which helps its satellite to navigate in deep space with minimal interference from earth. Long distance mission to Mars or any other planets are powered by such technologies.

This atomic clock is up to 50 times more stable ( stability means how consistently the clock measures an unit of time ) than other GPS clocks ever flown in space.

For more details on Deep Space Atomic Clock, see here.

Accuracy of GPS clocks

NASA’s Deep Space Atomic Clock will be off by less than a nanosecond after four days and less than a microsecond (one millionth of a second) after 10 years. This is equivalent to being off by only one second every 10 million years.

Figure: 2, Courtesy: Deep Space Atomic Clock Images

Let’s now see how the above time sources are used to calculate actual time that servers and other devices are using on a daily basis.

UTC ( Coordinated Universal Time ) Time

I am sure everyone is aware of UTC. It is the global standard for time based on which local time is calculated at different time zones. UTC has following two components:

Universal Time (UT)

Universal Time is an astronomical way of calculating time measured by the International Earth Rotation and Reference Systems Service (IERS). It’s different than usual clock time.

Universal Time is a solar time standard that reflects the average speed of the Earth’s rotation. Using the prime meridian at 0° longitude as a reference point, it shows the actual length of an average solar day on Earth, which is the time from one solar noon to the next. During a solar day, our planet completes a full rotation around its axis in relation to the Sun.

Due to the presence of several celestial bodies around the Earth in space, the shape of the Earth changes, e.g: Moon’s gravitational pull causes tide in Earth which in turn changes the shape of Earth. This causes some adjustment in the total time Earth takes to complete a day. So typically an Earth day is few milliseconds more than 24 hours, in some rare cases, it can be few milliseconds lesser than 24 hours also.

As per the date of writing, following list shows the average solar day length for the recent few years. If you want to know what is today’s solar day length, checkout this page.

Figure 3, Average solar day length as per March 7, 2021

Universal Time comes in different flavours: UT0, UT1, UT2. UT1 is very popular, others are rarely used.

UT1 is computed from determining the positions of distant quasars using Long Baseline Interferometry technique, laser ranging of the Moon and artificial satellites, as well as the determination of GPS satellite orbits. UT1 is the same everywhere on Earth, and is proportional to the rotation angle of the Earth with respect to distant quasars, specifically, the International Celestial Reference Frame (ICRF), neglecting some small adjustments.

In the Very Long Baseline Interferometry techniques, multiple radio telescopes are placed thousands of miles apart to observe quasars. Combining data from all of them effectively works as a very large telescope thousands of miles in size with a much higher resolution and an ability to determine the planet’s rotation rate to an accuracy of less than a thousandth of a second. Watch the below video to get better idea:

International Atomic Time (TAI)

TAI is a time scale that uses the combined output of around 400 highly precise atomic clocks in 69 national laboratories worldwide.

It provides the exact speed at which our clocks tick. The time scale is weighted, prioritizing the time signal provided by institutions that maintain the highest quality of primary cesium.

International Atomic Time does not take Earth’s slowing rotation in action but Universal Time takes. So there is always a time gap between the two.

Leap Second

Not only tidal effect due to Moon’s gravitational pull but also phenomenon like earthquake, mass distribution change in the Earth’s molten outer core, movement of large masses of ice near the poles, density and angular momentum variations in the Earth’s atmosphere impact Earth’s rotation. The overall observation is Earth is getting slower and currently it has slowed down by 0.002 second per day per century. This rate is not constant, just an average figure, and this rate also grows slowly over time.

Q. So what? This figure is very small, why to care?
A. Yes it’s small, however, the daily difference piles up over time. At this rate, today it’s 0.002, after about one and half year, it’ll be 1 second, after about 5000 years, it’ll be 1 hour. It means if we don’t care about this deviation, our clocks will be out of sync with astronomical time or Earth’s actual rotation rate at some point in future. To compensate this deviation, leap second was introduced.

Leap second is an extra second that’s generally added on the last day of June or December ( typically after 23:59 hours ) when the time difference between atomic clock (such as NIST-F1) and Universal Time becomes ≥ 0.9 second ( read towards 1 second ).

There is no definite time when a leap second would occur, however depending on how fast or slow the Earth moves, the internet time keepers publish the next leap second accordingly. The last leap second was added on December 31st, 2016 ( or 1st January 2017). The full list of leap seconds till now can be found here.

Adding a leap second is tricky, read how the web did crash in 2012 just because of a bug in Linux that could not handle leap second.

Q. Can a leap second be negative?
A. If the Earth rotates faster continuously ( in 2020, earth rotated faster 28 times :O ), an average day would be some milliseconds ( may be a fraction of millisecond ) less than 24 hours. In that case leap second can be negative. But since the inception of the concept in 1973, it never happened.

Leap Second Smearing

Although servers and systems can still run without adding the leap second, the time difference will get significant as discussed above creating more confusion in future. Adding a leap second to a clock seems simple but it’s a cumbersome process . There are couple of work around like stop / pause the clocks for that second ( technically taking the clocks backwards ) to bluff them as if there is no extra second or literally add an extra second to the clocks. These are very erroneous tricks.

A better approach was introduced by Google called leap smearing — instead of adding it at once, break that extra second into several milliseconds, keep adding them to the clocks before and after leap over a long period of time. This is a clever solution but not a norm yet. Google publishes such times through its NTP servers.

Fact: GPS clocks don’t care about leap second, so GPS time is not in sync with the Earth’s real rotation. They transmit extra offset information in GPS signal to let the receivers know how much offset they need to adjust.

UTC time is derived using UT and Atomic Time.

The rate of UTC is based on atomic frequency standards but the epoch of UTC is synchronized to remain close to astronomical UT.

Now that we know normal device clocks are unreliable and can easily drift apart, also atomic clocks are so expensive that even some of the biggest companies don’t spend on them willingly, the question is how do we synchronize clocks in every day device and commodity servers?

6. Network Time Protocol (NTP)

NTP comes to the rescue here :) Yayyy!!!!

NTP is usually a UDP based protocol designed to synchronize clocks in a variable-latency packet switched network by choosing suitable time servers considering network latency. NTP synchronizes clocks to within a few milliseconds of the UTC time.

NTP has a concept called stratum which is nothing but a hierarchy of time servers. There are 16 such layers from 0 to 15. Stratum n + 1 synchronizes with stratum n. Stratum 0 the most accurate level, stratum 1 is lesser accurate than stratum 0 and and so on, stratum 15 is the upper limit, level 16 represents completely unsynchronized devices.

Any computer can synchronize its clocks to internet reference clocks ( see stratum 0 clocks below) via this hierarchy of NTP time servers.

Figure: 4, Courtesy: https://www.ntp-zeit.de/index-en.htm

Stratum 0 represents the most accurate ( high precision ) clocks like GPS clocks, atomic clocks, radio clocks etc. Stratum 0 clocks don’t synchronize with themselves. They are called reference clocks, it’s a very special category, no other clocks in NTP can advertise themselves as stratum 0 clocks.

Stratum 1 devices maintain direct connection to stratum 0. The yellow arrows in the above picture represents that. They might be few microseconds behind than stratum 0 clocks. They also work as fallback clocks to sync up with each other just in case stratum 0 servers are unavailable. Stratum 1 devices are called primary clocks or primary time servers.

Stratum 2 devices synchronize to stratum 1 devices over the network ( as shown by the blue arrows in the above picture ). Their accuracy is within few milliseconds of stratum 1. Stratum 2 devices can query multiple stratum 1 devices and can even synchronize with themselves also for better accuracy.

Likewise stratum 3 devices work in a similar mechanism as stratum 2 and so are the other stratums.

Q. Why is NTP levelled as a hierarchy of time servers?
A. To handle scale. There are millions of devices that connect to NTP servers asynchronously to adjust time. It’s not feasible to let all of them connect to the same NTP stratum. Hence the hierarchy comes into place. Usually any device say our computer clock queries multiple time servers at different levels and chooses the best one to adjust time.

There are thousands of stratum 2 servers already. Most of the big companies manage their own NTP time server to let thousands of their devices sync time — one of their computer connects to NTP stratum 2 server and all other servers or devices connect to this internal server ( thus forming stratum 3 ) to sync their own time.

Q. How does a computer synchronize to NTP?
A. Computers run daemons like ntpd ( mostly in *unix machines ) or chrony which synchronize to NTP servers — basically they poll NTP time servers regularly as per defined interval in configuration file. Both of them implement NTP protocol but generally chrony is more accurate and better than ntpd since it uses extended NTP protocol.

Q. How does NTP handle leap second?
A. NTP servers can be configured to smear the leap second and propagate the adjusted offset to the downstream devices. Google NTP servers do that.

NTP Pool

NTP pool is a virtual ( not a real cluster managed by any company, you can register a server running in your home if you have a reasonable internet connection and the server has a static ip ) cluster of time servers hosted under subdomain in pool.ntp.org. Currently there are 4300+ time servers under NTP pool.

Accuracy of NTP

Accuracy depends on many factors like stratum level ( more the level, more distant your clock is from the reference clocks, hence possibly lesser the accuracy ), network latency due to congestion or equipment, variability in network delay, what network path the request has taken, network speed and quality etc. Also ntpd or chrony daemons measure estimated time based on the mentioned parameters.

Measuring NTP latency on your own is not easy, a special hardware setup is required. Facebook has its own NTP server (time.facebook.com servers), you can go through this article to understand how they measured NTP offsets for their experiments.

It’s very hard to quote any figure, but overall depending on multiple factors, NTP accuracy can be in the range of 0.1 ms with fast LAN and 10 ms or lot more for intercontinental network.

A good discussion on NTP accuracy could be found here.

Public NTP servers

If interested, check out here is a list of free public NTP servers.

Q. Coming back to the original problem of ordering in distributed systems, should our problem solve if we use NTP synchronized clocks to order events?
A. The accuracy of within a minimum of 10 ms ( in the worst case it could be even higher ) looks small apparently. But for a massively scalable system, every microsecond to nanosecond counts. So the accuracy is not suitable for most of the large tech consumer facing company.

Physical timestamp has the potential to strictly order events. Example: let’s say there are two nodes with a physical time difference of 1 millisecond, the first node runs transaction A, the second node concurrently runs another transaction B. A happens at timestamp 1614308336728, whereas B happens at timestamp 1614308336727 — notice the time gap of 1 ms between A and B. Even though the transactions are actually concurrent, when you retrieve them by descending timestamp, B appears before A. If there are thousands of such transactions happening at this very moment, imagine how messy the ordering would be. Hence using physical time, we just lose the notion of concurrency in this particular example. Similarly, if two transactions are not even supposed to be concurrent, they may appear concurrent because of the timestamp difference. Depending on systems and business requirements, this kind of scenarios might cause difficult to trace bugs.

This is the general problem with physical time even though clocks apparently appear to be highly synchronized.

Q. How do we solve the problem now?
A. We have two ways to solve this at a high level:
Option 1: Forget that real clock exists, take a simpler mechanism to define order. Numbers, Yes!!! Use plain integer counters which increase monotonically. This mechanism is called Logical Clock or Logical Timestamp.

Option 2: Use real clocks — define a mechanism so that clock accuracy is possibly the highest — define smallest possible maximum time deviation that could be observed across nodes at a given instant. This is tough to achieve. We’ll review this part of clock in a later article.

Conclusion

I hope now physical time sources and time synchronization is quite clear to all of us. We have seen how important physical time is in our life, how it’s related to even space science. We now know that it’s not always possible to use physical time to order events in a high scale distributed system.

In the next article of the series, we’l discuss logical clocks in depth. Till then, digest this article, appreciate the scientists and engineers for doing so much awesome research on time, clocks and creating such beautiful systems for the good of mankind.

If you liked it, please give multiple claps and share it on social media like Twitter and LinkedIn with broader audience.