We often spend a lot of our lives blissfully unaware of how something works. Most of the time, this ends up being an okay thing, since we don’t really need to know how everything around us works. But there are some times when we realize just how much complexity been hidden from us, abstracted out, tucked away neatly so that we never have to think about it.
I had one of these realizations recently when I discovered that I had never thought twice about a simple thing that I work with on an hourly basis: time! I’ve been using computers for the better part of my life, first as a consumer of technology and, later, as a creator of it. It was only until I began learning about distributed systems (for this series!) that I understood that I didn’t know as much as I thought about telling time, much less how the computers that we use everyday end up deciding what time it is.
When it comes to distributed computing, time is a completely different beast. But before we can really get into time in distributed systems, we’ll first need to understand how individual machines track time. So let’s start learning about what exactly makes time so darn tricky.
Most devices these days have some concept of what day it is and what time it is. With the exception of some “internet of things” devices (like a Raspberry Pi), all machines have some notion of time. But how do they figure out the day and time?
The answer is: with a clock! Now, this might seem obvious at first, but it gets a little tricker as we go on. A computer that needs to be able to tell the time will often have an internal clock that is built right into its hardware in the form of an integrated circuit. This circuit is often built right onto the motherboard. This small piece of hardware is known as a real-time clock, or RTC for short.
RTCs are particularly interesting because they come with an alternate power source (like a battery), which allows them to continue working even if a machine is powered off! This might seem obvious to us today, but the use of RTCs was a pretty significant milestone in the history of computing; early personal computers didn’t actually come with RTCs built-in, and they were added in later on. Now, we’ll find refrigerators and microwaves that have these little clocks built right in! That’s pretty wild if you think about it.
The RTC is responsible for keeping track of the current time, and we can think of it as one individual machine’s system clock. It’s important to note that this clock is specific to the “system” of the machine; in other words, any process or tasks or work that the machine does that relies on the time will inherently rely upon whatever time the system clock says it is.
So, how does this physical clock work, exactly? As it turns out, deep inside the integrated circuit is a crystal, which vibrates or oscillates; it’s known as a crystal oscillator.
Without getting into the nitty-gritty (read: physics) of this works under the hood, all we really need to know for our purposes is that the vibrations of the crystal are captured and counted by the clock. As the crystal vibrates, the clock keeps count of each vibration in the form of ticks, and as it counts one tick after another, it keeps track of the time. Of course, this begs the question of where it keeps count of each tick that it records. The physical clock uses a binary counter circuit, a simple circuit that does nothing more than count in binary, in order to store these ticks. Conveniently, the binary counter circuit (the system clock) is where a machine derives its system time from. Indeed, just as a machine has a system clock, it also has its own notion of time that is based on that clock!
An interesting thing about system time is that it always is calculating time based on when the system clock began counting. This means that if we set the system clock on a new computer to be five minutes slow or two days fast, then our starting point for our system time or the “zero” of where we start our counting of time will end up being either five minutes behind or two days ahead. In other words, we’ll be measuring “ticks” based on whatever date or time we set — not on what the actual time is.
To help avoid some of the confusion of this problem, most machines follow some kind of convention when it comes to deciding what that starting point should be. For example, machines that have Unix operating systems have standardized around Unix time, which is a way of deciding what the “zero” of where we start counting time should be. In the case of Unix time, the starting point or “zero” is the start of the Unix epoch, or January 1st, 1970, at 00:00:00 UT (Universal Time). An epoch is meant to be arbitrary; it’s nothing more than an agreed-upon starting date and time for when we should start measuring time. My personal favorite epoch is the one for Microsoft Excel programs, which is 0 January, 1990! (For the epochly-curious, check out this extensive list.)
Out of sync, out of control
Now that we know that a clock can start “counting” at arbitrary times, it’s time to add another factor to the mix. Namely, what happens if we have more than one clock? This is when things start to get, well, a little out of sync.
Since every machine can have its own notion of time, we can assume that two different machines will each have their own concepts of what time it is. But this is where things start to get a bit complicated: if two different machines have two different ideas of what time it is, how can we be sure that they are the same?
Unfortunately for us, computer clocks are not consistent.
We already know that every clock has a different idea of when it started counting and its “zero”, and that one clock could be inconsistent with another. However, it’s also worth mentioning that not all clocks are precise, and some are more precise than others. Over time, the preciseness of each “tick” of a clock really starts to have an impact on how that clock determines its time.
For example, a typical quartz clock will drift quartz clocks will lose or gain approximately one second over the course of 11 or 12 days. This is due to a tiny imprecision that occurs as the clock measures a single second as it ticks. A single second may not seem like much, but over time, slight imprecisions really start to add up! The preciseness of a clock could be affected by temperature, location, the clock’s source of power, and even just how well it was constructed.
This phenomenon of limited clock precision causing two clocks to to count time differently is known as clock drift. Unfortunately, it is just a reality of any machine that needs to keep track of the time on its own. Because clock drift is so common, we often also find ourselves comparing two clocks that display two different times. This is known as clock skew, and it is the difference in time between two clocks.
In a perfect world where two clocks agreed on the time, we wouldn’t ever run into either of these concepts! In such a utopia, both clock drift and skew would be zero.
Alas, we do not live in such a world, and so we must think of both of these things when comparing two different machines and their times. Instead, we live in a world where things are messy and…distributed.
No one clock to rule them all
We’ve talked a lot about clocks and how they work and how they disagree, but what does that have to do with distributed systems, exactly? As it turns out, all of this clock talk is leading is to one of the core, foundational principles of distributed systems that, somehow (!), we haven’t covered it. Now, it’s finally time to talk about it.
As we already know, in a distributed system, all the individual components are called nodes, and they are each autonomous, capable of performing their own work. We also learned that each node has its own notion of time, and keeps track of its own time internally. When we combine these two facts together, we arrive at one singular conclusion:
There is no one, true global clock in a distributed system.
Every node in the system has its own concept of time, and there is no centralized place for the nodes in a distributed system to figure out what time it actually is. And perhaps this wouldn’t matter if we didn’t care about time so much! We use time so frequently in computing, particularly in order to figure out when an event occurred, and which event happened before another.
So what happens when we don’t have a global clock in a distributed system? Well, for starters, we can’t actually know the real time that any two evens occurred, or the order that they occurred in. This makes is very hard to figure out how two events might be scheduled in the future. It also makes a distributed system really difficult to debug, since we don’t know for sure whether one event occurred before the other!
We’ll be talking more about time and the ordering of events in upcoming posts, and hopefully we’ll find some tried and true solutions for this problem. Until then, try not to look at your watch or think too much about what time it is.
Clocks and time in distributed systems can be one of the most unintuitive concepts to wrap your head around — especially if you’re new to distributed systems! Thankfully, there is a ton of course material out there that covers these topics in great detail. Below are some of my favorites, which I relied upon myself when I was learning about clocks and time!