Vector Clocks

Event keeping in distributed systems

Aditya Shete
9 min readMay 21, 2022

Introduction

Time is weird, little understood, and for most, a simple tool to measure if their favourite show is on. The notion of time and clocks is distinct. A distributed system person would give a lot for entirely accurate timekeeping clocks. Most problems in the domain are related to skews in clocks, unreliable timings of messages, etc. Synchronization of such systems, which have no other recourse than physical clocks, is a demanding ask. Even after this is achieved, any solution based on a particular system configuration is brittle and hard to scale. A general algorithm that takes minimal assumptions would be an ideal solution. In this article, we will not be using physical time measuring clocks.

Instead, we turn our attention to logical clocks.

Logical Clocks

Logical clocks are event counters, and a system of logical clocks can keep virtual time if they follow these steps:

  • Each event increments its local logical clock before execution. We can say that e1 was executed at that incremented time.
  • Each message send event will contain its local logical clock’s value.
  • Each message receive event will increment its logical clock to the max (local time, message time) + d.
A system of logical clocks in operation. Each process updates its logical clock to max of local, message time plus some increment.

This set of conditions allows us to have virtual time. What is virtual time, you may ask? It is physical time with some caveats. We can model time in the following manner:

1. Transitive

2. Irreflexivity

3. Linearity: For any x, y, C (x+y) = C(x) + C(y)

4. Eternity: For any x there are y, z, such that y < x and x < z.

5. Density: For any x < y, there is z such that x < z< y.

Any set of objects satisfying these properties could model time, say Q the rational numbers, or R the real numbers.

Does our system of logical clocks satisfy these axioms?

No, we have the following caveat:

5a. Discreteness

Our virtual time is not dense; it is discrete and modelled by Z, the integers. The crux of the model of time is this:

The future cannot affect the past

This determines the virtual time a logical clock can assign an event. An event of sending a message must always have a lower virtual time than its associated receive event. We thus have messages acting as carriers of force, i.e. they can cause events in the future. Events in different processes which never talk cannot be compared using virtual time as there is no reference between such processes. Such events where comparison is untenable can be said to be concurrent. The idea of concurrency is based on causality; independent events cannot affect each other.

This is the exact shortcoming of virtual time.

What is this shortcoming you ask?

The answer lies in the way our clocks work:

If e < e’ then C (e) < C (e’). But not vice-versa.
For if C (e) < C (e’), the events could have happened independently and yet satisfy this condition.

Vector Clocks

Therein lies the idea of vector clocks; these assign virtual time to events such that concurrent events get timestamps which force us to conclude they are independent.

So, what are the implementation level details of vector clocks?

Unsurprisingly, it contains vectors, or for the even more pedantic, an array of logical clocks. Each process is numbered [0, n], which is the length of the vector clock. We modify the requirement of logical clocks to this:

  • Pi increments the ith logical clock before execution for each event in the process.
  • Each message sent event will contain the processes vector clock’s value.
  • Each message receive event will increment its vector clock to sup (local time, message time).

The sup is the supremum operation, which takes the max value between the two vectors for each logical clock in the vector. Example sup ([1, 0, 3], [5, 0, 0]) = [5, 0, 3]
We now have the following properties in a system of vector clocks:

  • At any time instant, for any i, j we have Ci[i] ≥ Cj[i]. More verbosely, the local ith logical clock, corresponding to process Pi, is the most accurate component in the vector clock of Pi.
  • Any two vectors clocks can be partially ordered by <. Where u < v is defined as for all i, u[i] < v[i]. We define u and v as concurrent if neither u < v nor v < u.
  • If e occurs at Pi, for any event e’ such that e < e’, iff C(e)[i] =< C(e’)[i].
An example of a system of vector clocks, where each message contains the vector timestamp and each process updates its local vector clocks upon receiving a message.

Now that we have covered vector clocks let’s look at some applications for vector clocks over virtual clocks.

  • Debugging: Events which can cause errors need to be determined. This is to say that concurrent and later events must be determinable.
  • Deadlock, Termination: We need to determine if the global state of the system has terminated and hence can be restarted safely.

Global Snapshots

A global snapshot of a distributed system is the merging of all the local states and message channels to form a global patchwork that can exist at a certain time.

Before we move on to the problem of getting such a global snapshot, we need some definitions.

  • A cut is a proper subset of all the events in our system, with the additional property that each process has at least one event in cut C.
  • The maximal elements of each process Pi in C are known as cut events.
  • A cut will contain all the local events that occurred previous to Pi’s maximal element in C.
Image showing multiple possible cuts in a system of distributed processes.

Informally, a cut partitions the event set into PAST and FUTURE, with the maximal events of each process Pi in C forming the “PRESENT”. We can use the set-theoretic manner of comparing cuts to measure time. Each cut C1 can be later than C2 if C2 is a proper subset of C1.
Consider the more important requirement: We require the cut events to represent the system’s PRESENT and not an arbitrary collection. We have to first ask:

What forms a PRESENT of the system? Any set of events occurring in parallel with no dependencies on other events, such as a message receives.

  • We define a consistent cut as a cut such that for all events e in C, if e’ < e in E, e’ belongs to C.
An example of inconsistent cut. The inconsistent cut C has an event in the cut which is casually effected by an event not in the cut.

Using this definition, we have strengthened our cuts to include all possible events which could have affected each event present in the cut. We thus have ensured that the cut event will consist of events occurring concurrently, or else it is not a consistent cut.

Why all this drama? Our snapshot problem is now clarified to be determining consistent cuts of the system.

What is the time of a cut? We define the time of a cut event as sup (all clocks in a cut event). Sup is as we expressed for a set of vectors.

An example showcasing the global vector time at various points of time.

For consistent cuts, we have the simpler definition that the time of the cut is simply for all in [0, number of processes]: (…… C(ei)[i],…..). This can be understood from our previous discussion:

All events in a consistent cut event being concurrent and the fact that each process’ logical clock element is the most up-to-date. Thus we have a vector clock based method to determine whether a cut is consistent.

Computing global snapshots

We have a relatively straightforward algorithm to compute the global snapshot for systems with physical time:

  • All processes agree on some future time s.
  • A process takes a local snapshot at time s.
  • After time s, the local snapshots are merged to form a global state.

How to extend this algorithm to work with vector clocks?
The first requirement is that the time s corresponds to a consistent cut of events. This can be achieved at any point of a process Pi if we use messages to relay a request for a snapshot. This can be seen by the fact that the local logical clocks at i for Pi, at any moment for any process, are at their most accurate value. Thus the condition of consistency of events is trivially satisfied.

We have the following algorithm for a system of vector clocks:

  • Pi ticks and sets the time s = Current time + (0… 1(ith position), 0…0) as the common snapshot time.
  • Pi broadcasts this timestamp to all processes.
  • Pi awaits responses confirming the acceptance of the common snapshot time.
  • Pi tick to the common snapshot time and hence takes a local snapshot and broadcasts a dummy message to all processes.
  • The dummy message, in turn, forces the local time to be sup (local, snapshot time), which is strictly greater than snapshot time. Thus, for all processes, a local snapshot event is triggered.
  • All processes then respond with their local snapshots to Pi.

Variations and improvements

We see that the algorithm makes minimal assumptions regarding the message channels used; a system of logical clocks requires FIFO channels.

A snapshot triggered by any local event is guaranteed to be consistent; consistent is merely possibly happening together and not actually concurrent if measured by physical clocks.

The process Pi is kept waiting in the algorithm, not an ideal state of affairs. Can we improve upon this situation? If we notice that Pi has already taken its snapshot, we can do away with the entire first broadcast. Instead, we create a local Boolean flag to show that a message is after a snapshot has been taken. Each process will take a snapshot and further set its local Boolean flag to true upon receiving such a message. If we can guarantee that all messages are eventually received, we do not have to care about the message order. Any snapshot that will be taken will be of consistent cut.

Another glaring flaw in our algorithm, the algorithm does not capture messages in transit. The global state of our system should also consist of information regarding in-transit messages. We could augment the procedure such that processes sent to Pi all the messages received after taking the snapshot. The obvious problem with this schema is this: Pi cannot know when it has gathered all the messages in transit. It must wait idly indefinitely, or we must provide it with some message transmission upper bound; both solutions are inelegant.

A better way we can go about this is to have deficiency counters for messages at all processes; these are counters which count (messages sent — messages received). If all processes pass this information to Pi, it can calculate the messages in transit at the time of the snapshot. Any process that sends messages after it has taken its snapshot will change its flag; all further messages cause a receiving process to take snapshots.

Conclusion

While vector clocks solve many of the problems a system of logical clocks are not crafted for, there are certain drawbacks to using a system of vector clocks.

  • Message loads: Each message contains the full vector representing the local vector clock in its messages. Thus the bandwidth of a large system can suffer.
  • Poor scalability: A scalable system can dynamically add new processes. A system of processes with vector clocks would not be able to dynamically change the number of processes.

The paper, reference[1], has some interesting and lucid theoretical derivation of all the properties I have given, plus a discussion on the nature of vector time.

References:

[1] Virtual Time and Global States of Distributed Systems: *Virtual Time and Global States of Distributed Systems (ethz.ch)

All images are available in the paper.

--

--