Message Ordering in Distributed Systems

Sriram R
4 min readFeb 22, 2023

--

The order in which messages are processed determines the final outcome of the actions in any distributed system. This is actually more difficult than it appears to be.

Why is the order of messages important?

Consider the following scenario: A user initially dislikes a post but quickly realizes they want to like it and does so almost immediately.

Since network delays can vary, let's say the "like" request comes before the "dislike" request. We would dislike the post if we processed the messages in the order they were received, but the user’s intent was to like the post.

Thus, determining the order of two messages is critical for our applications.

Types of Ordering

There are two ways to order.

  1. Total Order
  2. Partial Order

Total Order

Given a set of events "A, B,C,D,’ there exists a single order of all messages in total order.

For instance, if we have a set of numbers 7,4,5,9,1 and want to order them using the < relation, there is only one order1,4,5,7,9.

Total order is used in single-machine systems where it is easy to figure out the order of events because they happen one after the other. For any events that happen at the same time, the machine's clock can be used since it’s a single global clock for all processes.

Total ordering, on the other hand, is not possible in distributed systems because different machines have different clocks and concurrent events cannot be ordered.

Partial Order

Only some events can be ordered in a partial order, while others cannot. As a result, multiple orders are generated for a given set of values.

For example, if we have a list of events "A, B,C,D’ and we know that events A and B are ordered, but Event C occurred at the exact same millisecond as Event A and Event D occurred at the exact same millisecond as Event B, there is no way to order these four events in a single order because they occurred concurrently.

Partial order is commonly used in distributed systems because some events occur in order while others do not, which is consistent with the theme of partial ordering.

Happens Before Relation

When we have two events A and B, we say that A happens before B (A -> B) if

1. Event A occurs before Event B in the Node Execution Order
2. Event A is the transmission of Message M, and Event B is the reception of the same message, because a message cannot be received before it is transmitted.
3. There is a Message C such that A occurs before C and C occurs before B.

If we can't establish a happens-before relationship between two messages, it means they happened at the same time. ( a || b ).

In this diagram

  1. Events A and B occurred in the same node, and A occurred before B in the execution order of the node.
  2. Event A is the transmission of Message M, and Event B is the arrival of the same message, because a message cannot be received before it is transmitted.
  3. There is a Message C such that A occurs before C and C occurs before B.

Causality

Take, for example, social media. Do we really care about the order in which we see two unrelated posts? Most likely not.
As a result, the system could use Partial Ordering here, where posts that cannot be ordered are displayed in a random order.

However, there are some events that must be shown in chronological order. For example, if User A responds to Comment C1 by User B, User A’s comment must appear after User B’s comment. The conversation would be difficult to follow otherwise.

What we described in the preceding example is the concept of Causality, which states that one event contributes to the occurrence of another, i.e. Event B occurred solely because Event A occurred.

Based on what we’ve seen so far, we can also conclude that if Event A happened before Event B, then Event A may have caused Event B. It is critical to emphasise that this is a possibility, not a guarantee.
However, if Events A and B occur concurrently, we can be certain that Event A did not cause Event B and vice versa.

Establishing Causal Ordering

Assume we want to determine the causal ordering of two events. Is it accurate to rely on Physical Time, especially since two nodes can drift at any point in time?

This gives rise to the concept of Logical Time, in which we don’t need the date and time to order events causally. In the following article, we’ll go over the specifics of Logical Time.

--

--

Sriram R

A software engineer on the way to a jack-of-all-trades.