Message Ordering in Kafka

Yashraj
TechVerito
Published in
5 min readApr 25, 2024

--

In modern microservice architecture, powerful messaging systems are the backbone of the system. Kafka is one such messaging system that offers high-throughput stream event processing. Compared to other messaging systems like RabbitMQ, Kafka shines in the amount of messages it can handle, which is millions of messages per second. How does it achieve this? Well, it uses two superpowers:

  1. Kafka uses physical log files to store messages and to read these files, it employs sequential disk I/O. This is a storage system that stores and accesses data from adjacent memory space, and it is faster than random disk access.
  2. A Kafka broker which is a Kafka server contains topics and partitions. A partition is smaller data storage within a topic that consumers read from. There can be one or more partitions in a specific topic.

But why Partition?

Now, usage of sequential disk I/O is self-explanatory but why partition? Consider this, if a topic is constrained to live in one node in the cluster this would put a limit on the size of the topic and make it difficult to scale, i.e. a topic could never get bigger than the biggest node.

Therefore a topic can be divided into several partitions and each partition can exist on a separate node in the Kafka cluster this is what makes Kafka an efficient distributed system.

But still if one chooses to have only one partition per topic, it’s simple, messages meant for that topic are stored there and consumers read from it.

But how does it work for multiple partitions?

Kafka assigns a partition to an incoming message using the round-robin algorithm. Let’s look at the below scenario where we have one producer who is generating messages M1, M2, M3, and M4.

This looks good! But wait what if the consumer is running multiple instances and reading from Topic A? In that case, all these instances are added to something called a Consumer group.

Each Consumer in a group is allocated a partition to read from.

Let’s say we have two consumer instances running for an application

Observe the output messages M3, M2, M1, and M4 which are not in the same order sent by the producer. This may happen because instance 2 was faster and read the message M3 faster as compared to instance.

This is not a desired behaviour imagine getting an order placed event gets processed before the payment received event in an e-commerce app.

Why does this happen?

Kafka does not guarantee message ordering within a topic

The Solution

Luckily Kafka doesn't leave us without options

We can assign a key to a particular message while publishing. Now when the partitioner finds the message with the key it will send all the messages with the same key to a particular partition.

Let’s assume messages M1 and M3 are the messages for a particular order being made where

M1 = Payment received and M3 = Order placed

Therefore Producer sent M1 and M3 with the same key K1.

Kafka ensures that only one consumer reads one partition at a time

Due to this constraint messages M1 and M3 are always read in order as sent by the producer

But note here that output is still not in the same order of as sent by producers. As ordering is only guaranteed for a key. Moreover, the output may be M1, M2, M4, M3 and any other combination. But it is always ensured that the mutual order of M1 and M3 is preserved i.e. M1 is always received before M3. That means,

Message order is only guaranteed within a partition

Now how does this solve our e-commerce problem previously mentioned here?

Obviously, in real-world applications, there will be more than one key assigned to a partition in a real application.

Let’s get our hands dirty trying out this concept. We will set up a Kafka server and publish messages with keys and observe how it behaves

Run this docker-compose file using the following command

docker-compose upx1

Now we have a docker containing running a Kafka and Zookeeper server.

Now let’s go inside this container’s bash shell

docker exec -it kafka1 bash

Once we are inside, let’s create a topic

kafka-topics --bootstrap-server kafka1:19092 \
--create \
--topic test-topic \
--replication-factor 1 --partitions 2

As it’s evident we are creating a topic named test-topic in our local Kafka server which is running at the 19092 port. Also, we are specifying that our topic will have two partitions

Ok so now that we are inside the container let's set up a console producer that will publish messages to the topic.

docker exec --interactive --tty kafka1  \
kafka-console-producer --bootstrap-server kafka1:19092 \
--topic test-topic \
--property "key.separator=-" --property "parse.key=true"

We will be publishing messages to test-topic. Property key.separator specifies the token that we are going to use to separate messages from the key this token is a hyphen(-).

Now we can publish messages like keyText-messageText

Similarly, on the consumer side, we set up two console consumers.

Run the following command in two different terminal windows representing two consumers listening to test-topic.

docker exec --interactive --tty kafka1  \
kafka-console-consumer --bootstrap-server kafka1:19092 \
--topic test-topic --group console-consumer-1\
--property "key.separator=-" --property "print.key=true"

Few interesting things to note here first group flag. By giving the same consumer group to both consumers we are specifying that these consumers will be listening to the topic and will be assigned to the different partitions of that topic.

As we can see in the above image we have the producer on the left-hand side publishing messages with certain keys and messages with similar keys are always read by one consumer.

Therefore order of messages with the same are will always be the same as that published by the producer.

We can also verify how the partitions of the topic are assigned using the describe flag for our consumer group

docker exec --interactive --tty kafka1  \
kafka-consumer-groups --bootstrap-server kafka1:19092 \
--describe --group console-consumer-1

We have two partitions assigned group’s consumer.

In conclusion, if in a particular use case message ordering is important for the application we must always publish messages with keys.

All the commands can be found here.

Thanks for reading hope this helped. Cheers!!

--

--