Kafka Demo

Sahan Vithanage
6 min readJul 19, 2022

--

In Kafka we can have a topic with multiple partitions. Let’s take an example where we have Employee-topic with 3 partitions and Project-consumer group with single consumer which listen to all those 3 partitions.

Use-case 1 Source: www.krishantha.com

Now we add one more consumer and scale. Now consumer 1 listens to partition 0,1 consumer2 listens to partition 2. This is completey random and any consumer can listen to any partition.

Use-case 2

If we add one more consumer each consumer wil listen to the each partition.

use-case 3

If we add more consumers to speed up the process, kafka won’t send messages, and all those extra consumers will be ideal. Because we know that number of consumers in a consumer group must equal or less than the number of partitions in the topic. But there are use-cases when we want additional consumers incase of our consumer get crashed. In that case we have already running consumers which can be used immediately to send the messages.

Now financial-consumer-group also listen the employee-topic with two consumers.

Demo

First let’s write a Node program.

Next let’s create employee-topic with three partitions.

Use-case 1

After that let’s run consumer using node consumer.js cmd. We can see that all three partitions are now listened by this consumer.

Use-case 2

Now let’s open another terminal and start anothert consumer. We can see that our new consumer listens to partition 0, 1

If we look other consumer, it’s now only listening to the partition 2

This way kafka auto-balance the partitions.

Now let’s run producer, node producer.js

consumers receiving messages

It can be seen that the kafka balancing the load between the consumers.

Now if we add another consumer, one consumer will be idel. It won’t listen to any partitions. But if we kill one of three working consumers, the idel consumer will be listening to a partition again.

So in our previous article we discussed that kafka only guarantees the order within the partition. So base on the key each message is sent to its relevent partition. So in our application if we take our customer id as the key, it can be guaranteed that the all message related to that customer will go to a one partition and get processed.

Let’s demonstrate this scenario by hard coding our key and then sending the messages again.(run producer.js)

Note: If a key is not provided at all, kafka use round robin assignor( distributes messages evenly among the partitions). In that case even within a partition the order is not guaranteed.

hard coding the key in producer.js
One consumer gets all the messages

Now if we create another consumer with different groupId. This new conusmer will listen to all the three partitons and as we set fromBeginning is to true, it’ll receive all the messages as well.

finance group
fromBeginning: true

In kafka, messages are sent to the consumers as batches. A property called consumer offset commits back to kafka as the consumer receives the messages. Offset is like a record which stores the last message given to a consumer. So let’s assume kafka send 100 messages and if the 85th message is crashed kafka has to send those 100 messages again? In kafka there are two properties to keep track on the offset.

consumer.js

In this code, autoCommitThreshold:10 means for each and every 10 messages we send consumer offset back to the server, or by using autoCommitInterval we can set a time to send offset e.g. every one minute

So in our problem until 80th message everthing will be committed and from there onwards kafka will again send the rest of the messages. Of course, we can even commit one by one message as well but it’ll affect the throughput as kafka has to send and also read the each message. To do that we have to set autoCommit: false

Now let’s modify our consumer.js file and run

at line 20, we set a timeout to mimic a time taking process and let’s break from 66 th message.

Now it breaks from 66th message but after that it keeps retrying from 61th. We discussed above that kafka process these messages as a batch that’s why it’s starting again. Now if we stop this and again run the consumer it starts from 65th message which is already processed.

Now let’s make few changes in our program.

consumer.js

At line 12 we set to retires as 0. So kafka won’t retry and process the same bacth. Let’s run the program again.

now we dealt with the retrying but still kafka process the 95th message which alrady processed. What happens here is, we know that 95th is success and it sends the offset(last commited message) but 96th(which get crashed/ last produced message) doesn’t send any offset. Because of that when kafka retries it sees last offset is 95 and it sends that message again. To fix this we add one to the offset,

In kafka there’s another property called sessionTimeout. If the messages get longer time than this sessionTimeout those consumers may detach from cluster as kafka won’t receive any heartbeats. If we increase this sessionTimeout if consumers are really dead it will affect the performance as re-balancing won;t happpen within that time interval. So by manually sending heartbeat we can keep this sessionTimeout as much as low and prevent consumers from unexpectedly detaching.

--

--