In the Land of Streams — Kafka Part 2: The rise of the Consumers

A Kafka Streaming Ledger

Giannis Polyzos
7 min readDec 13, 2022
https://www.vecteezy.com/free-vector/squirrel-cartoon

The Blob post series consists of the following parts:
- Part 1: A Producer’s Message
- Part 2: The Rise of the Consumers (this blog post)
- Part 3: Offsets and how to handle them
- Part 4: My Cluster is Lost!! — Embracing Failure

In the previous post, we discussed how the producing side works when we sent messages, and with data stored inside the topic let’s zoom in on the consuming side now.

This part aims to cover the following:

  1. How the consuming side works
  2. How scaling consumer groups works
  3. How scaling with the parallel consumer works
  4. Tuning to avoid slow consumers

Switching to the other side of the wall

You can find the relevant code samples on Github here.

A typical Kafka consumer loop should look similar to the following snippet

Consumer poll() Loop

We trigger the poll() method on the consumer, simulate a small amount of work, and finally show the records it processed.

Note: The show() method on the records comes from a helper extension function for printing the records in a nice and structured way:

So let’s try to better understand what happens here. The following diagram provides a more detailed explanation.

Kafka uses a pull-based model for data fetching. At the “heart of the consumer” sits the poll loop. The poll loop is important for two reasons:

  1. It is responsible for fetching data (providing ConsumerRecords) for the consumer to process and
  2. Sends heartbeats and coordinates the consumers so the consumer group knows the available consumers and if a rebalancing needs to take place.

The consuming applications maintain TCP connections with the brokers and sent fetch requests to fetch data. The data is cached and periodically returned from the poll() method. When data is returned from the poll() method the actual processing takes place and once it’s finished more data is requested and so on.

What’s important to note here (and we will dive deeper into it in the next part) is committing message offsets. This is Kafka’s way of knowing that a message has been fetched and processed successfully. By default, offsets are committed automatically at regular intervals.

The amount of data - how much it is going to be fetched, when more data needs to be requested etc. are dictated by configuration options like, fetch.min.bytes, max.partition.fetch.bytes, fetch.max.bytes, fetch.max.wait.ms. You might think that the default options might be ok for you, but it’s important to test them out and think through your use case carefully.

To make this more clear let’s assume that you fetch 500 records from the poll() loop to process, but the processing for some reason takes too long for each message. max.poll.interval.ms dictates the maximum time a consumer can be idle before fetching more records; i.e calling the poll method and if this threshold is reached the consumer is considered lost and a rebalance will be triggered — although our application was just slow on processing.

So decreasing the number of records the poll() loop should return and/or better tuning some configurations like heartbeat.interval.ms and session.timeout.ms used for consumer group coordination might be reasonable in this case.

Running the Consumer

At this point, I will start one consuming instance on my ecommerce.events. Remember from part 1 that this topic consists of 5 partitions. I will execute against my Aiven for Kafka cluster, using the default consumer configuration options and my goal is to see how long it takes for a consumer to read 10.000 messages from the topic, assuming a 20ms processing time per message. You can find the code here.

We can see that it takes a single consumer around 4 minutes for this kind of processing. So how can we do better?

Scaling the Consuming Side

Consumer Groups and the Parallel Consumer Pattern

Consumer Groups are Kafka’s way of sharing the work between different consumers and also the level of parallelism. The highest level of parallelism you can achieve with Kafka is having one consumer consuming from each partition of a topic.

Scenario 1: #Partitions > #Consumers

In the scenario, the available partitions will be shared equally among the available consumers of the group and each consumer will have ownership of those partitions.

Partitions are shared among the available consumers

Scenario 2: #Partitions = #Consumers

When the partition number is equal to the available consumers each consumer will be reading from exactly one partition. In this scenario, we also reach the maximum parallelism we can achieve on a particular topic.

1:1 consumer-partition mapping

Scenario 3: #Partitions < #Consumers

This scenario is similar to the previous one, only now we will have one consumer running but stays idle. On the one hand, this means we waste resources, but we can also use this consumer as a Failover in case another one in the group goes down.

#consumer > #partitions the extra consumers are idle

When a consumer goes down or similarly a new one joins the group, Kafka will have to trigger a rebalance. This means that partitions need to be revoked and reassigned to the available consumers in the group.

Let’s run again our previous example — consuming 10k messages — but this time having 5 consumers in our consumer group. I will be creating 5 consuming instances from within a single JVM (using kotlin coroutines), but you can easily re-adjust the code (found here) and just start multiple JVMs.

As expected we can see that the consumption time dropped to less than a minute time.

But if Kafka’s maximum level of parallelism is one consumer per partition, does this mean we hit the scaling limit? Let’s see how to tackle this next.

What about the parallel consumer pattern?

Up to this point, we might have two questions in mind:

  1. If #partitions = #consumers in the consumer group, how can I scale even further if needed? It’s not always easy to calculate the number of partitions beforehand and/or I might need to account for sudden spikes.
  2. How can I minimize rebalancing time?

One solution to this can be the parallel consumer pattern. You can have consumers in your group consuming from one or more partitions of the topic, but then they propagate the actual processing to other threads.

One such implementation can be found here.

It provides three ordering guarantees — Unordered, Keyed and Partition.

  1. Unordered — provides no guarantees
  2. Key — guarantees ordering per key BUT with the caveat that the keyspace needs to be quite large, otherwise you might not observe much performance improvement.
  3. Partition—Only one message will be processed per partition at any time.

Along with that it also provides different ways for committing offset. This is a pretty nice library you might want to look at.

The Parallel Consumer Pattern

Going once more back to our example to answer the question — how can we break the scaling limit? We will be using the parallel consumer pattern — you can find the code here.
Using one parallel consumer instance on our 5-partition topic, specifying a Key Ordering, and using a parallelism of 100 threads

1 parallel consuming instance

makes the consuming and processing time of 10k messages take as much as 6 seconds.
Notice on the screenshot how different batches are processed on different threads on the same consumer instance.

and if we use 5 parallel consumer instances

5 parallel consuming instances

we accomplished getting that down to 3 seconds.
Notice in the screenshot how different batches are processed on different threads on different consumer instances.

Wrapping Up

In this part, we saw how the consuming side of Kafka works. As takeaways when creating consuming applications:

  • We need to take into account the number of partitions each topic has
  • Think of our requirements in terms of processing and try to account for slow consumers.
  • How we can scale both with consumer groups and the parallel consumer pattern?
  • Message ordering, the number of keyspace, and partition guarantees need to be taken into account here and see what approach works the best (or a combination of both).

Check Next: Part 3 Offsets and how to handle them.

--

--

Giannis Polyzos

Staff Streaming Product Architect @ Ververica ~ Stateful Stream Processing and Streaming Lakehouse https://www.linkedin.com/in/polyzos/