Kafka Consumer Overview

6 min readJun 30, 2019

This article is a continuation of part 1-Kafka technical overview, part 2- Kafka producer overview and part 3-Kafka producer delivery semantics articles. Let’s look into Kafka consumer group, consumer and protocol used in detail.

Consumer Role

Like Kafka Producer that optimizes writes to Kafka, Consumer is used for optimal consumption of Kafka data. The primary role of a Kafka consumer is to take Kafka connection and consumer properties to read records from the appropriate Kafka broker. Complexities of concurrent multiple application consumption, offset management, delivery semantics and lot more are taken care of by Consumer APIs.

Properties

Some of the consumer properties are the bootstrap servers, fetch.min.bytes, max.partition.fetch.bytes, fetch.max.bytes, enable.auto.commit and many more. We will discuss some of these properties later in the next part of the article series.

Multi-app Consumption

Multiple application can consume records from the same Kafka topic, as shown in the diagram below. Each application that consumes data from Kafka gets it’s own copy and can read at its own speed. In other words, offset consumed by one application could be different from another application. Kafka keeps tracks of offsets consumed by each application in an internal “__consumer_offset” topic.

Consumer Group and Consumer

Each application consuming data from Kafka is treated as a consumer group. For example, if 2 applications are consuming the same topic from Kafka then internally Kafka creates 2 consumer groups. Each consumer group can have one or more consumers. If a topic has 3 partitions and an application consumes it, then a consumer group would be created and a consumer in the consumer group will consume all partitions of the topic. The diagram below depicts a consumer group with a single consumer.

When an application wants to increase the speed of processing and process partitions in parallel then it can add more consumers to the consumer group. Kafka takes care of keeping track of offsets consumed per consumer in a consumer group, rebalancing consumers in the consumer group when a consumer is added or removed and lot more.

When there are multiple consumers in a consumer group, each consumer in the group is assigned one or more partitions. Each consumer in the group will process records in parallel from each leader partition of the brokers. A consumer can read from more than one partitions.

Kafka multi-consumer and multi-partition consumption

It’s very important to understand that no single partition will be assigned to two consumers in the same consumer group; in other words, the same partition will not be processed by two consumers as shown in the diagram below.

When consumers in a consumer group are more than partitions in a topic then over-allocated consumers in the consumer group will be unused.

When you have multiple topics and multiple applications consuming the data, consumer group and consumers of Kafka will look similar to the diagram shown below.

Multiple application and multiple Kafka topic

Coordinator and leader discovery

In order to manage the handshake between Kafka and application that forms consumer group and consumer, a coordinator on the Kafka side and a leader (one of the consumers in the consumer group) is elected. The first consumer that initiates the process is automatically elected as leader in the consumer group. As explained in the diagram below, for a consumer to join a consumer group following handshake processes take place:

Find coordinator
Join group
Sync group
Heartbeat
Leave group

Coordinator

In order to create or join a group, a consumer has to first find the coordinator on the Kafka side that manages the consumer group. The consumer makes a “find coordinator” request to one of the bootstrap servers. If a coordinator already doesn’t exist it’s identified based on a hashing formula and returned as a response to “find coordinator” request.

Join Group

Once the coordinator is identified, the consumer makes a “join group” request to the coordinator. The coordinator returns the consumer group leader and metadata details. If a leader already doesn’t exist then the first consumer of the group is elected as leader. Consuming application can also control the leader elected by the coordinator node.

Sync Group

After leader details are received for the join group request, the consumer makes a “Sync group” request to the coordinator. This request triggers the rebalancing process across consumers in the consumer group, as the partitions assigned to the consumers, will change after the “sync group” request.

Rebalance

All consumers in the consumer group will receive updated partition assignments that they need to consume when a consumer is added/removed or “sync group” request is sent. Data consumption by all consumers in the consumer group will be halted until the rebalance process is complete.

Heartbeat

Each consumer in the consumer group periodically sends a heartbeat signal to its group coordinator. In the case of heartbeat timeout, the consumer is considered lost and rebalancing is initiated by the coordinator.

Leave Group

A consumer can choose to leave the group anytime by sending a “leave group” request. The coordinator will acknowledge the request and initiate a rebalance. In case the leader node leaves the group, a new leader is elected from the group and a rebalance is initiated.

Summary

As explained in part 1of the article series “partitions” are unit of parallelism. As consumers in a consumer group are limited by the partition in a topic, it’s very important to decide you partitions based on the SLA and scale your consumers accordingly. Consumer offsets are managed and stored by Kafka in an internal “__consumer_offset” topic. Each consumer in a consumer group follows find coordinator, join group, sync group, heartbeat and leave group protocol. Let’s understand Kafka consumer properties and delivery semantics in the next part of the article.