Apache Kafka Guide #39 Consumer Replica Fetch and Rack Awareness Setup

Paul Ravvich

Published in

Apache Kafka At the Gates of Mastery

4 min readApr 11, 2024

Apache Kafka Guide #39 Consumer Replica Fetch

Hi, this is Paul, and welcome to the #39 part of my Apache Kafka guide. Today we will discuss Consumer Replica Fetch.

Apache Kafka At the Gates of Mastery: Kafka Core

Amazon.com: Apache Kafka At the Gates of Mastery: Kafka Core eBook : Ravvich, Paul: Kindle Store

read.amazon.com

Consumer Replica Fetch

In this discussion, we’re delving into the standard behavior concerning consumer interactions with partition leaders. It’s a familiar concept that consumers, by default, fetch data from the partition leader. However, when dealing with multiple data centers, one might encounter issues like significant latency and increased network fees.

This is particularly true if the consumer is located in a different data center from the broker. For instance, within the same Availability Zone (AZ) on AWS, a cloud computing service, there are no additional costs. Yet, when data must be transferred between different AZs, additional charges are incurred for the data movement.

Kafka Consumer Replica Fetching (Apache Kafka v2.4+)

Exploring Kafka Consumer Replica Fetching presents an intriguing opportunity. Beginning with version 2.4, Kafka has introduced the capability for consumers to configure their settings to fetch data from the closest replica, rather than exclusively from the leader replica. This adjustment holds the potential to enhance system latency and reduce network costs, particularly beneficial for those utilizing cloud services.

For Apache Kafka 2.4+ you can configure Consumers to read from the closest replica.
Decrease network cost and latency.

Example

Here’s an example to illustrate a point. Imagine we have three data centers connected by a network, along with a partition that boasts a replication factor of three. Replication, a crucial process for data integrity, inevitably occurs among Brokers. This setup guarantees that some cost is incurred due to the necessity of replicating data between the partition leader and the in-sync replicas (ISRs), especially since the network spans two data centers. However, when a producer sends data to the leader partition, the system design allows for an efficient read process for consumers. Instead of consumers retrieving data directly from the leader partition — which would involve additional costs from pulling data across data centers — a consumer located in Data Center 2 can access the data from the ISR within the same data center, thus eliminating the cost.

This not only makes the operation free of charge but also reduces latency, given the proximity of the consumer and the broker within the same data center. This efficiency is replicated across different scenarios, including consumers in Data Centers 1 and 3, showcasing the benefits of this architecture in terms of reduced network costs and improved latency.

Apache Kafka Rack Awareness v2.4+ Setup

To configure Consumer Rack Awareness, you first need to ensure that your brokers are updated to at least version 2.4 and have been assigned a specific rack ID. This rack ID serves as a unique identifier for the data center they are located in. For instance, in an AWS environment, the rack ID would correspond to the Availability Zone ID, such as rack.id=usw2-az1. Subsequently, you are required to implement a replica selector class by setting it to RackAwareReplicaSelector. Additionally, your consumers must have the client.rack configuration set to match the data center ID where they are deployed. As a result, when the consumers authenticate to Kafka, leveraging the replica selector class ensures they read data from the replica geographically closest to them, thereby enhancing efficiency and reducing latency.

Broker settings

Kafka version 2.4+
rack.id set to data center ID
Set broker replica selector class:

replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector

Consumer client settings

client.rack to the data center ID where the Consumer running.

Thank you for reading until the end. Before you go:

Apache Kafka At the Gates of Mastery: Kafka Core

Amazon.com: Apache Kafka At the Gates of Mastery: Kafka Core eBook : Ravvich, Paul: Kindle Store

read.amazon.com

Please consider clapping and following the writer! 👏
Follow us on Twitter(X), LinkedIn

Apache Kafka Guide #39 Consumer Replica Fetch and Rack Awareness Setup

Apache Kafka At the Gates of Mastery: Kafka Core

Amazon.com: Apache Kafka At the Gates of Mastery: Kafka Core eBook : Ravvich, Paul: Kindle Store

Consumer Replica Fetch

Kafka Consumer Replica Fetching (Apache Kafka v2.4+)

Example

Apache Kafka Rack Awareness v2.4+ Setup

Broker settings

Consumer client settings

Apache Kafka At the Gates of Mastery: Kafka Core

Amazon.com: Apache Kafka At the Gates of Mastery: Kafka Core eBook : Ravvich, Paul: Kindle Store

Paul Ravvich

Written by Paul Ravvich