Apache Kafka Guide #39 Consumer Replica Fetch and Rack Awareness Setup

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readApr 11, 2024
Apache Kafka Guide #39 Consumer Replica Fetch

Hi, this is Paul, and welcome to the #39 part of my Apache Kafka guide. Today we will discuss Consumer Replica Fetch.

Consumer Replica Fetch

In this discussion, we’re delving into the standard behavior concerning consumer interactions with partition leaders. It’s a familiar concept that consumers, by default, fetch data from the partition leader. However, when dealing with multiple data centers, one might encounter issues like significant latency and increased network fees.

This is particularly true if the consumer is located in a different data center from the broker. For instance, within the same Availability Zone (AZ) on AWS, a cloud computing service, there are no additional costs. Yet, when data must be transferred between different AZs, additional charges are incurred for the data movement.

Kafka Consumer Replica Fetching (Apache Kafka v2.4+)

Exploring Kafka Consumer Replica Fetching presents an intriguing opportunity. Beginning with version 2.4, Kafka has introduced the capability for consumers to configure their settings to fetch data from the closest replica, rather than exclusively from the leader replica. This adjustment holds the potential to enhance system latency and reduce network costs, particularly beneficial for those utilizing cloud services.

  • For Apache Kafka 2.4+ you can configure Consumers to read from the closest replica.
  • Decrease network cost and latency.

Example

Here’s an example to illustrate a point. Imagine we have three data centers connected by a network, along with a partition that boasts a replication factor of three. Replication, a crucial process for data integrity, inevitably occurs among Brokers. This setup guarantees that some cost is incurred due to the necessity of replicating data between the partition leader and the in-sync replicas (ISRs), especially since the network spans two data centers. However, when a producer sends data to the leader partition, the system design allows for an efficient read process for consumers. Instead of consumers retrieving data directly from the leader partition — which would involve additional costs from pulling data across data centers — a consumer located in Data Center 2 can access the data from the ISR within the same data center, thus eliminating the cost.

Reduces latency and given the proximity

This not only makes the operation free of charge but also reduces latency, given the proximity of the consumer and the broker within the same data center. This efficiency is replicated across different scenarios, including consumers in Data Centers 1 and 3, showcasing the benefits of this architecture in terms of reduced network costs and improved latency.

Apache Kafka Rack Awareness v2.4+ Setup

To configure Consumer Rack Awareness, you first need to ensure that your brokers are updated to at least version 2.4 and have been assigned a specific rack ID. This rack ID serves as a unique identifier for the data center they are located in. For instance, in an AWS environment, the rack ID would correspond to the Availability Zone ID, such as rack.id=usw2-az1. Subsequently, you are required to implement a replica selector class by setting it to RackAwareReplicaSelector. Additionally, your consumers must have the client.rack configuration set to match the data center ID where they are deployed. As a result, when the consumers authenticate to Kafka, leveraging the replica selector class ensures they read data from the replica geographically closest to them, thereby enhancing efficiency and reducing latency.

Broker settings

  • Kafka version 2.4+
  • rack.id set to data center ID
  • Set broker replica selector class:
replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector

Consumer client settings

  • client.rack to the data center ID where the Consumer running.

Thank you for reading until the end. Before you go:

Paul Ravvich

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!