Apache Kafka Guide #7 Topic Replication

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readDec 28, 2023

--

Hi, this is Paul, and welcome to the 7 part of my Apache Kafka guide. Today we’re going to talk about how Topic replication works.

In Kafka, the replication factor can be 1 when working on a personal machine. However, in a production environment, where a real Kafka cluster is used, this replication factor should be more than one, typically two to three, and most often three. This ensures if a broker, or a Kafka server, goes down for maintenance or due to a technical issue, another Kafka broker will have a copy of the data to continue operations. For instance, consider Topic-A with two partitions and a replication factor of 2. In a setup with three Kafka brokers, Partition 0 of Topic-A is placed on Broker 1, and Partition 1 on Broker 2. Additionally, with a replication factor of two, there will be a replicated copy of Partition 0 on Broker 2 and a copy of Partition 1 on Broker 3, each following a replication mechanism.

In this situation, it’s evident that there are four units of data, given the presence of two partitions and a replication factor of two. It’s observable that brokers are duplicating data from other brokers.

What happens if Broker 2 goes down?

Alright, looking at the situation, we still have Brokers 1 and 3 active, and they can continue to handle the data.

This means Partitions 0 and 1 remain accessible within our cluster, and this is the reason for having a replication factor. To put it, with a replication factor of 2, losing one broker is manageable without any issues.

Partition Leader

We establish a leader for a partition. At all times, only a single broker can act as the leader for a specific partition. The principle here is that producers are allowed to transmit data only to the broker who holds the leadership of that partition.

If we revisit the diagram from earlier, you’ll notice I’ve marked the leader of each partition. Observe that Broker 1 is the leader for Partition 0, while Broker 2 leads Partition 1. However, broker 2 also serves as a replica for Partition 0, and Broker 3 replicates Partition 1. This means the other brokers are duplicating the data. When this replication occurs swiftly enough, each duplicate is then referred to as an ISR.

An ISR, or in-sync replica, refers to a replica that is synchronized, unlike an out-of-sync replica. When data replication is successful, these replicas maintain synchronization. This is crucial, particularly in the context of leadership roles.

By default, a typical behavior in leadership scenarios is that your producers will only write to the leader broker of a partition. If a producer aims to send data to Partition 0, as previously explained, and there’s a Leader and an ISR (In-Sync Replica), then the producer is aware that it should send data exclusively to the broker leading that partition. This is a critical feature of Kafka. Regarding Kafka consumers, they will, by default, read-only from the partition’s leader.

This implies that the consumer will request data solely from the leader Broker 1. Consequently, in our previous example, Broker 2 serves merely as a replica for data replication. However, if Broker 1 fails, Broker 2 can step up as the new Leader, continuing to provide data for both the Producer and the Consumer.

Defference from Kafka 2.4

In Kafka 2.4 and above, a new feature called “Kafka Consumer Replica Fetching” was introduced, allowing consumers to read from the nearest replica. For instance, consider Broker 1, which is the Leader of Partition 0, and it receives data from a Producer. This data is replicated in the ISR Partition 0 of Broker 2. Consequently, consumers can now read directly from this replica.

This functionality is beneficial as it can potentially reduce latency, particularly if the consumer is geographically closer to Broker 2. Moreover, it might also lower network costs, especially in cloud environments, as intra-data center communications often incur minimal charges. We’ll delve into this later in the next articles, but for now, it’s important to understand this new capability.

See you in the next part of the guide!

Paul Ravvich

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!