Apache Kafka Guide #6 Brokers & Topics

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readDec 26, 2023

--

Hi, this is Paul, and welcome to the #6 part of my Apache Kafka guide. Today we’re going to talk about how Brokers and Topics works.

So, a Kafka cluster consists of several Kafka brokers. A broker is essentially a server, but in Kafka terminology, they’re known as brokers because they handle data (sending and receiving). Each Kafka broker is identified by a unique ID, which is a numerical value. For instance, in our cluster, we might have Broker 1, Broker 2, and Broker 3.

Each broker stores specific topic partitions. This means the data is spread across all brokers in the cluster, a concept we’ll explore further. Kafka has a unique feature, where after connecting to any Kafka broker, also referred to as a bootstrap broker, the clients — whether they are producers or consumers — already know how to connect to the entire Kafka cluster. This is due to Kafka clients having intelligent mechanics.

Consequently, it’s not necessary to know all the brokers in your cluster in advance. You just need to connect to one broker, and your clients will automatically establish connections with the others. This flexibility allows your Kafka cluster to have as many brokers as desired. A good starting point is three brokers, but larger clusters may contain over 100 brokers. In this example, we’ve chosen to start numbering brokers at 100 simply because it’s more convenient for discussion purposes, distinguishing them from topic partitions numbered zero, one, and two.

Brokers and Topics

Alright, let’s delve into how brokers, Kafka topics, and partitions interconnect. Imagine we have Topic-A with three partitions and Topic-B with two partitions. We also have three Kafka brokers: Brokers 1, 2, and 3. Broker 1 manages Topic-A, Partition 0, while Broker 2 handles Topic-A, Partition 2 — and this is intentional. Broker 3 oversees Topic-A, Partition 1.

This is how the data is distributed:

  • Topic A with 3 partitions
  • Topic B with 2 partitions
  • Broker 3 doesn’t have any data from Topic B because Topic B has only 2 partitions and Broker 3 didn’t have left anything.

This setup demonstrates that topic partitions are dispersed among all brokers, without a specific order. Concerning Topic-B, Broker 1 holds Topic-B, Partition 1, and Broker 2 has Topic-B, Partition 0. This example illustrates data distribution, highlighting that it’s normal for Broker 3 not to contain any Topic-B data partition since both partitions are already assigned to other brokers. This scenario showcases Kafka’s strength. Data and partitions are spread across all brokers, enabling Kafka to scale horizontally. As more partitions and brokers are introduced, data is increasingly distributed throughout the entire cluster. It’s important to note that brokers only store the data they’re responsible for.

Brokers Discovery

Let’s discuss the broker discovery mechanism in Kafka. In your Kafka cluster, each broker is known as a bootstrap server. Consider a cluster with 6 brokers, for instance. While Broker 1 is presented as a bootstrap server, in reality, all brokers serve this function. This concept of bootstrap servers becomes evident when using command-line interfaces or Java programming. I’m explaining this so you understand the connection process in a Kafka cluster. Clients only need to connect to one broker to access the entire cluster. Our Kafka Client connects to Broker 1 and requests metadata. If successful, Broker 1 returns a list of all brokers in the cluster, along with additional data like the distribution of partitions among brokers, which we’ll delve into later.

Thanks to this list, the Kafka Client can connect to any required broker, whether to produce or consume data. Each broker in the Kafka cluster is quite sophisticated and aware of all other brokers, topics, and partitions. Every broker possesses the complete metadata of the Kafka cluster, facilitating client connections.

Conclusion

  1. Each Kafka Broker is also called a “bootstrap server”
  2. The client needs to connect to any broker and the Client receives data on how to connect to all brokers in the cluster in metadata response.
  3. Each broker knows everything about all brokers in the cluster like addresses, topics, and partitions.

See you in the next part of the guide!

Paul Ravvich

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!