Starting out with Kafka clusters: topics, partitions and brokers

If you are looking at using Kafka, you may be trying to work out how clusters, brokers, partitions, topics, producers, consumers and consumer groups work. You wouldn’t be the first. There are many articles about this, some better than others. I thought I would have a go with my own way of describing it with the aim that you can include Kafka in your own solution designs.

Martin Hodges
14 min readApr 1, 2024
Kafka

Kafka

First a quick introduction to Kafka. When LinkedIn started, it was written as a single, monolithic application. As it grew in popularity, its architecture could not cope and a change was made to microservices. This introduced another problem in that the synchronous nature of the new architecture could not cope with the load, either.

The result was the development of a highly scalable, highly performant messaging queue called Kafka.

Luckily for us, LinkedIn made the solution open-source for us all to use.

If you are wondering if it will scale to your requirement, just think about the fact that LinkedIn has hundreds of millions of customers and trillions of messages a day.

With massive scalability comes configuration complexity

Before you implement Kafka, you need to understand that the scalability architecture is complex and requires a lot of design decisions to be made.

However, you can start small and scale as your use case grows.

Kafka Architecture

The Kafka architecture can be hard to understand. I find it easiest to look at the architecture in two directions — from the perspective of a message and from the perspective of the deployment. You can then put them together and see how the system works.

Message Architecture

Kafka is a publish/subscribe queue (pub-sub for short).

Publish: This is where a system sends a message to Kafka. Such a system is called a producer.

Subscribe: This is where a system asks to receive messages from Kafka. Such a system is called a consumer.

Kafka message architecture

So a producer publishes or sends messages to Kafka. Kafka stores these messages as commit log files in its filesystem. Any number of consumers can then subscribe to receive these messages. Each consumer will then receive the messages sent by the producer.

Unlike other queuing technologies, messages are persistent and are not deleted once read. This allows consumers to be added and for those consumers to read all the previous messages. This helps with adding new functionality or recovering from failures.

Note that storing messages indefinitely does present a resource cost. Kafka as a default deletes (purges) messages after 1 week but you can expand this to any length of time you require.

Topics

Having every consumer process every message produced by all producers is not efficient as many messages would be irrelevant and have to be ignored. Instead, when a producer sends a message it is given a topic.

Kafka topics

Topics are handled by Kafka as independent queues. This means that a consumer can subscribe to a specific topic and only receive messages marked with that topic.

Partitions

Topics provide a degree of scalability but Kafka goes one stage further with partitions.

A topic is subdivided into partitions. It does this so messages in one partition can be processed in parallel with other partitions, increasing the throughput.

Kafka partitions

Now, when the producer creates the message, it adds the topic and can, optionally, add the partition. The message is then consumed and processed by a consumer that has subscribed to the given topic and partition.

This now means that you can have multiple instances of your producer and/or consumer.

I mentioned that the publisher specifies the partition but there are three options:

  1. The publisher adds the partition number as well as the topic to the message
  2. The publisher adds a key, which is used to select a partition (the same key always results in the same partition being selected)
  3. The publisher does neither and Kafka selects a partition based on a round-robin selection

There is another option regarding sticky sessions. If multiple messages are sent to a round-robin partition as a batch, all messages in that batch will be sent to the same partition. The round-robin allocation will happen with the next batch.

Earlier, I mentioned that Kafka stores its messages as log files. Each topic/partition combination gets its own folder to store its logs in.

Note that you may see recommendations that you do not exceed 10 partitions per topic and 10,000 partitions per Kafka deployment. There have been advances in performance and in this Apache Kafka blog, it suggests up to 4,000 partitions per broker with a maximum of 200,000 per cluster.

Message sequence

It is important that you understand message sequences.

When a message is received by a partition, it is given a sequential index or offset number for that topic/partition combination. This means, if you have two partitions, each will store the messages it receives in order with offsets 1, 2, 3 etc.

When a consumer reads a partition, it reads them in the sequence of the offset number.

In fact, a consumer has control over the offset it reads from and can re-read a set of messages by setting its read offset back to an earlier value.

Now, when a consumer reads from multiple partitions, it might read them out of time-ordered sequence. The sequence is only guaranteed within a partition and only then in the sequence the messages were received, not sent.

If sequence is important to your application, you will need to make sure that it can handle out of sequence messages.

Deployment Architecture

Before we can go any further, we need to understand the deployment architecture and its components.

A Kafka deployment forms a Kafka cluster. The cluster includes a number of brokers and a ZooKeeper deployment. For now, we will just consider ZooKeeper as an enabler for managing the brokers.

Kafka deployment architecture

Brokers

There can be any number of brokers but 3 are typically shown as a minimum. This allows one out for maintenance and one to fail at the same time.

The brokers are independent systems (but may be virtual) with their own commit log file storage. They receive messages and store them. They also respond to requests from consumers.

Whilst they are independent, they do act as a ‘team’. At any one time, one (and only one) of them manages the others. It is the active controller. The controller talks to ZooKeeper and stays up to date with what is happening and what producers and consumers are doing. As required, it then updates the other brokers.

If the active controller fails, then the other brokers will vote and one of the remaining brokers will become the active controller.

ZooKeeper

ZooKeeper is a highly-available, very fast, specialised file system. It was created to provide reliable metadata management across a network of systems. I say metadata (ie: data about data) because it is specifically designed to only store minimal information to ensure it is fast and responsive.

In Kafka, it is used to track the status of the brokers, producers and consumers. As part of managing the cluster it helps with the following:

  • Tracks which brokers are in the cluster (and who isn’t)
  • Brokers registering with the cluster
  • Active controller elections
  • Reconfiguration based on topics and partitions (more on this later)
  • Partition leadership (more on this later too!)
  • Consumer group management (yep, more on this too)
  • Security and quotas
  • Message offset tracking
  • Tracking and metrics

As ZooKeeper is such an important part of the architecture, it is suggested that it is deployed in a highly-available configuration itself.

KRaft

Whilst I did not show this in the diagram above, the Apache Kafka project has introduced an alternative to ZooKeeper. The use of KRaft over ZooKeeper became possible in versions above 3.3.

KRaft is an event-based variant on the Raft protocol and has been embedded directly into Kafka. This removes the need to maintain the ZooKeeper deployment in parallel to Kafka.

KRaft can also manage larger numbers of partitions per cluster and fails over brokers more quickly.

Bringing it together

In the first part of this article we looked at producers, consumers, topics and partitions. All related to messages.

In the second part, we looked at brokers, controllers and ZooKeeper. All related to the deployment of Kafka.

We now bring them together. Hopefully you have followed along so far. This is the tricky bit. We have to overlay the topics and partitions on the brokers.

Partitions per broker

It would be tempting to say that all partitions (and hence all messages) exist on all brokers. That way, any failure of any broker would not matter.

However, when it comes to massive messaging volumes, this is not effective. It means that every message has to be copied to every broker. Each broker may then spend more time replicating messages than actually processing them.

Kafka solves this by allowing you to configure the number of copies there are for each topic, with each copy being on a different broker. This is known as the replication factor. The replication factor determines how many copies there are of each message in the topic across brokers.

It is important to remember that replication is about being fault tolerant not about capacity.

Replication factors are typically set to 3, which means you can continue if one broker fails whilst another is out for maintenance. However, it can be any number greater than 0.

If the replication factor is greater than the number of available brokers, the topic will not be created until the number of brokers increases to the point that there are enough to hold all available replicants of the partition.

An example helps.

Partition distribution

Let’s say you have two topics, 1 and 2, each with a single partition. Let’s also say you set a replication factor of 2 for topic 1 and 3 for topic 2.

This means that a message sent to topic 1 will have 2 copies in the cluster whilst a message sent to topic 2 will have three copies.

Ok, so where are these copies held?

Kafka uses an internal algorithm to distribute partitions across the available brokers. An example is shown in the diagram above.

When the partition copies (or replicas) are allocated to the brokers, one replica will be made the leader. Now, when a producer sends a message, it will be received by the leader, which will then distribute it to the other replicas on the other brokers.

If the leader were to fail, an election would be held to determine which partition replica will become the new leader.

Bootstrapping

You may be wondering how a system knows where to send and receive its messages to and from. If the leader for a given partition can be on any broker and can change, how is the producer able to know where to send its message to or a consumer read its message from?

We now introduce the concept of a bootstrap server. Any client (producer or consumer) can contact any broker and ask it for information about the cluster. In the response, the client finds out about the list of brokers and which contain the leaders for each partition. This enables the client to work our which broker it needs to connect to.

Usually, a client is given a list of 2 or 3 brokers. This way, if the first is down, then the client can use the next broker in the list to bootstrap from.

Sending Messages

Producing messages and sending them to the cluster is actually quite simple.

The main concerns for the producer are:

  • Create the payload of the message (this is application specific and can be any format but is normally serialised and deserialised as a string)
  • Optionally calculate a hashed key to use to determine which partition to use
  • Uses the bootstrap information to determine which broker to send the message to

There are many options for the producer that enable retries, security, batching of requests, encryption, compression, idempotency etc. However, there is one that is very important to understand: acks.

The acks option determines when the producer considers the message to have been sent.

acks = 0

This means that the producer is not waiting around to hear if the message was sent. It assumes it was.

This option does mean that the retries option is ignored as the producer does not wait to hear. It also means that the offset of the message is unknown (generally it is set to -1).

Whilst this option is fast, it is also unreliable.

acks = 1

In this case, the producer waits for confirmation that the message has been received and persisted in the partition leader’s commit log.

The producer will be informed of the offset for the message in its partition.

As the replication to the other replica partitions has not yet happened, it is possible that the leader could fail and the message be lost.

acks = all

For absolute acknowledgement that the message has been safely delivered, this option makes the producer wait until all replicas (or a configured minimum number) have confirmed that they have persisted the message.

Whilst this option is the most secure, it is the slowest.

There is another option that is important, idempotence. This ensures that, for a given message, only one copy is ever sent to the queue. This requires the number of retries to be set to non-zero and for acks=all to be set.

If this is not set to true, then it is possible that multiple copies of a message may be placed on to the queue.

Reading Messages

Like the producer, consumers use the bootstrap servers to create a connection to the broker that has the partition leader it is interested in.

Also like the producer, there are many options covering retries, security, batching of requests, encryption, compression, etc. It also must determine how it will deserialise the message (normally as a string).

Each consumer must have a unique ID, which allows Kafka to track what is happening, particularly when we talk about consumer groups.

One important option is auto.offset.reset.config. This determines what happens if a consumer asks for a message offset that does not exist. Options include, give me the oldest message (earliest), give me the youngest (latest) or give me an error. This also determines what happens when a consumer connects to the cluster for the first time.

Consumer groups

So far, I have not mentioned consumer groups. These are a very important concept in Kafka and relate to the processing of messages.

Every consumer is part of a consumer group.

Kafka consumer groups

In the diagram above, you can see how we have defined two consumer groups. The first has three consumers and the second only one. Both groups are reading messages from a set of 4 partitions.

You can see a few things happening here.

  1. Messages from any given partition are delivered, in order, to only one consumer within a consumer group (but could be delivered to multiple consumer groups)
  2. Consumer group 1 has fewer consumers than partitions and one consumer will receive messages from two partitions
  3. Consumer group 2 only has one consumer so all messages from all partitions are sent to it

If there are more consumers than partitions, then the excess consumers will not receive any messages.

Kafka allows the consumers in a group to be dynamically scaled up and down. It will reallocate the partitions as required to ensure that the messages can be consumed. This process is called rebalancing and whilst rebalancing is in effect, no messages will be allowed to be consumed. It effectively causes a pause in processing.

The use of consumer groups allows for the scalability you require.

Receiving messages in batches

When a consumer asks for messages from the partition, it receives all the messages that have arrived since the last message committed by the consumer.

This is important.

If anything happens and the messages that have been received are not committed, the consumer will be given the same messages again and they will be processed two or more times.

Committing your messages

Ok, so we have our consumers consuming messages and everything appears scalable.

As mentioned, one of the responsibilities of a consumer is to commit the messages that it reads to ensure they are not sent again. The commit records the last offset reported to the consumer group for a given partition and ensures messages are only delivered once. When a commit occurs, it is effectively saying that the consumer group has processed all prior messages for that partition.

Commits happen in two ways:

  1. Automatically based on time
  2. Programmatically (also called manually)

In the first case, messages are periodically committed for the consumer group (actually the time is only checked on the next poll). Should a failure occur, then some messages may be reprocessed.

In the second case, the application decides when to send the commit message back to the partition. There are a number of strategies for sending the commit message:

  1. At least once
  2. At most once
  3. Exactly once

The strategy you choose will depend on your requirements.

At least once

The consumer reads the messages, processes them and then commits the last.

If the consumer crashes before the commit, then the consumer will be sent all the messages again, resulting in them possibly being processed again.

With this strategy, your application needs to be able to recognise processed messages and not to process them again, ie: it needs to be idempotent.

At most once

The consumer reads the messages, commits the last and then processes each one individually.

If the consumer crashes during processing, then it will only be sent the messages after the batch that failed. This ensures the message is only processed at most once but could mean that the message is never processed along with others in the batch.

Exactly once

The consumer reads the messages, processes each one and commits each individually.

If the consumer crashes during processing, then it will only be sent the messages it has not yet processed (including the one that failed).

Commit performance

The strategy you choose will depend on your application but the more commits you make, the more the performance of your application will be affected.

To help with the performance impact, Kafka provides the ability to commit synchronously (ie: the consumer waits for the commit to complete) or asynchronously (ie: the consumer does not wait for the commit to complete).

If you chose to use the asynchronous option, you need to consider what happens if the commit fails.

Summary

In this introduction to Kafka we covered a lot of the essentials, including:

  • History of Kafka
  • Messaging publish/subscribe architecture
  • Producers and consumers
  • Topics and their partitions
  • Message sequence
  • Deployment architecture
  • Brokers, ZooKeeper and KRaft
  • Partitions per broker
  • Bootstrapping
  • Sending messages
  • Reading messages and commit strategies

This should have given you a good foundation on how Kafka works so that you can add it to your solution design.

If you found this article of interest, please give me a clap as that helps me identify what people find useful and what future articles I should write. If you have any suggestions, please add them in the comments section.

--

--