Basic concepts of Kafka

Jhansi Karee
3 min readJun 16, 2019

--

Kafka is a fault tolerant, scalable, distributed asynchronous messaging queue. If you are not aware of Kafka, please read my introductory blog on Kafka before this.

Here are the basic concepts of Kafka:

Topic:

So far, it looks like all the information is going into Kafka and there are consumers who could receive this information. All good? But, does it mean all the different types of messages would be in one Kafka. What if there are multiple consumers and different consumers interested in different types of data. Should they have to read all messages irrespective of what they are looking for? To address this, we have something called Topic in Kafka.

Topic is a stream of messages, you may consider it as table in database. There could be multiple topics created in Kafka. Producers and Consumers could be specific about what topics they want to send and receive the message from. Consumers would receive the messages only from the topics they have subscribed from. This categorisation of messages into Topics gives the flexibility for consumers to be able to subscribe to only the topics they are interested in.

Partitions:

Each Topic would be further divided into multiple partitions. This means messages of single topic would be in multiple partitions. Each of the partitions could have replicas which are the same copy. This is helpful in avoiding single point of partition failure. More on this we will talk in brokers.

Brokers:

What if Kafka system goes down? This would make our Consumers and Producers pause till the Kafka is back. This is a single point of failure, to avoid this. Kafka is architected to have multiple brokers.

Broker is a Kafka server which has multiple partitions of all the topics. All brokers combine to be called Kafka cluster. When there is more than one broker, the information in broker 1 would be replicated to other brokers. Yes, this is the failure resiliency created by Kafka.

When there are 3 brokers for two topics with 3 partitions A,B and C. How does the Kafka cluster look like

Each broker has the data of all partitions of all the topics. Each broker is self sufficient and has complete data of all topics in it. When there are multiple brokers which means the data has multiple replicas, as many replicas as the number of brokers.

Replicas:

When there are multiple replicas of the data how does the production and consumption of the data happen? For each partition which could be in multiple brokers, Kafka assigns one partition as a leader (The partitions indicated in green in the above diagram are leader partitions). When data is produced, data is first sent only to leader partition. Leader partition, internally writes the data to its replicas. Similarly, consumers always read the data from Leader partition. What happens when broker 1 goes down with leader partitions A and C of Topic-1? New leaders are elected from the available brokers.

Zookeeper:

Kafka uses Zookeeper to manage the cluster. Zookeeper is used to do leader election for Kafka broker topic partitions. Zookeeper sends changes of topology to Kafka, each node in the cluster knows when a new broker joins, broker dies and topic was added or removed.

Consumer group:

Consumer group is a mechanism of grouping multiple consumers where consumers within same group share the same group id.

Offset:

Each message in the partition would have unique index which is specific to the partition.

Each Consumer when it is reading messages from Topic, it might not want to read the same messages again from the beginning. Instead, it might want to start reading messages from where it left off. To address this, an index is maintained at __consumer_offset topic for each consumer group id, topic name and partition number. This index is referred as Offset. This denotes the number of messages read by the consumer group for a particular topic with the partition number. Hence using offset, consumers could receive only the messages from the later index.

--

--