Kafka: A Big Picture, Part 1

Published in

Mastering the System Design Interview

7 min readFeb 21, 2021

This blog explains the “big picture” of Kafka with big pictures. The goal is to build a high-level mental model about how Kafka works under the hood. After reading this article, you would feel more comfortable using Kafka in system design interviews.

Why use Kafka?

Kafka is built to solve complex communication problems in a microservice system. Without Kafka, microservices directly talk to each other. Communications become an N*N problem.

**Figure 1.** Kafka as a messaging hub in microservice system

With Kafak, one service produces events to Kafka, and multiple services can consume from Kafka. This greatly simplifies call patterns in microservice systems. Communications are reduced to a 1*N problem.

Kafka Architecture

Conceptually, Kafka is a commit log. Just as the commit log in DBMS, events are written in an (approximately) sequential order. Each event has a monolithically increasing log id (offset). The consumer read events in (approximately) sequential order.

Kafka is set up as a cluster of hosts. The sequential log is stored as a “distributed commit log” on multiple hosts in the Kafka cluster. Each log is called a “topic”. Each host is called a “broker”. One topic is sharded into multiple partitions. The producer can choose any partition to send an event. Splitting a topic into multiple partition increase the write bandwidth and allow producers to scale (multiple producers to send events to one topic). Also, Kafka is high availability. Each partition is replicated (asynchronously) onto 3 replicas.

Producer

Applications use a Kafka producer client to send events to Kafka brokers. The producer serializes events to btye[] (serializer) and assigns a partition for all events (partitioner). Each topic is sharded into partitions. Partitions are stored on different brokers. A producer has three methods assign a partition for an event:

Round-robin: choose partition randomly with round-robin.
Consistent hashing: Each event can be assigned a key (optional). Kafka producers use consistent hashing of the key to choosing a partition.
Custom method: You can implement a custom method in code to programmatically assign partitions. But you need to know the broker topology and configure partitions carefully.

**Figure 3.** The producer sends events to Kafka

Once a producer is ready to send an event, the sending thread enqueues it in a buffer. A networking thread picks up events from the buffer in batch and sends them to brokers. The sending thread has three methods to handle the response from the networking thread:

Fire-and-forget: the sending thread sends an event and doesn’t really care if it arrives at brokers successfully or not. Most of the time, it will arrive successfully, since Kafka is highly available and the producer will retry sending events automatically. However, in case of non-retriable errors or timeout, messages will get lost and the application will not get any information or exceptions about this. This option is best for high throughput, but occasionally loose events.
Synchronous send: the sending thread sends an event and synchronously waits for ACKs from brokers. The send() method returns a Future object. The sending thread calls Future.get() to synchronously wait for a successful response. The parameter ackscontrols how many partition replicas must receive the event before the producer can consider the send() successful. This option has a significant impact on the durability of events. By default, send() succeed after the event has been written to the leader.
Asynchronous send: the sending thread calls the method send() with a callback function, which gets triggered when it receives a response from the Kafka broker.

Consumer

The consumer is high scalable by forming clients into consumer groups. Each topic can have multiple consumer groups, and each consumer group can have multiple clients. Consumer groups are independent of each other, e.g. order events produced by the Ordering Service can be consumed by Order History Service and Payment Services independently. Within a consumer group, consuming applications can scale up by adding multiple clients. Those clients split ownership of partitions. One partition can only be pulled by one client. Thus the number of clients ≤ number of partitions. In Figure 4, consumer group 1 only has one client, so it owns all partitions; Consumer group 2 has three clients, thus partitions are split among them: client 1 owns partition 1 + 2, client 2 owns partition 3, client 3 owns partition 4.

When the number of live clients changes in a consumer group (when a client leaves or joins a consumer group) the partition ownership is rebalanced. Each client sends heartbeats to Kafka broker to indicate liveness. When Kafka broker doesn’t hear from a client for a certain period, Kafka treats the client as not alive and starts to rebalance partitions among remaining clients. When a new client is added to the consumer group, it starts to send heartbeats to Kafka broker. Kafka detects a new client has joined a consumer group and rebalances partition among the expanded clients.

After rebalancing partitions, a partition could be pulled by a different client. How does the new owner know where to start? After a client pulls and processes events from a partition, it also commits an offset to Kafka broker as a marker of the last processed event. Figure 5 zooms into partition 1. Consumer group 1 only has one client (client 1). After client 1 cashes (after missing several heartbeats), partition 1 is reassigned to client 2. Let’s say client 1 has committed offset 11 before it crashes. Client 2 will resume pulling from offset 11.

End-to-end Message Delivery

This section discusses common issues about end-to-end message delivery. In design interviews, you need to be aware of those design trade-offs using Kafka.

Message Ordering

**Figure 6.** Out-of-order message delivery

A producer sends events to all partitions in a round-robin fashion. For example in Figure 6, the producer sends events to the two partitions in turns. when the consumer reads from the two partitions, it could read one event from partition 1, and then another 3 events from partition 2, etc. As a result, the consumer reads events in a different order. A Kafka topic does not guarantee event ordering across partitions. However, within a partition, events are ordered by offset. For example in Figure 6, events 0, 2, 4, 6 in partition 1 are guaranteed to be read by the consumer in the same order.

What if an application requires in-order message delivery? One design technique is to partition by key. As discussed in the producer section, the partitioner can assign a partition using the consistent hash of the event key. All events with the same key are sent to the same partition. A consumer client can read those events in order. For example, by using user-id as the event key, all the events from one user are sent to one partition. A consumer can read one user’s action in order, e.g. sequential events of login, making an order, confirming the order, make a payment, and logout. However, in a partitioned system, the hot partition is a common problem. As a system designer, we need to be aware of this problem and carefully chose the partition key so that all partitions are well balanced.

At-least-once Delivery

Some systems, e.g. web analytics, can tolerate occasionally event loss. But many systems require reliable message delivery. To guarantee events are delivered at-least-once, it requires special configuration in both producer and consumer.

Figure 6. At-least-once message delivery

On the producer side, after calling send(), a client synchronously waits for brokers to reply. This makes sure each event is persisted on brokers. If a partition has 3 replicas, the client needs to wait for 2 ACKs to make sure a majority of replicas has received the event. If any broker fails to respond or the send() request times out, the client needs to retry until it succeeds.

On the consumer side, the client should first complete processing an event. If the consumer needs to record the processing result to an external system, it needs to synchronously wait for ACK from the external system or retry on failure/timeout. Then the client can commit offset for the processed event. If the client crash before committing the offset, a new client will pick up the same event and process it again. Both producer-side retry and consumer-side reprocess lead to duplication. But this configuration guarantees events are delivered at-least-once.

This is a trade-off between latency/throughput and delivery guarantee. In some applications, e.g. payment services, it’s important to make sure each event is delivered and processed.

Exactly-once Delivery

Exactly-once is very difficult to implement. Kafka does offer exactly-once semantics to a limited extend (for detail see this blog from Confluent). In Kafka, Exactly-once delivery consists of two parts: idempotent producer and transaction. However, the idempotent producer does not support deduplication beyond a short window, also Kafka transaction does not guarantee atomicity when the consumer updates external system. In most real-world systems, events usually contain unique identifiers that external systems can use to deduplicate the events.

Conclusion

In this article, we have discussed how Kafka stores a “distributed commit log” and how producer/consumer interact with Kafka broker. Also, we have looked at several end-to-end message delivery semantics. In Part 2 of the “Kafka: A Big Picture” series, we will review issues about using Kafka in real production systems. We will introduce additional features which are not offered by Kafka but critical to support real-world applications.