Introduction to Apache Kafka?

Erkan Demirel
4 min readFeb 5, 2023

--

Apache Kafka was originally developed by LinkedIn for collecting metrics and logs from the application to facilitate activity tracking. Now, Kafka is an open-source, distributed event store and streaming platform built using Java and Scala offered by Apache.

Apache Kafka allows developers to build real-time, event-driven, mission-critical applications that support high-performing data pipelines, data integrations, and streaming analytics.

What is Apache Kafka Used For?

  • Messaging
  • Website Activity Tracing
  • Metrics
  • Log Aggregation
  • Stream Processing
  • Event Sourcing
  • Commit Log

Apache Kafka Cluster Architecture

A cluster is a distributed computing concept in which a group of devices works together to achieve a specific purpose. Multiple Kafka brokers form a Kafka cluster. The main goal of a Kafka cluster is to spread workloads evenly across replicas and partitions. Kafka clusters can scale without interruption. They also manage the persistence and replication of data messages. If one broker fails, other Kafka brokers step in to offer similar services without data loss or degraded latency.

Multiple components contain the core Kafka cluster architecture:

  • Zookeper

The Consumer Clients’ details and Information about the Kafka Clusters are stored in a ZooKeeper. It acts like a Master Management Node where it is in charge of managing and maintaining the Brokers, Topics, and Partitions of the Kafka Clusters.

The Zookeeper keeps track of the Brokers of the Kafka Clusters. It determines which Brokers have crashed and which Brokers have just been added to the Kafka Clusters, as well as their lifetime.

Then, it notifies the Producer or Consumers of Kafka queues about the state of Kafka Clusters. This facilitates the coordination of work with active Brokers for both Producers and Consumers.

Zookeeper also keeps track of which Broker is the subject Partition’s Leader and gives that information to the Producer or Consumer so they may read and write messages.

  • Topics

A topic is a Collection of Messages that belong to a given category or feed name. Topics are used to arrange all of Kafka’s records. Consumer apps read data from Topics, whereas Producer applications write data to them.

A topic can have zero or more consumers, as well as it can have zero or more producers.

  • Partitions

In Kafka, Topics are segmented into a customizable number of sections called Partitions. Kafka Partitions allow several users to read data from the same subject at the same time.

  • Topic Replication

Topic replication is the process that improves the capability of the topic to overcome a failure. Replication defines the number of topic replicas in the cluster of Kafka and can be defined at the topic level. It should be in the range of two to three replicas.

Essentially, you can ensure that Kafka data is highly available and fault-tolerant by replicating topics across data centers or geographies so that you can do broker maintenance in the event of problems.

  • Offsets

The offset is the immutable and incremental identifier given to every message in the partition. This works similarly to the unique ID in a table of a database. However, offsets only have meaning for distinct partitions. It is one of the three metrics that can help in identifying and locating a message. Sequentially, first, there is the topic, then its partition, and lastly the message ordering (offset).

  • Producers

Similar to the messaging system, Kafka producers produce and send messages to the topic. A Kafka producer writes or publishes data on the topic within different partitions. It acts as a source of information in the cluster. It defines what stream of data (topic) and partitions a given message should be published on/to.

  • Consumers

Consumers are applications or machines that subscribe to topics and process published message feeds. Sometimes called “subscribers” or “readers,” consumers read the messages in the order in which they were generated.

A consumer uses the offset to track which messages it has already consumed. A consumer stores the offset of the last consumed message for each partition so that it can stop and restart without losing its place.

Consumers interact with a topic as a group (although a group can consist of only one consumer). This enables scalable processing. The group ensures that one member only consumes each partition; if a single consumer fails, the group’s remaining members reorganize the consumed partitions to compensate for the absent member. Groups enable Kafka to consume topics with a massive amount of messages horizontally.

  • Brokers

Brokers in Kafka refers to the servers available in the cluster of Kafka. A broker contains several topics with their partitions and can only be identified by an integer id. It allows consumers to acquire messages by partition, topic, and offset. By sharing information between each other directly or using Zookeeper, Broker creates a cluster in Kafka. A Kafka cluster contains one broker that acts as a controller.

Conclusion:

In this Kafka Tutorial, we have seen some components, use cases, and Kafka architecture.

Thanks for reading.

drawings are belong to me :)

References

--

--