Apache Kafka Fundamental. How is it different from RabbitMQ?

Muhammad Khoirudin
4 min readAug 1, 2024

--

Until today, too many distributed applications and application that built using microservices. Most of organization want the application is reliable, scalable and able to distribute data smoothly. We already understood about microservice concept, so the correct way of distributing data between services is one of the most things.

Then, what is apache kafka? In simply way, this is an open-source stream-processing platform developed by LinkedIn and later open-sourced under the Apache License. Kafka is build for develop real-time data pipeline and streaming application.

Many developers might be familiar with messaging service / message broker. For instance we can call it “RabbitMQ”, this is the way to push our message into broker and then broadcast to all subscriber. In the other side we able to compare between apache kafka and rabbit mq, however, in fact both have different characteristic and purpose.

Before continuing break down the distinguish of both, we will explain about Apache Kafka first. Core concepts of kafka such as

  1. Producer / Publisher
  2. Consumer / Subscriber
  3. Topic
  4. Partition
  5. Broker
  6. ZooKeeper

Let’s getting the detail one by one. We will use “Library” as analogy

  1. Producer / Publisher

In simple way, it’s an application that send data / message to Kafka topic. Think of the producer as a book authors, they write several books and submit them to the library.

2. Consumer / Subscriber

An application that reads message from Kafka Topic. It’s like to librarion or someone who visit the libray to read some books.

3. Topic

This is category or feed name to which messages are sent. In kafka topics are partitioned and replicated across the broker. For example if we have 2 GB of topic’s storage, then it will be split into 2 partitions (for instance, 1st partition 1GB, 2nd partition 1GB), and we able to create partition replication as well, therefore server should has 4GB of storage.

Replication of partition will store into different server, so that the worst condition is happened, we able use the other server

Analogy : topic is like genre or book category, for instance novel, education, technology etc. As per we can see in the library, each genre will be placed in different section. This is similar with application, may be if we will sent user data, it will be sent to“UserTopic”. If we will sent product data, it will be sent to “ProductTopic”

4. Partition

As per mentioned above, the topic is split into partitions to allow parallel processing. For instance, may the novel category will be placed in south east of library, after that the officer will have shelves and rows to store the related books, so the visitors able to search the book easier rather than storing the novel books without shelves.

5. Broker

A Kafka server that stores data or messages. In simple terms this is a library itself

6. Zookeeper

A service used by Kafka to manage and coordinate distributed brokers. Think that is like a library’s officer and management that organized the library, shelves and books

Rabbit MQ vs Apache Kafka (Comparison)

There are several items that might be explained in this section. Let’s dive into in

A. Overview and Purpose

Apache Kafka

Distributed streaming platform. It means system will be deliver data continuously and in realtime. Usually the data that need to be delivered has big size (if it’s compailed). For example: May organisation has an e-Commerce, they want to get knowledge how many times users are accessing the particular page. Other example, how many time user click on particular button. The following are example how Linkedin know about impressions data, it might usgin something like streaming data platform.

RabbitMQ

It’s a message broker, using queue mechanism to perform traditional messaging. It uses Advanced Message Queuing Protocol (AMQP). Real case example, imagine we develop e-commerce order processing system. Once user click on “Order Now” button, the system may prociding process into order validation, payment processing, updating inventory management and shipment tracking. If system performs them by HTTP call in single request, it causes timeout of process.

So, the solution are sending data order into queue/broker, then the application that handle payment, inventory and shipment will subscribe the message in the background process.

In fact, kafka can perform messaging mechanism like rabbitMQ as well.

B. Architecture and Data Flow

Apache Kafka

In apache kafka we know about “Topic”, it means the messages are stored in to the topic, since there’re a partition and consumer have to pull the message from topic.

Messages are retained for a configurable period or until disk space limits are reached, regardless of whether the messages have been consumed.

RabbitMQ

In rabbitMQ we know about “Exchange”, it means the messages are sent to exchange (the exchange will create a route), but the actual message & route will be queued.

Message are retained after the message has consumed by subscriber. RabbitMQ focuses more on short-term message routing rather than long-term storage.

I think is enaough for now to explain about apache kafka as well as the different between rabbitMQ. At a glance, apache kafka looks superior than rabbitMQ, but both are a powerful application, it depends on the use case and scenario.

Thank’s for reading

--

--