A comparison between RabbitMQ and Apache Kafka

Published in

The MavenHive Blog

8 min readJun 4, 2018

Image used from this blog post. RabbitMQ and Apache Kafka SaaS is provided by CloudAMQP.

RabbitMQ, Kafka and many other messaging systems are implementations of Publisher-Subscriber Pattern.

Publish/Subscribe is distributed interaction paradigm well adapted to deployment of scalable and loosely coupled systems. In this post we will understand the functionalities of Pub/Sub System, learn about Kafka and RabbitMQ, draw a comparison between them on different parameters and at last we will go through the best use cases for both systems.

Core Functionalities Of Pub/Sub Systems.

I. Decoupling publishers and subscribers is the most fundamental functionality of pub/sub systems and can be decomposed into three different dimensions.

Entity Decoupling: publishers and consumers do not need to be aware of each other i.e pub/ sub infrastructure terminates the interaction in the middle
Time Decoupling: The interacting parties do not need to be actively participating in the interaction, or even stronger, switched on, at the same time.
Synchronization decoupling: the interaction between either producer or consumer and the pub/sub infrastructure does not synchronously need to block the producer or consumer execution threads, allowing maximum usage of processor resources at producers and consumers alike.

II. Routing Logic (or Subscription Model) which decides if and where a packet that is coming from a producer will end up at a consumer. Two types of routing logics are:

Topic Based: Publisher tags the message with a set of topics that can be used very efficiently in filtering operations that decides which message goes to which consumer.
Content Based: All data and meta fields of the message can be used in filtering conditions.

Consumers subscribe to selective events by specifying filters using subscription language. Evaluating these complex filters comes at high processing cost.

Quality Of Service Guarantees For A Pub/Sub System.

Correctness -> correctness behavior can be defined by three primitives. No-loss, no-duplication, no-disorder.
Availability -> It is the capacity of a system to maximize its uptime.
Scalability ->The concept of scalability refers to the ability of a system to continuously evolve in order to support a growing amount of tasks. Various dimensions are consumer/producer, topics and messages.
Transactions -> In messaging systems, transactions are used to group messages into atomic units: either a complete sequence of messages is sent (received), or none of them is.
Efficiency -> Two common measures of efficiency are the latency (or response time), and the throughput (or bandwidth).
* Latency. In any transport architecture, latency of a packet/message is . determined by the serial pipeline (i.e., sequence of processing steps) that it passes through.
* Throughput. Throughput of a transport architecture is the number of packets (or alternatively, bytes) per time unit that can be transported between producers and consumers.

Important: Efficiency and Scalability often conflict with other desirable guarantees.

For eg. highly expressive and selective subscriptions require complex and expensive filtering and routing algorithms, and thus limit scalability.

Apache Kafka

Kafka was originally built at LinkedIn as its centralized event pipelining platform, replacing a disparate set of point-to-point integration systems
Kafka is designed to handle high throughput (billions of messages) .

In its design, particular attention has been paid to the efficient handling of multiple consumers of the same stream that read at different speeds (e.g., streaming vs batch).

The resulting system is a scalable publish-subscribe messaging system designed around a distributed commit log.

RabbitMQ

RabbitMQ is primarily known and used as an efficient and scalable implementation of the Advanced Message Queuing Protocol (AMQP).

What is now known as AMQP originated in 2003 at JPMorgan Chase. From the beginning AMQP was conceived as a co-operative open effort. JPMorgan Chase partnered with Red Hat to create Apache Qpid. Independently, RabbitMQ was developed in Erlang by Rabbit Technologies.

The design of AMQP has been driven by stringent performance, scalability and reliability requirements from the finance community.

Comparison between both the systems on different parameters

Time Decoupling: Both systems can be used to buffer a large batch of messages that needs to be consumed at a later time or at a much lower rate than it is produced.
RabbitMQ will store the messages in DRAM as long as possible, but once the available DRAM is completely consumed, RabbitMQ will start storing messages on disk without having a copy available in DRAM, which will severely impact performance.
Kafka, on the other hand, was specifically designed with the various consumption rates requirement in mind and hence is much better positioned to handle a wider scale of time decoupling.
Routing Logic:
RabbitMQ inherits the routing logic of AMQP and hence can be very sophisticated. Another relevant and useful feature in RabbitMQ is Alternate Exchange which allows clients to handle messages that an exchange was unable to route (i.e. either because there were no bound queues or no matching bindings).
With Kafka, the choice is more limited, it supports a basic form of topic-based routing. More specifically, the producer controls which partition it publishes messages to.
Delievery Guarantees:RabbitMQ and Kafka differ in their notion of at least once semantics. Since individual packets from a batch can fail, recovery from failures can have impact on the order of packets. Depending on the application, order might be important, so it makes sense to split this up in
* at least once without order conservation: Kafka cannot preserve order when sending to multiple partitions.
* at least once with order conservation: RabbitMQ sorts messages when writing them to queue structures, meaning that lost messages can be correctly delivered in order without the need to resend the full batch that lost 1 or more messages.
Note: The only way to guarantee that a message is not lost is by using transactions which are unnecessarily heavyweight and decrease throughput.
ORDERING GUARANTEES:
RabbitMQ will conserve order for flows using a single AMQP channel. It also reorders retransmitted packets inside its queue logic so that a consumer does not need to resequence buffers. This implies that if a load-balancer would be used in front of RabbitMQ (e.g. to reach the scalability of what can be accomplished inside Kafka with partitions), packets that leave the load-balancer on different channels will have no ordering relation anymore.
Kafka will conserve order only inside a partition. Furthermore, within a partition, Kafka guarantees that a batch of messages either all pass or all fail together. However, o conserve inter-batch order, the producer needs to guarantee that at most 1 produce request is outstanding, which will impact maximum performance.
AVAILABILITY:
RabbitMQ Clusters can be configured to replicate all the exchange and binding information. However, it will not automatically create mirrored queues (RabbitMQ’s terminology for replicated queues) and will require explicit setting during queue creation.
For Kafka, availability requires running the system with a suitably high replication factor.
TRANSACTIONS:
AMQP guarantees atomicity only when transactions involve a single queue. RabbitMQ provides no atomicity guarantees even in case of transactions involving just a single queue,.Take e.g. a producer publishing a batch. If any of the messages fails, the producer gets the chance to republish these messages, and RabbitMQ will insert them in the queue in order. After which the publisher is notified that the failing messages did make it and can consider the transaction complete.
Kafka extended support for transactions recently for applications which exhibit a “read-process-write” pattern where the reads and writes are from and to asynchronous data streams such as Kafka topics.
MULTICAST:
RabbitMQ supports multicast by providing a dedicated queue per individual consumer. As a result, the only impact on the system is that there is an increased number of bindings to support these individual queues.
In Kafka, only one copy of messages within a topic is maintained in the brokers (in non-replicated settings); however, the multicast logic is handled completely at the consumer side.
DYNAMIC SCALING
For RabbitMQ, adding additional nodes to running clusters or removing a node from a cluster is well supported. Adding nodes in a RabbitMQ cluster is transparent for consumers
Adding nodes to a Kafka cluster is not transparent for consumers, since there needs to be a mapping from partitions to consumers in a consumer group.
LATENCY RESULTS
In case of RabbitMQ, up to medium level of load, the latency for both at most once and at least once modes is below 10 ms.
In case of Kafka, on the other hand, if it can read from OS cache, its latency for at most once mode is below 10 ms, and about twice as large for the at least once mode. However, when it needs to read from disk, its latency can grow by up to an order of magnitude to around 100 ms.
THROUGHPUT
RabbitMQ is mainly constrained by routing complexity (up till frame sizes of a few 1000 bytes, at which time packet copying becomes non-negligible).
Finally, the error rate level in case of Kafka is not as low as that of RabbitMQ. Two potential causes for these variations are: (i) Kafka relies on OS level caching of disk access, which is a complex hidden subsystem that cannot be accurately modeled or even controlled and is shared across everything that runs on the machine (ii) Kafka runs on the JVM, which has much more variability than an Erlang VM due to unsophisticated locking mechanisms and the garbage collection process.

Best Suit Cases For Kafka

Pub/Sub Messaging. Kafka can be a good match for the pub/sub use cases that exhibit the following properties: (i) if the routing logic is simple, so that a Kafka “topic” concept can handle the requirements, (ii) if throughput per topic is beyond what RabbitMQ can handle (e.g. event firehose).

Scalable Ingestion System. Many of the leading Big Data processing platforms enable high throughput processing of data once it has been loaded into the system. However, in many cases, loading of the data into such platforms is the main bottleneck. Kafka offers a scalable solution for such scenarios and it has already been integrated into many of such platforms including Apache Spark and Apache Flink.

Data-Layer Infrastructure. Due to its durability and efficient client multicast, Kafka can serve as an underlying data infrastructure that connects various batch and streaming services and applications within an enterprise.

Capturing Change Feeds. Change feeds are sequences of update events that capture all the changes applied on an initial state (e.g. a table in database, or a particular row within that table). Traditionally, change feeds have been used internally by DBMSs to synchronize replicas. More recently, however, some of the modern data stores have exposed their change feeds externally, so they can be used to synchronize state in distributed environments. Kafka’s log-centric design, makes it an excellent backend for an application built in this style.

Stream Processing. Starting in Kafka version 0.10.0.0, a light-weight stream processing library called Kafka Streams is available in Apache Kafka to perform stateful and fault-tolerant data processing. Furthermore, Apache Samza, an open-source stream processing platform is based on Kafka.

Best Suit Cases For RabbitMQ

Pub/Sub Messaging. Since this is exactly why RabbitMQ was created, it will satisfy most of the requirements. This is even more so in an edge/core message routing scenario where brokers are running in a particular interconnect topology.

Request-Response Messaging. RabbitMQ offers a lot of support for RPC style communication by means of the correlation ID and direct reply-to feature, which allows RPC clients to receive replies directly from their RPC server, without going through a dedicated reply queue that needs to be set up

Operational Metrics Tracking. RabbitMQ would be a good choice for realtime processing, based on the complex filtering the broker could provide.

Underlying Layer for IoT Applications Platform. RabbitMQ can be used to connect individual operator nodes in a datafow graph, regardless of where the operators are instantiated.

References:

Kafka versus RabbitMQ: A comparative study.