Proven: Starlight for RabbitMQ and Apache Pulsar Offers 10x Better Throughput

RabbitMQ was once popular as an open-source message queuing platform, but struggles to keep up with the scale and performance requirements of today’s technology solutions. Now, with our recently launched Starlight-for-RabbitMQ, a thin proxy that adapts Apache Pulsar to the RabbitMQ/AMQP 0.9.1 protocol, enterprises can take advantage of Pulsar’s incredible performance and horizontal scalability without requiring them to rewrite their RabbitMQ applications.

This adapter opens a new world that pairs Apache Pulsar’s cloud-native performance with existing RabbitMQ applications. Using Pulsar as a storage backend with an AMQP implementation has its advantages, such as the ability to replay previously consumed messages.

Compared with RabbitMQ, Pulsar’s log-based approach provides over 10 times more throughputs. Whether you have a consumer on the other end of your topic or not, you may still need at-least-once delivery working reliably, making Pulsar your go-to solution.

But don’t take my word for it. Keep reading to see the benchmark proof of Starlight-for-RabbitMQ.

A brief history of RabbitMQ delivery guarantees

RabbitMQ was primarily designed for fire-and-forget applications where messages lived in-memory and losing messages was acceptable. But for many applications today, this is no longer unacceptable and at-least-once delivery is required.

To support these use-cases, a number of features were added to RabbitMQ:

  • Replication: The data needs to be replicated on multiple servers so that if one of the servers crashes, even indefinitely, the data can still be accessed on another server. For this, RabbitMQ introduced clustering with mirrored queues, and more recently quorum queues.
  • Persistent messages: The AMQP protocol has a per-message “persistent” flag so messages will not be lost in case of a server restart. In RabbitMQ, this concept is orthogonal to the fact that the message is effectively written to disk.
    Under memory pressure, RabbitMQ will indeed write transient messages to disk. In normal memory conditions, persistent messages will be present in both disk and memory. If a persistent message can be removed from the queue fast enough because there’s a tailing consumer that can get and acknowledge it, the message won’t go to disk; instead, it’ll be removed from the queue immediately.
  • Producer confirms: To ensure that messages are not lost between the producer and the broker, producer confirms must be activated on the channel. When this is activated, for a persistent message routed to a durable queue, an acknowledgement message will be sent by the broker after persisting the message to disk (or receiving an acknowledgement from a tailing consumer to which it was sent).
    When the queue is mirrored, the acknowledgement is sent only once all relevant nodes have persisted the message. For performance reasons, messages are batched and fsync’d to disk every 200ms.
  • Consumer acknowledgements: The client application needs to send explicit acknowledgements (AMQP parameter no-ack set to false) to ensure that messages are not lost between the broker and a consumer.
    When RabbitMQ receives the acknowledgement, it removes the message from the queue (and from the disk if the message was flushed) or re-queue it in case it’s a negative acknowledgement.

OpenMessaging Benchmark

To perform the benchmark, we chose the OpenMessaging Benchmark suite. It has native support for RabbitMQ, which keeps things simple. The framework comes with Terraform and Ansible scripts, providing a simple way to quickly deploy a reference setup on a cloud provider (we used AWS).

OpenMessaging Benchmark enabled us to spawn multiple client workers (which served as producers and consumers to the broker) and ensured the client side wasn’t the bottleneck. We focused on the following configuration parameters:

  • Producer rate: This can be set to either a very high rate to detect the maximum throughput, or a lower rate to measure the latency at various throughputs.
  • Number of topics: In the RabbitMQ world, the “topic” concept is mapped to a fanout exchange. A sufficient number of topics was set to ensure all nodes and CPUs of the cluster were used.
  • Number of producers per topic: We set a sufficient number of producers (which are spread over the client workers) to ensure the producing clients weren’t a bottleneck.
  • Number of subscriptions per topic: Also in the RabbitMQ world, the “subscription” concept is mapped to a queue. We used 1 subscription per topic in our tests.
  • Number of consumers per subscription: We set a sufficient number of consumers (which are spread over the client workers) to ensure the consuming clients were not a bottleneck.
  • Size of the messages: We used various message sizes that represent realistic application messages.

With a configured, ready-to-go OpenMessaging Benchmark instance, we’re ready to shift focus to the driver of each Broker.

Setting up RabbitMQ driver for tests

To benchmark RabbitMQ, we contributed several enhancements to the OpenMessaging Benchmark RabbitMQ driver:

These additions kept the benchmark fair and the testing realistic.

Creating an integration for the Starlight-for-RabbitMQ driver

We developed an OpenMessaging integration for Starlight-for-RabbitMQ. Based on the RabbitMQ integration, this integration adds some code to set up the Pulsar cluster during the initialization phase.

The test scenarios

Benchmarking RabbitMQ presented a few challenges. For example, RabbitMQ uses multiple factors (transient/persistent messages, memory pressure, consumers tailing or not, to name a few) to decide between in-memory or disk.

We decided on the typical scenario where RabbitMQ is used as a decoupling buffer between producers and consumers. This introduced a common need where it’s important to maintain the producer rate even when there’s a failure of the consumer(s).

We used the backlog building feature of OpenMessaging Benchmark, stopping the consumers until a certain size of backlog is reached. After that stop, it resumes the consumers and will attempt to drain the backlog. We configured replication on both Starlight-for-RabbitMQ and RabbitMQ and used persistent messages and publisher confirms. Since there are no tailing consumers during the backlog build, we are sure that messages go to disk. This can be verified by looking at the disk usage.

Note: all the results below are for messages of 1kB size.

Hardware setup

We installed RabbitMQ and Starlight-for-RabbitMQ on two distinct clusters of three AWS i3en.6xlarge instances, each with two SSDs.

For the Starlight-for-RabbitMQ cluster, a Pulsar broker and a Bookkeeper instance are installed on each node. Starlight-for-RabbitMQ is deployed as a protocol handler inside the brokers. We used one disk for the journal and the other for the ledgers. Zookeeper was deployed on a separate t2.small instance to hold Pulsar’s metadata.

RabbitMQ nodes are configured to use one of the SSDs for the message store.

Test results for RabbitMQ only

RabbitMQ uses 1 CPU/process per queue. Since we have a cluster of three machines that have eight CPUs, we configured the workload to 24 topics and 1 subscription per topic. This creates 24 queues that use all the CPU resources we have on the cluster.

(1 queue/CPU) * (24 CPUs) = 24 queues

We tested several producer rates and looked at the latency. When the consumers match the rate of the producers (and the queue size stays low), RabbitMQ works in-memory only and the publishing latency is very good (sub-millisecond average for 30k messages per second).

When trying to find the max throughput (but still keeping the messages in-memory), we found it could reach about 50k messages per second before a significant increase in latency.

As expected, when the backlog builds, the throughput drops, due to RabbitMQ redirecting the messages to disk. We found it could only maintain a throughput of 30k messages per second.

Here are the results for the full test:

Figure 1. RabbitMQ publish and consumption rates.
Figure 2. RabbitMQ publish latencies.

As you can see, when the backlog rises, the producer rate of 30k messages per second can’t be maintained and the latency spikes over 10 seconds.

When the consumers resume and the backlog drains, we see that consumer throughput goes up to 140k messages per second and the producer rate drops severely. It seems RabbitMQ favors the consumers over the producers.

With lower producer rates (eg. 10k messages per second), it’s easier to maintain the rate during the backlog construction. However, we still see regular latency spikes of several seconds. And, we still get the producer rate drop when the consumers catch up.

Starlight-for-RabbitMQ test results: 10x throughput

Since Starlight-for-RabbitMQ is built on Pulsar, a high performance streaming platform, we achieve much higher throughputs than with RabbitMQ alone. For the backlog building test, we used a conservative 350k messages per second producer rate.

Here are the results:

Figure 3. Starlight-RabbitMQ publish and consumption rates.
Figure 4. Starlight-RabbitMQ publish latencies.

Notice the stable producer rate during the whole test has a very reasonable publishing latency (< 10ms in average, < 100ms at P99).

Thanks to the power of Apache Pulsar and its log-based architecture, we can get more than 10 times the throughput using Starlight-for-RabbitMQ versus using RabbitMQ alone. In addition, the publisher rate and latencies are much more predictable when things deteriorate on the consumer side.

Conclusion

We hope we’ve presented a fair comparison between RabbitMQ and Starlight-for-RabbitMQ. The performance we found was on par with case studies RabbitMQ have done, so we believe we did it justice.

Eventually, we benefit from all the other features Pulsar brings to the table (message retention, possibility to replay data, geo-replication, and lightweight computing).

Apache Pulsar is one of the most popular cloud-native messaging and streaming platforms with phenomenal usage by enterprises like Tencent, Comcast, and Verizon Media. If you are not too familiar with it, read more about Apache Pulsar here and discover its powerful features.

We also encourage you to run your own benchmarks with our suggested testing environment. You can find the code and configurations used in this fork of the OpenMessaging Benchmark.

Unlike most traditional message brokers, Pulsar at its heart is a streaming platform designed for horizontal scalability. We are just scratching the surface of what can be achieved with Starlight-for-RabbitMQ.

Follow the DataStax Tech Blog for more developer stories. Check out our YouTube channel for tutorials and DataStax Developers on Twitter for the latest news about our developer community.

Resources

  1. Starlight-for-RabbitMQ
  2. Starlight for JMS
  3. Apache Pulsar
  4. RabbitMQ
  5. RabbitMQ/AMQP 0.9.1 Explained
  6. The OpenMessaging Benchmark Framework
  7. GitHub: Update RabbitMQ TF for latest terraform
  8. GitHub: OpenMessaging integration for Starlight-for-RabbitMQ
  9. GitHub: DataStax OpenMessaging Benchmark
  10. Cluster Sizing Case Study Part 1

--

--

--

We’re huge believers in modern, cloud native technologies like Kubernetes; we are making Cassandra ready for millions of developers through simple APIs; and we are committed to delivering the industry’s first and only open, multi-cloud serverless database: DataStax Astra DB.

Recommended from Medium

Efficiently Generating a Python Dictionary to Store Item Counts

Flutter for web & mobile development — cross-platform tips

Reaching PegNet Community Consensus via PoW — Factomize

The Easiest Haskell Idiom

Shiba Inu Price Steps Into Buy Zone ! Here Are The Entry And Exit Levels

Deploying to Multiple Kubernetes Clusters with the K8ssandra Operator

CyberSecLabs — Shock Write up

Testing that no exception was thrown in Java

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DataStax

DataStax

DataStax is the company behind the massively scalable, highly available, cloud-native NoSQL data platform built on Apache Cassandra®.

More from Medium

Easily Manage Workflows at Scale with Temporal.io and Astra DB

3.0.0 Alpha Release !

Zero Copy. One Of Reason Behind Why Kafka So Fast.

Quarkus Vs Golang APIs in AWS Lambda — A Comparative Study