BigData & Web3. RabbitMQ High Availability and High Load. Part 1

Dmytro Nasyrov
Pharos Production
Published in
5 min readJul 31, 2024

This time we will discuss the specifics of RabbitMQ operation with high loads (High Load) and ensuring high availability (High Availability). We will consider various methods of increasing performance and horizontal scaling, and analyze and configure internal tools. Also, as we dive deeper, we will study the main pitfalls of all approaches.

Shovel

It is a message forwarder integrated into RabbitMQ (consumer+publisher). It takes messages from one RabbitMQ (queue or exchange) and forwards them to another RabbitMQ (exchange or queue). It doesn’t have to be to another one — you can forward them to the same one. It works entirely according to the AMQP protocol It can be launched both on the receiving side and on the sending side. It is usually better to configure it on the side of those RabbitMQs, the number of which can change. For example, a new RabbitMQ is up, it starts its shovel and connects to the common flow. If you delete RabbitMQ, you delete shovel.

You can also run shovel on a third party. For example, run RabbitMQ without queues and message processing at all — only with a shovel. The relatively lightweight nature of RabbitMQ allows for this. It helps separate the wheat from the chaff. There can be several shovels in one RabbitMQ instance.

A dynamic shovel is configured after starting RabbitMQ via the console/web interface/REST API. A static shovel is configured via configs. Each approach has its pros and cons, depending on the specific use case. Dynamic shovels can be unloaded via export_definitions. The notation of dynamic and static shovels differs radically, although the set of features is similar (the static shovel can additionally declare and have multiple sources).

If you do not enable the “Add forwarding headers” option, it works quite efficiently. To speed it up, I recommend running several instances of the same shovel even within one rabbit — 3–4 pcs increase the performance quite well, after that the increase is insignificant (they are limited by RabbitMQ). It looks like they work single-threaded. RabbitMQ is very stingy with shovel logging. It is often quite difficult to understand why it does not work, although the configured shovel does not require any additional maintenance.

It can’t do any logic, even the most basic one. Keep in mind if you want some sharding or basic filtering/deduplication, shovel can’t do it. You’ll have to write your solution. Shovel works in a cluster. We get all the clustering capabilities at once, failure of one cluster server does not cause an accident. The shovel restarts on another cluster node.

Federation

This plugin has two separate concepts — federation exchange and federation queue. Federation exchange is a more real, working thing. It connects the internal exchange of the downstream server to the exchange upstream. Each downstream will receive a copy of the message while connected to the upstream. No data accumulation occurs for the downstream — while it is not connected, it loses messages. Setting Expires and Message TTL does not affect this behavior. In terms of performance, troubleshooting and flexibility, we have all the disadvantages of shovel, plus its complete lack of universality. That’s probably why there is very little information about this tool on the Internet.

High Load

First, you need to understand that High Load can be different. In essence, a high load is the moment when your server can no longer handle the load. This can happen even on a stream of one message per second. For example, if you do not have time to process it in a second. And you need some mechanisms for increasing productivity — both vertical (increasing capacity: more RAM, more processors) and horizontal (increasing instances).

What types of high loads can we face:

  • A large number of connections. When there are more than 1000 publishers, problems begin. We need to balance.
  • A large flow of messages. We need to divide the flows by rabbit — either by balancing connections or by allocating a balancing rabbit.
  • A large number of connections/channels are being opened/closed. It is advisable not to allow this, but if there is no choice, put AMQPproxy between the publisher and RabbitMQ.
  • Messages do not have time to be processed. The task is beyond the scope of this article. If you cannot simply scale the number of consumers, you need to optimize the processing speed, and the rabbit has nothing to do with it.

A large number of connections from the publisher

When there are more than 1000 publishers, problems begin. You can increase the number of available connections, but, as practice shows, such a solution leads to new problems in completely unexpected places. It would be more correct to install a connection balancer and launch several rabbit instances to receive messages

As a connection balancer, I recommend using HAproxy. I experimented with different balancers, and only HAproxy gave a stable result. In practice, nginx coped very poorly with maintaining AMQP connections, although perhaps there is some secret option that solves these problems.

By the way, this scheme already adds a certain layer of fault tolerance — if one of the rabbits fails, HAproxy will balance the connections across the remaining rabbits.

Now messages from all publishers will be collected on different RabbitMQ instances, all that remains is to combine them into one common RabbitMQ. For this, RabbitMQ has shovel and federate mechanisms, both will help you in this situation, but I would recommend using shovel in this case, running on each instance of external RabbitMQ

We will continue in the next article. Stay tuned!

You can say Hi to us at Pharos Production — a software development company

https://pharosproduction.com

Follow our product Ludo — the reputational system of the Web3 world

https://ludo.com

--

--

Dmytro Nasyrov
Pharos Production

We build high-load software. Pharos Production founder and CTO.