The Hare and the Tortoise Messages — Part 1

Hamza
The Binary Bin
Published in
4 min readApr 30, 2021

RabbitMQ is a lean message broker that is quite handy for asynchronous task processing — producers simply dump their tasks into a queue and consumers pick them up and process the tasks. If a consumer dies for any reason (maybe because of too much work), no acknowledgment of the task completion is sent and RabbitMQ lets another consumer pick up and handle the task instead — quite a slave driver indeed.

With all these tasks the consumers are performing, it shouldn’t come as a surprise that not all of them succeed all the time. If there was only a minor glitch, the consumer could simply retry the task. But if there is a deep-rooted problem that needs fixing before the tasks can complete, there is no straightforward action the consumer can take besides discarding the task entirely (woo hoo!)

While consumers might find immense joy in not having to deal with the drudgery, we certainly aren’t pleased with our all-important tasks being thrown in the trash heap. What if there was a way consumers could notify the queue that the task had failed and could only proceed once a fix was ready?

Tortoises beating the hare — not fair but who cares?

Enter TortoiseMQ — the bane of consumers. TortoiseMQ is a light wrapper around RabbitMQ that lets consumers notify it when a task has failed. TortoiseMQ then puts the task in a store where it can be retrieved and re-dumped into the queue once the fix is in.

To handle the storage and flexible retrieval of failed tasks, TortoiseMQ uses ElasticSearch (anything that does the job could be used instead). When you’ve fixed the problem that might be causing a certain set of tasks to fail, you can query for those tasks (based on the error message perhaps?) and TortoiseMQ will dump those tasks right back into the queue for the poor consumers to slog through again.

Here’s a scenario where TortoiseMQ can help us steadily complete our tasks: We begin by running a RabbitMQ server — Docker can help us out here.

docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management

With our server running, we can set up a producer and consumer. We use the pika python client library for connecting to our RabbitMQ server:

pip install pika

We have a python script slave-driver.py that doles out tasks to the queue every time it is run. Well, technically it can only send messages to an exchange but we provide the routing_key as our queue name so the default exchange can put it in the desired queue. Oh, and we’re also making the queue and messages durable so the tasks aren’t lost even if RabbitMQ dies — we love punishing those consumers after all.

We also need a consumer script worker.pythat starts up a consumer and gets him listening for tasks. Normally, the tasks are provided to the consumers in a round-robin manner, but to be a bit fairer to the consumers, we use channel.basic_qos to not route a task to an already busy consumer (We don’t really care about them— we just want our tasks completed as quickly as possible.)

So what is the task? The consumers have to sleep for the number of seconds specified by the dots in the message — five dots means sleep for five seconds. So their “task” is really to stop listening for a moment and go to sleep. And you thought we were cruel.

We can start up consumers in different terminal windows: python worker.py. And add tasks by running python new_task.py <message> where the message can be any string with or without dots: hello, ...first, .......naps and so on.

You’ve successfully managed to put workers to sleep and everything seems hunky-dory. But some of these workers have started having nightmares disrupting their (well-deserved?) sleep. Specifically, if the message contains “work” as a substring, the workers get nightmares because of bad memories and are unable to sleep properly.

Fortunately, we can provide the workers happy memories by providing them a message that contains food in it: python slave-driver.py food. Now, they’ll have happy memories and can proceed to have sweet dreams. The trouble though is that all the tasks we’d already provided which contained “work” in them would have been lost.

Sometimes, failures require fixes to different parts of the system. But by the time we realize the problem and come around to fixing them, the damage is already done and much work (sleep?) has been lost. TortoiseMQ aims to minimize this loss by keeping track of failures and allowing you to retrigger the tasks once the underlying problems have been resolved. The only catch though is that work might get executed when it’s not supposed to (a really old failure that was forgotten, for example). But as long as we’re careful around this, the ability to never “lose” tasks combined with the flexibility in choosing the tasks to retrigger make TortoiseMQ a compelling choice for fragile environments.

We’ll build out TortoiseMQ in the second part and see how it helps preserve failed tasks. In the meantime, please don’t give “work” work to the workers.

--

--