The Ack problem — Part 3

The acks you get from the message broker

Philippe Detournay
Xendit Engineering
3 min readJun 6, 2023

--

Messaging is just another request-response call to a broker

Last time, we discussed about the risk of duplicate database records due to the request-response nature of the DB protocol.

Messaging, at first glance, is a different beast: you don’t wait to get a reply before moving on. As a matter of fact, you are typically not expecting a reply at all: you just broadcast some events and you are not really interested to know what other services will consume them.

Except that messaging is just an API call delegation to a third-party service. What you really do with messaging is asking another service, typically called a message broker, to take the responsibility for the later delivery to potentially interested downstream services.

And while you delegated the responsibility for the final delivery, you still need to ensure that this delegation has been setup successfully: once your package is in the hands of UPS, it’s not really your problem anymore, but if you leave it for UPS collection in the middle of the street without supervision and it gets lost before the UPS driver comes, it is your responsibility alone!

Hello? Anyone here?

I’ve seen messaging described as “fire and forget”. Nothing could be further from the truth! You still need to ensure that your broker received the message request before moving on.

The broker Ack problem

As usual, if you call your message broker to submit a new message for further distribution, you can’t be sure whether this delegation has been setup successfully or not until you receive the Ack from the broker. What if you don’t receive the Ack, then the message may or may not have been submitted, which means that the consumers may or may not receive your message.

We can make the same argument for the broker as what we did in the introduction:

Sending a message is a synchronous call to the broker

If you do not check and wait for the Ack, you can never be sure your message has been safely delivered to the broker, and this may mean the message is lost. But if you wait for the Ack and you never receive it (or you receive some technical or network error), then you are facing the Ack problem.

There are many possibilities from there:

  • There is a way for you to check whether the message was actually sent or not, by querying the broker;
  • You just report an error to your own caller, rolling back any ongoing transaction. But this means that you may have no record, in your database, of a message that was actually sent;
  • You move forward and ignore the error, but this means you may, in the end, fail to send the message that is required by downstream services;
  • You may retry to send the message, but this means that you may be sending duplicate requests. And what if the retries all fail (if the network is down, this is actually quite likely to happen)?

It seems introducing this messaging approach didn’t solve any problem at all. Actually, it may have introduced much more problems.

In a future post, we’ll discuss typical messaging patterns and how to handle these situations.

Yes please…?

In the previous article, we’ve seen that the Ack problem has unexpected consequences on how to interact with our database. In the present article, we extended the conversation to the message broker. In the next post, we’ll see that it applies to any kind of API.

--

--