The Ack problem — Part 8

Common patterns in a message consumer

Philippe Detournay
Xendit Engineering
4 min readJul 3, 2023

--

Last post was quite in-depth and covered typical patterns to achieve idempotency in an API server. This post will focus on message consumers.

Oh no! This Belgian guy is now making Belgian memes!

It’s all the same thing again

Conceptually, a message consumer is only a downstream API service to a message broker, whereas a message broker is only a downstream API service to a message producer. The “async” nature of messaging only comes from the fact that the broker may send the Ack to the producer before trying to call the consumer, but this doesn’t change anything to the fact that each of these calls will face the Ack problem.

Messaging is just one intermediary for an API call

There is no such thing as “fire and forget” in messaging: sending the message to the broker is a synchronous API call like any other, so the producer must wait for the Ack, and should perform retries in case of errors like for any other API call.

As quickly discussed previously, message brokers typically won’t provide an idempotent “send message” API. This is typically not a useful feature, as:

  • Having an idempotent call requires distributing the message (or at least the idempotency token) to all brokers, which can have a performance impact;
  • The idempotency token must remain stored even after the message has been delivered, which typically goes against “typical” broker logic that messages are to be deleted or archived after delivery;
  • The consumer must also be idempotent, and should therefore accept an idempotency token from the broker. This idempotency token would be derived from the producer’s token anyway (as we discussed before, downstream tokens should be derived from upstream tokens), therefore it is simpler to rely on the producer to add an idempotency token in the message and let the consumer perform idempotency based on the message content.

In summary:

Idempotency in messaging is typically done between message producer and message consumer, without relying on idempotency at broker-level

Note that some message brokers will provide some level of idempotency feature (such as Kafka), but these are only useful in very specific scenarios and features (Kafka Streams) and are typically not adding much value outside these scenarios.

More specifically: using idempotency between the producer and the broker will not eliminate the need to implement idempotency between the broker and the consumer, and as we’ve seen before this cannot be implemented in a library but must be done from within the core processing logic.

Consumer idempotency

Because the broker faces the Ack problem, it will have to perform retries in case of technical/network errors from the consumer. The consumer must therefore:

  • Be ready to accept duplicate messages caused by retries from the broker to the consumer;
  • Be ready to accept duplicate messages caused by retries from the producer to the broker;
  • In short: be made idempotent based on the message content.

In other words, the consumer is just another API server. Therefore, a typical message consumer will implement all the patterns already discussed in the previous blog:

  • The message will have an idempotency condition such as a transaction reference or any suitable idempotency token;
  • This idempotency token has been derived from upstream;
  • If a database is used, it will contain this idempotency token one way or another (using either lock or lock-free strategies);
  • If downstream services are called, the message idempotency token will be used to build these additional calls’ idempotency tokens.
I knew it!

A final word about Kafka

When Kafka sends a message to a consumer, the only possible reply it expects is “ack” (specifically, the message must be committed before further messages can also be committed). Any other situation is treated as an error and will trigger a message re-emission. There is an explicit assumption that the consumer is idempotent and retrying is always a safe option.

Since there is no feature to request Kafka to “requeue” or “try again later” or anything like this, if the consumer finds itself incapable of processing the message (i.e. the database is currently not available or something), we’ve seen before that halting the consumer is a valid idempotent behaviour. “Do or die” is a typical and totally valid Kafka consumer behaviour.

Conclusion

We are now reaching the end of this series of articles on the Ack problem. In the next and final article we will first have a very, very small sneak peak at a much larger topic that is disaster recovery and backup handling to see how it is also driven by the same root causes.

--

--