Consume idempotently

I’ve been using the RabbitMQ in production for 3 years and up until recently, I haven’t considered the fact that you may sometimes consume the same message twice.

Let’s illustrate the idea.


Somewhere in our web app, we have a code which is triggered on every HTTP request. Among other things, it logs requests by posting messages to RabbitMQ.

public class HitMessage
{
public string UserIp { get; set; }
public DateTime Timestamp { get; set; }
}

Somewhere else we have a handler which dumps it into MongoDB.

public static void LogHitRequest(HttpRequestReceived message)
{
requestCollection.InsertOne(new RequestDocument
{
UserIp = message.UserIp,
Timestamp = message.Timestamp
});
}

However, if you think about it more (or if somebody tells you), you will realize that this handler is not as good as it could be. If somebody unplugs the power cable from the server right after the moment the MongoDB got the insert, RabbitMQ won’t receive its ACK for the message and won’t remove the message from the queue. The message will be re-sent, and we’ll log the same request twice.

This might or might not be the end of the world, by the way. But you should be aware of it.

This particular case could be fixed quite easily if we assume that no two different requests could happen at the same millisecond. And again, in this case the stakes are low.

There are cases that just can’t be fixed. For example, if you’re sending the emails, you have no way to be sure if a specific email was already sent or not (you may log sent mails, but the power could be turned off right after you sent the email and before you log it). So, we have two options here, both of them non-perfect:

  1. at-least-once: if we ACK (=tell the broker “That message you gave me is handled, please don’t send it anybody else”) the message only after we actually send the email, we may sometimes send the same email multiple times.
  2. at-most-once: if we ACK right after we received the message from the broker, we may sometimes lose the email.

As a rule of thumb, I’d strive to eliminate the cases when we sent the same email twice. So I would choose at-most-once for all but the most critical emails.

So, to summarize:

  1. Write idempotent handlers if you can.
  2. If you can’t, mind at-least-once/at-most-once trade-off.

PS: I should have read about Two Generals many lines of code ago.