Exactly-once with Distributed Systems

Sheng Hau

Published in

Government Digital Services, Singapore

3 min readAug 7, 2018

Standalone systems are easy. But once you have two systems working together, things get complicated.

Photo by Charles Deluvio 🇵🇭🇨🇦 on Unsplash

The story of A and B

Consider a relatively simple scenario, where [A] tells [B] to do something.

[A] —> [B]

It is still quite manageable if:

[B] can finish the task before the connection timeout, and returns a HTTP response code indicating success or failure
[B] takes quite a while to finish the task, but knows how to call a callback provided by [A] when it’s done
[B] doesn’t know how to call a callback, but provides a way for [A] to check on the status of the task

What can go wrong?

As you might have guessed, this is not the end of the story. A few things could go wrong here:

[B] could finish the task, but fails to send back the HTTP response
[B] could finish the task, but fails to call the callback
[B] could finish the task, but tells [A] that it’s not done when asked

All the above scenarios would mislead [A] into resubmitting a duplicate request. This is the fun part of distributed systems. We can only choose between “at least once” or “at most once” semantics. If we wanted the task to be done at least once, [A] should keep re-sending it until we get a confirmation from [B]. And if we wanted the task to be done at most once, just send it once and hope for the best (or don’t send it at all, but that’ll be cheating). So there’s no way to do something exactly once?

There’s still hope

There is something that [B] can do to ensure “exactly once” semantics.

De-duplication — if [B] is able to identify duplicate requests, then it’s ok for [A] to follow the “at least once” strategy
Idempotency — if the requests are idempotent, then it’s ok for [B] and do the same task more than once
Transactions — if the processing and acknowledgment of tasks can be wrapped in a transaction, then we wouldn’t have processed tasks without acknowledgments

De-duplication and idempotency is dependant on the business logic, so i’ll leave that to you to implement.

As for transactions, the general idea is to make the processing and acknowledgment steps atomic. For example, using database transactions, we could do the following on [B]:

Receive the request from [A]
Compute some results that needs to be persisted
Use a database transaction to:
a. Record the results
b. Record that the task has been processed

This way, the results of the task would only take effect if and only if it is marked as done.

Other than databases, Apache Kafka also supports transactions (https://www.confluent.io/blog/transactions-apache-kafka/). If you’re the adventurous type, you can also roll your own using a two-phase commit protocol (https://en.wikipedia.org/wiki/Two-phase_commit_protocol).

Thanks for reading, and i hope this was useful to you. Leave a comment below if you know of other ways to achieve exactly-once semantics, or if you have a scenario which the above doesn’t address. I would love to hear from you. 😁

Exactly-once with Distributed Systems

The story of A and B

What can go wrong?

There’s still hope

Written by Sheng Hau