Hey Mike Gershunovsky, nice article on distributed deadlocks and sharding!. I had a quick question.
Sachin Malhotra
2

Hey Sachin Malhotra great question. First of all we never rollback. If the first transaction completes we will make sure we complete the second one eventually or we abandon it.

In some cases we are fine with only one side committing, for example if I leave an emoji reaction on a payment, we can update the view for the recipient first, then the sender. This way if we fail to update for the sender, the recipient will still see it but the sender won’t. The sender can try again and it will be updated for both. At least we’ll never have a case where the sender thinks they reacted but the recipient doesn’t see it!

In other cases where we must have consistency, we will always enqueue a retry job during the first transaction to ensure the second transaction is executed eventually. For this we use a job queue and make sure the transactions are idempotent, so we can retry it safely until both transactions succeed. Jesse’s post on movements (linked in article) talks more about how we synchronize two customers with eventual consistency.

There’s also the possibility that the second transaction can never succeed due to a bug. We try to write resilient code so this doesn’t happen, but in rare cases we do have to make manual fixes.

It’s also worth noting we have an abstraction around our job queue that allows transactional enqueues - that is the job will only run if the transaction succeeds. This uses a simple table with tokens to know which transactions committed and the job also knows about this token.