Distributed Transactions

Managing transactions across services

Published in

Applaudo Tech Blog

7 min readMar 10, 2022

Before we dig deeper into our topic, let’s analyze the meaning of Distributed Transactions? and how does it differ from a “regular” transaction?

We can define a transaction as a series of successful actions to modify our data repository. Whenever an action fails we will go back to where it all started.

Here we have a request made by the client to place an order, where we can expect the data updated in the repository, or not, depending on the action result.

Now, the same idea should apply to Distributed Transactions. Let’s take a look at an example:

Ok, what’s going on here? An action failed on the Payment service, and only the balance is reverted but not the order information.

Now that we have different services in the picture returning to where everything started, becomes more challenging. So, picture if we have additional services like inventory, delivery, notification that would become unmanageable very fast.

From the picture, we can think of a process that will revert the order and items information but, which component should be responsible for calling it? Which one will keep track of what should be saved/reverted?. Thankfully we have some patterns we can explore.

2 Phase Commit.

This pattern aims to solve the challenges raised by distributed systems by appointing a coordinator, which will be responsible for tracking the operations required by each action, when it should save (commit), the information, and when it should restore it (rollback).

Phase 1 — Voting

The coordinator starts the process by requesting each participant to get ready. And each participant will reply if they were able to get ready or not.

From this phase, the coordinator will wait for each request to be confirmed. When a participant cannot satisfy this request will be detected by the coordinator.

This phase is known as the voting phase because the coordinator will only continue with the next phase when all votes are successful, otherwise will execute abort the processes.

Phase 2 — Commit - Abort

When all voters have replied, the coordinator will initiate the second phase known as the commit-abort phase.

From this point, either of the below processes could occur:

All voters replied success, meaning they are ready to commit, in which case the coordinator will initiate a commit request to all participants.
At least one voter replied with an error, in which case the coordinator will ask each remaining participant to abort.

Now, let’s examine what we have achieved:

Now we can identify the coordinator as responsible for tracking the actions and rolling them back in case of failure.
Atomicity, all actions are committed or the repository is restored to where it all started.
We could have more services involved without adding complexity to the process.
The pattern complies with the ACID properties of a transaction.

Keep in mind

The coordinator and participants are waiting while all responses (votes, acknowledge) are resolved. Meaning, participants are blocked until the coordinator tells them they can move forward and commit or abort.
Transactions could be left incomplete if the coordinator crashes
The execution time will be impacted by the slowest process involved in the distributed transaction.

SAGA

Saga is a coordinated pattern that introduces new concepts seeking to manage the problems involved in distributed transactions.

Saga is compound by multiple steps, in which each step manages its transaction.
Each step has an activity to achieve and a compensation operation to revert any change made. Important, compensation must be idempotent and retryable.
The pattern relies on an event-driven architecture, in which we have components interacting with each other by emitting and consuming, and reacting to events.

Saga Flow — Happy and Compensation paths

Here we have Saga represented by a series of steps, in which each step executes its operations within its transaction and continues with the next one to complete the saga. Along with each step, there is a compensation operation that could be executed or not.

In contrast with 2-Phase Commit, Saga pattern’s steps (participants) executes and commits as they receive a request.

Compensations

But, why are compensations necessary? Let’s first examine what would happen when a step fails.

By design in the Saga pattern, rollback is not possible because data has been committed already by the step. So, when a step fails, it informs others to undo their modifications.

The Saga pattern points that calling the compensation operation on each successful step will undo the committed modifications.

Notice that compensation does not necessary means deleting the data, for example:

An already processed payment might require cancellation or even a refund.
Emailing a customer indicating a placed order might need another notification reporting an issue.

So far we’ve explored the way steps execute their operations and their compositions (when it’s needed), now let’s see who is responsible for keeping track of the whole process.

And for these, let’s examine two ways we can implement this pattern.

Choreography Pattern

The Choreography pattern introduces a few new components in the picture, like Saga Execution Coordinator (SEC), Saga Log.

The Saga Execution Coordinator responsible for:

As we could expect, tracing the operations made by the steps, which ones have been ended successfully and which ones not.
Logging operation’s result into the Saga Log
Knowing for each step, which compensation operation must be executed in case of failure.

The Saga Log is storing each outcome from the different steps and enable us to:

In case of failure, walk back thru the completed steps within the saga to compensate them.
Compensate incomplete sagas when the SEC is recovering from a failure.

Before breaking down the above example, let’s examine these essential points about steps and how they are interacting with the SEC and each other.

Steps interact with the SEC by reporting when a Saga starts, ends and when a step transaction has begun, ended, or failed. And now, Steps interact with each other by triggering the transaction on the next step of the Saga.

Looking at our example, placing an order relies on three services. Starting from creating the order, thru the payment service, and last, allocating the products. The SEC will receive an event notifying when each of those services starts and ends a transaction, and each service will receive a command to commence its transaction.

And last, the SEC is tracing actions by adding elements to the Saga Log.

Let’s review another example:

What do we have here? The Payment Service published an event reporting that payment did not go thru. With this notification, the SEC starts the compensation process. By reading the Saga Log, the SEC will know which transactions need to be compensated.

Takeaways from Choreography pattern

By emitting events, participants know when to start their transactions.
Participants know each other, or at least when to react to an event produced by others.
SEC could be embedded in each participant since participant A could start its compensation by reacting to an event produced by participant B.
The Saga is successful when all participants complete its transaction.
The Saga is compensated when one of the participants emits an error.

Orchestration Pattern

One key difference between the Chorography implementation with Orchestrated relays on the SEC. The SEC is now totally independent and responsible for managing the Saga from the beginning to the end.

As we can see now, steps receive a command message from the SEC to start a transaction, and in return, the step replies with the operation result.

The SEC decides to continue with the Saga based on the replied message.

A message indicating success will mean that we can issue a command message for the next step transaction.
A message indicating failure will start a compensation process issuing the corresponding commands.

Takeaways from Orchestration pattern

The SEC emits commands so participants can start their transactions
The SEC reacts based on the events emitted by participants.
The SEC will move on to the next transaction when successful events are received or start compensating when an error is caught.
The Saga is successful when all participants complete its transaction.
The Saga is compensated when one of the participants emits an error.

Keep in mind

The atomicity is accomplished by either completing all transactions within the Saga or by compensating all operations.
As we pointed up before, compensation has to be well-design and does not always will mean deleting records in a DB.
By executing compensations we achieve the consistency of the information.
Isolation cannot be accomplished since information will be available before all the Saga is completed.

Distributed Transactions

Managing transactions across services

2 Phase Commit.

Phase 1 — Voting

Phase 2 — Commit - Abort

Keep in mind

SAGA

Compensations

Choreography Pattern​

Takeaways from Choreography pattern

Orchestration Pattern​

Takeaways from Orchestration pattern

Keep in mind

Written by Luis Chong

Choreography Pattern

Orchestration Pattern