Saga pattern — Transactions and Data Consistency in distributed systems
The Saga pattern is a way of managing transactions and data consistency in a distributed system. It involves breaking down a complex workflow into a series of smaller, independent transactions, each of which is referred to as a “saga.” These sagas are designed to be able to execute and compensate for one another, in order to ensure that the overall workflow is completed successfully.
Imagine that you have a current functionality with cross domain updates into a relational database. In my case, I’ve work as consultant on a fintech project where I helped to design and implement a functionality for merging checking accounts. In a traditional monolithic system, this type of cross-domain update into a relational database could be handled within a single SQL transaction. However, in a distributed architecture, with microservices, there’s no transactions and it can be difficult to ensure data consistency. This is where the Saga pattern can be useful, as it allows for the creation of smaller, independent transactions, or sagas, that can be executed and compensated for independently in order to ensure that the overall workflow is completed successfully, even if one of the microservices fails.
Orchestration vs Choreography
Orchestration and choreography are two approaches for implementing the saga pattern and managing transactions in distributed systems.
Orchestration refers to a centralized approach, where a single entity, such as a saga coordinator, is responsible for managing the execution and compensation of the individual sagas. In this approach, the sagas communicate with the coordinator, which coordinates the execution and compensation of the sagas as needed.
Choreography, on the other hand, refers to a decentralized approach, where the individual sagas are responsible for communicating and coordinating with one another directly. In this approach, there is no central coordinator, and the sagas communicate with each other using a publish/subscribe model.
Both approaches have their own benefits and drawbacks, and which one is more suitable will depend on the specific needs of the system. Saga orchestration can be simpler to implement, as it relies on a central coordinator to manage the execution and compensation of the sagas. However, it can be less resilient, as the coordinator is a single point of failure. Choreography, on the other hand, is more resilient, as it does not rely on a central coordinator. However, it can be more complex to implement, as the sagas must be able to communicate and coordinate with one another directly.
How?
To illustrate this, let’s consider the merge account example. For the purposes of simplicity, I am simplifying the process as follows:
To illustrate this, let’s consider the merge account example. For the purposes of simplicity, I am simplifying the process as follows:
- Check that the accounts are eligible to be merged.
- Transfer the user information, statements, expenses, balances, and other data from the source account to the destination account.
- Update any references to the source account to point to the destination account.
- Close the source account.
- Notify user.
Each of these steps can be represented as a separate saga, and they can be executed and compensated for independently. For example, the “transfer data” saga might consist of the following steps:
- Transfer the user information.
- Transfer the statements.
- Transfer the expenses.
- Transfer the balances.
- Update references from the source account to the destination account.
- Close source account.
- Send merge notification (push notification, email, sms, etc.)
If any of these steps fail, the “data transfer” saga can compensate for the failed transaction by reversing the changes made in the previous steps. For example, it might include a step to roll back the transfer of the user information, and to adjust the balances and expenses accordingly.
In this example, I am implementing the Saga orchestration pattern in a distributed system with multiple microservices that are loosely coupled and do not have direct knowledge of one another. In this pattern, there is a centralized workflow orchestrator that manages the execution and compensation of the individual sagas. Each box in the workflow diagram represents an API, which may be implemented as part of different microservices. The orchestrator coordinates the execution and compensation of the sagas as needed, ensuring data consistency and integrity in the system.
The Saga pattern is particularly useful for managing complex, multi-step processes like account merging, as it allows for the individual steps to be executed and compensated for independently. This can help ensure that the overall process is completed successfully, even if one of the microservices fails.
Benefits, drawbacks and challenges
There are several benefits to using the Saga pattern in a distributed system. First and foremost, it helps to ensure data consistency and integrity, as each saga is designed to execute and compensate for one another, in order to ensure that the overall workflow is completed successfully. It adds resiliency, if one microsservice fails the Saga should rollback the previous steps and ensure that this failure won’t mess up with the workflow execution. In addition, the Saga pattern can help to simplify the design and implementation of complex, multi-step workflows. By breaking the process down into smaller, independent transactions, it becomes easier to understand and manage the different steps of the process, and to identify and fix any issues that may arise. It also allows for greater flexibility and extensibility as changing the workflow or adding extra steps is easy.
It’s important to note that the Saga pattern is not a replacement for traditional transaction management techniques, such as two-phase commit or XA transactions. Instead, it is designed to be used in conjunction with these techniques, as a way of managing the transactions and data consistency of a distributed system with microservices.
When implementing the Saga pattern in a distributed system, it’s important to carefully consider the design and execution of the individual sagas, in order to ensure that they are able to execute and compensate for one another as needed. Failing to properly design and execute the sagas can lead to a complex workflow that is difficult to maintain and change, with hidden bugs that may cause data inconsistencies. It is also important to implement robust error handling and monitoring, as this can help to prevent issues from arising and ensure the smooth operation of the overall workflow. Without proper design and execution, the Saga pattern may give a false sense of having a transactional system, but if the transactions are not properly designed, it will not have the desired transactional effect.
One common challenge when implementing the Saga pattern is ensuring that the individual sagas are executed in the correct order, and that they are able to compensate for one another as needed. This may involve designing the sagas to be idempotent, so that they can be safely retried if necessary, and using techniques such as event sourcing to ensure that the state of the system is accurately reflected at all times. Another challenge is ensuring that the sagas are able to compensate for one another in a way that is consistent with the overall workflow. This may involve designing the sagas to be reversible, so that they can undo the changes made by previous sagas in the event of a failure.
Overall, the Saga pattern is a powerful tool for managing transactions and data consistency in a distributed system. By carefully designing and implementing the individual sagas, it is possible to ensure that the overall workflow is completed successfully, and to maintain data consistency and integrity in the system.