Transactions in software systems
Transaction
A simple question to start with : What is a transaction?
Simply put, a transaction is an exchange of good or service. Now when applied to a software system, a transaction would be define as an operation to be done on an entity.
An entity?
An entity is a fundamental unit of a real world thing in a software system. Ex — financial entity like money, asset entity like a property, etc.
When an operation needs to be done on an entity, it needs to be done with a set of criterion that needs to be met with, which would ensure that the entity state before, during and after the operation is deterministic.
We generally talk about transactions in software systems in the context of the transactions provided by persistence layer i.e. databases. This is not the only definition though. This statement is more of the time when relational DB and SQL DB were the only solution and software services were confined to a single monolithic services. The well known properties of relational DB are the ACID properties. The overview of ACID is as follows :
- Atomicity — Transactions either happens successfully or fails. There is no other states in between.
- Consistency — This refers to the integrity of the data so that database is consistent before and after a transaction.
- Isolation — Each transaction executes independently. Multiple transactions occurring do not interfere with each other.
- Durability — Once a transaction is successfully completed, it stays in the system even if there is a system failure.
Some well known criterion about a transaction I find interesting :
- For a transaction to be completed and database changes to made permanent, a transaction has to be completed in its entirety.
- When a transaction completes successfully, database changes are said to be committed; when a transaction does not complete, changes are rolled back.
- There are different levels of ACID properties that can be configured mainly for consistenct and isolation.
- Transactions can have multistep process for effectively managing them.
Monolithic system
When your service is a monolithic service using a SQL database, you can pretty much assume the ACID guarantees from the DB. The transaction management can be handled by your code using the DB transaction management and hence your entity states are managed.
The Biggest advantage for Monolithic application in transaction management is a single and common database server. Transactions can be initiated at the database level and can be committed or rolled back based on the final outcome of the transaction.
Before we discuss about transactions across distributed boundaries, lets first understand how transactions happen theoretically in SQL servers. SQL Server can operate 3 different transactions modes and these are:
- Autocommit Transaction mode is the default transaction for the SQL Server. In this mode, each T-SQL statement is evaluated as a transaction and they are committed or rolled back according to their results. The successful statements are committed and the failed statements are rolled back immediately
- Implicit transaction mode enables to SQL Server to start an implicit transaction for every DML statement but we need to use the commit or rolled back commands explicitly at the end of the statements
- Explicit transaction mode provides to define a transaction exactly with the starting and ending points of the transaction
Distributed Systems
But when you work in a distributed setup with multiple microservices having their own set of domain and bounded contexts. Microservices guidelines strongly recommend you to use the Single Repository Principle(SRP), which means each microservice maintains its own database and no other service should access the other service’s database directly. The entity and its shared or related data now spans over multiple microservices.
Distributed Transaction Approaches in microservices
- Two-Phase Commit Protocol
- Eventual Consistency and Compensation
2PC — 2 phase commit
A two-phase commit is a standardized protocol that ensures that a atomic transaction commit across multiple nodes is implementing in the situation where a commit operation must be broken into two separate parts. Saving data changes is known as a commit and undoing changes is known as a rollback. There are two phases in the algorithm — prepare and commit. The algorithm comprises of a coordinating node which initiates the transaction :
- Prepare : All the participants of a transaction in this phase will be prepared for the commit and inform the transaction coordinator/message broker that they are ready for completing the transaction
- Commit or Rollback : In this phase, transaction coordinator will issue one of the commands they are a commit or a rollback to all the participants.
Both the phases can be achieved comparatively easily using transaction logging when a single server is involved, but when the data is spread across geographically-diverse servers in distributed computing (i.e., each server being an independent entity with separate log records), the process can become more tricky.
One really interesting use case is to implement exactly-once message processing using 2PC. Please read this article: “An Overview of End-to-End Exactly-Once Processing in Apache Flink” if you are interested why two phase commit algorithm must be used to achieve the exactly once semantic.
Eventual Consistency and Compensation
This model of distributed transaction arises from the concept of relaxing C in CAP theorem. Compromising on consistency and trusting the dependent domains to take the right steps based on the transaction nature. The impact of this is that the data across nodes or domains would be updated but with some delay essentially eliminating the guarantee of the freshness of data. We need to make sure that system should be eventually consistent at some point of time in the future. This model doesn’t force to use ACID transactions across microservice but forces some mechanism for ensuring for consistency.
Each service which involved in the transaction should be responsible to update the user with the proper status of the transaction even if next consecutive services are failed to respond and should handle them whenever services are up and make sure all the scheduled transactions are completed and data in the system is consistent.
Saga Pattern
Coming to microservices, 2PC is not an option. We employ something called Saga pattern. Saga pattern defines how interaction between different microservices needs to be carried out for transaction propagation and transactional reversal. The two types of Saga pattern are choreography and orchestration pattern.
In a saga pattern. we implement each business transaction that spans multiple services. A saga is a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.
For understanding Saga pattern better, we will assume the case of an
1. Order service
2. Customer service
3. Fulfillment service.
Choreography
Choreography pattern says that each pair of interacting microservices is responsible for its transaction commit or rollback and contracts.
Orchestration
Orchestration pattern has a orchestrator or coordinating service which tells the involved services what is to be done and the order in which it is to be done.
Benefits
- It enables an application to maintain data consistency across multiple services without using distributed transactions
Drawbacks
- The programming model is more complex.There needs to be compensating transactions that explicitly undo changes made earlier in a saga.
The post tries to help people gain better understanding of transactions in software systems addressing both the architectural styles. The complexity is very different in both the architectures. The approaches to manage Transaction propagation and reversal with details about the pitfalls and sample examples. The following are the non functional aspects of the microservices architecture which make distributed transactions way more complex :
1. The multiple point of failures in the transaction path.
2. Multiple modes of inter-service communication.
3. Availability of systems in the transaction path.
4. Reliability and Resilience of the transaction path.
5. Scalability across all points in the transaction path.
In the next post, I will be discussing about consistency management in distributed systems. As part of this, I will be discussing about some famous consensus protocols like — paxos and gossip, which are used in some famous distributed systems like kafka, cassandra db.