Problem Solving Strategies for Microservice Architecture Part I

Distributed Management Problem

Metin Barkın Narin

7 min readAug 4, 2022

What is the problem?

Distributed transaction management problem.

How to handle a transaction across multiple services?

Context

A microservice is a distributed system. A transaction is distributed to multiple services that are called sequentially or parallelly to complete the entire transaction. With a microservices architecture, the most common pattern is database per microservice, so transactions also need to span across different databases.

With the advent of microservices architecture, there are two key problems with respect to distributed transaction management:

Sub Problems

How to maintain a transaction’s atomicity. Atomicity implies that all of the steps in the transaction must be successful or if a step fails, then all of the previously completed steps should be rolled back. However, in a microservices architecture, a transaction can consist of multiple local transactions handled by different microservices. Therefore, if one of the local transactions fails, how can you roll back the successful transactions that were previously completed?
How to manage the transaction isolation level for concurrent requests. The transaction isolation level specifies the amount of data that is visible to a statement in a transaction, specifically when the same data source is accessed by multiple service calls simultaneously. If an object from any one of the microservices is persisted in the database while another request reads the same object at the same time, should the service return the old data or new?

How to solve this?

It is crucial to address these two problems while designing microservices-based applications. Below are the two approaches to address these problems:

Two-phase commit (2PC)
Saga

Two-phase commit (2PC)

Two-phase commit is a well-known pattern in database systems. This pattern can also be used for microservices to implement distributed transactions. In a two-phase commit, there is a controlling node that houses most of the logic and participating nodes (microservices) on which the actions are performed. It works in two phases:

Prepare phase (Phase 1): The controlling node asks all of the participating nodes if they are ready to commit. The participating nodes respond with yes or no.
Commit phase (Phase 2): If all of the nodes replied in the affirmative, then the controlling node asks them to commit. Even if one node replies in the negative, the controlling node asks them to roll back.

Even though 2PC can help provide transaction management in a distributed system, it also becomes the single point of failure as the onus of a transaction falls onto the coordinator. With the number of phases, the overall performance is also impacted. Because of the chattiness of the coordinator, the whole system is bound by the slowest resources since any ready node has to wait for confirmation from a slower node. Also, typical implementations of such a coordinator are synchronous in nature, which can lead to a reduced throughput in the future. 2PC still has the following shortcomings:

If one microservice becomes unavailable in the commit phase, there is no mechanism to roll back the other transaction.
Other services must wait until the slowest service finishes its confirmation. The resources used by the services are locked until the whole transaction is complete.
Due to their dependence on the transaction coordinator, two-phase commits are slow by design. This can cause scalability issues, particularly in a microservices-based application and in a roll-back scenario involving many services.

Saga

A Saga, as described by Hector Garcia-Molina and Kenneth Salem their 1987 Association for Computing Machinery article, is a sequence of operations performing a specific unit of work and are generally interleaved with each other. Every operation that is part of the Saga can be rolled back by a compensating action. The Saga guarantees that either all operations complete successfully or the corresponding compensation actions are run for all executed operations to roll back any work previously done.

A compensating action must be idempotent and must have the capability to be retried until it is executed successfully, essentially making it an action that just cannot fail and no manual intervention is required to solve its failure. The Saga Execution Coordinator (SEC) provides that guarantee and capability to the overall flow, making it a transaction that is either successful or aborted successfully with necessary rollbacks.

How does the Saga pattern help in a distributed transaction scenario?

Microservices introduced another set of problems for managing transactions, as each of the domain-driven services is deployed individually and running in isolation. With a microservices architecture, a single business process brings multiple microservices together to provide an overall solution. It is very difficult to implement ACID (Atomicity, Consistency, Isolation, Durability) transactions using a microservices architecture and it’s impossible in some cases. For example, in the aforementioned e-commerce example, a microservice with the coupon functionality can’t acquire a lock on the payment database, since it is an external service in most cases. But some form of transaction management is still required, so these transactions are referred to as BASE transactions: Basic Availability, Soft state, and Eventual consistency. Compensating actions must be taken to revert anything that occurred as part of the transaction.

This is where the Saga pattern fits perfectly, as it helps to:

Maintain data consistency across multiple microservices without tight coupling.
Perform better compared to 2PC.
Offer no single point of failure.
Keep the overall state of the transaction eventually consistent

Different ways to implement the Saga pattern

There are two logical ways to implement the Saga pattern: choreography and orchestration.

Choreography

In the Saga choreography pattern, each individual microservice that is part of a process publishes an event that is picked up by the successive microservice. You must make decisions early in the microservice development lifecycle to understand if it will be part of a Saga pattern or not, since you must choose an appropriate framework that will help implement this pattern. To adopt a particular framework code, the microservice must be decorated with annotations, class initializations, or other configuration changes. In the Saga choreography pattern, the SEC can be embedded within the microservice or in most of the scenarios is a standalone component

Whenever a service comes up, it registers itself with the SEC which makes it available to be part of a transaction that may span various microservices. The SEC maintains the sequence of events in its log, which helps it make a decision about the compensating services to call in case of failure and the sequence.

Choreography implementation is preferred when the number of microservices that will participate in the distributed transaction is between 2 and 4. In the case of more than 4 services, it is more appropriate to apply the Orchestration implementation, which we will examine in the rest of our article.
The Saga choreography pattern is ideal when you start your microservices journey (essentially a greenfield development) and understand that it is necessary to introduce process microservices in due course.

DISADVANTAGES OF CHOREOGRAPHY

It becomes difficult to keep track of which service is listening on which queue. Adding new services can be difficult and confusing.
There is a risk of circular dependency between services as they consume each other’s queues.
Integration testing is difficult as all services must be running to simulate a process.

Orchestration

As the name of the Saga orchestration pattern suggests, there is a single orchestrator component that is responsible for managing the overall process flow. If the process encounters an error while calling any individual microservice, then it is responsible for calling the compensating service too. The orchestrator helps model the Saga flow but also relies on the underlying framework to call the services in a sequence and make compensating calls if any of the services fail.

ADVANTAGES OF ORCHESTRATION

It is ideal for complex workflows where many services are present and services are added over time. Here, the lower threshold is 4 services. If there are more than 4 services, orchestration implementation is preferred
It provides central control over each service and its activities.
Since orchestration implementation is unilaterally dependent on Saga participants (services), there are no circular dependencies.
Not every service needs to know anything about the other service! Thus, there is the Separation of Concerns.
It is easier to implement and test than choreography implementation.
Since the work done will remain linear, undo or compensation management is easier.

The only disadvantage of orchestration implementation is that the entire workflow is managed by Saga Orchestrator.

As a Result

As can be seen, the complexity increases as the number of services increases, regardless of which implementation is used in Saga, be it choreography or orchestration.
Debugging in Saga is quite a difficult operation.
Any changes made in the local databases of the services in Saga cannot be undone!
Lost Updates can occur if one Saga processes data without reading the changes made by another Saga.
Or, Dirty Reads can occur when a service reads the updates made by another service before completing the updates.

The resulting distirubuted transaction problem is solved, but its implementation must be done carefully.

You can review our open source sample saga project that I prepared with my friend.

Thanks for Reading…