Saga Pattern in Microservice

An Architectural Pattern for Manage a Microservice Transaction and Data Consistency

Published in

Ralali Tech Stories

7 min readJun 2, 2021

Data is the foundation of all software success stories. The software’s success or failure is determined by its dependability and effectiveness in maintaining data consistency. Levels of data integrity can easily be maintained in a monolithic architecture. It can range from a simple to a difficult effort. Even in the most complex monolithic systems, we can easily handle and maintain data consistency.

A two-phase commit (2pc) is a standard protocol that ensures database commits are implemented when the commit operation must be divided into two parts. 2pc is always used in a monolith architecture for handling data consistency. This method requires that all processes that occur in two or more tables, such as insert, update, delete, or any combination of the three, commit or roll back simultaneously. In essence, the data must be processed in its entirety or not at all. This two-phase commit is already supported by all major programming languages.

The story will be different when we face a system that has implemented a microservice or a distributed system. Maintaining consistency of data is very difficult for this situation. This is because the database can be spread across several services that are separate from each other.

Because the tables are in different databases and only one database is managed by one service, the two-phase commit concept cannot be used. Saga Pattern is a concept that can answer data consistency on microservices.

Saga Pattern itself has been published for a long time since 1987 by Hector Garcia-Molina Kenneth Salem and right now being used in many microservices architecture. The Saga pattern helps to support a long-running transaction that can be broken up into a collection of sub-transactions that can be interleaved any way with other transactions.

Suppose the architecture of the e-commerce system is as shown below.

There is a minimum of four services involved in an order process. Every service has its database and every data that related to each other must be consistent.

Another example is the order process in a hotel or food market management system. The order process that occurs can be described as follows.

In the process referred to above, the order process will search for data from the consumer table through the consumer service and then, at the same time, write the data to the kitchen and accounting tables through their respective services. In the case of microservices or monoliths, the handling of data consistency as described above is very difficult. This is because the tables are in a separate database.

The approach to the distribution of transactions, or better known as global transactions, cannot function properly under conditions where we use modern non-transactional databases on a global scale such as MongoDB or Cassandra. Note that some types of NoSQL already support two-phase commits, one example is MongoDB itself.

Meanwhile, the transaction type database or RDBMS and its tables are separate, maintaining data consistency can still be achieved through global transactions. This can be done because the type of database supports x-transactions.

Another problem with the distribution of transactions is that the process is carried out synchronously so that if the application is hit by a large number of users, the synchronous process will reduce the availability and performance of the application. Thus, the distribution of the traction is still not the best solution.

Data Transactional with Saga Pattern

Saga is a local transaction sequence where each transaction runs on the local system of each service. The first transaction is initiated by an external request that corresponds to the operation of the system and then each subsequent step is triggered by the completion of the previous process.

If we apply it to the e-commerce case above, the order process that has applied the saga pattern will be like the following picture.

The saga pattern can be implemented in a variety of ways. However, event choreography and command orchestration are the most commonly used. For the time being, we’ll focus on the simplest method of use.

Events or Choreography

The main feature of this saga pattern implementation is that the process does not have a center. Each service creates and listens for events before deciding whether or not to take action. If the last service involved in the saga makes a local transaction and no publication events, the saga pattern will end or finish.

If we describe this choreography process we can see it as shown below.

Order Service stores new order data and determines the state or status of order as pending and simultaneously publishes an event with a name, for example, ORDER_CREATED.
Payment Service listens to the ORDER_CREATED event, debits the client, and publishes the BILLED_ORDER event. In this process, the trigger can also be brought in from the payment process made by the client.
The Stock Service listens to the BILLED_ORDER event, updates the stock, prepares for the delivery of the purchased product, and publishes the ORDER_PREPARED event.
Delivery Service listens to the ORDER_PREPARED event to pick up and deliver the product. When finished, he will publish ORDER_DELIVERED.
Finally, the Order Service listens to the ORDER_DELIVERED event and determines the state of the order being executed or in other words the entire order is FINISH.

In the case above, if we want to track all the processes that have occurred, it can be done easily, namely, the Order Service will listen to all events and make updates of any state or status that occurs.

Rollback System on Choreography Implementation

Rollback in the saga pattern is accomplished through this event or choreography by publishing an event whose purpose is to restore the processes carried out by previous processes. As a result, this rollback process is not free. When the rollback occurs, we must write additional code.

For example, when we update the stock data and it turns out that the stock availability has run out, the rollback process that occurs can be explained as follows.

Stock Service published the PRODUCT_OUT_OF_STOCK event
The two previous processes, Order and Payment will hear the PRODUCT_OUT_OF_STOCK event and carry out the process:

Refund or refund client
The status of Order Service is set as ORDER_FAILED

Advantages and Disadvantages of the Event or Choreography

This event choreography is a natural approach to implementing the saga pattern. The method is simple and easy to understand. All services involved in transactions are very independent of one another. If our transactions are only 2 or up to 5 stages, this method feels like the most suitable way to use.

However, if our transactions have many stages and the possibility of the future process for changes is very large, the transaction process we have will be difficult for us to monitor our way of doing the test will be very difficult, we must maintain all running services so that the test process can be run properly.

For the above conditions, the saga pattern with Command or Orchestration implementation is an approach that can be used to solve it.

Saga Pattern on Big Agent Microservice Architecture

Until now, Big Agent already has about eight backend services that have implemented the microservice architecture. All services communicate with each other asynchronously and non-blocking via Kafka and direct access to REST API.

To maintain data consistency between services, due to the distribution of transactions between services, we have successfully implemented the Saga Pattern so that data consistency between services can be guaranteed to be 100% the same and we can also do the atomicity, consistency, isolation, durability (ACID) concept correctly.

Another advantage of implementing the Saga Pattern is the significantly improved response time of all of our critical APIs. Initially, in Big Agent when using a monolith or a microservice but not using the saga pattern, the average response time for the service was greater than 250 ms. This causes the level of availability of the system to decrease.

Following the implementation of saga pattern, asynchronous, and non-blocking processing, our average speed for each API was less than 50 ms. For example, in our new digital good integrator service, our digital good buy process takes only 20 ms on average.

Finally, with this performance increase, it can improve the user experience of the big agent users.

Closing

We are currently looking for Senior BE, FE, ME, and QE candidates. If you want to learn more about this and work in a good cultural environment with globally stack technology, please contact me at deni.rizal@ralali.com or careers@ralali.com.