Saga Pattern for Orchestrate Distributed Transactions using AWS Step Functions
In this article, we are going to implement Saga Pattern for Orchestrate Distributed Transactions using AWS Step Functions.
By the end of the article, we will develop Hands-on Labs: Saga Pattern for Orchestrate Distributed Transactions using AWS Step Functions and we will see how to Orchestrate Microservices with Saga Pattern — Orchestrate way.
I have just published a new course — AWS Lambda & Serverless — Developer Guide with Hands-on Labs.
Introduction — Saga Pattern for Orchestrate Distributed Transactions using AWS Step Functions
Lets start with fundamentals of this section topics with remembering microservices and communications. As you know that; Microservices are organized around business capabilities with a separate database per microservices. Microservices are loosely coupled and they use asynchronous messaging to communicate with each other. Microservices are independently deployable.
Lets think microservices on e-commerce domain and analysis Order Fullfilment Process.
- Customer place an order using frontend channels and API Gateway invoke Order microservices and Order Fulfillment Process is started
- Order is going to check if the item is available in the Inventory microservices
- If Inventory is available, deduct item in Inventory and it goes to Payment microservices
- If Payment is successfully, It starts to Delivery microservices when deliver is done, order status will be completed.
- If one of this steps are not successfully, it should rollback the steps and return back to previous status.
How we can setup this workflow in microservices architectures ? Of course with using SAGA Pattern. So now its good to understand What is SAGA Pattern ?
Saga Pattern for Distributed Transactions
The saga pattern is provide to manage data consistency across microservices in distributed transaction cases.
Basically, saga patterns offers to create a set of transactions that update microservices sequentially, and publish events to trigger the next transaction for the next microservices. If one of the step is failed, than saga patterns trigger to rollback transactions which is basically do reverse operations with publishing rollback events to previous microservices.
By this way it is manage Distributed Transactions across microservices. As you know that its used some principles inside of the Saga pattern like publish/subscribe pattern with brokers or api composition patterns. The saga pattern provides transaction management with using a sequence of local transactions of microservices. Every microservices has its own database and it can able to manage local transaction in atomic way with strict consistency.
So saga pattern grouping these local transactions and sequentially invoking one by one. Each local transaction updates the database and publishes an event to trigger the next local transaction. If one of the step is failed, than saga patterns trigger to rollback transactions that are a set of compensating transactions that rollback the changes on previous microservices and restore data consistency.
So Transaction management in really hard when it comes to microservices architectures. In order to implementing transactions between several microservices and maintaining data consistency, we should follow the SAGA pattern. Saga pattern has two different approaches:
- Choreography — when exchanging events without points of control
- Orchestration — when you have centralized controllers
Saga Pattern — Choreography and Orchestration
There are two type of saga implementation ways, These are “Choreography” and “Orchestration”. Let me explain Choreography way of Saga pattern.
Choreography Saga Pattern
Choreography provides to coordinate sagas with applying publish-subscribe princioles. With choreography, each microservices run its own local transaction and publishes events to message broker system and that trigger local transactions in other microservices.
This way is good for simple workflows if they don’t require too much microservices transaction steps. But if Saga Workflow steps increase, then it can become confusing and hard to manage transaction between saga microservices. Also Choreography way decouple direct dependency of microservices when managing transactions.
Orchestration Saga Pattern
Another Saga way is Orchestration. Orchestration provides to coordinate sagas with a centralized controller microservice. This centralized controller microservice, orchestrate the saga workflow and invoke to execute local microservices transactions in sequentially.
The orchestrator microservices execute saga transaction and manage them in centralized way and if one of the step is failed, then executes rollback steps with compensating transactions.
Orchestration way is good for complex workflows which includes lots of steps. But this makes single point-of-failure with centralized controller microservices and need implementation of complex steps.
SAGA Fail Scenario
Image shows a failed transaction with the Saga pattern.
The Update Inventory operation has failed in the Inventory microservice. So when it failed to one step, The Saga invokes a set of compensating transactions to rollback the inventory operations, cancel the payment and the order, and return the data for each microservice back to a consistent state.
We should careful about when using saga pattern in distributed microservices architecture. If our use case required data consistency across several microservices, and required rollback when one of the step is failed, than we should use Saga pattern.
AWS Step Functions — Orchestrate Distributed Transactions with Saga Pattern
We are going to learn how AWS Step Functions — Orchestrate Distributed Transactions with Saga Pattern. When we place and order, the order fulfillment process is starting. So think that we have 2 microservices
Order and Fulfillment. When place an order, Order microservice notify Fulfillment microservice to start fulfillment processes. We call this communication as a Event Notification Pattern.
But Order fulfillment process has several steps, for example if the payment status is still pending when the order arrives to fulfillment, there would be a problem or it could be a short delay in the payment status update in the database.
This problem can be solve with WAIT state in AWS Step functions. Event Notification is fire-forget Pattern, it emits an event and execute the process but doesn’t allow the users to control the timing.
So how we can manage delays when using event-driven microservices architectures ?
This can very easily solve with AWS Step Functions. AWS Step Functions has WAIT state when the source services emits an event, the workflow waits for 10 second in this case and then starts the fulfillment processing steps.
As you can see that, with Step Functions we can get more control over the service interactions by creating state machines and workflows on AWS Step Functions.
Event-driven Microservices Architecture on E-commerce application
Microservices usually communicate through events with async communication to indicate the process changes. So in our e-commerce case:
- Customer place an order in our e-commerce application
- API Gateway redirect to request to order microservice
- Order microservice get processed and forward to Inventory service and then Payment Service to provide fulfillment processes.
This is the happy path of place_order use case in e-commerce domain. There are no failures on the workflow and order get placed successfully.
So this is an example of distributed transaction with polyglot persistence. The order transaction data get stored across different databases and each service writes its own databases.
What if we faced a network failure and the payment gateway has time-out ?
So the last step of this place_order workflow has failed. At the time of failure the order database and the inventory databases are already updated. The order status is set the FAIL in order to indicate failure to the downstream services. But if we look at our order data, it is now inconsistent and need to rollback updates. Order status is set as ORDERED, Inventory has been decremented, so there is no way to correct data and there is no option to retry the processes that have failed.
So how we can fix this problem in our workflow ?
If we design monolithic application with relational database, we can provide transactional atomicity, consistency with foreign keys. If a transaction fails in a relational database, the transaction rollbacked.
But when it comes to distributed transactions in microservices architectures, we can’t use database commits and transactions due to transaction data is distributed across various databases.
In this case, the solution is use the SAGA Pattern. With using SAGA Pattern,
If any transaction fails in the workflow the SAGA executes a series of compensating steps that rollback the changes that were made by the preceding transactions.
So if we think our main use case — place_order case in e-commerce application:
- Customer place an order when item was low in stock
- Another customer’s order gets fulfilled in the meantime and it goes out of stock after the order has been placed.
As we all know that this can happen when multiple customers are trying to purchase the same item at the same times.
In this case:
- The update Inventory step will fail in the workflow
- The orchestrator will now execute the compensatory steps
- It will run rollback to inventory, remove order steps and return the status as failed.
- These compensatory steps are provide to ensure the data integrity and consistency is maintained. The inventory reverts to original levels and the orders reverted back.
As you can see that, every action that makes a change to the databases there is an opposite action to compensates the changes in case of a failure. So a failure that happens in the last step, payment service in this case, this will trigger all the compensatory transactions to be executed before returning a failed state.
We can implement this SAGA pattern flows with using AWS Step Function workflows. The individual steps invoke lambda functions to perform the task, the step transitions are defined in the Amazon State Language. So its very easy to set up and configure the Compensatory Transactions in case of failures.
We will use SAGA Orchestration to implement e-commerce place order use case for happy and failure states.
- Create API Gateway to trigger to AWS Step functions when place-order request comes from customer
- Saga orchestration pattern implementation with AWS Step Functions
- Success & Failure paths in a distributed transactions
- Restoring data consistency in Amazon DynamoDB database
Design AWS Step Functions State Machine for Place Order Use Case
We are going to Design AWS Step Functions State Machine for Place Order Use Case.
- Open console — Step functions — Create State Machine — Design visually
- Design workflow
As you can see that I have designed the workflow with implementing SAGA pattern, but we don’t develop our Order, Inventory and Payment microservices. You can take this an assignment that will connect their DynamoDB databases and perform required actions to complete this flow.
For now I am not going to proceed this flow to create state machine,
before that we should develop this Order, Inventory and Payment microservices lambda functions with rollback methods. To see full developments of this hands-on lab, you can check below course on Udemy.
Step by Step Design AWS Architectures w/ Course
I have just published a new course — AWS Lambda & Serverless — Developer Guide with Hands-on Labs.
In this course, we will learn almost all the AWS Serverless Services with all aspects. We are going to build serverless applications with using AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon Cognito, Amazon S3, Amazon SNS, Amazon SQS, Amazon EventBridge, AWS Step Functions, DynamoDB and Kinesis Streams. This course will be 100% hands-on, and you will be developing a real-world application with hands-on labs together and step by step.
Get the Source Code from Serverless Microservices GitHub — Clone or fork this repository, if you like don’t forget the star. If you find or ask anything you can directly open issue on repository.