Distributed Transaction

Jolly srivastava
System Design Concepts
6 min readJan 3, 2021

In the previous article, we have studied transaction and locks.

In this article, we will study more about transactions specifically distributed transactions and locks. We will look at various use cases to see how is transaction is being taken care of in a monolith and in a microservice architecture?

Transaction in a monolith architecture

Let’s consider the “Amazon” use case. As depicted in the above sequence diagram, when a user tries to place an order, amazon's service triggers the transaction on DB. If none of the steps failed i.e if the user has sufficient balance to place the order, the quantity of the ordered item in inventory > ordered value, then finally the order gets committed into the database, else if any of these steps in the transaction get failed then the rollback occurs. This is all about the monolith.

Let’s talk about microservices now.

Transaction in a microservice architecture

Microservice architecture behaves differently from monolith architecture. Here when a customer tries to buy a product, an event is trigged to the orchestrator. Orchestrator then orchestrates the request to the customer wallet service which checks whether a customer has sufficient balance to place the order or not, post that or even parallelly orchestrator also triggers the request to the order service which checks whether the inventory has sufficient product or not. Here transaction is local, not global i.e we don't have one transaction for both of the calls. we can clearly see there is no isolation here.

What will happen if the balance is deducted from the wallet but then the order creation gets failed? Who is responsible for rollback? How to resolve this issue in microservice?

This is where distributed transaction comes in.

There are various techniques to implement distributed transactions.

one way to resolve this is to let the “customer” and “order” services to share the common DB. This will definitely resolve our issue of transaction however here we are going anti-pattern and giving birth to an architecture that is not really reliable and scalable.

Another way is to replicate the “order” and “customer” DB. But here consistency goes for the toss. if consistency is not the important factor of one’s use case, they can implement transactions in this way.

Another approach is 2- phase commit

2-phase commit

Here commit takes place in 2 phases i.e prepare and commit. As we can see in the above image, we have introduced a new component ‘coordinator’ (that can be a separate service or can be present in any microservice). This takes care of 2 phase commit.

When a user places an order, the coordinator first creates a transaction id which is given as a reference to all the other microservices that the coordinator is going to talk with. The first thing the coordinator does is prepare the phase. In this phase, the coordinator asks customer services and order microservice to prepare their states i.e check for balance if the wallet has sufficient balance to place the order, coordinator locks that particular row of customer wallet DB and same happens with the order service as well and then both service replies back to the coordinator. After receiving successful responses from both the services, the coordinator begins the 2nd phase where it asks the customer and order services to proceed with commit. Both the microservice commits and respond to the coordinator with the respective message. This is an ideal scenario.

Let’s consider an example where Alice is trying to place an order. Here customer id is 4. However, she doesn't have sufficient balance to place an order. What will happen now?

Orchestrator generators a transaction id and sends a request to customer service and order service to prepare the wallet and the order respectively. At this point in time, prepare wallet will throw an error as the customer doesn't have enough balance to place the order but the order returns a successful response. Hence, here a rollback happens and the whole transaction is aborted.

If Bob is also trying to place an order from Alice’s account i.e trying to modify the same id=4 row, then Bob’s request will wait until Alice’s transaction is completed as soon as the transaction started the respective row (i.e id=4) is being acquired at the customer and order table. Hence providing isolation property.

Also, we need a timeout here as to what if the prepare call’s response never came back to the coordinator. Timeout tells the coordinator that something is wrong with the service and it’s time for the rollback of this transaction.

Advantages:

  1. strong consistent model.

Disadvantage:

  1. It’s the coordinator who is handling all the transactions.
  2. All the resources are locked for a long time (from the prepare phase to the response of microservice from the commit phase) which increases the latency.

Problem with this approach:

  1. What happens if the co-ordinator fails?
  2. what will happen if any of the microservices fails to reply during phase -1, co-ordinator doesn't even know the state of failed microservice?
  3. what happens if a microservice fails during the commit phase?

In all these cases resources are being locked and no other can use that resource.

Due to all these reasons, 2 phase commit is not the recommended solution.

The modified version of this is 3 phase commit.

This is the extension of the two-phase commit where there is one extra step which is known as the pre-commit step. It helps us in the recovery of the coordinator or participant’s failure.

There can be one or more participants. By-election, any microservice can act as a coordinator.

First step: “Can commit?” → which checks how many participants are there and whether they can commit or not? All the participants reply with either yes or no.

Second step: Pre-commit →Place the lock.

Third step: Commit → commit and gets the ack.

How does it help in recovery?

Let’s say the coordinator dies during a transaction. Of all the participants one of the participants becomes the coordinator by-election. when this new coordinator talks with participants in the first step and if any of these participants reply with the value of “do commit” state, then this new co-ordinator begins from there and triggers the do commit call.

Drawback:

  1. synchronous calls.
  2. Increased latency.

SAGA:

This is also a widely used pattern called SAGA and this works asynchronously unlike previous patterns. We can implement this kind of pattern in the case of web sockets and long polling. How this works??

When a user places an order it comes to “order ms” which checks the inventory and adds the messages to the event bus, which then gets picked by “customer ms”. If all goes well, the whole cycle finishes. If the customer doesn't have sufficient balance then the message enters into the rollback queue. This message then gets picked up by “rollback ms” which takes care of all the rollbacks. It is faster compared to 2 phases and 3 phases. Since all the order is going through the event bus, at no point of time there are multiple requests, hence isolation is provided.

This is all about the distributed transaction. I hope you guys would have enjoyed this blog as well.

Stay tuned for more!!!

--

--