The Problem of Distributed Transactions in the Context of Microservice Architecture

Published in

Pharos Production

4 min readAug 23, 2024

As is known, the transition from a monolith to a microservice architecture causes some difficulties related to both the technical part of the project and the human factor. One of the most challenging technical problems is ensuring consistency in a distributed system.

Consistency

A rather subtle point is that consistency in the context of distributed systems differs from consistency in the context of databases. Further, by consistency, we will mean the former: an unfinished (erroneous) operation does not bring any effects and does not change the data. With concurrent access to data, all operations are considered atomic (it is impossible to see the intermediate result of the operation); if the data has several copies (replication), then the sequence of applying operations on all copies is the same. We want to get an ACID transaction, but only a distributed one.

The cause of the problem

Why is it challenging to ensure consistency in a microservice architecture? The fact is that this architectural style often involves the use of the database per service pattern. Let me remind you that this pattern is that each microservice has its independent database or databases (databases, because in addition to the primary data source, a cache can be used, for example). This approach allows, on the one hand, not to add implicit data format connections between microservices (microservices interact only explicitly through the API) and, on the other hand, to make the most of such advantage of microservice architecture as technology agnostic (we can choose a data storage technology suitable for a particular load on a microservice). But with all this, we still need to guarantee data consistency. Judge for yourself: the monolith communicated with one large database, which provided the ability to ensure ACID transactions. Now, there are many databases, and instead of one large ACID transaction, we have many small ACID transactions. Our task will be to combine all these transactions into one distributed one.

Optimistic consistency

The first thing that might come to mind is the concept of optimistic consistency: we make as many transactions as we want on as many storage engines as we need. We expect everything to be okay; if everything is bad, we say everything will be okay. If everything is bad, we say: “Yes, this happens, but with extremely low probability.” Seriously, neglecting to ensure consistency if it is non-critical for the business is a good decision, especially considering how much effort it will cost us to ensure it (which, as I hope you will see a little later).

Consistency options

If consistency is critical for business, you can ensure it in several ways. If we are talking about a situation where data is updated by one service (for example, database replication), you can use standard consistency algorithms such as Paxos or Raft. Such transactions are called homogeneous. If data is updated by several services (that is, there is a heterogeneous transaction), then this is where the difficulties we talked about above begin.

On the one hand, we can still bypass the need to ensure a distributed transaction by striving for a service-based architecture (we combine services in such a way that the transaction is homogeneous). This solution could be more canonical regarding microservice architecture principles, but it is technically much more straightforward, which is why it is often used in practice. On the other hand, we can leave canonical microservices but simultaneously use one of the mechanisms for ensuring distributed transactions: a two-phase commit or saga. In this article, we will study the first option and discuss the second one next time.

Two-phase commit

The mechanism is extremely simple: a transaction manager orchestrates the transaction. In the first stage (prepare), the transaction manager issues a corresponding command to the resource managers, according to which they write the data to be committed to their logs. Having received confirmation from all resource managers about the successful completion of the first stage, the transaction manager starts the second stage and issues the following command (commit), according to which the resource managers apply the previously accepted changes. Despite the apparent simplicity, this approach has several disadvantages. The entire transaction must be canceled if at least one resource manager fails at the second stage. Thus, one of the principles of microservice architecture is violated — failure tolerance (when we came to a distributed system, we immediately assumed that failure in it is the norm and not an exceptional situation). If there are many failures (and there will be many), the process of canceling transactions will need to be automated (including writing transactions that roll back transactions). Secondly, the transaction manager itself is a single point of failure. It must be able to issue transaction IDs to transactions. Thirdly, since the storage is given special commands, it is logical to assume that the storage must be able to do this, that is, comply with the XA standard. Not all modern technologies comply with it (brokers such as Kafka, RabbitMQ, and NoSQL solutions such as MongoDB and Cassandra do not support two-phase commits).

The conclusion from all these factors was perfectly formulated by Chris Richardson: “2PC not an option”.

Feel free to drop a “Hi” at Pharos Production, where we bring software to life! 👋✨

https://pharosproduction.com

“Join our exciting journey with Ludo — the reputation system of the Web3 world! 🌍✨”

https://ludo.com