Dismantling the distributed monolith — Our microservices journey from orchestration to choreography
A service is a software functionality with a purpose that different clients can use ~ Wikipedia
Background
Years ago, as we set out to build a Billing & Finance system for the company’s foray into Eurasia, we chose tools and technologies that the team was already familiar with.
At the time, we had built several shared microservices using Spring Boot and the Netflix OSS stack — services to send out notifications, securely store documents, etc. These services were consumed by different client applications thereby eliminating duplication of code and effort. They were a huge hit within the company.
Also, at the time, we had several services that were accessed by partners and clients outside the company. We used WSO2 as an API gateway for these externally available services. WSO2 also provides, in their own words, “a low code approach to microservices integration” via their enterprise integration capability.
These existing tools and the fact that we were required to build this system in under ten months, shaped our views on how the new Billing & Finance system should be architected.
Version 1 — Orchestration
Let’s say, one of the clients drops an event in the queue indicating that an order has been placed. The billing & finance system should record the charges, generate an invoice and also make entries in the general ledger. We used WSO2 to call different services for each task (i.e. orchestrate). In case any of these tasks failed, the WSO2 orchestrator would be responsible for a “rollback”.
Each microservice was a Spring Boot web application providing REST services. They were invoked via an HTTP request and they provided a response synchronously over the same HTTP connection.
Pros and Cons
The primary advantage of this architecture was the fact that it made use of tools and technology already very familiar to the team.
Another advantage was that the orchestrator made it possible to maintain data consistency in the face of runtime faults. Each transaction in a microservice has a corresponding reverse (a.k.a. compensating) transaction. Thus, if the WSO2 orchestrator encounters an error when calling the nth microservice, it can perform a “rollback” by calling the reverse transactions on the first n-1 microservices.
The primary disadvantage with this architecture was inter-team dependency due to tight coupling. One of the goals of a distributed architecture is to allow rapid parallel development. However, orchestration-based microservices architectures result in a distributed monolith. In a lot of ways, they suffer from the same problems of inter-dependencies as a monolith. Practically every change requires multiple teams to coordinate on the releases.
Fragility was the next prominent disadvantage of this setup. If a single service were to go down, or even perform poorly for a short time, the entire end-to-end process would fail.
Moreover, performance issues had to be tackled with utmost priority as this synchronous request-response setup had no tolerance for slow-running transactions.
Version 2 — Choreography
Rice doesn't grow on infertile soil ~ Old Chinese proverb
Fortunately for us, we got another shot at building the same system — this time, to replace the company’s existing legacy Billing & Finance platform. Over the course of the year, we had moved away from the Netflix OSS stack and had embraced Kubernetes for microservices. Kafka was being used within the company for data pipelines and had become part of the company's DNA.
Events as first-class citizens
In version 1, our microservices were not event-aware. They were simple REST web services invoked synchronously via the WSO2 orchestrator. This resulted in very tight coupling between the services. In the new architecture, it was important to loosen the coupling so teams could be more independent and development could move faster.
We had to rebuild our microservices to listen to Kafka topics for events and publish some of their own. We had to redraw the service boundaries to ensure that transactions were not split across microservices. And we had to dismantle the orchestrator.
In version 1, our microservices were request-response-based REST web services. In version 2, they were event-driven components.