Dismantling the Monolith — How Microservices works

Kevin Lanthier

Published in

The Startup

15 min readNov 27, 2019

We’ve all heard it at some point.

Microservices

For some of you, that means docker. For others, it means containers. Some other developers might think they are deploying Microservices because they have multiple images/containers/instances of their software running.

The reality is, deploying your system with multiple images does not make it a service-oriented architecture software. Yes, you might have taken your old software from a 1-instance beast to a couple of scalable instances (We’ll come back to this one), but that does not mean you have made it a microservice if your architecture is still the same.

The goal of this article will be to explain the multiple concepts of the microservices architecture with the help of well-defined images that compares how a microservices architecture function versus how the same architecture would be done in the more conventional Monolithic approach.

The Goals of Microservices

Before we address what Microservices tries to achieve and how they do it, let’s take a step back and look at what you most likely do right now: A Monolith-like Architectured application.

Usually, in a monolithic architecture, we will find the system to rely heavily on one final datastore.

It is usually done that way to ensure strong consistency. We are aware that our processing servers (application servers) can be potentially scaled with some adjustments to process more data all the while ensuring that whichever server process that data, ALWAYS return the same response.

Monolith applications will normally use database joins and or would make several requests/transactions to the datastore to get the requested information.

The datastore would be what we call, the single source of truth of the application, therefore we can rely on the fact that the database will always give us the proper information whenever we request data from it.

How are Microservices any different from that?

Microservices operate in a very similar approach, but at a different scale. They want to ensure you create single-responsibility autonomous and reactive micro-monoliths.

You will notice from that comparison image above, that all we did is split what was initially our whole application (containing tables for Users, Orders, Shipping, etc…) into smaller self-contained micro-applications.

Splitting our Monolith into several microservices has a couple of interesting effects:

➕ Contention reduced in the database (If one of the databases is locked in one of the microservice to perform some transaction, then only a part of the application is locked, not the whole application)

➕ Availability increased

➕ Failures isolated

➖ Now more complicated to perform queries that would normally be done with multiple joins inside a SQL query (You need to request each micro-service for the information you want)

➖ More difficult to monitor our infrastructure (need multiple watchers)

➖ More complex deployment process required to effectively update each singular microservice

There are many great benefits you gain from implementing a microservice architecture, however, it comes with certain trade-offs. Micro-services add some extra layer of complexity and that has it’s own cost. If you work on a smaller project, it may still be a good idea to stick with the simpler Monolith approach.

This is a slight introduction to a very broad topic. I will explain in this article how Microservices can help you gain Availability, better User Experience, Performance, Theoretical Infinite Scalability, and Cost efficiency.

1. Availability and User experience

When you chose to go down the path of Microservices, then you have to accept something immediately:

Your backend will no longer work with Synchronous requests. The very existence of how Microservices work forces you to treat requests in an Asynchronous way since you might never know who, when or how your request has proceeded.

That forces us to create our applications differently. Any request sent to the backend must be asynchronously waiting for an answer… or might never get an answer (depends on the chosen implementation). You must design your User Interface to properly handle this kind of system, which we call Reactive systems.

Now speaking of Reactive, your backend microservices should also be built following that pattern. When they are calling each other to gather some information to perform the desired query, they, just like your UI, must be able to be Reactive about the fact that some queries may take time, some might never work out, etc…

You should always build/treat all requests as Asynchronous by default. If you must ABSOLUTELY perform a synchronous task (which will block your whole system), then know that this is a potential spot for contention and system slowdown.

The impact of availability

Below you will find a graphic that displays the impact of a failure on one of our databases. Say we have some partition issue as an example and our database had to go down.

Monolith vs Microservices : Availability

The Monolith usually depends on a final source of truth, which helps with consistency but comes at a cost on the availability spectrum. In this case, since we can no longer query the database to perform all our requests, the whole system becomes unavailable.

Microservices on their end, they all deal with requests on their terms. In this example, the Order Microservice is down (for the same reasons as the Monolith) except the system is still fully capable of answering User & Shipping requests — So the end user is prevented from some capabilities, but not all of them.

The microservice architecture gives you the ability to not only keep your system alive in times of failures but also it allows you to isolate that failure by only preventing a small subsection of your system to function.

This is especially useful in our recent ages when user expectations are at the highest in terms of system availability. Users are frustrated with unresponsive systems. A system down will disrupt their work and prevent them from delivering to their customers. If instead of going entirely down you prevent them to use a certain function for a short period they might find it acceptable since they were able to continue with your software (with limited capabilities).

2. Autonomous, Scalable and Isolated services

How does scaling differentiate between Microservices and The Monolith?

In general, assuming you have a standard Monolith configuration, you will have one endpoint for your datastore. Your datastore might have it’s own scaling mechanism because eventually, you will hit a point where space is becoming an issue, but that endpoint is still the same.

To scale their software, companies will eventually end up creating multiple endpoints.

They will start to:

Create specific endpoints for regions (eg. na.myapp.com, eu.myapp.com)
Create specific endpoints for customers (eg. customer1.myapp.com, customer2.myapp.com)

They do this because to scale the application, they need to replicate the infrastructure as a whole, in it’s new sandbox. The Monolith comes as a whole, therefore, it needs to scale as a whole. You usually cannot scale just a small part of it, and if you did, it might not change much, as the contention will be found at the datastore level most of the time. You are always at the mercy of your datastore.

Scaling a full infrastructure comes with a lot of inefficiencies. You need many machines, that are barely used just because that’s how the Monolith is designed. The Monolith needs all of it to operate.

What’s great in a microservices world is that you can scale by simply duplicating the microservice that you need. Say in our case, we have a lot of Orders in your system and it’s starting to be slow when requesting information/processing requests about orders, then you can simply add another order microservice.

Here are a couple of other benefits obtained with scaling microservices:

Each order microservice has its database, so they don’t share the contention. They split it, by having data stored separately. (Suddenly, your system has become more responsive as we have each instance handle their locks independently)
Each order microservice is isolated from one another. When a request for information is made, our Gateways (Machines that redirect traffic to the proper instance of the microservice) will dispatch the request to the right microservice.
You could have a failing Orders microservice. That might impact only half of your customers since your customers are now split between two microservices.
You are not limited by the size of a single machine anymore. If you have reached the actual memory / hard drive limit of a machine, you can simply just add more instances without headaches (What would you do if you had the Monolith in this case?)
Scaling only one microservice instead of scaling the Whole Monolith is a lot more cost-efficient.
Usually, scaling Vertically (Meaning, increasing the hardware of your machine) costs more than having multiple smaller machines. You can save some costs by scaling Horizontally instead.

Microservices architecture allows scaling more efficiently and increases availability.

However, it comes with certain trade-offs:

Adds complexity on how to deploy your applications updates (How many containers do we update, how is the deployment process done ?)
How do we monitor all those containers? We initially only had to keep track of a single instance…
How to keep backups of all those different databases

3. How performance is obtained

Simply putting in place microservices is not going to magically yield a huge performance boost. Just like the Monolith, we are always bottlenecked by how we retrieve and save data.

A good and efficient way to boost performance in a regular system is to have a caching layer. That layer allows you to directly fetch the data from an in-memory cache which is super fast and efficient.

However, monoliths are always restrained by entities of caches. It is hard to scale the caching system by only keeping a single endpoint. The cache becomes a cache for everything. The cache eventually becomes big and then you hit the same database issue. How much data can be kept on the cache instance will be limited by hardware limitations.

The image above integrates a couple of key concepts of Microservices. First off, we have scaled our User microservices by using a technique known as Sharding (Well get to it soon).

You will notice that the Monolith has scaled to fulfill performance. In the usual case, that means, we increased the Database instance to a better machine. It’s a pretty simple model.

You increase the database, suddenly you find that your queries are running twice as fast. That’s great.

However, you are still suffering many issues like:

Big transactions are locking the database for a couple of seconds, everyone using the system is blocked during that time.
Very high usage of your system during a short period (bursts) is slowing down everyone. The machine does not adapt based on usage and therefore you have to scale it more to be able to fulfill demand during peak times (Increase in overall costs just to fulfill peak times).
You see that there is a limit to scaling up the machine, and you will eventually have to think about how to scale the database next once you hit that limit.
It can be costly to scale the machine, you will likely find yourself in a situation where you have unused resources for a while until it starts bottlenecking again. (not-cost efficient)

So let’s see how we can gain performance and solve some of those scaling issues with microservices!

Shards

By Sharding our entities, we can get better performance. What exactly does that mean?

Shards are a self-contained instance of our app that knows how to process inquiries made. We can scale by adding a theoretically infinite amount of them to suit our business needs.

The idea is that each shard contains a small section of items. (eg. Shard #1 may contain customer data 1 to 10, Shard #2 has 11 to 20, etc…).

Our sharding system has a technique to distribute customer information across each shard so that the load is distributed evenly or in a manner that makes the most sense to ensure the load does not goes all on one shard (We want to reduce contention as much as possible!)

There are techniques out there to do that, so you don’t need to worry about it too much.

Sharding has many benefits, such as :

Allows us to isolate contention by shards, so if one shard is impacted by a demanding task, then our other customers are not bottlenecked by that.
Customers who are heavy-users that put a lot of stress on the system will be contained to their shards and will not impact everyone using the application. (So that single shard might be under heavy load, but not the others)
It’s always fast and snappy for users that are not using the system frequently, they are not feeling the impact of the high consuming users.
We add small shards, running cheap instances. Costs increase as our customer base increase (Instead of running bigger instances initially to cover for future needs)
The gateway microservices knows how to distribute messages to the proper shard. Shards do not need to know how to communicate outside, you can add more of them as needed without any hassle.
Risk is reduced as each shard needs to have its Failure, Backup, and monitoring strategy. If you lose data (You never want that…) then you only do so for a very small part of your system.

Overall, you find yourself with increased Availability, a more Reactive system, Contention reduced, Performance gains and a strategy to Scale as you need.

Command-Query Request Segregation (CQRS)

Another part of the performance gained in the image shown above is a technique called Command-Query Request Segregation.

Since we love to use microservices, then we can add more to the equation, right?

The idea is instead of having a small Monolithic instance of a shard that process request in the traditional way, the shard itself has a way to scale what we will call Commands and Queries

Commands are referencing to any tasks that involve state mutability. That would mean, anything that has the intention of modifying, updating, creating new entries in your microservice.

Queries are the request sent to your microservice to retrieve data. Usually, you will query your database or cache to get that data.

The goal of CQRS is to split those two tasks into two different paths.

CQRS wants Commands to be write-only (Update database & cache).
CQRS wants Queries to only fetch from the cache, and never from the database. (*With the exception that at the start of the service, the cache will rebuild itself by querying the database)

What you gain from this is a way to scale what your shard needs most. If you happen to have some IoT device that writes a lot, then you can scale your Commands. If you have some SaaS that shows information and 90% of your queries are just showing information, then you can scale your Queries accordingly to meet user demand.

This allows your system to be very fast, extremely efficient and contention-free (No potential database locks!). However, now you have to deal with eventual-consistency.

Because no locks are involved, it also means a user can have this ‘in-between’ state where the write has been done, but the cache not yet updated. Regardless of your business model, this shouldn’t be a problem. Keep in mind that it will only take an instant for everything to be propagated.

How consistency is obtained in a distributed system

Consistency is an issue you do not have to think about when using Monoliths since you query from a single endpoint.

Microservices introduce this issue because we are now distributing our system across multiple clusters of servers with small shards.

Up to now, our microservice infrastructure always required some Middle Man that we called Gateways. They were responsible to know who is where, and how to deliver the messages to the right microservices. However, scaling and adding more Microservices in those situations require additional configurations to inform them of the newcomers.

Introducing another useful concept: Event Sourcing.

With event sourcing, all microservices that we add to our system will subscribe to a single entity known as a Message Bus. The goal of the bus is to decouple and remove dependency from one Microservice to another Microservice.

Message Bus can only do two things:

Subscribe to: We listen to a specific message and process them if they are relevant to us. (eg. AddNewCustomer message will be handled by our User microservice. He might then delegate that message to one of its shards to do the processing.)
Publish to: We send a new message on the Message Bus. Someone might be interested in the message sent and do something about it.

To give a better example, let’s say we wanted to update one of our customer information. Now because our services are all decoupled, we cannot do that with a single update transaction. But we want to make sure all the microservices are made aware of the changes.

We would initially receive from our API, a request to modify a specified user. An initial ChangeCustomerInformationById message would be sent on the Bus.
The User microservice is listening for those messages on the Bus, it catches it and processes the request.
After successfully updating its record, the User microservice publish a new message called CustomerInformationUpdatedById which is then, picked-up by two other microservices that are interested in that type of message.
The Orders and Shipping microservices listened for the CustomerInformationUpdatedByIdmessage. They will change some information on the affected orders and shipments. They do not need to publish another message afterward, but they could if we wanted other microservices to listen for those kinds of updates.

By using Event Sourcing, we can easily decouple our services. It makes it very easy to add new services without having to wire anything (other than the message bus).

However, now that you are listening for messages, it may get a lot more complicated code-wise to keep track of who is listening for what, so make sure you have a good strategy for managing and dealing with those messages!

Aggregators

You will notice on the picture above the addition of an Aggregator Microservice.

Does this mean I need to query EVERY microservices each time I need to retrieve an object structure that is composed of the result of multiple microservices?

Fear not, that is why you have Aggregators. You can build a simple microservice whose sole purpose is to query other microservices when it starts, and populate a cache, which will contain data in a different structure, ready to be served (with most-likely, aggregated object from the different microservices).

An example would be, to have the list of Orders by Customer. The aggregator could store all orders by customers by querying both Microservices and gather what is needed to deliver the information immediately when asked for.

Your aggregator would also be Subscribedon the Message Bus, listening for messages that change the data. Therefore, he would immediately update itself as well when there is a mutation, so you can keep it up to date (eg. A new order is added to the Order microservice, the Aggregator will then also add that order to his list).

The only issue with the Message Bus implementation is consistency. There is a slight delay between the update performed on the microservice and the cache update on the aggregator. So you have to deal with Eventual-Consistency again, meaning that for a very short period, your Aggregator will return old data that has already been updated on your other microservice.

If you absolutely need strong consistency, then you should not use an aggregator microservice and make sure any request for that particular case is directly sent to the microservice responsible for that query.

It will be less efficient in terms of performance, but you get strong consistency, which is the trade-off in this scenario.

Closure Notes

It is more important now than ever to rethink your architecture choices.

Users are the one driving usage of a product, and if your product is slower, unresponsive or maybe it’s in Maintenance mode every other day and it happens that you have this new competitor that does not have theses issues, you might find yourself with a customer base leaving your product for them.

It may seem trivial to you. “It’s just a couple minutes down, no one will bother”, but what do you think your users are potentially doing while it’s down?

➡️ Yes, if they are frustrated (Which they might be), they could be shopping around while it’s down for alternatives.

Remember that the way you want to store data is usually not the way you want to consume it. The beauty of microservices is that you can focus on the right way to store data, and then have a proper data view or aggregator that serve the data in the way you want to consume it.

Your front-end devs are happy with a clear and easy to consume API
Your back-end devs are happy with a proper datastore, that can update easily over time for each single microservice

I hope this will serve as your first step into microservices and have you reconsider or talk to your colleague how your current infrastructure is thought. Who knows, maybe it’s due for a little refresher?