Reactive Microservices in Practice

Lucian Oprea

Published in

METRO SYSTEMS Romania

15 min readDec 2, 2019

Building services tolerant to failure and make them strive under varying load conditions.

Why should I care about Microservices?

First, I want to recap why would we care about microservices anyway? What do we gain?

Isolation is the most important feature and it is the foundation for many benefits in microservices.

We gain ease of deployment

In monolith applications, we have thousands of lines of codes and because of that, there is a higher risk when deploying the app, even if you have just a few modifications. In practice, these deployments will happen very seldom. Consequently, a lot of changes will accumulate and the higher this delta of changes, the higher the risk of something bad to happen. We get into a vicious circle.

But isolation helps you adopt CD because you can deploy incrementally service with service.

We gain organizational alignment

Isolation also has an impact on architectural design and will, therefore, have an effect on how we split the teams and their responsibilities.

Most of us have probably experienced problems associated with big teams and big codebase. And we all know that smaller teams having under their responsibility smaller codebase will be more productive.

Microservices offer the promise of a better alignment within the organization when considering the architecture. Therefore, it would be easier to find the sweet spot of team size and productivity.

Microservices are Technology Agnostic

Meaning that we can use different tools for different jobs. Actually, we can use the best tool for a certain use case. If I want to develop a social media application then I can choose to model the relationship between friends with a database such as GraphQL, and the posts and statuses in a document database such as MongoDB.

Composability

Another key promise of distributed systems is functionalities reuse. We could use the same functionality in many ways and by different consumers. However, to achieve this we should apply the Single Responsibility Principle, meaning that a microservice should have a single reason to exist. In this way, the system will be easy to understand, maintain and extend.

Own its state

If we have to work with stateful logic, then we should try to maintain the scope of that logic as small as possible and it should have clear boundaries.

Stateful logic or mutability is necessary since a software that doesn’t produce changes is useless.

But mutability is also complex. For example, for a variable that is mutable, we should know what it means? Where it was first time assigned? And where can we change it? And if I have situations where I need variable A to change in some arbitrary part of the code so that B to happen in some other part of the system, then I have remote complexity.

Microservices will help to minimize the scope of this mutability.

Stay Mobile, But Addressable

With the rise of cloud computing, virtual machines and docker we have a lot of power at our disposal to manage hardware resources. But none of this matter if we are locked into a specific topology or deployment scenario.

And microservices are agnostic in this matter, we could move them at runtime, while the system is being used. We have different strategies for achieving this, for example, service discovery, blue-green migration and so on.

There are of course other benefits such as scalability, resilience. But we want to keep this short.

The Trouble with Distributed Systems

Now, let’s not forget that microservices are distributed systems that are well known for being problematic. Here are a few things that can happen in distributed systems.

Partial Failure

When you write an application for a single computer, it normally behaves in a predictable way: it either works or it doesn’t. Software with bugs may give the appearance that the application is “having a bad day”(a problem which is usually resolved by a reboot), but that’s just a consequence of badly written software.

Using the same set of operations, I should always receive the same set of results. There is no reason why software on a single computer should behave fuzzy.

Even for hardware issues, the ones who design computers, prefer to crash the computer rather than returning wrong results, because wrong results are difficult and confusing to deal with.

In a distributed system, at a certain point in time, there could be a part that works perfectly fine, but there could be also parts which are in failure. Hence the name Partial Failure.

The difficulty appears when you make a request which depends on multiple nodes. And if the request depends on let’s say services A, B C. Then if node A does not respond I will get one types of response, if node C does not work I will get other type of response, if both A and C don’t work, I will get some other type of response. Basically, the response will be non-deterministic.

There is also the case when I don’t even know which one worked and which one not. because these nodes are connected via a network.

Unreliable networks

The internet and local networks from datacenters send data using asynchronous packets transfers. Therefore, there is no guarantee that those packets will arrive at the destination or that we will receive back a response.

I’m going to give just some examples of what might happen when you send a request to a server.

Your request may have been lost (perhaps someone unplugged a network cable)
Maybe your request is waiting in queue
Now your request has reached the server, but maybe the server has crashed, or it was powered down
Perhaps the server has stopped responding, because of a long garbage collection pause, and it will respond later
Or maybe the remote server has processed your request, but now the response has been lost on the way back on the network if some switch was misconfigured or there is a firewall rule
Or the request has been processed, but the response will be delayed because the network on your machine is overloaded.

The conclusion is that if you send a request on the network and you don’t get back a response it’s impossible to tell why?

Unreliable clocks

Time is used everywhere in apps whether we want to trigger a process, invalidate a cache or give a timeout. we will always have a latency. Therefore, you can’t determine the order of when things happened.

Even if you would use a timestamp, each machine has a device to measure time, which is usually a quartz clock, and which are not very reliable. So, every machine has its own time.

Strong consistency

Strong consistency requires coordination, which is very expensive on distributed systems, and it adds an upper limit for scalability, availability, latency, and throughput.

The need to coordinate between services means that the application will not make progress, and it will spend most of the time to reach consensus.

I will have a higher latency because I wait for more nodes to respond. If latency is higher, I will, of course, have smaller throughput, because I can process fewer requests per second.

Scalability is also affected because if I add more nodes to the system, I will need to synchronize more nodes.

Availability will also have to suffer because in order to achieve it we usually add more nodes,

The idea is that it’s difficult to achieve strong consistency. On this type of system, we should try to rely on eventual consistency, and we should abandon the idea of transactions. Still, transactions have their place, but we should not use them as the default option.

We should also not forget that in these systems, there is more than one database. Data inside is not normalized anymore, and you can’t do joins between services.

Tools to manage the complexity of distributed systems

Event Driven Services — Event-Driven Services

Event Buses

Here I have an event bus, which could be a queue of events such as Kafka, RabbitMQ, etc.

The basic idea of this design is the inversion of control.

I will not have a system, where I command to service A do that, service B do that. Instead, I have a system where I publish events in a queue and whoever is interested can listen and react to these events.

As a result, I will decouple my dependencies and each one would be autonomous.

Using this I would gain great flexibility, I can now move around, scale, remove my service without caring that others would be affected, basically without risk.

On one part of the queue I have the producers, one or many, and on the other part of the queue, I have the consumers, one or many. These producers and consumers can be anything. I could have big data analysis tools, other databases, services, or other queues.

Keep in mind that all of this processing is eventually consistent, and the communication is asynchronous. Therefore, I won’t have transactions and strong consistency. However, we need point to point communication and transaction as well, but we should keep in mind that synchronous communication it’s overused, it the default when there are actually few cases when business requires it.

And we should also consider the event communication because it brings a lot of advantages.

From commands to events

Commands represent an intention to someone to execute some action and I expect some result, However, depending on the state of the receiver I would get different unexpected results.

While with events I have no expectation, I just publish a piece of information, and whoever is interested can react to it.

And events have some interesting properties, they are immutable, which means they can’t be modified or retracted. We can’t change the past, even if we would like to.

Events accumulate the same as knowledge and we can only invalidate them by adding new events that will override old events.

Reactive programming

It’s about programming with streams of events where I can choose to react on them asynchronously. The stream of events could be anything such as clicks from UI, GPS signals, packets over the network, new orders in online shopping, results from a database and so on. To these events, I can register a callback function that will execute when a certain event appears. Only that these callbacks are designed in a way that is easy to combine and compose so that I don’t arrive in the situation named callback hell with which Javascript programmers are probably very familiar. Additionally, I also have plenty of operators to map, filter and all sorts of out of the box functionalities.

Now because reactive programming is a declarative and a functional programming model, it’s very readable and expressive as you can see in this example.

userService.getFavorites(userId)
           .flatMap(favoriteService::getDetails)
           .switchIfEmpty(suggestionService.getSuggestions()) 
           .take(5)
           .publishOn(UiUtils.uiThreadScheduler())
           .subscribe(uiList::show, UiUtils::errorPopup);

Here, I take of bunch of favorites details from a user, and what’s interesting about this is that I can declare where and how I want these requests to run. On a single thread, or thread pool or whatever using the publishOn operator.

Another cool fact is that errors are not treated as exceptions or something unusual as in core java, but they are treated as part of the normal flow as you can see on the last function subscribe.

Key benefit. Now, this is all nice and cool, but the key benefit of reactive programming is actually the ability to scale with several orders of magnitude with a small fixed number of threads and a small amount of memory consumed.

This is simply because we don’t block the thread to wait for response, we just release that thread, and only call it again when the response is available.

Applying Reactive to our project

So, we haveve some good tools to deal with distributed systems: event-driven design and reactive programming.

And we decided to apply them to our project. Our goal was to build a reactive system, one that will be elastic in order to cope with peaks and spikes of load. Resilient, so that it could contain the failure to just some services and not propagate the failure to the whole system.

In other words, we wanted to create a responsive system, in all types of conditions be it load or failure.

Just for context, our application is a platform that offers some services for multiple applications and projects for a big commerce group. So, you could imagine that we have millions of requests every day.

Since it’s a platform, the client’s applications can come and go any time, and they could come with an unpredictable load. Therefore, we needed our system to have the characteristics of a reactive system.

Now, we already had some events and queues so the next natural step was to apply reactive programming.

As I said the key benefit of reactive programming was the possibility to scale a lot with a small number of threads.

As an example, an HTTP container implemented using reactive programming can manage 100k connections per second without any problems, on an ordinary machine such as your laptop. While an HTTP container based on thread-per connection or thread pools such as Tomcat or Jetty can deal with 1k connections per second, and that when the server is well configured and tuned.

This means that I could have a more robust application because now I can scale in a more predictable way. I can manage as many requests as needed and I can deal with peak loads.

The only condition to observe these benefits is to not have only logical computation, but some processing which involves a certain latency, such as that when you make a request to an external resource.

We said great, we only have request to external resources and databases.

But we realized at some point that in order to be able to process 100k requests per second we should not make any blocking request. Now, if your project is like most Java applications, chances are that you are using blocking code anywhere. When you make a query to a relational database or a request to a Rest API, that would be a blocking request.

Our stack consisted of an HTTP container that will accept any incoming HTTP request on one side and a JDBC or Rest API on the other side. And in between, we have some layers that handle the business logic.

And so we couldn’t make use of the true power of reactive programming.

Why so big difference in scalability between blocking vs non-blocking requests?

Continuing with the HTTP container example, classic implementation with thread-per-connection or thread pools will have big problems when it comes to handle 10k concurrent connections.

Container of type thread-per connection might scale well until 10k, but considering that 1 thread in java process consumes 1 MB of memory to hold its stack, then you can imagine that after 10k connections you’ll probably get OOM errors. Not to mention the time spent to switch 10k threads on a computer with 4 CPU cores.

If you consider thread pools, then threads would be recycled. I will not spawn a new thread every time a new connection comes in. And this is how most popular HTTP containers, such as Tomcat or Jetty, are implemented.

However, the number of connections is limited to the number of threads in the pool, which is usually around 100. The extra connection request that comes will be added in a queue and processed later.

What is even worse is these request in queue will be processed even though the client is no longer interested in the response because it has given up, but my server still process them.

The fundamental problem with these threads is that there are blocked while being idle, and doing nothing. They just wait for a response from an external resource. When Java programmers have to deal with IO operation they spawn new threads. They do that in order to absorb the time spent on the network when they have to make a lot of calls to an external resource.

To avoid blocking the thread that makes a request, we should change the programming model. Instead of creating new threads or threads pools to absorb this waiting, we would register a callback and release the thread. If you worked with Node.js you are probably more familiar with this concept called Event Loop.

But if you are in a Spring context, as a java programmer, then you’ll probably need to use different libraries, that are non-blocking, in order to achieve these goals.

For our usual stack we should have:

A non-blocking HTTP container such as Netty or Undertow
A non-blocking driver to DB, usually NoSql such as Cassandra or MongoDB offers this out of the box while SQL databases are trying to support this
Or if you make HTTP requests by your own then use a non-blocking client such as Webflux from Spring

And when someone makes a request to your app, we could go ahead and make the request to the database, and register a callback. When the result is ready I can pass the response to the requester without blocking.

The trick is to not block anywhere because if you do, you just move the problem somewhere else.

Conclusion for Reactive Programming. We saw that RP it’s a powerful tool to possess. But before using it please check if you have any JDBC or JPA dependencies. We could still use RP, but not to its full capacity.

And also keep in mind the learning curve which is pretty steep. And we might not want to apply it everywhere.

Resilient system

I said in the beginning that microservices provide isolation at the business logic level. Each microservice offers some kind of functionality.

But one service could have one or more consumers. In our application, being a platform, we had many applications of a commerce group consuming services from us. We could have tens of consumer applications for the same service.

And now if there is a noisy client, it could impact all other clients. A large number of requests coming from the same client could exhaust the resources of a service and other consumers can’t use it anymore or they will experience long waiting times.

Furthermore, if you are in a distributed system chances are that someone will be a client of my client. And if my client process is slow, so will all clients downstream. And we can possibly reach a cascading failure.

Here is displayed the concept of bulkheads which is very well known in the industry of building ships. It is the technique of dividing a ship into waterproof isolated compartments. If something happens and water gets into some of the compartments, then water will be contained and will not leak to the whole ship and it could continue to function.

The same principle could be applied in software development but the isolation should be done completely.

Probably you’re thinking about Titanic right now as a counterexample. Actually Titanic did use bulkheads to split the ship in compartments, but the walls that were supposed to isolate water leaking did not reach all the way to the ceiling, because they made some calculation and said that even if you rip some walls water can’t reach the top. But when Titanic hit the iceberg, and 6 of 16 walls were ripped, it actually tilted and the water spilled over the bulkheads.

That just an example of what might happen when you don’t have proper isolation.

There are many strategies out there to reach better isolation.

One is to map clients to individual instantiates of a service. In this way, we could isolate customers between them, and we could prevent an eventual failure to the whole service. But we didn’t really like that because we lock clients to certain machine instances.

Another solution would be to simply use autoscaling. When the load is high you simply spawn new instances. But autoscaling is a slow process, it takes some time to scale since we should take into consideration that autoscaling implies not only certain instances but also the resources that these instances depend on. And when there is just a spike of load we just scaled for nothing.

An alternative solution is to allow the client applications to utilize the resources only until a certain threshold. The system should monitor the current load and restrict extra request that comes so that we could continue to respect our SLA. And there are multiple strategies to implement this throttling mechanism. One of them is to simply reject users that are making more than N calls per second for a certain period of time.

Autoscaling + Throtelling — Autoscaling + Throttling

But an even better solution that we applied is to combine scaling with throttling as described in the picture. Keep throttling as a temporary solution and if you observe that the load persistent, and if so only then we scale up.

Conclusion

In conclusion, microservices just like humans need to communicate and to collaborate to reach a consensus. And just like with humans, it’s a collaboration where most difficulties appear.

So what’s difficult in designing microservices is not individual microservices, but the space between them.

Bibliography

Martin Kleppmann, Designing Data-Intensive Applications, O’Reilly Media, March 2017
Sam Newman, Building Microservices, O’Reilly Media, February 2015
Jonas Bonér, Reactive Microsystems, O’Reilly Media, August 2017