A Failure to Communicate
Building microservices means that you will call other microservices. That’s kind of the point. People think it’s going to be easy — you have one service call another and get results. This is not the case. Often, you need to call multiple services, and then combine the results. You then run into the problem of fan-out and composition.
In a monolithic codebase, especially a blocking one, this is easy. You can call a method or a function that returns a list, and then iterate over the items in the list. If you want to do something with the items in the list, you call another method.
This simple task is not so simple in a microservice environment. No matter how you slice it, your application must deal with the distributed nature of microservices. It doesn’t matter that your code is blocking–your code is now running on a different CPU then it was before. That function call you made may take many thousands or millions of times longer to return, and fail in new and unpredictable ways — if it returns at all.
What makes fan-out and composition such a challenge for developers to deal with? Does it need to be so hard?? I would contend that how we handle communication between services is at the root of the increased complexity.
To demonstrate this point, let’s look at communication in an office. If you work in an office with 4 people, it’s easy to communicate. There are 24 different ways for you to communicate–not so bad. But what happens when number of people in the office grows? What happens when you go from 4 to 40 people? Now, there are 8x10⁴⁷ different ways to communicate, more possible conversations than could ever take place!
Obviously, a communication strategy that works for a 4-person office would be a disaster and lead to chaos if used for a 40-person office. The same is true with microservices.
How often do you hear, “the network is unreliable” as an explanation for unreliable microservices? Is the network to blame? Or is it engineers applying the same techniques they used in monoliths to their network systems? Communication with microservices is not without its challenges. But, like communication in an office, it is a problem that is surmountable given the right approach.
Let’s consider some attributes that would make microservice communication easy, and would result in simple fan-out and composition…
1) Async Abstraction
First, microservices are asynchronous–there’s no getting around it. Even if your application is blocking the entire system, it is still asynchronous once you cross the binary boundary. If you don’t address this properly and try to block, you will have to use a circuit breaker like Hystrix to bulkhead requests. This creates an additional burden on the developer. What you really want is an abstraction that makes modeling async code easy.
2) Easy Composition
The second thing you want to do is to easily compose requests. Microservices constantly require the developer to take a result from one service and call another with it. When your code is synchronous and on-box that isn’t a big deal– but now it can come from anywhere and you don’t control when it arrives. As previously mentioned, this means dealing with asynchronous code. But how to deal with it? One way is to model it using callbacks and listeners. This is manageable when you only need to deal with one or two calls — but that doesn’t happen in the microservice world. You need to call several services at once, asynchronously, and then combine the results. So not only do you need something that makes dealing with asynchronous code easy, you need something that makes composing asynchronous calls easy as well.
The third thing you need to deal with is collections of data. In a monolith, it is very easy to return a list of data, or an iterator. This gets trickier once you move off-box. There are various techniques to deal with this, but many of them are leaky abstractions, exposing details of the underlying implementation that should ideally be hidden away. Developers now have to worry about keeping track of paging or dealing with block data. What you really want is an abstraction like a stream.
4) Application-level flow control
The forth thing you’re going to need is application-level flow control, i.e. back-pressure. There are lots of examples of byte-level flow control and they work very well for the problem they are intended to solve. The problem is, they don’t necessarily protect applications.
There are many cases where the application can’t process data as fast as it arrives. If there isn’t application-level flow control, then the developer is left to deal with it themselves. This leaves you constantly having to tune queue sizes, circuit breakers, and rate limiters. What you really want is for your microservices to communicate transparently to the developer and decide how much traffic they can handle.
5) Load-balancer designed for Microservices
The fifth thing you need is a load-balancer designed for microservices. Simple round-robin or least-loaded strategies can cause thundering herd problems and cascading failures that can bring down your entire application.
Consider a service with several healthy instances and just one unhealthy instance. Under normal conditions, the median response time is 50ms with a 55ms p90. The unhealthy instance has 5ms p50 with a 550ms p90. Although the unhealthy instance has a better median response time, it will return 10 times slower one out of every 10 calls.
This represent a big problem when you think about how microservices work. In scenarios with lots of fanout and composition, the tail latency of the slowest service tends to dominate the overall response time. For example, you have two services A and B where A makes 5 calls in parallel to B for each time A is called. Service A’s processing time is only 1ms, and B’s processing time is one of the latencies mentioned above. If our load balancer is using a simple round-robin strategy, A’s latency will be completely dictated by B’s latency. If one of four instances of B is unhealthy, there is a 12% chance that A’s response time will exceed 550ms, making A’s latency profile even worse than the unhealthy service B! As we can see above this has an out-sized impact on the upstream calls in a microservice system. This can be mitigated through the use of predictive load balancing, which keeps track of a service’s performance over time and penalizes unhealthy instances as soon as they are detected.
The sixth thing you need is the ability to cancel work. Since microservices constantly fan-out calls to multiple services, it’s not uncommon that one of the services you call gets an error. Without cancellations, you will waste time and resources on the servers processing unneeded requests. This can get bad enough that it leads to outages.
Another scenario for cancellations deals with streaming. In a monolith, you can call a function or method and return a list. If the list has 100 items but you need the top 5, you can iterate over them and take the first items. The same thing can be done with streams when you have cancellations. Consume the first 5 items emitted from the stream, and then cancel the request so the other service stops emitting data.
The seventh thing that you need is an easy way to route between services, and methods within a service. In a monolith, this is easy to do using namespaces and libraries. If you want to call a method, you just reference a namespace, and then call a function within that namespace. You need something similar for microservices to make them easier to use.
8) Discovery with presence notification
The eighth thing you need is a way to discover when services are present and ready to send or receive requests. In the monolith world, this was done with DNS and maybe with a load-balancer. In the microservice world, where everything is async, you want to ask for a service and then be notified of its presence when it’s ready.
To recap, the 8 items are:
1) Async Abstraction
3) Application-level Flow Control
4) Easy Composition
5) Load-balancer designed for Microservices
8) Discovery with Presence Notification
In my next blog post I will detail a service that shows how these attributes can be used make a complicated service call graph simple.