Coupling in Microservices, Part 1: Single vs. Multi-Service

Published in

Flipp Engineering

9 min readFeb 28, 2020

Here at Flipp, we use Event-Driven Microservices pretty heavily. In particular, we use Apache Kafka as our data backbone. Microservices are (from a historical perspective) a relatively new pattern in computer science. One of the main touted advantages of microservices is the ability to decrease coupling between different pieces of functionality.

“Low coupling” is a term originally used to refer to different modules or classes within a single codebase. More recently, it’s been co-opted to be used within a service-based architecture to discuss the advantages of keeping the different services independent.

Some of these advantages include:

Reducing the “blast radius” of issues; if one service goes down, the others can stay up;
Reducing the things that need to be changed when making code changes (if something is independent then theoretically only that part of the code needs to be changed and deployed);
Ability to scale different parts of the code differently if they have different needs (e.g. we may need only one front-end UI container, but five backend API containers if there are other services calling it);
Each service focuses on a single task to make each piece more easily digestible by programmers and more easily testable;
Ability to limit the access requirements of each service to tailor it to its own needs (no need for a single service to access a database, a messaging topic, a Redis cluster, an external API, other HTTP services, etc.)
Ability to make changes more agilely (you can upgrade a library more easily if it’s used by less code).

I’d like to take a deep dive into what low coupling means and how it translates to real-life improvements.

What are we trying to improve?

In software architecture, there are always a number of things to consider before choosing a design. Some of these are on the technical level (speed, cost, reliability) while others are on the human level (ease of understanding, maintainability). You can consider each of these areas as KPI’s (Key Performance Indicators) where we want to get the best possible result. Most KPI’s come with tradeoffs that need to be considered.

Standard caveats:

I will be discussing a standard web application — let’s say hundreds of requests/messages per second, and tens of thousands of users.
Assume that the system uses some kind of relational database, although you can generally swap in a NoSQL or similar solution without much difference.
Assume that all services are in the same programming language.

In each of the following sections, I will be focusing on a single KPI at a time (note that the absolute best thing for one KPI can be disastrous for a different one — I am not advocating for any of these local optima!) and discussing what variables can affect it.

Speed and Performance

No matter what you’re doing, systems work faster if there’s zero network latency. A function call is always more performant than either synchronous (HTTP, RPC) or asynchronous (message broker/queue) messages. This seems to indicate that putting services that constantly talk to each other into the same process will be the best bet for performance. This is particularly true of synchronous API’s.

This does depend on scaling issues— if your single service is too beefy to be reasonably scaled, then splitting it up can increase total throughput even if an individual task may take longer.

Performance in a multi-service case can be increased by having your services all make their updates into a single data store. This way the updates can be seen immediately with no delay across all services. (Remember my caveat above though — there are plenty of reasons why this can be a terrible idea!)

Cost

Cost is completely dependent on how your services run. In some offerings, you’re essentially paying for your CPU time, so the number of services doesn’t matter as long as they’re all doing work. Other times you’ll need a consistent box to run your services on, and in that case it may or may not make sense to bundle multiple services together, depending on their resource costs.

In general in today’s market, engineer hours are orders of magnitude more valuable than CPU time, so this KPI is (and should be) often ignored until it gets out of hand.

Blast Radius

If something goes wrong with Thing A, how likely is it that Thing B is also affected?

There are two aspects to this:

Point of failure: If something goes down, does it affect something else?
Resource utilization: Can something grab all the memory or CPU and stop the other thing from doing its job?

Here the clear winner is multiple services, full stop. Having two things running in the same container or process is a huge risk that introduces a single point of failure. Splitting out processes adds fault tolerance.

However, it should be noted that this only makes sense if those processes are truly independent. If (e.g.) one of them is a backend API and the other is a message broker that does nothing but call that backend API, you don’t actually gain any points of failure. (However, you do still get the advantage of separating resource usage.)

Another aspect to this is the ability to separate out the access permissions of each service. If only Service A needs to access the database, then denying Service B access to the database cuts in half the chances of introducing a bug that makes the database go crazy.

Figuring out what should be a separate process can be split down business lines (a problem with item reviews should not affect buying the item) as well as technical ones (a broken image processor shouldn’t affect the creation of the reviews).

Scaling Efficiency

Here again, multiple services are the clear winner — you can scale your HTTP API separately from your front-end and separately from your message broker. Assuming cost (above) is not an issue, this allows you to tweak your processing load so it’s perfect for each job.

Ownership of Data

How confident can we be that one system or service owns a particular kind of data?

One of the constant refrains of microservices is that you shouldn’t have two services writing to the same database. The question is — does this really mean two services, or two systems? (Many people use the term service to really mean a set of processes working together. I’ve been using it to refer to a single process, and the word system to refer to the set of processes.)

I argue that because systems consist of services that work together for a shared business purpose, it’s entirely proper that more than one service within the same system write the same kind of data. If the only real difference between service A and service B is that A is triggered by an HTTP request but B is triggered by a message on a queue, A and B should be treated as identical from a business point of view.

We don’t gain anything by tying our arms around our backs and forcing ourselves to create multiple data sources, pass the data around, etc., when all these services are managed by the same team and contributing to the same overall process — in fact, there are plenty of downsides to this which are covered by other KPI’s.

In other words, for this question I believe it makes no difference whether there are one or many services in a single system — the data ownership should be the same — on the system level.

Speed of System Creation

How fast can the system be written from scratch?

The main variable here is how good your tooling is in creating a new service. How much manual work is involved in creating tasks / lambda functions / etc., ensuring permissions are set correctly, setting up access rights, and all the other fun stuff?

The bigger the pain of setting up a new service, the more beneficial it is to bundle services together, e.g. by forking or threading.

Investigations and Debugging

How quickly can problems be identified?

The fewer things you need to check, the better. This seems to point to a single service being the best of all worlds. Having multiple services that talk to each other can quickly lead you to a place where you need to start at the end and follow the trail all the way back till you see where the breakdown happened. (Service D got the message, did service C send it? No, so did service B send it? Yes, so something happened between B and C.)

However, putting multiple kinds of work into a single service can actually make it more confusing, not less (e.g. all your logs could be interleaved). Clean separation could give the investigation less cognitive overhead even if it takes longer to match up the streams.

Cross-service tracing tools like AWS X-ray can help making this kind of debugging problem easier.

Maintenance Coverage

How many things do we need to monitor or support?

One of the big challenges with creating services is having to make sure the lights are on constantly. With a single service, you can set up a single monitoring account, ensure that one system has the right amount of CPU, memory, etc. More services mean more monitors, more alerts, more dashboards and more knowledge of what runs where.

Summary

Although in general multiple services have a number of advantages, the disadvantages shouldn’t be discounted. Some good advice is usually to start with a single service, design it in a way such that bounded contexts have bounded code boundaries, and split it up when it becomes too much to handle.

Advantages of Multiple Services:

Multiple points of failure (if boundaries are drawn correctly)
Individualized resource utilization
Reduced possibility of shared resources being accidentally misused
Ease of scaling separately
Faster to investigate problems that are pegged to a single area/process

Advantage of Single Services:

Performance
Faster setup time (e.g. access control, permissions)
Can be simpler to investigate more complex problems
Less monitoring and alerting necessary

In general

I’d recommend using multiple services only when:

Your single service has shown some of the disadvantages listed above and it’s hurting you (too many points of failure, scaling issues, trying to do too much at once).
You have a good way of spinning up new services (and setting up permissions, access control, resources etc.) As of now most companies seem to have to spin up their own way to do this, although there are third parties that aim to simplify the process.
You have a good way for different services to talk to each other (e.g. Kafka or a good API framework and schema).
The services are doing wildly different things (e.g. database CRUD vs. image processing) or are acting on totally different types of data.

Next up —a discussion of whether to handle your multi-service setup within a single code repository or have separate repos for each service.