Microservices: Prioritize operational complexity over service templates

Published in

CodeX

5 min readDec 31, 2023

There is a common myth among teams trying to jump into microservices, that the first step required is to have a project template.

I disagree.

But, shouldn’t it be easy to create a new service?

Yes, but that is not the first thing you need.

I think this idea that we must come up with a ‘template’ project first is a result of seeing microservices primarily as separated repositories, before anything more.

And so a project template is often hastily born, and in many cases it’s just a Hello World application with maybe the infrastructure code that deploys it to an environment.

But something like that is not very valuable, in my opinion. All that this sort of a “template project” does is to solve a small part of the development complexity involved when considering a microservice style of deployment, but not much more.

An alternative approach

I think we should be spending more time focusing on operational complexity when considering using microservices to solve business problems.

Tackling operational complexity

One way to think about the various operational complexities one encounters with microservices is to go back to the often-quoted Fallacies of Distributed Computing.

Let’s go over each of them.

The network is reliable

Before we even head out into the world of microservices, do we have the following concerns addressed?

For read-based remote calls, do you have an established method for resilience?
For write-based calls, do we have a strategy to make repeated calls idempotent?
And on the business side, are user flows and business processes set in such a way that they are resilient to technical failures?

Latency is zero

For every business operation,

does the team have a way to measure P9x response times for core-business operations, and more importantly, do we have a baseline for this measure?
do we have an SLA for our end users when it comes to response times? Do we have a way to monitor them continuously?

Bandwidth is infinite || Transport Cost is Zero

I have had an experience with an engagement that did microservices, with a high throughput of (large) data between services. This had a negative impact on the performance of the application.

do we know the amount of data that needs to flow in between services?
do we know the typical pattern of this data-flow on an average day?

Our technical choices when we do microservices might change based on this answer.

The network is secure

For every service,

Do we know how to ensure secure communication between services?
are these policies signed off by the business/organization?
do we have the necessary safety nets we have in place to ensure that the team has a multi-layered security approach (since microservices increase the surface area of a possible attack?)

We must remember that microservices can inadvertently increase the surface area of a potential attack if proper thought is not given to the security aspect of things.

The topology does not change

When we think of the boundaries between microservices,

do we have our microservice boundaries where if a single service goes down, the rest of the services or business capabilities can continue as usual?
do we have the necessary infrastructure and knowledge of tools and techniques that will enable this sort of resilience between services?

For example, the often-seen ‘Order-Service’ in most e-Commerce platforms is a single point of failure, often because it gets tied to most business processes.

When the boundaries and dependencies are not thought through, what we have is a setup where most services will require the ‘central’ service to be up and running at all times.

If the boundaries and the technologies do not support this level of resiliency, microservices provide little benefit when it comes to a blast-radius of failure.

There is one administrator

If splitting an existing service into different microservices,

Do we have a monitoring and observability tooling setup for the current service that we can extend to the newly created service(s)?

And when these new microservices are to be maintained by different teams,

Is there a way to have end-to-end visibility for the present state of the system at any given point in time? Does every team have a common observability and monitoring tooling setup?
Do all the teams have the same capability and expertise using monitoring and observability tooling?
Do all the teams have a similar infrastructure setup, and do we know if there are any downsides (especially around cost or long-term maintenance) to them being different or inconsistent?

I have come across a couple of microservice setups, where even basics like having aggregated logging or even having a correlation-id between different services were not implemented before jumping into the world of microservices.

The network is homogeneous

Sometimes, choosing a single interoperable mechanism to communicate between microservices might not be the best approach depending on your use case.

For each of these mechanisms (for eg: HTTP based REST API with JSON payloads)

Do we have a necessary safety net and tooling required to ensure that ‘contracts’ between services are not broken?

Remember, with microservices, often we split up what could have been a method-call into a call through the wire.

The former is often more performant, easier to modify and refactor, and often is more maintainable in the long-run. When moving to the latter we should understand the trade-offs and minimize the downsides as much as possible.

One step at a time

Am I suggesting, that given the opportunity,

one must not jump into creating a microservice “template” too soon? Yes.
one must not jump into microservices without considering their operational complexity?
Yes.

Here is my proposal: completely skip creating a template during the initial days, and choose to solve operational complexities while solving business problems.

Simpler, adaptable solutions first

As many in our industry advise, prefer to have an evolutionary approach to coming up with designs for a software system.

Do we even need microservices implemented to resolve some of the operational complexities that we described earlier?
In most cases, no.
You could surely look into having a Monolith-First strategy to understand how to split your domains and the communication requirements between them.
Do we need 5 or 10 microservices implemented already to solve the operational complexities that require multiple services to be in place (eg: maintaining contracts using contract-testing)?
I believe the answer is no again. We could start small, with just a separate service to solve these specific operational concerns, and then replicate it to a full-blown microservice setup.

If you cannot answer some of these questions that help us understand operational complexity, should you not defer the decision to move to microservices?

Conclusion

Solve the right problems first, and solve the harder problems earlier.

Operational complexity is what will backfire if not taken care of early when considering microservices. Template projects ought to come much later, often derived from working solutions.