The four horsemen of a potential microservices apocalypse

Published in

Towards Application Data Monitoring

4 min readJan 7, 2021

The job of an engineering leader has never been more complex. They have to enable the business with scalable, resilient systems. They have to ensure that their systems are optimized to always achieve peak performance. And they have to do this in an environment that is in perpetual flux.

With today’s highly distributed systems and teams, it is a real challenge to maintain architectural standards. Architecture documents cannot keep up with the pace of CI/CD cultures where it is common to see hundreds of changes deployed every day. It’s no surprise that constant, small changes to different parts of a loosely coupled system can result in major movement of the overall system over time as functional scope, complexity, dependencies, and teams grow. It is a rare team that has an accurate global view of their entire architecture at all times.

As microservice architectures become more ubiquitous, there is increasing research into the anti-patterns that adopters see creeping in over time. In 2019 Taibi et al categorized a range of microservices pitfalls and their work is available on arxiv. Their most recent work is based on discussions with enterprises at varying stages of microservices maturity. 23% of their subjects had adopted microservices two years previously, 60% between three to four years previously, and the remaining 17% had adopted microservices five or more years previously.

The researchers identified organizational issues that range from “non-homogeneous adoption” to “sloth” to “magic pixie dust” as pitfalls. While these anti-patterns are clearly cultural in nature, there are several important technical watch-outs. The chart below shows the most common and most harmful anti-patterns their research subjects pointed out. In the chart, the line shows how often, in percentage terms, respondents cited a particular category of problems. The bars show how harmful, on a 10-point Likert scale, respondents perceived a particular anti-pattern to be.

Looking at the three most harmful outliers and the three most frequently mentioned, we see that there are two anti-patterns common to both outlier groups. This leaves us with the researchers’ view of the four horsemen of a potential microservices apocalypse, which are:

Hardcoded endpoints, which refer to hardcoded IP addresses and ports between connected microservices.

Wrong cuts, which refers to microservices that are split based on technical layers (presentation, logic, data) as opposed to business capability.

Cyclic dependencies, which refer to the existence of cycles of calls between microservices. E.g., A calls B, B calls C, and C calls A.

Local logging, which refers to log data stored locally in a microservice instead of a distributed logging system.

Our team takes issue with the authors’ perspective and results on one of the horsemen. They position “wrong cuts” as an anti-pattern to be avoided. In our world, we have seen microservices split based on technical layers as a common pattern that holds up well at scale. Industry leaders have shown that this approach is in fact desirable. Uber’s engineering team, for example, has explicitly defined a thoughtful approach to layer design that minimizes the “failure blast radius” and focuses precisely on the infrastructure, business logic, presentation layer, etc “cuts” that the authors would deem “wrong.” With over 2,000 microservices in production, it’s not a stretch to argue that Uber’s approach is battle tested. Clearly, this is an artifact of new ground being charted as more and more companies adopt microservices. These differing points of view are likely healthy as adopters explore and understand the approaches that deliver the best results for them.

Disagreements aside, we have seen several significant anti-patterns that the authors highlight, show up regularly in our client work. One of the most frequently cited fears that we hear is about logging. There is the constant specter of accidentally logging sensitive data and only realizing the problem after a manual audit uncovers the problem. We’ve witnessed clients detect problematic cycles in their service dependency graph shortly after deploying our observability platform. Knowing that there is a problem or a problem brewing is critical to long term architecture hygiene.

Given the breadth and complexity of the anti-patterns that can appear over time, reliance on good practices is necessary but not sufficient. Leveraging capabilities that provide deep visibility into services, dependencies, health metrics, data traffic and more, is a “must-have” as the scope and scale of systems expand.

—

At Layer 9, we’re building a next gen observability platform for modern stacks. If you’re interested in learning more, please drop us a note via layer9.ai or follow us on Twitter @layer9ai or on LinkedIn @ Layer 9 AI

The four horsemen of a potential microservices apocalypse

Written by Arjun Dutt