A journey through microservices
Table of Contents
- What are Microservices?
- Fiercely Independent
- Loosely Coupled
- Automation Friendly
- Decentralised Data
- Distributed Systems
- Smart Endpoints
- Organised by Domain
- Complete Team Ownership
- Microservices Technical Qualities
- Resilience & Reliability
- Microservices’ Challenges
- Manageability & Reliability
- Transaction Management & Scope
- Business & Technical Qualities
In my line of work, I’m often asked questions like: “what are microservices?”, “why use microservices?”. Well… here goes…
What are Microservices?
Fundamentally, microservices is not a new concept. They build upon decades-old ideas (such as component-based software). However, technological advancements (such as the Cloud, REST APIs, and Containerization) have enabled progress, and brought microservice architectures to the forefront of technological advancements.
A microservice is a cohesive, independent software unit, representing a specific business domain; for instance, a Customer microservice. A microservices architecture is a software architecture containing many (often hundreds) of microservices, that collaborate to solve a larger problem.
Hint — Microservice Definitions
Microservice — a software unit representing a business domain; e.g. a Customer microservice.
Microservice architecture — a group of software units representing an application(s)/product(s).
One key differentiator of microservices (over, say, a componentized monolith) is that microservices can truly be independent (they are fiercely independent), by using:
- A loosely-coupled integration mechanism (e.g. HTTP REST).
- Decentralized data stores.
- Organisation around domain.
- Isolated development and deployments.
Microservices exhibit the following qualities; see Figure 1:
Microservices are fiercely independent. They can be designed, developed, tested (somewhat), deployed, and released independently of others.
This promotes productivity at many levels, and across multiple roles. Developers can focus their attention on implementation, choosing the best technology for the job. Testers can collaborate with developers early on, to agree upon appropriate interface definitions and write tests first. Operational staff only need to deploy changes, without impacting other areas of the system.
Independence blurs the lines between roles, promoting collaboration. Teams can be built around specific domains, and become responsible for one or more microservices.
The isolated nature of microservices also promotes independent scaling (and some resilience; discussed later) to meet growing capacity needs.
Note — Monolith to Microservices
Much of the system evolutionary challenges I hear discussed about today involve replacing a monolithic system. Often, a microservices architecture is the next logical evolution. It’s chosen mainly due to its flexible, decoupled model that supports efficient, yet reliable change.
The common approach is to identify “seams”, and use techniques like The Strangler pattern, to migrate to microservices piece-by-piece. Basically, you chip away until there’s sufficient functionality present in microservices to replace the monolith. However, Domain Pollution can hamper, or prevent it.
Typically, microservices expose capabilities using a loosely-coupled, technology-agnostic integration mechanism (REST HTTP). This is powerful because:
- Different technology stacks can be easily combined to form one solution (e.g. combine Java, .NET, and Python implementation technologies into one seemingly seamless product).
- The integrating technology uses a Lowest Common Denominator (string-based HTTP communication), supported by many implementation technologies (even the more antiquated). There is no need to manage/release “stubs” to consumers, typical in other integration styles.
- The underlying technology can change without impacting consumers (see Evolvability); i.e. we hide implementation details.
Microservices are automation friendly; i.e. they promote practices and technologies built for automation. Typically, this relates to supporting “continuous” practices, deployment pipelines, test automation, and even Blue/Green deployments. Uniformity is also a key factor here; e.g. using Docker/Kubernetes once, provides a blueprint for all other microservices to follow. Generally, this automation focus makes change much more palatable (i.e. less Change Friction).
Note — Automation and Business Qualities
Although embedding automation-friendly practices in from the beginning of a project may have short-term TTM and ROI consequences, the rewards are reaped over the medium/longer-term. Additionally, these rewards can also strengthen Brand Reputation.
I’ll touch upon this later, but microservice’s independence, and integration mechanism, also simplifies testing. There’s fewer layers between how a tester interfaces with the feature, and the underlying behavior under inspection. REST APIs are a more user-friendly communication mechanism than (say) SOAP (see Lower Representational Gap). And finally, we can theoretically deploy a single microservice for testing, without relying upon other system functions — i.e. there’s no need to identify and deploy other unimportant parts of the system (of course this comes with some caveats).
Deployment Pipelines (key to many continuous practices) can be built around a small amount of microservice code, with minimal dependencies. The pipeline can automatically retrieve the source from GIT (or another version control), compile it, execute unit tests, and then build it into an deployable artifact (e.g. a binary .jar file). From there, the pipeline could create a Docker image from it, deploy it to the desired environment, and start the container. Then, the pipeline(s) can execute further automated tests (e.g. acceptance tests), load tests, penetration tests; all through the same microservice REST interface.
When it’s time to deploy to production, another pipeline can mirror an earlier pipeline (e.g. the Staging pipeline) to push into production, where the microservice should function identically. Great eh?
Data that is decentralised may be managed in isolation, because others are not given direct access to it. This enables data structures, and technologies, to change (see Evolvability) with relative ease, and the database technology to be scaled independently of other parts of the system.
Delivering quality microservices is about more than just how they’re constructed; it’s also about how (well) they function in a production environment.
This raises the following questions:
- How do we find services that we can collaborate with to deliver some useful functionality (discovery)?
- How do we monitor and debug services running in a distributed production environment?
- How can we understand (as a whole) what our system does, in a distributed production environment?
Comparing the operational model of the classic, centralised monolith and microservices, we find distributed systems to be both a blessing, and a curse.
Typical operational challenges include:
- Distributed Logging — how can we understand the entirety of the system, when logging is distributed across many ephemeral instances.
- Distributed Debugging — how can we trace each request across multiple system boundaries?
One operational benefit of a distributed system is decentralized monitoring; i.e. we may monitor a microservice’s health, without intrusion from other microservices.
The classic approach taken to logging on a centralized (monolithic) system (all logs on the same server, and easily gathered) doesn’t function in a distributed, cloud-based environment. The problem is twofold:
- By distributing software execution across many machines, each managing its own log files, and using logging technologies (and formats) specific to the implementation technology, how do we find and combine these distributed logs into a form suitable for fault diagnosis (you can’t expect operational staff to access each instance, just to gather logs)?
- Ephemeral instances. Microservices (and the Cloud) is an entirely different operational model than the familiar on-premise, centralized systems of bygone days. In this distributed, cloud-oriented world, software is (typically) executed within instances/containers (e.g. Docker) that we have little emotional tie to. We treat them as cattle (see later sidebar); we don’t nurse the sick back to health; we let them go, and raise another one (in a fraction of the time required of the monolith). However, this ephemeral nature has ramifications around log availability and management.
Note — Cattle, not Pets
The monolith makes many assumptions about its operational environment; thus, it was relatively arduous to spawn a healthy instance of it. Naturally, we (implicitly) grew attached to its safekeeping. In this model our mindset was preventative, and remedial; our work, onerous and challenging.
However, being smaller and more independent, each distributed microservice makes fewer assumptions (than the monolith) about its runtime environment, enabling us to be more dispassionate about their health. This detachment to specific instances enables us to support the more chaotic cloud model; we can treat them more like as cattle than pets (see https://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern).
Independent logging across distributed systems has little value unless we can centralize, and aggregate it. Thus, the use of a logging aggregation tools (e.g. the ELK stack).
Why do we debug software? Well, mainly to track down and resolve software bugs in an efficient manner.
Debugging a centralized monolith is (relatively) straightforward, because the entirety of it executes in the same process. IDEs and application servers have great support to remotely debug a monolith. Unfortunately, creating this “single view” in microservices is harder, because of our decision to distribute workload.
We need some way to implicitly glue pieces of the system back together, and make them seem centralized, even though they’re not. Aggregating log files (see above) is all well and good, but this has little value if we can’t trace a request through the entirety of our system.
A common distributed system pattern to support distributed debugging involves wrapping each request with trace information, enabling us to aggregate an entire request’s routing behavior. See Figure 2.
I used an Edge Service here (for convenience) to generate a random traceId (e.g. 60989322456) and inject it into the HTTP header of each routing (of course the orchestration might be done elsewhere; this is just an example). Each service can now use the traceId to log to its own log files, which are subsequently logged to a centralized log aggregator (using something like the ELK stack), to view the entire journey of a single request.
Monitoring & Metrics
Monitoring tools provide a point-in-time view of a service’s health, performance metrics, etc. These tools promote better transparency, and earlier stakeholder engagement, which can be helpful in resolving potential issues quickly, prior to a failure.
At a grander scale, we have business metrics, used to measure business performance; e.g. the customer uptake on a new feature. Business metrics enables us to make better judgements on what constitutes value, and can be used to drive business actions and direction, such as defining the product road-map. This is an extremely important area to promote TTM, ROI, and Business Agility.
Monitoring and metrics are typically managed with tools (e.g. ELK stack, Netflix Turbine).
One of the pitfalls of the Enterprise Service Bus (ESB) is that it can cause a significant amount of logic (or “smarts”) to be embedded outwith of where it most logically belongs. Vital business (workflow) logic is placed in a centralized system, which may become a monolith in its own right.
This means that we no longer organize behavior around a domain, but around several different domain locations (I have similar reservations around solutions that mix back-end business logic written in 3GL languages (e.g. Java), and database stored procedures). Some functionality may be found in the service, some in the bus.
This is problematic. Not only must a developer look across multiple applications (and source) to understand/change a specific feature, but the bus becomes a hub of activity for development and runtime processing, creating a potential bottleneck. If the bus becomes a development bottleneck, then developers stumble over one another, reducing productivity; if a runtime bottleneck, then it may cause performance or scaling issues.
Anyway… a (true) microservice architecture doesn’t expect the use of some centralized orchestration engine. All “smarts” are expected to coexist in the same code/instance.
Note — Analogy
I sometimes think of these smart endpoints as similar to a cross-functional team. The team has complete ownership for something. There’s no silos, where part of the solution is done elsewhere.
Organized by Domain
The organisation of software is important to support understanding, and change. Poorly organised, incohesive software is difficult to find, follow, and is challenged by Change Friction. This leads to maintainability challenges.
Organizing microservices by domain (behavior is built around a unit) lowers the representational gap (see LRG principle) between business and technology stakeholders. Common sense prevails. A customer function lives in the customer service, a catalogue action resides in the catalogue service.
Note — Analogy
Dorphor was vizier to the King Mazzan the Great. One morning he was summoned to court by Mazzan. “Find me my favorite gold trinket Dorphor,” commanded Mazzan of him, “and make haste. I want to wear it on my journey.” The King was about to begin a tour of his kingdom and wished to wear his favorite jewel.
Dorphor runs off to the grand treasure house with great alacrity. Mazzan is not renowned for his patience. When Dorphor reaches the treasure house he sees three doors ahead of them, indicating the Gold, Silver, and Bronze treasure rooms. Dorphor, naturally, makes for the Gold Room, turns the handle and enters. A blaze of gold light greets him.
Dorphor immediately begins the task of searching for the gold trinket. He spends an age searching through all the room’s treasures, but fails to find it. “Blast!” He scratches his head and leaves the room perturbed, about to return to Mazzan empty-handed and with whispered apologies, when he decides, on the off-chance, to search the other rooms.
He enters the Silver Room. After a short search, he spies a gleam of gold, incongruous amongst the white of the silver. “Finally!” he shouts triumphantly. It’s the trinket! “Someone has put the trinket in the wrong room. Blast them for their negligence!”
Dorphor returns with great haste to Mazzan carrying the trinket. When he gets outside, he realizes the sun has set, and the King has left the palace on his tour. Worse, to punish him for his tardiness, he has offered Dorphor’s job to another. Dorphor returns home dejected, to tell his family the bad news.
Poor Dorphor lost his job for two reasons:
1. The person who stored the trinket in the first place hadn’t used a common-sense approach to filing it.
2. Dorphor had assumed (as most would) that the gold trinket would be correctly filed, amongst the other gold items, and that the person storing it used the same protocol.
The treasure in our context is the thing of value; a feature for instance. Dorphor is the developer, trying to find the value (let’s say to improve it). Mazzan, is the disgruntled customer, demanding immediate value (i.e. TTM). Each treasure room represents a software unit (aka a microservice) that should have been organised by domain.
Complete Team Ownership
The silo’ing of people is a common cause of complaint and concern within many established organisations. In this model, a person is grouped according to their skills and knowledge, and placed in a team of similar ilk; e.g. the “Development” team, the “Operations” team. See Figure 3.
In this type of team organisation, we have pools of staff; e.g. developers (the DEV team), operations (the OPS team) etc. Completed work items sit in a kind of purgatory, whilst they await another team to pick it up; sometimes it’s immediately actioned, but often, that next team is constrained, and must complete their existing work-items before pulling in more tasks.
I view these silo’d teams as workstations in Goldratt’s The Goal (I highly recommend it if you’ve never read it). Work items build up at whichever silo’s are “constrained”; the most constrained workstation slows the entire production line down to its own capacity.
Note — Example of Silo’ing
How often have you seen a DEV team throw something at the TEST team at the last minute, only for the TEST team to tell them they’ve no capacity to work on it?
So, we must either wait (affecting TTM, and potentially customer expectations), or we expedite it, affecting productivity and exacerbating future work.
I saw a good example of this siloing effect the other day. Three developers had spent two days (the equivalent of six man-days of effort) trying to understand why they couldn’t talk to an external provider. There was no operations support available to them, so they were figuring things out themselves. The team really struggled, however once the operations staff worked on it (they had been busy on other silo’d tasks), they resolved it in thirty minutes (it turned out the port hadn’t been opened to send out traffic).
Success story? Hardly. We lost six days (talk about poor ROI), simply because we didn’t have the appropriate resource on-hand. Imagine how efficient we could be by having the right skill-set on-hand, whenever needed?
Silo’ing can cause the following issues:
- A lack of transparency, collaboration, ownership, and accountability. Typically, this approach pushes specific activities onto specific skills-based teams (e.g. Ops does all monitoring; Developers are blind to it, and can offer no support/improvements).
- Quality is not introduced early enough. Building quality software is a highly challenging task due to all of the different considerations and viewpoints. By neglecting to involve all stakeholders early enough, you run the risk of delivering a substandard solution (or worse, one with zero value). How many times have you seen a Developer’s ear bent by Security, Test, or Ops practitioners because they weren’t involved in the decision-making, and an important aspect has been neglected?
- Rework is a common practice. See my earlier point on quality.
- Inefficiencies. If rework is required, then the individual responsible for it may be busy on another task (technology staff don’t tend to spend much time twiddling their thumbs), or may have forgotten all about it, and must re-acquaint themselves with it.
- It doesn’t work, provide value, or deliver what’s been asked for. Remember the Agile practice of involving stakeholders early and often? It’s mainly to ensure the right thing is delivered, or to pivot if not. This is harder to achieve in silo’s.
Recent years have witnessed a strong backlash against the organisation of staff around silos (mainly stemming — in the software industry — from the Agile and DevOps movements). DevOps, for instance, is about cultural change; achieved, in part, through practices such as cross-functional teams. By sharing information (at the appropriate time) with a more diverse audience (e.g. Customers/Product/Operations/Developers/Architects/Testers), we empower the team to own the entire quality life-cycle (from business requirements, to implementation, to operational use) of a solution, with early and regular group input.
And more diverse teams promote:
- Higher autonomy — less need to obtain feedback from other silos/individuals, where wait time is a factor.
- Greater team cohesion — they’re all striving for the same target and are similarly aligned to common goals/strategy.
- Quicker feedback cycles and decision making — the team has sufficient understanding, and diverse skills and knowledge, to make better judgements, sooner.
How does this relate to microservices I hear you ask? Well… two points:
- Microservices better support a more diverse team structure, and…
- I firmly believe that a supporting technology architecture can better promote a cultural change.
Note — Technical Architecture can support Cultural Change
Part of of the problem of silo’d teams is the cultural aspect (e.g. “I don’t know what you’re doing, and you don’t know what I’m doing, and by the time I realize what you’re doing, it’s too late to resolve”). However, I also believe that some of these cultural issues have stemmed from the technical architecture being supported. Let me elaborate.
Changes to a monolith tend to be slow, often lack automation, and suffer from lengthy release cycles. These failings deter practices such as Agile, DevOps, and other “continuous” approaches. In this model, value-adding activities (such as metrics and monitoring) belong in the Operations team, and are rarely part of the development team. This is, in part, due to the difficulty in isolating parts of the system, to (for instance) provide individuals with appropriate credentials to protect the system, yet still promote ownership (typical silo’d Ops teams tend to be very wary of offering DEVs too much access, particularly in production).
Anyway, smaller, independent units potentially equates to greater ownership and the uptake of better/modern practices. Case-in-point; some years ago I introduced decoupled (independently deployable) services into an existing product suite where the monolith had taken center stage. For years before, we had been toying with the Agile methodology, yet failing regularly due in large part to the monolith. However, by breaking down the software into small, releasable units, we were in a significantly better position to embrace Agile, being less hampered by the monolith.
My argument is this then… if you wish to change a culture, you could do a lot worse than first aligning technology to your cultural aspirations; it’s likely easier to change technology, before attempting the cultural mindset change required of individuals.
Microservices Technical Qualities
Microservices promote the following technical qualities:
Let’s visit them.
One of microservices’ key qualities is its ability to scale independently. This approach can be highly advantageous. There are no atomic deployments (deploy everything and scale it all, regardless of needs), typical of a monolith. Scaling may be either vertical, or horizontal.
Note — Atomic Deployments
In a monolithic application, we’d drop everything into a deployment, regardless of what’s needed, wasting valuable system resources on areas of the system that don’t require them. I term this the atomic deployment strategy.
It’s an all-or-nothing deployment approach — everything is deployed (whether it’s needed or not), or nothing is deployed.
This deployment strategy is overly complex and not particularly scalable (from both a deployment and software hosting perspective), since a host of additional software components are deployed which are never used. Thus, valuable system resources (memory and CPU cycles) are wasted unnecessarily.
Consider a new startup business, called EventMix. Their main business model is the sale of Pay-Per-View (PPV) events to customers. To sell these (mainly sports) events, it needs an ecommerce platform supporting the following functions:
- A Storefront (supported by catalogues, discounts features etc).
- A Customer management solution (supported by customer capture, user management features etc).
The first few events were relatively small — EventMix were testing the water — but were extremely successful. They make sufficient profit to (successfully) bid for a significant upcoming PPV event, already receiving much popular media interest. There’s a lot of excitement in the office.
However, EventMix’s technology representatives quickly realize the PPV event will put additional, significant stress on key parts of the system (the storefront, particularly the catalogue and carts areas, will receive significant additional load). The good news is that we can scale up that area, independently (no scaling of discounts, no customer capture), and microservices provides the extra flexibility needed to do this. See Figure 4.
In this case, EventMix scales the environment from the Business-As-Usual (BAU) model (on the left), out to the Scaled model (on the right) for this event. When the event ends, to save money, they arrange for the environment to be returned back to the BAU model.
Note — Scaling the Monolith
It’s not to say that a monolithic application couldn’t handle the scale, only that it would be forced to handle it more rudimentally. It may also involve significant (and unnecessary) setup to allow it to function.
Whether you’re a developer, tester, or operations specialist, microservices promotes a greater level of stakeholder confidence, and Productivity. Whilst with the (large) monolith, developers often wade through swathes of code (unrelated to the problem at hand), before finding what they’re after, with microservices we can more efficiently identify the change area, and thus reduce Change Friction. To me, focused change is a very important part of the management of software.
Being automation friendly, engineers can build functions around each microservice that will promote uniformity (e.g. containerization), that is the same, regardless of environment. We can build a reusable blueprint that all microservices can follow.
Microservice code is less brittle than the monolith, and better facilitates automated testing, leading to more assurance that the change will function as expected; i.e. greater assurity (see testability).
A greater level of confidence also exists around deployments too. Independent deployments simplifies deployments to DEV and Test environments, reducing effort. Again, techniques such as Containerization support more automated deployments, and thus, better productivity.
Note — Containers
Containerized microservices (e.g. Docker) support a reusable runtime environment, identical across environments. This benefits all, as stakeholders have greater confidence the software will function identically on every (both development and production) environment; i.e. less of the “it works on my machine” mentality.
Note — Tools
There is a vast array of tools to facilitate microservice development.
Swagger CodeGen, and UI, are examples of powerful tools to generate code (client and server side) — including validation logic, tests, and documentation from a single specification file.
All lead to increased long-term productivity.
Microservices encapsulate implementation details from consumers, typically by only exposing behavior through a REST HTTP interface. This has several advantages.
By using HTTP integration, we also hide versioning information (e.g. “should we use Java 6, or Java 8 here?” It doesn’t matter!). Assuming the HTTP interface remains consistent to consumers, this decoupling enables us to replace implementation technologies, or versions of the same technology, with relatively little cost.
Note — RMI
RMI (Remote Method Invocation) was a popular integration protocol (mainly used with Java technologies) around a decade ago.
RMI uses a stub/skeleton (remote proxy) approach, where the caller requires a stub class from the server to interact with it. The data passed must be serialized, and that means knowing how to map bits to/from an object. In addition to the stub, the consumer also needs direct access to all classes (e.g. DTOs, exceptions) involved in the interaction. This gets quite messy from both a distribution, class-loading, and versioning perspective (it works when both sides use the same technology, and version of that technology. Not so well when they don’t).
Decentralized data stores are another important aspect of evolvability. The more coupled a set of data is, the harder it is to change, and thus evolve. If data is directly shared amongst domains (see Figure 5), evolution becomes harder.
In this case, all three domains access the same tables. There is no clear owner of the data. When that structure must change, or be replaced, we must change all three domains.
However, in the microservices world, each service owns (and manages) its own data (decentralized). All interactions with that data must originate from within the owning microservice; see Figure 6.
Controlled, decentralized data also prevents (or reduces) Domain Pollution.
Modernizing the monolith is one painful aspect I’ve directly witnessed; particularly around the incorporation of new technologies and practices, to improve productivity, security, performance etc. For instance, I worked on one monolith that suffered four failed modernization attempts, predominantly due to the scope of the project.
Mainly, the problem stemmed from Big Bang Change Friction; the friction resulting from the expectation that any change requires a Big Bang; all the code must be upgraded, or none of it. For most, this approach simply isn’t practical.
Note — The Big Bang
Most business stakeholders probably won’t appreciate its technology representatives requesting to down tools for (say) six months (with no functional improvements), just to migrate an existing product onto a modern technology stack (a hard-sell I can tell you!). What value will they see? When consumer demand is insatiable, businesses want to deliver more functionality, not less.
Microservices supports product modernization, one service at-a-time. There’s no Big Bang, required of the monolith. This is immensely powerful, and politically savvy.
Microservices also promotes test automation (including Test Driven Development (TDD) practices).
Note — Decoupling for Test Automation
Have you ever heard technologists commenting that, “it’d be easier to start again”? This is Change Friction. Often, it’s caused by a lack of stakeholder confidence, caused in turn, by a lack of unit tests (one of the monolith’s biggest challenges is how to sufficiently decouple it to support test automation).
I’ve seen this first-hand. The development staff could not retrofit unit tests, due to the sheer scale, and inherent risk, of the challenge. Of course, it could be refactored, but how would we know that it was successful (there’s no unit tests to provide you with the necessary confidence), without undertaking a significant regression test, that — to the business stakeholders — holds little value?
It’s more feasible to build a suite of unit and acceptance tests (at the API level) for microservices, prior to generating the functional code (i.e. more of a test-first approach than TDD). Assuming you follow one implementation technology, the uniformity typical of microservices (e.g. they all use a similar package structure and file naming convention), helps testability.
Note — Tool Support
We can use popular tools (e.g. Swagger Editor) to generate tests from an initial specification, prior to developing the source code for the solution.
This promotes TDD. It’s also a great way to indicate progress — we can use the tests to show how much effort is required to complete the feature.
Load testing is also simpler. Each microservice can be load-tested for its non-functional capabilities in isolation, using its REST HTTP interface (the same interface used for functional interactions). Thus, we can make judgements about a microservice prior to attempting to integrate it into the rest of the system.
Resilience & Reliability
Microservices are loosely-coupled, and thus less brittle to long-term failures than the monolith. In a monolithic system, a single failure (e.g. memory leak) could undermine the entire system (e.g. a reliance upon a single database schema springs to mind; did someone say single-point-of-failure!). However, to counter that argument, a centralized system tends to be more stable (the likelihood of a failure is less), mainly because everything it needs is co-located.
Note — More Distributed, More Problems
Failure is an inevitable outcome of (parts of) the distributed system. Whilst the monolith had to cater to hardware, OS, application failures, with a sprinkling of network; the distributed system leans heavily upon the notoriously unreliable network communication. Thus, more chance of failure.
So, if the centralized system is so stable, why move towards a distributed architecture? Well, mainly for scale, flexibility, failure recovery etc…
The modern distributed (cloud) infrastructure is a more hostile, less predictable environment than the typical on-premise, centralized model. Thus, our software must be more resilient to hostility, as more things can go wrong.
Note — Containers, Resilience, and Reliabilty
Microservices run in their own environment, and typically use technologies such as containerization (e.g. Docker), orchestration (e.g. Kubernetes), and decentralized data stores. All promote resilience and fault-tolerance, but not (necessarily) reliability (it’s all still distributed after all).
Microservices enable us to isolate key parts of a system, impeding the escape of a failure into other areas of the system; i.e. hampering chain reactions that begin small but can quickly become a crisis; i.e. a failing instance need not impact others.
Returning to the ecommerce platform (described earlier), let’s assume that a critical bug is encountered in the Customer microservice, causing all running container/instances in that area to fail. The orchestration technology (e.g. Kubernetes) attempts to resolve it, but fails. No new customers can now register for EventMix’s PPV service.
This is sobering, but not necessarily disastrous news. Even whilst the customer registration is failing, other parts (e.g. the Storefront) can still function; for instance, the Storefront could still present customers with a catalogue, or cart management support. See Figure 7.
The distributed model can provide a business with more flexibility, and make failures less encompassing. If EventHub have built the solution well, the system is autonomic (heals itself), and the customer microservice remedies itself, limiting customer impact. If not, then at least we can pinpoint the issue (the customer service), and can focus all attention on solving it, then re-release only that.
Some aspects of Reliability are noticeably tougher (Manageability for instance). Misconfiguration (or missed configuration) is more likely; there’s a lot more to configure. Additionally, there’s more dependence upon the network (interactions are over different, distributed processes), which can be notoriously unreliable.
I’ve also seen a lack of confidence around dependencies in the monolith. This leads to an overcompensation elsewhere. We regression test everything, costing additional time and money, and moving important resources away from (potentially) key strategic work. It’s generally easier to identify one microservice’s dependents, enabling focused regression testing.
Note — Decoupling & Productivity
By decoupling dependencies, we can simplify releases. This promotes Productivity, enabling us to focus a release on the areas of impact.
I can’t mention Releasability without also discussing the “Continuous” practices that are an expectation of many modern software businesses; microservices helps to facilitate this. Deployment Pipelines are built around each microservice, promoting automation, and thus, our ability to release software.
Docker and Kubernetes are good examples of this. They provide a infrastructural blueprint (cookie-cutter) that can be reused across different microservices, and manage our software for us. Uniformity is the key factor here in fast delivery; we use the same/similar technologies and techniques for each microservice, facilitating fast change and a strong competence. All this leads to faster, more robust releases. Another powerful feature of these tools are their support for concepts like Blue/Green Releases — a technique that removes the need for system downtime during an upgrade and a strong rollback mechanism (this is harder with the monolith).
Security threats can come in several forms, including:
- Depending upon flawed libraries; i.e. unpatched software.
- Exposing too wide a privilege to untrusted parties.
Depending upon your outlook, software patching in microservices can be attractive. Containerisation enables us to patch software in a piecemeal fashion, enabling only sections of a system to be patched, released, and regression tested one at a time. There’s the flexibility to patch a single microservice, or all of them.
Warning — Caveat
Although this piecemeal-patching approach may not endear you to your operations staff, it can be an attractive business strategy. For instance, the business may be preparing for a major release soon, and want to limit (unnecessary) external influences on it that would place additional risk on its delivery. If the security concern is deemed to be low impact, then in this case, we need only regression test the areas of change, and then undertake a migration path to secure all other microservices post-release.
I’m not suggesting you should do this, only that it extends your options.
Of course, piecemeal-patching may also be a hindrance; if the security vulnerability is critical to all system areas, then every microservice needs re-released. Also, the downside to Technology Choice per Microservice is that you must analyse a greater toolset for vulnerabilities.
All-party (Privileged) Access
By decentralising data, and promoting Technology Choice per Microservice (e.g. the customers microservice uses MySQL, whilst catalogues uses DynamoDB NoSQL), with independent access privileges, we harden software to some common attack vectors (e.g. cross-domain injection attacks). See Figure 8.
In Scenario A, a hacker who obtains access to the monolith (system and database) gets access to all data within it. This typically occurs when there’s a monolithic database that uses one set of credentials to access all data. In Scenario B, we have decentralized the data, per service. Any referential integrity is inferred only, and is linked through service (rather than data) orchestration. The two databases are entirely decoupled (including technologically); one cannot be accessed from another. Gaining access to Service A and its data does not give you access to Service B’s data.
Flexibility is about supporting diversity and change. We can do this by removing unnecessary assumptions from the solution and promoting decoupling and cohesion.
Breaking something down into small, cohesive units infers that each unit makes fewer assumptions about its environment and use, enabling us to combine, unpackage, and then repackage these units with more ease. This can be particularly useful for supporting bespoke solutions; e.g. see Figure 9.
Note that I’ve intentionally left the implementation technology for orchestration vague. Some people might choose an ESB for instance (whilst I’d argue that misses the point of Smart Endpoints), whilst I’d possibly look to a layered microservice architecture that keeps orchestration at a higher level than the functional unit and allows it to change outwith of any functional change.
We combine independent services to form part of some other solution, without necessarily changing those services (Open/Closed at the microservice level). Microservices makes fewer assumptions (than, say, a monolith), and can be easier combined into something else.
Note — Flexibility & Evolvability
Flexibility and Evolvability are linked. For instance, microservices provides us the flexibility to replace one implementation with another (e.g. Java to node.js).
Microservices might currently be the industry darling, and the chosen architecture of many, but it’s not all plain sailing. They have several challenges:
- Manageability and reliability — you’re not releasing one large unit, but many little versioned units to combine into a whole. Additionally, distributed units cause additional reliability concerns.
- Uniformity — how do we ensure that all services get the same support and monitoring? How do we balance our ability to evolve against productivity?
- Performance — mainly around latency for large workflows.
Note — The Cost of a Decision
There’s always a cost associated with the technical decisions you make (you could view this as a form of technical debt accrual). Technical qualities are so closely related that it’s impossible not to demote some to promote others. One of an architect’s toughest jobs is to understand these relations, and how best to balance them.
Manageability & Reliability
I covered Releasability earlier, so I’ll focus here on Managability.
Unpackaging an (monolithic) application into small, versionable units, opens up potential software management problems. Theoretically, you can select any microservice and combine it with others, but you must still know (a) that all of them will function correctly together, (b) which version of each to use, and (c) where to find the executable.
Furthermore, by decentralizing data and supporting Technology Choice per Microservice, we may also exacerbate software upgrades; e.g. a data store upgrade per service, but also tailored for the underlying database technology chosen.
Configuration management is also more involved. Due to its distributed nature, a Microservices architecture requires significantly more configuration than a centralized one; e.g. we find first discover our intended interactions, before communicating with them.
However, in some ways (mainly due to modern technologies), manageability is less of an issue.
Note — Support for Manageability
Many modern microservice-friendly technologies (e.g. Kubernetes) support continuous practices such as Blue/Green deployments, and Canary Releases (supporting multiple paths a user can take and measuring their success). Not only do these practices support pressing business needs, they also promote manageability through a declarative, autonomic model.
Transaction Management & Scope
Other challenges lie in transaction management. Centralised (monolithic) applications, with more localised interactions, can better leverage database transaction scope; i.e. one transaction manages a series of database interactions, and still support relatively simple rollback and commit facilities.
Microservices, are more isolated, and often use different database technologies. Thus, transaction scope is isolated to the microservice-level; transactions are not shared. This leads to data consistency, and rollback challenges. Another practice must be used to orchestrate transactions. See Figure 10.
In the first scenario, typical of a monolith, one transaction (Tx A) manages all five database interactions, often into the same (monolithic) database schema. The second case, used in microservices, is quite different. In this case, a transaction is managed per action (assuming each database interaction is encapsulated by a single microservice interaction). This is fine if all transactions succeed, but challenging when part of the flow fails and remedial action is required.
If not carefully managed, microservices’ evolutionary benefit can also become a hindrance.
Being ultimately flexible in technological choice (Technology Choice per Microservice), runs the risk of such diversity that it may hamper change. For instance, if the implementation (and database) technology may be anything, there is a risk that the overall solution is so technologically diverse (i.e. a complex ecosystem) that (a) comprehension can be hard, (b) security concerns are spread over a wider range of technologies, and (c) moving technical staff across domains is difficult (e.g. Simon may be an extremely competent Java developer, but he has no skills in node.js).
This uniformity is also useful for the non-functional aspects used for logging, alerting, monitoring, or any other metric-gathering tools. We don’t (particularly) want multiple ways of processing these actions (regardless of implementation technology).
There’s also something to be said from a container security perspective. By limiting the number of technologies, we should be able to more quickly patch a container, and then re-release the microservice on top of it. Patching multiple divergent technology stacks can be tougher, and suggests a higher likelihood that we must wait upon the vendor to release a patch.
Promoting a level of uniformity is therefore sensible. Better to select a limited technology set for most cases, than an unmanageable technology sprawl.
Because each microservice interaction is independent (including their transactions), any significant collaboration (i.e. a workflow involving many parties), can create performance challenges. Specifically, this relates to latency (the time it takes from the initiation of an action, to receiving a response). See Figure 11.
No science was harmed in the making of this diagram! It’s merely meant to demonstrate the difference challenges for the architectural styles.
The scenario represents a distributed (e.g. microservices) system. The workflow interacts with four different domains (1, 2, 3, and 4) to complete a job. The useful functional value (white, numbered boxes) may be of a relatively short duration, whilst the red bar represents the varying latency costs of network negotiation/transfer/marshalling to talk with the next microservice. The orange bar represents the overall time cost so far. There’s quite a bit of red involved in these distributed interactions.
In Figure 12 we have a centralised representation.
In this case, the workflow must interact with the same four services/domains, but the cost to communicate with each component is much less (i.e. the short red bars).
Note — Tactics to Reduce Latency Woes
There’s a few tactics that can mitigate these latency issues, but no real definitive solution. You can:
1. Attempt to bring dependents closer together in the network, thus reducing latency.
2. Use an orchestration mechanism that sends messages to each, and compiles a response as they become available (assuming you can do this).
3. Go entirely asynchronous.
4. If visual representation, provide data in stages, using technologies like Ajax.
In the end, it depends upon the system. Most technologists I know would favor scalability over performance; i.e. ensure the system can scale to meet greater demands, at the (willing) cost of slightly reduced performance.
Business & Technical Qualities
Microservices can (under the right conditions) promote the following qualities.
Some of my qualifications may not be obvious at the moment (e.g. how can scalability support TTM?); however, this will make more sense in future publications.