Evolutionary Decoupled (loosely-coupled) Services Architecture

Published in

CronOps

12 min readOct 17, 2020

📖 Overview

After reading ‘AN ARCHITECTURE THAT ENABLES PRODUCTIVITY, TESTABILITY, AND SAFETY’ section from The DevOps Handbook I’ve couldn’t agree more with it’s 1st lines. This text actually brought to me lot of light regarding Software Architecture considerations, which I cite below to put the reader in context:

In contrast to a tightly-coupled architecture that can impede everyone’s productivity and ability to safely make changes, a loosely-coupled architecture with well-defined interfaces that enforce how modules connect with each other promotes productivity and safety. It enables small, productive, two-pizza teams that are able to make small changes that can be safely and independently deployed. And because each service also has a well-defined API, it enables easier testing of services and the creation of contracts and SLAs between teams.

So I’ve realized that there are some widely adopted and tested modern architecture design patterns and practices that could help most projects which are going through this architectural modernization rewrite or replacing legacy applications process, even applicable in many ways to your cloud migration journey.

Moreover, a studies from the DevOps Research and Assessment (DORA) team shows that architecture is an important predictor for achieving continuous delivery (CD). Whether you’re using Kubernetes, VMs, Serverless or even mainframes, stating that your architecture enables teams to adopt practices that foster higher levels of software delivery performance.

Through our own experience, we could say that Infrastructure modernization, leveraged by Cloud Providers like AWS, offers remarkable benefits including (in compass with the AWS Well Architected Framework pillars):

✅ reduced operating costs (cost optimization),
✅ scalable performance (performance efficiency) and,
✅ greater up time (improved reliability & high-availability),
✅ security (by design + shift left),
✅ operational excellence (an evolutionary architecture workflow that boost maintainability & support),

by using technologies such as cloud managed services which include serverless, container engines like managed Kubernetes, relational and non-relational DBs, message queues, among many others. However, migrating legacy systems to the cloud is not that simple and straight forward. Which raises the following question…

❓How can we move faster while still dealing with legacy systems?

A modern Evolutionary Decoupled ( loosely-coupled) Architecture could supports small incremental changes with sufficient feedback loops to evaluate its effectiveness and efficiency. There are a number of approaches available when dealing with your legacy architecture and moving towards an evolutionary architecture which we’ll try to summarize below 👇

⚠️ CONSIDERATION | Please this post is to be seen as a high level opinionated start point, it’s not intended to be taken as a deep dive technical reference documentation for re-architecting your legacy platform 🙏 Aimed to encourage and motivate you in this migration journey though.

☑️ Event-Driven Architecture

We’ll review and describe ‘How can event-driven architecture help?’ based on a great Indu Alagarsamy blog post.

📖 Definition by AWS

An event-driven architecture uses events to trigger and communicate between decoupled services and is common in modern applications built with microservices. An event is a change in state, or an update, like an item being placed in a shopping cart on an e-commerce website. Events can either carry the state (the item purchased, its price, and a delivery address) or events can be identifiers (a notification that an order was shipped).
Event-driven architectures have three key components: event producers, event routers, and event consumers. A producer publishes an event to the router, which filters and pushes the events to consumers. Producer services and consumer services are decoupled, which allows them to be scaled, updated, and deployed independently.

(source: AWS, “What is an Event-Driven Architecture? — How it works: example architecture”)

🔷 Stage 1 | Add Visibility to Your Monolith

Event-driven architecture is a software design pattern that uses messaging techniques in conjunction with messages queues to convey events to other services (e.g. APIs). It promotes an asynchronous workflow. Events are nothing but simple data transfer objects (DTO) to convey some significant logic that happened within your business system.
What characterizes this kind of systems is an asynchronous working mode. The strength of this model is that the publisher (or sender) of the event just publishes the event to a target message queue (like AWS SQS or RabbitMQ) without depending on a synchronous request-response action. It has no knowledge of what the subscriber intends to do with the sent event. Event-driven architecture leads and allows a loosely-coupled service architecture system.
A way to start using this pattern implies to 1st analyze your monolith code to find the lines that complete certain scoped (important) actions. Some times as a prof of concept (POC) starting by the ones that would allow you the easier decoupling (e.g. simpler DB datasets to migrate) could be a good idea. After your analysis and the necessary code and services have been developed and deployed you could now start publishing those actions as events.
Repeat this process until you’ve captured all the important or necessary actions as events your monolith is performing. Adding the code to publish an event inside your monolith is less intrusive than attempting to write a brand new feature inside it. And doing this is simpler than having yo add new features for this purpose, especially if you’re using an available library like Python Celery or Apache Airflow which could easily allow to extend your current ones (of course this will depend on your language + framework of choice and preference).

🔷 Stage 2 | Add Your New Feature on the Outside.

In case your planning to add new functionality features to your monolith system you must consider to create your new features as a brand new service (API/s). For instance as a new service that subscribes to the event published by the legacy monolith. When this service receives the event, take the relevant action that needs to occur. This service could also end up publishing an event for processing other downstream business processes taking advantage of this new loosely-coupled event-drive design pattern you’ve adopted.

☑️ Strangler Pattern

Definition and Overview

From The DevOps Handbook ‘USE THE STRANGLER APPLICATION PATTERN TO SAFELY EVOLVE OUR ENTERPRISE ARCHITECTURE’ section:

The term strangler application was coined by Martin Fowler in 2004 after he was inspired by seeing massive strangler vines during a trip to Australia, writing, “They seed in the upper branches of a fig tree and gradually work their way down the tree until they root in the soil. Over many years they grow into fantastic and beautiful shapes, meanwhile strangling and killing the tree that was their host.”
If we have determined that our current architecture is too tightly-coupled, we can start safely decoupling parts of the functionality from our existing architecture. By doing this, we enable teams supporting the decoupled functionality to independently develop, test, and deploy their code into production with autonomy and safety, and reduce architectural entropy.
As described earlier, the strangler application pattern involves placing existing functionality behind an API, where it remains unchanged, and implementing new functionality using our desired architecture, making calls to the old system when necessary. When we implement strangler applications, we seek to access all services through versioned APIs, also called versioned services or immutable services.
Versioned APIs enable us to modify the service without impacting the callers, which allows the system to be more loosely-coupled — if we need to modify the arguments, we create a new API version and migrate teams who depend on our service to the new version. After all, we are not achieving our re-architecting goals if we allow our new strangler application to get tightly-coupled into other services (e.g., connecting directly to another service’s database).
If the services we call do not have cleanly-defined APIs, we should build them or at least hide the complexity of communicating with such systems within a client library that has a cleanly defined API.

❓How the strangler pattern provides an approach to controlled re-architecture?

🔷 Stage 1 | Identify the system boundary you want to re-architect.

As mentioned before “Some times as a prof of concept (POC) starting by the ones that would allow you the easier decoupling (e.g. simpler DB dataset to migrate) could be a good idea”
So starting by a component that you could safely and easily decouple will increase the positive outcomes (or at least the easiest you could find).

🔷 Stage 2 | Create a facade.

The facade ensures clients of the system are unaffected by the re-architecture activity and should therefore match an existing system API.
⚠️ If you don’t have an API, you must put the existing functionality behind an API and avoid making further changes to it.
The API for the new service should be carefully designed to ensure backward compatibility with the legacy API. If you want to make breaking changes to the API for the new service, that should be performed once re-architecture has been completed.
Consider the diagram immediately below as an explanatory example

(source: Ryan Means: “Serverless Strangler Pattern on AWS”)

🔷 Stage 3 | Re-architect.

Implement your new service, wiring it in to the strangler facade.
Ideally dark launch the new service: tee traffic between the new and legacy implementations (so both will be active but with a transparent behavior for the end user).
✔ ️Return responses from the legacy implementation.
✔ Capture, or discard, responses from the new implementation. This enables you to monitor and test the behavior of the new implementation to ensure it is fit for purpose before your switch-over and go live.
Continuous delivery and if possible deployment is also a good option. Continuously deploying partial implementations of the new service, while development is still in progress, to gain confidence, test and mitigate associated risk without compromising your current system behavior.

🔷 Stage 4 | Cut-over.

Once you are confident that the new implementation meets your needs, move forward with the production traffic cut-over (go live switch-over) so requests and responses are served from the new service (API) implementation.
Keep your legacy system (API) implementation to allow for easily rollback if needed.

(soruce: AWS “Module Four — Deploy Microservices | Architecture Overview”)

🔷 Stage 5 | Post clean up (the mess) tasks.

Once the new system is established, battle-tested and production ready, you should archive, or if you’re fully migrated and confident, even delete the legacy environment and code that has been fully replaced.

🔷 Stage 6 | Migrate client users to the new API and remove the strangler facade.

Depending on how much your migration and re-platform (refactor) implied, this rewrite would likely require some breaking changes or new behaviors for your users, and should be managed accordingly via API versioning if possible.
To manage the transition, if you have a large number of clients, expose the
new API while maintaining the strangler facade to allow you to.
✔ Monitor usage of the strangler facade and proactively engage with users to help them migrate and validate any necessary associated tasks.
✔ Once the strangler facade is no longer being used, it can be safely removed.

Concerns and Important Considerations

📒 You can’t (shouldn’t) stop everything, put down tools, environments, teams, and re-write everything from scratch. What’s more, if you tried you would likely be caught out by the Second System Effect: a re-write shall solve the issues of the past but actually results in increased complexity and over-runs. 📖 “We can solve the issues of the past and re-write the system that took us 6+ years, in 12 months”.
The actual reality?️
❗After 18 months down the line you are still building on the original system; the new system supports only ~20% of the expected functionality (legacy system functions + new features); technical debt has started growing in the push to deliver; your beautiful brand new greenfield architecture has turned into a big ball of mud, and what’s more your disappointed and angry customers have left for the sake of their business continuity.
📒 During the application of the strangler pattern and the re-architecture process some trade-offs may be required, like prioritizing the re-write and migration before new features development. This should be most probably accepted since the strangler pattern is a means to an end. It will take you from point A to B in a controlled and planned manner. Removal of the strangler facade, followed by further evolution should be expected.
📒 In some circumstances the strangler pattern is overkill. When breaking up a monolith, if clear APIs are already available, a “slash and burn” technique could be simpler: duplicate the code and its environments, establish clear deployment boundaries and then remove the unused service functionality code (and environments).
📒 Where your system already has support for migration patterns such as versioned APIs and multi-version deployment flexibility, then the strangler pattern could probably not be required. Planning for continual iterative redesign and evolutionary architecture is always a better overall approach. Moreover, after your migration to a loosely-couple architecture you should (ideally must) follow this evolutionary architecture workflow to avoid falling into the same trap.
📒 The problem with that shiny brand new platform you’ve got your eye on is that it will most probably become the unmanageable and clunky old dinosaur in a few years (possibly faster than you imagine with nowadays software evolution fast pace). If you chase down every new tool/technology/platform that comes out, chasing is all you’ll ever be doing. Instead a continuous iterative evolutionary architecture approach should be considered.
📒 Software systems are generally deceptively complex, and by the time you fully realize just how complicated your system is, you’re deep enough into the rewrite that you can’t turn back. That’s the sunk-cost fallacy hard at work.
And guess what, the dirty little secret about refactor estimates is that they
are often intentionally low because the people who want to rewrite the software know the business would never sign on if they knew the true costs. Even if not intentionally software development cognitive biases like
✔ The #optimistic bias,
✔ The #overconfidence bias,
✔ The #confirmation bias,
will favor a non-intentionally optimistic effort estimations.
📒 Be very careful about what you cut short and actually keep as part of a rewrite. It’s so easy to over-engineer when going into a rewrite or re-platform, and all the easier to slide into when you have a team of really smart engineers that don’t have a ton of experience.
📒 Understand the Software Development biases since we could all fall into cognitive bias’ traps. Being aware of the most commons will give you an edge to avoid them. So don’t blame the responsible for the bias, but fix the possible situations the bias is acting on. And please remember to be careful with quick decision and take time to search for more information. Moreover, whenever possible, play devil’s advocate for important decisions and actions. And finally always ask yourself what and who can influence your judgment unreasonably to prevent cognitive biases.

Conclusion

1️⃣ Large scale refactoring and paying down accrued technical debt is very challenging. History has shown that big bang re-architecture is very risky and generally liable to a crash and burn failure. So, instead an incremental approach would be always preferred. The strangler pattern provides one such tool and technique to help with managing the complexity of refactoring and re-architecture.

2️⃣ By making smaller changes, with appropriate feedback loops you can make a fundamental technology shift while maintaining your up-time (availability) and avoiding compromising the project time-line, roadmap and its milestones. When you build in evolutionary change in your architecture, changes will become cheaper and easier. And the heart of doing evolutionary architecture is to make small changes, and put in feedback loops that allow everyone to learn from how the system is developing.

3️⃣ An evolutionary architecture allows experimentation and gradual replacement of existing functionality. For this purpose it should supports incremental, iterative and guided change as a first principle across multiple dimensions as well as adaptability, using proper abstractions (modularity and loosely-coupling enabling composition, extensibility and reducing tightly-couple breaking changes), database migrations (as code), automated test suites (unit, integration and e2e testing), continuous integration and delivery (CI/CD + A/B testing and Canary Releases, among others), Infrastructure as Code (IaC + GitOps if possible) and refactoring to harvest re-use as it occurs within a system (re-utilize as much as possible).

4️⃣ Before rewriting remember that monolithic architectures are not inherently bad — in fact, they are often the best choice for an organization early in a product life cycle. As cited in The DevOps HandBook, Randy Shoup observes, “There is no one perfect architecture for all products and all scales. Any architecture meets a particular set of goals or range of requirements and constraints, such as time to market, ease of developing functionality, scaling, etc. The functionality of any product or service will almost certainly evolve over time — it should not be surprising that our architectural needs will change as well. What works at scale 1x rarely works at scale 10x or 100x.” We summarize this idea through the Monolithic vs Microservice architecture table presented below