The Purpose of (Software) Architecture
0. Preface
Recently I have been regularly involved in discussions around “the purpose” of architecture. I also repeatedly launched surveys on architecture among engineers and managers as part of our continuous improvement process. Working through it, I realized there is no common understanding of “why?”— some people wanted to have more rigorous architecture work, and some people wanted to avoid producing architecture artifacts at all but focus on well-written code in a fully agile way.
We have many good books and articles explaining the value of appropriate architecture. But I don’t think it is the same, so I am here in an attempt to cover the gap with my personal thoughts.
But before I start, I put here some pieces of advice. They are pretty obvious and straightforward, but I want you to find the same advice repeated at the end of this article to see them from different perspective. Here we go:
0 If you are building a low-competition internal tooling, it might be very efficient to go for 3rd-party SaaS like MS Office 365 or SalesForce Cloud (Atlassian stack is also here). All other cases will benefit from using cloud PaaS like AWS, MS Azure, or GCP.
1 To research a new software product hypothesis, it is beneficial to go for monolith until you get to product-market fit.
2 When a product-market fit is confirmed, it is worth investing in infra- and microservices tech but still keep core functions in the monolith.
3 There might be a moment when you start investing in tech debt. This investment will look like acceptance of some over-engineering.
4 And finally, you will likely somehow reach a pinnacle of architecture practices combination: strategic domain-driven design, modular (micro)services architecture, multi-layer platforming, distributed collaborative architecture decision making, etc.
Back to Theory: The Architecture Mission
We need architecture to help organizations to survive (or to prosper) by giving control over three cost factors:
- Cost of Building the system (product, service)
- Cost of Owning the system
- Cost of Changing the system
You may ask — what about the “unblocking customer value” mission? It is hidden inside the control over the “cost of change.” My point is that it's possible to “unblock customer value” even without doing architecture work, but you can likely hit your head on the wall of “extremely high cost of change.”
From this perspective, it is easy to explain such a broad spectrum of opinions about the necessity of architectural work.
In brief, each cost factor can be explained in the following way:
Cost of Building the system
is the most visible one, directly related to CapEx. Usually consists of cost of implementation (team cost * duration) and cost of acquisition (buy vs build)
Cost of Owning the system
is usually treated as part of the “cost of Changing” and OpEx, but this is not always the case. E.g., we can have a system with a low cost of change but a high cost of ownership; and a system with a high cost of change but a low cost of ownership — see details below. Consists of changes that are not related to the system (business) functions. “Ownership” cost may consists of activites like: security patches, infrastructure upgrade, 3rd-party components upgrade, incident response, resiliency (HA/DR), monitoring/alerting, L2/L3 support.
Cost of Changing the system
is key characteristic of most software products. It includes both time & cost aspects: time to market (TTM) aka “lead time,” and efforts (team cost) per feature. In brief, it is average “cost of Building” per feature.
What’s aside?
- Optimization as an Architecture Constraint.
There are extreme cases when we need to push a system to its limits of throughput or performance — it is the so-called “highload” architecture. There we have the same drivers (cost of build, change, and own) but with technological constraints dictated by the available tech ecosystem and stack. Such systems must be designed based on known tech constraints so our “cost aspects” play a less critical role in the architecture decision-making process. Anyway, the golden rule is that any “highload” system is just a subsystem in a real product/service, so our statement about the cost drivers is still valid.
2. Cloud-nativeness & PaaS.
AWS, Azure, GCP, and smaller PaaS providers made significant progress in squeezing software TCO (Total Cost of Ownership). Engineers can start building software without infrastructure provisioning delays (cost of build); patching and maintenance operations do not interfere with engineering (cost of owning); deploying a new node or allocating a new resource is a low-cost task (cost of changing). But unfortunately, no cloud platform can handle domain complexity, and after the initial linear “TCO paradise,” we quickly get into the “coupled (distributed) monolith” slope. After some point, technological complexity becomes significantly less important than domain complexity, so PaaS added value is no longer a critical factor preventing migration to private data centers.
1. No Architecture, no Problems.
Assume you prefer to run Agile like the guys from my survey who opted out of doing architecture work beyond crafting a well-written code. This way, you minimize the cost of building the system — with the vital assumption that your organization has very talented engineers like ours.
So what about other cost factors in this scenario? It depends on the architectural style you have.
Monolith Story
Monolithic application provokes your engineers to put shortcuts in cross-domain data/model access with no API/contract updates. This gives you a nonlinearly growing cost of change. You may not notice a difference immediately, but your DORA metrics will continuously degrade. Soon, your managers may start blaming teams for decreased velocity and increased estimates (“You said you need 2 weeks for this feature? I remember 3 months ago, you were able to implement it in one week! Something broken here, we need to invest in engineering culture!”).
Even worse, your releases can get stuck in the traffic of multiple teams, increasing the risk of complex roll-back scenarios. Unfortunately, even if you start building internal APIs and put part of your capacity on refactoring for isolation (“pay back a tech debt” :) ), a monolith cannot be healed completely. Of course, you can try to transition from monolith to monorepository by introducing independent provisioning for its internal subsystems, but this approach takes significant investment into the Internal Platform.
The cost of ownership is also compromised by increasing MTTR, longer bugfix, complex roll-out process, and unclear accountability for maintenance tasks (unpleasant workarounds like “maintenance duty” or “maintenance shift” can appear).
Distributes System & Microservices Story
The cost of change will be affected by a high likelihood of boundary leakage. Maintaining conceptual integrity includes thorough work on isolating highly-cohesive areas of the system to avoid unnecessary coupling. The only reliable way to work with functional coupling is Domain Driven Design, and code-level refactoring without tracing back to domain boundaries cannot prevent you from the leak of abstractions unless your team employs freaking geniuses who can do domain modeling in their heads on the fly (that’s how programmers used to work 50 years ago :) ). In brief, your cost of ownership will increase.
The cost of change is also affected similarly by unnecessary subsystems coupling: each such leakage will require efforts to cover it with regressions, backward compatibility, monitoring/alerting, circuit breakers, etc.
To sum it up:
- when minimized Cost of Build
- you get the Cost of Ownership maximized
- and the Cost of Change is also high
But what if we want to have absolutely minimal cost of change?
2. Over the Engineering
It is possible to enable the change in any system using two architectural practices:
- Modularization and isolation inside the system: the trick here is to have a “loose coupling high cohesion.”
- Identifying potential “extensibility points” and adding additional layers of abstraction to localize the effect of a possible change even more.
For example, let's see how we can build a validation logic for input requests as part of the event processing pipeline. Straightforward implementation is to hardcode a validation logic into the pipeline. But to increase the system extensibility (in Architect’s lingua, it is the same as “decrease the cost of change in this part of the system”), we can introduce an abstraction like RequestValidator Interface, specify its contract with the pipeline, code a unified protocol for flow control, and decouple validation logic from the pipeline implementation (modularization!).
After that, we can easily add more validators if we need. But what if we don’t? A single hardcoded validator from the initial spec could be implemented ~2x faster (say “~2x cheaper”). We all know this scenario as an example of the infamous “over-engineering” evil.
The probabilistic nature of our change drivers makes such “over-engineering” useful as a risk mitigation investment, but I wish you luck explaining this to Product Managers. :)
The ownership cost is also affected: tracing and debugging can become challenging in a decoupled system (both monolithic and distributed) that follows inversion of control, dependency injection, and late binding principles. However, this situation is still better than owning a monolith due to better MTTR, failure localization, and parallel releases.
To sum it up:
- when minimized Cost of Change
- you get the Cost of Build maximized
- and the Cost of Ownership is high enough
And finally, let’s try to focus on the Cost of Ownership optimization.
3. Build a Cutting-edge Tech or Don’t Build at All
There are two ways to build a system with minimal ownership cost:
Buy vs Build Story
The traditional way of optimizing the Cost of Ownership relies heavily on 3rd-party services and products. Ultimate optimization can be achieved by adopting PaaS like Salesforce or Office365. With a solid skillset available, you can get the system up and running very quickly (low cost of build), but the cost of change will be huge if you decide to have something outside of the platform's native capabilities.
Modularization for a sake of Modularization
“We built 120 microservices, had a short break, and then started planning our 2nd sprint”
An alternative way is to focus on a highly-modular structure by applying microservices or even (nanoservices) serverless functions wherever possible. With containers, Kubernetes, DBMSaaS, schema registry, and service mesh, you can spawn a load-tolerant highly-available auto-healed distributed application. Sounds good, but unfortunately, it is too easy to screw up by starting to change the application logic. A change of average scope can span across multiple microservices and affect their data & API schema. You will need to have advanced monitoring, deployment, data consistency validation, and testing tools. You will need to observe for performance baseline, apply circuit breakers everywhere, and track chains of failures. Still doable; lots of companies invest in this direction. The cost of building will also include significant investments into cutting-edge tooling adoption and infra readiness.
All in all:
- when minimized Cost of Ownership
- you get maximized/affected the Cost of Change
- and affected the Cost of Build
In conclusion, I can utilize the mechanics I have written about in straightforward bits of advice. They are pretty common, but I want you to read them through how we will control production cost factors.
0 If you are building a low-competition internal tooling where you can control change drivers, it might be very efficient to go for 3rd-party SaaS like MS Office 365 or SalesForce Cloud to have a low cost of build and low cost of ownership. All other cases will benefit from using cloud PaaS like AWS, MS Azure, or GCP because of a good set of tools to keep TCO (combined build, change, and ownership cost factors) very competitive for a technical baseline of the software you are building.
1 To research a new software product hypothesis, it is beneficial to go for monolith until you get to product-market fit due to the lowest cost of build and lowest at the beginning (but growing) cost of change.
2 When a product-market fit is confirmed, it is worth investing in infra- and microservices tech to build peripheral product functions in a controlled environment (lower cost of ownership) but still have core functions in monolith (risky but acceptable cost of change).
3 There might be a moment when you start investing in tech debt to slow down the cost of change degradation. This investment will look like acceptance of some over-engineering (adding abstractions to resolve the modifiability of the software). At this stage, we can also see that Domain complexity becomes more impactful to overall system complexity than a technical baseline, meaning that cloud PaaS investment is less valuable than before.
4 And finally, to unblock a hypergrowth and horizontal org scalability with stream-aligned teams, you will need to reach a pinnacle of architecture practices combination: strategic domain-driven design, modular (micro)services architecture, multi-layer platforming, distributed collaborative architecture decision making, etc. — To balance all cost factors of the software production: build, own, change.