The Illusion of Runtime Flexibility
Systems that can be easily changed are so attractive to everyone. Developers try their best to keep the code clean so that they can make changes fast. Ops engineers try their best to simplify the deployment process so that changes can be easily deployed. However, after the development process has been improved for decades, the business is still not satisfied with the speed of changes. Therefore, many people look for runtime flexibility that allows the software to be changed without making code changes. However, they don’t realise that this flexibility is built on a huge cost that could be invested in other opportunities to help survive in the future.
Changeability and Runtime Flexibility
There are two ways to change the behaviours of the software:
- by changing the code
- by changing the live data (without code changes)
Changeability refers to how easy software can be changed by making code changes. It is normally measured by the lead time.
Runtime flexibility refers to the ability to change software behaviours without making code changes.
The Illusion of Runtime Flexibility
People normally feel that making code changes is slow and expensive. So it is so attractive to have some flexibilities in the software, which allow changing the behaviour by changing live data or configurations. One example is the access management service.
The example: Access Management Service (AMS)
The design
A generic AMS normally is designed with the following entities:
- Operations: defines the actions such as
account:view
- Roles: a group of operations such as
admin
- Resources: defines the data to access such as the account information
- Users: defines who want to access the data
An AMS normally support changing the relationship between roles and operations on the fly to control what users can do dynamically. Like this:
In reality, however, it usually leads to application errors. Let’s have a look at why.
How is AMS used?
Let’s say there is a simple page that displays the information about clients’ invoices. The page consumers the following data from the AMS:
- the account id as a resource
invoice:view
as an operationbilling
as a role that can doinvoice:view
- the user that assigned to the
billing
role for the account access
Then on the page, the code checks the operation to decide whether the invoices are displayed or not, like this:
if (user.hasPermission("invoices:view", this.accountId))
<invoiceList accountId={this.accountId} />
else
<message>no permission</message>
Of course, a similar check is also on the backend API:
- httpApi:
method: GET
path: /invoices
authorizer:
name: myobIdAuthorizer
scopes: invoices:view
The code is agnostic to roles so that we can easily remove the operation invoice:view
from the billing
role to remove the access of any users assigned to this role (it can happen when we want to split a role into two, such as billing
to finance
and manager
). It is so flexible! YaY!
But, is it just this simple?
The cost to maintain
The reality is much more complex than the example above. A page normally contains many sections of information that require permission checks for many operations. Here is how a real page looks like:
Permission checks on the page:
- It requires
invoices:view
andcredit-notes:view
to display the sections of the Current Bill and Bill History. - it requires
charges:view
to display the Bill Details section. - The Pay Now button requires
payments:view
to check if the invoice has been paid andpayments:write
to make a new payment. - The Refund button requires
credit-notes:write
to generate a new credit note andpayments:write
to make a new reversed payment.
It shows there are 6 operations required on the page. It means there are 64 possible combinations of operations on this page. It means to keep the flexibility of changing the relationship between a role and operations, we need to make sure that any combination functions well. It means we need to consider (maybe test) 64 scenarios for every change made on this page. This is extremely expensive.
Of course, we can make assumptions that the role can do payments:write
must have payments:view
to reduce some of the scenarios. But the number of scenarios is still very high. And the entire application is much more complex than this page. The more operations the application depends on, the more expensive to maintain the flexibility.
Thus, the flexibility would be broken unless we invest a significant amount of cost.
Another well-known example is feature toggles. The feature toggle gives us the flexibility to switch functionalities in running software. However, it also doubles the cost to build and test. The longer we keep the toggles alive, the higher cost it is. That’s why we want to keep them short-lived as much as possible.
So in general, runtime flexibility is very very expensive. But why people don’t feel so?
What is missing?
As I mentioned, people feel making code changes is expensive and slow. It is because it requires going through a development process that has many steps. On the other hand, runtime flexibility requires no process to make a change, so people feel it is cheap and fast. But they normally ignore the risk of making changes without tests.
The triangle of risk, cost, speed
In software development, when the size of a change is certain, the relationship of risk, cost, speed is:
It means:
- we can reduce the risk by increasing the cost. eg, we have to test all the scenarios to keep the flexibility alive, which the cost is very high
- we can reduce the risk by slowing ourselves down. eg, we can put an approval process in place for every change, which also requires the extra cost to build.
- risk can not be reduced without increasing cost or slowing the speed.
As we see, the only way to reduce the risk is to invest a lot of costs.
Is investing runtime flexibility worth it?
The business is usually willing to invest in runtime flexibility even they know it is expensive when the speed of delivery (by code changes) is slow. They want to keep the flexibility alive so that the business can make a fast change at any time.
But what if we put the investment to improve our development process (Changeability) to reduce the lead time? What if any code change only needs 30 mins to be safely deployed into production (continuous deployment)? In this case, a fast change can be also achieved but without the cost of maintaining runtime flexibility.
It is true that investing in the development process is also very expensive, but it is a standard process that can be scaled to all the delivery teams in your organization. It has a higher return rate than context-based runtime flexibility (can not scale).
Summary
Runtime flexibility is sometimes necessary if you don’t have control of the data. But if you have the control, please put them into the code and only allow them to be changed from the code. So you can minimise the risk for each change and save the cost, which can be invested in the changeability of all your teams that help you survive in the future.