The Illusion of Runtime Flexibility

6 min readAug 22, 2021

Systems that can be easily changed are so attractive to everyone. Developers try their best to keep the code clean so that they can make changes fast. Ops engineers try their best to simplify the deployment process so that changes can be easily deployed. However, after the development process has been improved for decades, the business is still not satisfied with the speed of changes. Therefore, many people look for runtime flexibility that allows the software to be changed without making code changes. However, they don’t realise that this flexibility is built on a huge cost that could be invested in other opportunities to help survive in the future.

Changeability and Runtime Flexibility

There are two ways to change the behaviours of the software:

by changing the code
by changing the live data (without code changes)

Changeability refers to how easy software can be changed by making code changes. It is normally measured by the lead time.

Runtime flexibility refers to the ability to change software behaviours without making code changes.

The Illusion of Runtime Flexibility

People normally feel that making code changes is slow and expensive. So it is so attractive to have some flexibilities in the software, which allow changing the behaviour by changing live data or configurations. One example is the access management service.

The example: Access Management Service (AMS)

The design

A generic AMS normally is designed with the following entities:

Operations: defines the actions such as account:view
Roles: a group of operations such as admin
Resources: defines the data to access such as the account information
Users: defines who want to access the data

An AMS normally support changing the relationship between roles and operations on the fly to control what users can do dynamically. Like this:

In reality, however, it usually leads to application errors. Let’s have a look at why.

How is AMS used?

Let’s say there is a simple page that displays the information about clients’ invoices. The page consumers the following data from the AMS:

the account id as a resource
invoice:view as an operation
billing as a role that can do invoice:view
the user that assigned to the billing role for the account access

Then on the page, the code checks the operation to decide whether the invoices are displayed or not, like this:

if (user.hasPermission("invoices:view", this.accountId))
  <invoiceList accountId={this.accountId} />
else
  <message>no permission</message>

Of course, a similar check is also on the backend API:

- httpApi:
    method: GET
    path: /invoices
    authorizer:
      name: myobIdAuthorizer
      scopes: invoices:view

The code is agnostic to roles so that we can easily remove the operation invoice:view from the billing role to remove the access of any users assigned to this role (it can happen when we want to split a role into two, such as billing to finance and manager). It is so flexible! YaY!

But, is it just this simple?

The cost to maintain

The reality is much more complex than the example above. A page normally contains many sections of information that require permission checks for many operations. Here is how a real page looks like:

Permission checks on the page:

It requires invoices:view and credit-notes:view to display the sections of the Current Bill and Bill History.
it requires charges:view to display the Bill Details section.
The Pay Now button requires payments:view to check if the invoice has been paid and payments:write to make a new payment.
The Refund button requires credit-notes:write to generate a new credit note and payments:write to make a new reversed payment.

It shows there are 6 operations required on the page. It means there are 64 possible combinations of operations on this page. It means to keep the flexibility of changing the relationship between a role and operations, we need to make sure that any combination functions well. It means we need to consider (maybe test) 64 scenarios for every change made on this page. This is extremely expensive.

Of course, we can make assumptions that the role can do payments:write must have payments:view to reduce some of the scenarios. But the number of scenarios is still very high. And the entire application is much more complex than this page. The more operations the application depends on, the more expensive to maintain the flexibility.

Thus, the flexibility would be broken unless we invest a significant amount of cost.

Another well-known example is feature toggles. The feature toggle gives us the flexibility to switch functionalities in running software. However, it also doubles the cost to build and test. The longer we keep the toggles alive, the higher cost it is. That’s why we want to keep them short-lived as much as possible.

So in general, runtime flexibility is very very expensive. But why people don’t feel so?

What is missing?

As I mentioned, people feel making code changes is expensive and slow. It is because it requires going through a development process that has many steps. On the other hand, runtime flexibility requires no process to make a change, so people feel it is cheap and fast. But they normally ignore the risk of making changes without tests.

The triangle of risk, cost, speed

In software development, when the size of a change is certain, the relationship of risk, cost, speed is:

It means:

we can reduce the risk by increasing the cost. eg, we have to test all the scenarios to keep the flexibility alive, which the cost is very high
we can reduce the risk by slowing ourselves down. eg, we can put an approval process in place for every change, which also requires the extra cost to build.
risk can not be reduced without increasing cost or slowing the speed.

As we see, the only way to reduce the risk is to invest a lot of costs.

Is investing runtime flexibility worth it?

The business is usually willing to invest in runtime flexibility even they know it is expensive when the speed of delivery (by code changes) is slow. They want to keep the flexibility alive so that the business can make a fast change at any time.

But what if we put the investment to improve our development process (Changeability) to reduce the lead time? What if any code change only needs 30 mins to be safely deployed into production (continuous deployment)? In this case, a fast change can be also achieved but without the cost of maintaining runtime flexibility.

It is true that investing in the development process is also very expensive, but it is a standard process that can be scaled to all the delivery teams in your organization. It has a higher return rate than context-based runtime flexibility (can not scale).

Summary

Runtime flexibility is sometimes necessary if you don’t have control of the data. But if you have the control, please put them into the code and only allow them to be changed from the code. So you can minimise the risk for each change and save the cost, which can be invested in the changeability of all your teams that help you survive in the future.