Legacy code: changing software you don’t understand

Luca Matteis
6 min readSep 14, 2019

--

Software development in 2019 is a vibrant field with new patterns, paradigms, languages, abstractions, run-times being discovered and old ones being challenged everyday.

I am happy to be part of an ecosystem where I am constantly inspired to think differently about things and to solve problems in alternative ways.

One major problem I believe is critical that I find myself tackling everyday is the idea of changing code you don’t understand. There are entire books, practices and patterns that allow us to write code elegantly, cleanly, like poems that others will be able to read and understand years from now.

“Writing good code” seems to be the only way of fixing this problem.

I’d argue that the issue of changing software stems from a deeper and more fundamental way of how programming is done.

Let’s take a look at a simple program that takes as input a number x and decides whether it is a multiple of 3:

const x = readInput();
if (isMultipleOfThree(x)) {
return true;
} else {
return false;
}

Now let’s imagine we want to change this program to “also check whether it ends with the digit 5”. To do this we can simply change our if statement to include this check (in bold the added change):

const x = readInput();
if (isMultipleOfThree(x) && endsWithDigitFive(x)) {
return true;
} else {
return false;
}

I think the very fact that we had to make this modification to integrate this change is key to understanding why legacy code is so hard to change.

But this is crazy talk… how can we make changes to a program without doing what we just did? What kind of sorcery am I talking about?

Let’s rewrite the program above using a sort of new “language” with different execution semantics. It looks like this:

const x = sync({ waitFor: 'input' })
if (isMultipleOfThree(x)) {
sync({ request: 'good', waitFor: 'bad' })
} else {
sync({ request: 'bad', block: 'good' })
}

When we run this program and we feed an event such as input(6) we get this output:

input(6) // input event
good

and if we feed it a number that isn’t multiple of 3 we get:

input(7)
bad

Nothing surprising. Let’s try to implement the same change we did earlier to “also check whether it ends with the digit 5”. Instead of changing the code we just wrote, we’ll write a new module that looks like this:

const x = sync({ waitFor: 'input' })
if (endsWithDigitFive(x)) {
sync({ request: 'good', waitFor: 'bad' })
} else {
sync({ request: 'bad', block: 'good' })
}

This new module will run in parallel with the other one. Both modules run symmetrically. They both wait for input events. Whenever the sync function is called the two modules peek at each-others declarations.

For instance IF they reach the second sync call:

if (isMultipleOfThree(x)) {
sync({ request: 'good', waitFor: 'bad' })
...
if (endsWithDigitFive(x)) {
sync({ request: 'good', waitFor: 'bad' })

They are both requesting good hence that’s what the program will output.

IF one of them is in another state such when the number ISN’T a multiple of 3 and it ends with 5, they’ll find each-other at this sync point:

if (isMultipleOfThree(x)) {
sync({ request: 'good', waitFor: 'bad' })
} else {
sync({ request: 'bad', block: 'good' })
}
...
if (endsWithDigitFive(x)) {
sync({ request: 'good', waitFor: 'bad' })
} else {
sync({ request: 'bad', block: 'good' })
}

At this point the first module is requesting bad and the other is requesting good. Who will win? Because the first module is also blocking the good event, this makes the bad event win. Hence the program will output bad.

Integrating changes is where complexity lies

You might be asking: what’s the point of programming this way using these sync calls, and these request/waitFor/block events?

Intuitively we just introduced a change to a program, albeit a simple one, without having to do any integration work.

Whereas before we had to write integration logic such as && endsWithDigitFive(x) in order for our change to work, in this new system we simply had to create a new module that did exactly what we intended. Both modules could be swapped out without them knowing of each-other and without having to do any integration work.

This is a huge deal.

You might ask: but even with this new system we’ll eventually have to modify and refactor existing modules based on the new change.

Indeed, but the change will be about enriching modules with semantics that allow them to collaborate better as a whole (such as waiting or blocking new events) rather than having to integrate or glue together parts of the modules to make them aware of how other modules work. Key difference is: there is no contact point between modules. They are always oblivious about each-other.

But my pure functions are also oblivious of each-other

Pure functions are just input→output and in this context they are also written in a way that they are unaware of each-other.

For instance let’s look at a simple data-transformation operation using pipe:

pipe(
getName,
uppercase,
get6Characters,
reverse
)({ name: 'Buckethead' })
// 'TEKCUB'

The problem is that these functions still have a point of contact: the point where they’re used (aka the point of integration).

The difference is subtle but in my opinion crucial to understanding why the problem of integration will continue haunting developers for years to come.

Let’s make this a little more concrete and discuss a change to the flow above regarding “reversing the name before it gets the first 6 chars”. Obviously this is yet again a simply change. But what if we continue discussing the change where “the uppercase should only happen if the name is capitalized” and “reverse should only be done after successfully getting data from an API”.

Things are getting a bit more hairy and complicated and yet only resemble a tiny and minimal version of the requirements that usually come up in real-world scenarios.

By not having points of contact these requirements seem less intimidating to implement: for instance a new module could be swapped-in to pause execution of the reverse operation once the API successfully responds without modifying existing code.

This new method of executing programs is actually something that exists and is called Behavioral Programming.

We can enhance or refine a system by simply adding modules, similarly to how one can enhance a requirements document by adding clarifications, refinements and exceptions in the form of new sentences in the body of the document or as independent appendices and footnotes.

As goals are refined and requirements added to a program, or when bugs appear, rather than enhancing and often complicating existing modules, we strive to add new modules that precisely address the difference, or the gap, between the goals and the what the existing system accomplishes.

Modules can be thought of as physical servers in a rack that can be easily swapped out and back in, rather than lego pieces that might crumble or complicate existing structures.

Changing software you don’t understand

But how does all this help with the infamous question of changing legacy code?

Intuitively a program written this way allows us to observe specific traces and swap-in and out modules to implement a change without having to deeply understand the structure of the program: because the changes don’t depend on the structure but on the combined behavior of the modules.

For instance in a complex legacy program we might need to implement a requirement:

Given the user doesn't have a promo code
When the user adds an item to the shopping cart
And is the first Monday of the month
Then they should not be able to add more than 3 items to the cart

In the common integration-style way we’d have to alter and somewhat complicate the modules that are responsible for these changes.

In this new Behavioral Programming style we can instead map these changes quite naturally to new modules that can be swapped into the program without touching or even seeing how the system works.

Which brings us to a new point: programming this way is more aligned with requirements:

promoCode = sync({ waitFor: 'promoCode' })
if (promoCode) return;
sync({ waitFor: 'itemAddedToCart' })
if (isFirstMondayOfMonth()) {
sync({ waitFor: 'itemAddedToCart' })
sync({ waitFor: 'itemAddedToCart' })
// only 3 items max!
sync({ block: 'itemsAddedToCart' })
}

I hope to have shown you a different way of thinking about changing software.

Obviously this does not mean that by programming this way we do not have to think about good software development practices. But I do believe it makes it easier and less daunting to make changes to complex systems: you can swap out and back in new modules based on the changes needed rather than having to modify crucial points of integration within the structure of the program.

If you’re interested in learning more on how to program this way using these modules (formally called b-threads) please read my other article on the subject: B-threads: programming in a way that allows for easier changes.

--

--