What is technical debt? And how to talk about it?

This is a transcript of my eponymous presentation at Confoo 2021.

Title slide

The metaphor

So first off, the term “technical debt” is a metaphor. It’s a figure of speech, a thing representing something abstract. So what is “technical debt” representing? Here’s what Wikipedia says:

The origin of the Metaphor

Now, let’s see what the person who coined the term “technical debt” has to say about “technical debt” and “agility”. That person is Ward Cunningham. And interestingly enough, Mr Cunningham co-authored the Agile Manifesto. He says:

We ❤️ agile ways of working

So what we learned here is that we want to deliver code sooner. We want to be agile. We also learned that agility comes with an explicit cost. As long as we build to learn, actually learn, and continuously consolidate that learning in our software, we fulfill the promise of the agile manifesto. If we only do the building, without the learning or the consolidation, what we get is “technical debt”, and this technical debt will make it harder and harder to build. Which is not what the author of the Agile Manifesto had in mind: “Agile processes promote sustainable development. The [whole team] should be able to maintain a constant pace indefinitely.”

Move fast, but…

So, we, as an industry, choose to deliver sooner to learn sooner, over delivering later for… for what? Increased confidence? Confidence about what? Safety? When we say “going faster”, there’s the thrill, right? But there is also the fear. We have the intuition that there is a change of balance in how we assess the risks we take. Like the risk of breaking things.

Mark Zuckerberg, standing in front of a screen saying “Move fast and break things”.

Move fast, and… The DevOps answer

For 6 years, with data from over 31 thousands professionals worldwide, the State of DevOps research program, the longest running investigation of its kind, identified the most effective and efficient ways to develop and deliver software. And here’s what they found

Mark Zuckerberg, in 2014, standing in front of a screen saying “Move fast with stable infra”. Credits: Mike Isaac

A better definition of Technical Debt

Technical debt, beyond a mess, is the byproduct of delivering sooner, to learn sooner, if we don’t consolidate these learning back into what we build.

A graph showing value over time. Simple interest grows value linearly over time. Compounded interest grows value exponentially over time.

…but why are we taking on debt?

So now we have a better understanding of what the technical debt metaphor represents. But we don’t know why, despite our best efforts, we are involuntarily taking on debt.

Process Performance, a system thinking view

The actual performance of any process depends on two factors. The amount of time spent working and the capability of the process used to do that work.

Working Harder

Performance gap puts everyone under pressure to perform. That pressure can be explicit, like KPIs or velocity target for instance, or implicit, like increased micro-management or slight shift in culture. That pressure incentivizes teams to spent more time and energy doing work. And an increase in effort also increases the performance of the process, and closes the performance gap.

Working Smarter

Here we respond to performance shortfall by increasing the pressure on people to improve the capability itself. We might kickoff new improvement project, or increase training. If successful, this investment will, with time, yield improvement in process capability, increase throughput, and close the performance gap.

Limitations

From this vantage point, we all see that it’s better to “work smarter” than to “work harder”.: an hour spent working produces an extra hour’s worth of output, while an hour spent on improvement may improve the productivity of every subsequent hour dedicated to production. Yet despite its obvious benefits, working smarter does have limitations.

Reinvestment Loop

That connection exists because teams rarely have excess capacity. Increasing the pressure to do work leads people to spend less time on non-work related activities, they use the “work harder” loop. There are, however, obvious limits to this. After a while, one cannot continue to work harder. If the performance gap continues to widen, teams have no choice but to reduce the time they spend on improvement as they strive to meet their expectations. This connection between “pressure to do work” and “time spent on improvement” creates the “Reinvestment loop”.

Shortcuts Loop

As we saw already, cutting investments in maintenance and improvement in favour of “working harder” erodes our capability, and hurts performance. But capability doesn’t drop right away. It takes time for our capability to decay. In the meantime, the decision to skimp on improvements boosts the time available to get work done right now.

System response

To illustrate these dynamics, let’s look at two different two use-cases and see how the process reacts to “working harder” versus “working smarter”. Both use-cases begin in the same equilibrium state. Now, let’s increase our expectations!

The Capability Trap

The interaction between the “shortcuts loop” and the “reinvestment loop” creates the “Capability Trap”.

The Fundamental Attribution Error

We generally assume that cause and effect are closely related in time and space: to explain a surprising event we look for another recent and nearby event that might have triggered it.

Self-Confirming Attribution Error

Managers cannot observe all the activities of their teams, they cannot easily determine how much of an increase of performance is due to “working harder’ versus taking shortcuts. As a result, managers might overestimate their impact when increasing the “desired performance”, and are not aware of the trade-off they’re incentivizing their teams to make.

So what?

First, the most important implication of this research is that our experience often teaches us exactly the wrong lessons about how to maintain and improve the long-term health of our systems. This means that successfully reversing negative dynamics involves a significant mindset shift of the both leaders and teams.

Visibility

We’ve seen that, due to a lack of visibility, managers can push their teams in the capacity trap, by not realizing their teams started to skimp on improvement work due to high pressure of working harder. And because “you can’t have your cake, and eat it too”, that’s a problem managers and leadership need to deal with, or it will become someone else’s problem.

Flow Framework

For our needs, the Flow Framework has a solution. Here’s how Gene Kim describes it:

Flow Items

Each of these 4 items are units of business value, pulled by a stakeholder, through the software delivery process. Here’s the list.

  • Defects are quality problems that affect the customer experience. Work here is delivering external quality. This work is pulled by customers, it looks like bug, problems and incidents.
  • Risks are security, regulation, and compliance exposures. Risks are not visible to customers. That is, not until it’s too late. Work delivers security, governance, and compliance. The work is pulled by a Risk Management Officer, it looks like vulnerability, regulatory and contracting requirements, and internal compliance.
  • Debts are anything that reduces of the ability to modify or maintain our software in the future. Work delivers removal of impediments to future delivery. It is pulled by architects, and looks like API addition, refactoring, process change and automation, or change of architecture.

A better metaphor to communicate: “Technical Delta”

Thinking is term of technical gap, or technical delta, can help us make that point. A technical gap, like the performance gap, is created by a discrepancy between what we have and what we want have, between our current technical capabilities and our desired technical capabilities required to enact the business strategy.

Summary

So, let’s summarize:

  • It stems from the lack of visibility and awareness that managers have on the actual work being done (and not done). And using that metaphor doesn’t really help to make that work more transparent. So managers, stop blaming the developers.
  • We’ve also learned that the “fundamental attribution error” is pervasive, and that the best way to improve, is to improve the system. So developers, stop blaming the managers.
  • Managers: account for risks and debts. If you don’t talk about it, assume that no-one does, and that nothing is getting done about it. So that should be a problem.

Sources

In order of appearance:

More resources

System thinking

Technical debt

Visibility & flow

50% system manager at Akeneo / 50% endurance cyclist. Will train for food and burn it for adventures.

50% system manager at Akeneo / 50% endurance cyclist. Will train for food and burn it for adventures.