Strategic Technical Debt
This is an opinion piece, on a matter which other people have their own strongly-held opinions. I would suggest also reading Ward Cunningham’s explanation of his own interpretation of the “technical debt” metaphor and Ron Jeffries’ criticism of the metaphor and its implication that there are valid reasons for temporarily cutting corners in software development. Obviously, I feel differently about this from Ron, but I also feel it’s often wiser to read a few contradictory perspectives on a matter and work out what applies to your own situation than it is to limit yourself to whichever perspective you first come across.
What Is Technical Debt?
Any decision made in the course of building or maintaining a system which yields some near-term benefit at the cost of:
- leaving work which still needs to be done eventually (the “principal”)
- incurring an additional upkeep cost until the deferred work has been done (the “interest”)
How Could That Possibly Be a Good Thing?
One of the things I think is responsible for the sharp splits in opinions about whether there is such a thing as “good” technical debt is that some of us have experienced it primarily in contexts where it is taken on because of ignorance or carelessness while others have seen it more often in contexts where it is employed intentionally as part of a work-prioritization strategy.
Ward Cunningham’s assessment seems to be oriented more to the “prudent” side of this quadrant (on the idea that everything which was done either was a good idea at the time or at least seemed like a good idea at the time), while Ron Jeffries’ assessment seems to be oriented more to the “reckless” side of the quadrant (on the idea that technical debt seems to disproportionately come from “cowboys” who have little concern for structure or quality).
I’ve seen all of the above, and have been responsible for instances of all of the above at one point or another.
Going on the “debt” metaphor, though, let’s apply that same quadrant to financial decisions:
Not exactly the same thing (all non-trivial abstractions, this one included, leak at least a little), but it does illustrate that the metaphor works in that there debt can be accumulated because:
- not taking the debt on has an opportunity cost which outweighs the cost of the interest on the debt (deliberate/prudent)
- shit happens, sometimes without warning, and the debt is either a direct result of this or the best path forward — but is now one more thing that you need to take care of (inadvertent/prudent)
- the debt is convenient for now, and you can’t be bothered with thinking about the future (deliberate/reckless)
- the debt is the result of not paying attention to or fully understanding the consequences of your present decisions (inadvertent/reckless)
To Ron’s point, a lot of the technical debt in most systems comes from reckless actions — making it the technical equivalent of taking on an irregular mortgage assuming that the housing market can only go up or maxing out credit cards to buy things you can’t afford and could easily do without.
But there’s also technical debt that’s a result of things which simply weren’t known at the time a decision was made (like a system having a design which is a good fit for the original intended use case — but a poor fit for the emergent use case it ended up being applied to). And technical debt which isn’t much of a choice because there are even bigger problems which need to be solved right now (an ugly patch to temporarily fix a production scaling problem can be a lot like putting a car repair or medical bill on a credit card because you didn’t think you were going to run into that expense so soon and didn’t have sufficient liquid resources to take care of it properly when it happened). These are reasonable situations which reasonable people get into, and fit into a general strategy of “accept and pay down” where there’s not much to do other than to recognize that it is impractical to completely avoid such debts and make sure you have enough resource to spare in the long run to pay them down before the costs associated with them snowball to an unmanageable level.
There is also technical debt which is incurred while paying off some other, even more expensive technical debt. Think of the next time you see a refactoring effort which goes through a few also-less-than-ideal iterations as “technical refinancing” where the new technical debts are (hopefully) at least fewer in number or lower in cost than the ones for whose elimination they were incurred.
Finally, there’s the kind of technical debt that I’m talking about for the rest of this piece, where you knowingly take it on because it allows you to move faster or sooner on something which would have a huge opportunity cost if you don’t employ debt strategically to be able to move faster or sooner than you otherwise could.
As an important note, which some of you may already have been saying to yourself while reading this, it’s easy to be wrong about what kind of debt you’re taking on. Sometimes a “prudent” action still fails to work out — you could go to university and have trouble finding work in your field of study, you could take on a personal loan to launch a small business and find that the market you were counting on didn’t really exist, and you could charge those car repairs and find that the income you were counting on for the next month didn’t come through after all. The good news, with respect to technical debt, is that it’s easier to “default” on in such cases than financial debt. More on that later.
How Does This Work?
When thinking about intentionally incurring technical debt, the following four questions are what I see as key to determining whether that debt would be “reckless” or “prudent” debt:
What is the Opportunity Cost?
For technical debt to be worth taking on, there must be something of value which you can get by incurring it.
Taking a $10K cash advance and putting it in a shoebox wouldn’t make much sense. Taking that $10K and putting it into starting a hosting company which could bootstrap your way into being the founder of a successful cloud provider at least makes the risks and costs something worth considering. While I don’t believe any credit-card cash advances were involved, I do know someone who essentially did just that.
At DigitalOcean, later on, the company would use credit lines to fund hardware build-outs rather than artificially slowing its rate of growth so it could wait for the old servers to pay for the new (rather than buying the new servers and letting them pay for themselves in short order) or diluting equity which was rising in value at a rate much faster than the equivalent interest cost on the credit lines.
Not taking on that debt would be very expensive. So the remaining questions became relevant.
On a technical level, the equivalent is a situation where you know that if you try to do everything according to “best practices” you’ll:
- be late to market — weakening your competitive position,
- delay the delivery of some real business value (like cost savings from introducing a better inventory-planning or delivery-optimization system),
- …or burn through your project’s or company’s funding before you get anything working at all.
In these cases, technical debt is at least worth considering because there are real costs in avoiding it.
What is the Interest Rate?
Okay, so you know that there’s a big opportunity that you can only take advantage of if you also take on some debt…
…what is the debt going to cost?
There are a lot of things that are worth financing at a 3% APR which wouldn’t be at a 20% APR. Or even a 6% APR.
For instance, a house is often worth buying at a 3% APR, if you are confident that you’ll be staying in one place for a while, because that interest rate is:
- not far off from the average rate of appreciation for real estate over long time scales
- tax-deductible, making it effectively even lower
- an enabler for applying some portion of what you used to spend on rent to what amounts to savings in the form of home equity
Buying a house at typical credit-card interest rates, however, would be insane — the overall cost would outweigh the costs of simply renting, while losing the relative flexibility of being a renter.
Technical debt, unfortunately, rarely has a nice “interest rate” number stamped on it to tell you exactly how many person-hours it’ll consume in a given time period. But you can get a feel over time for which shortcuts come a small cost and which come at a dear one. I’d hand-wave it as follows:
- Bad architecture is like payday-loan debt. Avoid it if at all possible, because the costs of fixing bad architecture are insanely high. This goes double if you’re talking about a 24/7 service with active users where you can’t “fix” the bad architecture by building a nicer version of the product from scratch and then replacing the old one wholesale.
- Bad documentation is like credit-card debt. It’s okay in some circumstances, but should be paid off soon after it is incurred or else the costs associated with it will compound and make it hard to catch up with. If you pay it down soon enough (while the project is still fresh in your mind and you haven’t had to on-board any new maintainers) you might be able to get away without paying any “interest” at all.
- Bad tests are like personal loans. Having tests which are both robust in their coverage and automated such that they can be easily run is cheaper over the long run than having only spotty test coverage or only manual testing. But ad-hoc manual testing can work surprisingly well where necessary to defer the costs of actually writing a full battery of automated tests — with the downside of adding to the cost of applying any update to the code. Unlike sparse or bad documentation, there’s rarely any “free” period where not automating tests has zero cost — you will probably find bugs once you get those tests together which would have been cheaper to fix if you wrote those tests right before or right after the corresponding product code.
- Bad code is like a car loan. The costs of ugly, un-refactored code are real but often less than the costs of having no tests, shoddy tests, or only manual testing. Having ugly code does slow down maintenance and troubleshooting efforts, but often not as badly as missing/inaccurate documentation or a test suite that no one really trusts to validate that a change can sanely be deployed to production.
- Deferred “features” which are really developer conveniences are like a mortgage. It’s good to check these things off and pay down this “debt” over time, but the costs are low and the opportunity cost of not releasing a product until you have beautifully-convenient development and deployment tools can be quite high. These items are often best addressed after feature work and avoiding/repaying the above forms of tech debt, and after applying this like of thought as a sanity check:
There are probably a lot of people who would take issue with my feeling that documentation debt is worse (unless paid off very quickly) than testing debt or code debt. My take on the matter is that good documentation helps with the writing of good tests and good code, and that bad documentation can drive a misunderstanding of the system which leads to fundamentally-flawed design decisions in a way that substandard tests and code can not. That documentation can also be the guidepost for making sure testing and coding debts don’t get buried and forgotten — it’s easier to pay off the testing debts when the docs are in good shape, same as it’s easier to pay of the coding debts when the tests are in good shape.
What are the Risks?
Okay, so the opportunity looks good and the costs seem manageable.
What could go wrong with the opportunity, leaving you with nothing to show for the debt other than the costs associated with it?
What could go wrong with the costs which might make that debt more expensive than you’re expecting it to be?
How likely are these things? Are they things you’d be able to see coming in advance? Is there anything you can do to manage or hedge against them?
Examples of opportunity risks:
- The project fails to reach completion anyway, due to some other technical problem
- The project fails to reach completion anyway, due to a retraction of support from the organization (politics, changing priorities, etc.)
- The project is successful, but the resulting product or system fails to show the expected value (the market didn’t buy into it, the expected cost savings didn’t materialize, etc.)
- The project is successful, and generates revenue or cuts costs — but the revenue ends up being cannibalized off of an existing product or new costs show up elsewhere in the system as an indirect result of the changes which were made
Examples of cost risks:
- The resource expected to be available for paying off short-term technical debts fails to materialize — turning them into long-term technical debts
- The team needs to increase its rate of growth and on-board a lot of new developers, who will have a harder time working with the system while those debts are unpaid
- Deferring test automation allows a critical flaw in the system to go unnoticed because the manual ad-hoc tests were not reviewed as rigorously as automated tests would be, causing expensive-to-fix problems shortly before or even after the software is released
- Careful design is deferred by creating a temporary system which is meant to be replaced on the “plan to throw one away” principle — and the “temporary” system ends up becoming a permanent one
As far as risk-management strategies go, one of the simplest is to limit how much outstanding technical debt can be incurred before some of it has to be paid off. In practice, there’s an inherent limit at the point where you can’t get anything else done because you’re spending all of your time dealing with technical-debt-related maintenance headaches. But ideally you can account (formally or informally) for how much time is being “wasted” on technical-debt related work or slowdowns — and use that as a gauge for when the team needs to have time explicitly allocated for documentation, test automation, code refactoring, or whatever else is needed to bring those costs back down to an acceptable level.
What Happens in the Case of Default?
This is one of the fun points about technical debt (as opposed to the financial variety).
Most of the time, if the opportunity side of the equation doesn’t work out, the debt can be cancelled without repayment — and no one is going to have a problem with this.
If you incur a lot of technical debt early on in a project which is then cancelled because of a priority shift or poor reception in early market testing, you can count it as a list of things you didn’t waste time on. Unless you incurred some of that technical debt while altering something which will remain in service, all repayments and “interest” costs are cancelled along with the project.
So, in the case of projects with a high market-fit risk, technical debt can be used almost as an option strategy where you can abstain from paying for most of the project unless the additional information you gather while market testing it confirms that the opportunity side of the equation is looking good.
That does, of course, rely on your organization being one which won’t try to rush people on to the next project after the beta is launched without giving them time to cross off any high-cost or high-risk TODOs that they had deferred while putting together an MVP to test the product’s viability.
Also, if technical debt was incurred in something which doesn’t go away along with the failed project, that cost will still need to be paid off. Which is a good reason for being more careful about what is considered “acceptable debt” in an existing system than you are with the unproven prototype you’re fitting into that system. And to make sure resource is going to be available to clean up anything which is hacked onto those systems (either by going back to do things nicely or by rolling back any hackish changes) even in the case of the project being cancelled.
What Do I Need to Look Out For?
There are some human factors which can make even deliberate and reasonable employment of technical debt difficult to employ constructively:
- Developers are often irrationally optimistic about their own work, which can make getting an accurate assessment of project opportunities, costs, and risks difficult.
- Developers are often unfairly critical of other developers’ work. By this, I mean that they tend to overlook the context within which other developers have made decisions, and instead judge those decisions as examples of what those developers would be capable of doing if not forced to make trade-off decisions as a concession to the realities of getting software out the door. Today’s strategic technical debt is tomorrow’s “incompetence” or “cowboy coding” when someone new sees the tech debt in the end product before understanding why these decisions were made. I’ve been on both ends of this, myself, and can say it’s frustrating for everyone involved when the historical background of decisions is lost.
- Sometimes a decision which seemed like a good idea at the time was a good idea at the time — and sometimes it only seamed like a good idea because critical information was not yet available. Because of this, some technical debts which were expected to be low-cost or short-lived end up lasting longer and costing more than they should. In an organization with a healthy learning culture, however, this can at least be used as a learning point around why these costs were misestimated and whether there were any blind spots which can be better covered the next time such a decision comes along.
- Management’s focus and incentive structures rarely put much value on “niceties” like cleaning up scaffolding and temporary patches once the user-visible features have been shipped. So technical debt should be approached with extreme caution unless management is not actively involved in the process of making these decisions and aware of the necessity of paying off technical debts incurred in the implementation of a successful product.
Technical debt can be your friend, if used consciously and with care in making sure the organization will be both able and willing to pay it down on a sane schedule.
That said, it’s still something best avoided if you can’t make a sane argument based on what you’ll be able to get by taking it on and what costs and risks will come with it until it is “paid off”.