Technical Debt at Teamworks

Published in

The Startup

9 min readSep 29, 2020

During the last year at Teamworks we’ve had a vibrant conversation going about technical debt. Where it is in our codebase obviously, but also what the different types of technical debt are and what they all share in common, and then how to communicate that out to a company in a way that fixing the technical debt can be prioritized alongside product roadmap, bug fixes, and sundry.

Communicating technical debt to people other than engineers is essential to getting work on that debt prioritized and valued alongside bugs and product roadmap work, and it’s not easy at all. For a long time we struggled with differing definitions of technical debt by different parts of the company and a lack of ability to communicate the urgency of tackling it.

We’ve arrived at this definition:

Technical debt is the net-difference between process or code which services a short-term need but fails to service a predictable interim or long-term need without negative impact to the organization.

This doesn’t attempt to provide a taxonomy of technical debt. It doesn’t establish a framework to determine priority (I’ll talk about how we do it a bit later). But it does establish a hard line between what is and what is not technical debt and gets everyone on the same page.

Ward Cunningham first used the debt metaphor to talk about code problems that weren’t exactly bugs but rather things that made the code harder to understand and modify. It’s a good metaphor. It describes these problems as having a cost associated with them. It provides for the idea that there’s a principal and an interest rate, even if it doesn’t define how you arrive at those things. And it’s good that it doesn’t.

Really, your organizational structure determines how you have to define the principal and interest rate of technical debt.

The necessity of technical debt in an investment-backed company.

Investment-backed startup companies generally share the characteristic that they spend money faster than they can make it to expand into new markets. This deficit spending is a conscious choice and makes long-term sense. Most importantly it tends to pervade every decision made about how to allocate capital in a company.

That includes how that company allocates technical capital. An investment-backed software company has to build software faster than it can refine it. Getting into new markets and getting to market fit require lean experimentation alongside a codebase that’s also serving a well-developed base of paying customers who count on an agreed-upon service. This leads to conscious adoption of technical debt in the service of growing the company.

In an investment-backed company, you need to strategically embrace technical debt. Just remember to understand it, document it, and budget for paying it down before it buries you.

What counts as technical debt?

In his article on Technical Debt, Martin Fowler characterizes it this way:

Software systems are prone to the build up of cruft — deficiencies in internal quality that make it harder than it would ideally be to modify and extend the system further. Technical Debt is a metaphor, coined by Ward Cunningham, that frames how to think about dealing with this cruft, thinking of it like a financial debt. The extra effort that it takes to add new features is the interest paid on the debt.

This definition is a good one for software development in a vacuum, but it’s not broad enough to characterize the consequent costs that build up as software is put into production in a company setting. In the enterprise, technical debt has impact well beyond engineering concerns. It includes:

The cost of providing adequate customer support.
Cost of providing performant and reliable software.
Cost of continued scaling of the customer base.
Cost of ensuring regulatory compliance and security.
Throughput of individual support requests and their impact on customer relationships and retention.
Cost of hiring engineers who can make system modifications reliably in bounded time.
Ability to execute on high- to medium- priority items in a product roadmap in a sane amount of time.
The impact of customer frustration from user “toil” and confusion necessitated by engineering around existing behavior i.e. “You have to have this permission and go to that strangely named screen, and then do your task in 8 click-and-waits because that’s the only way we could build it.”

The impact of technical debt is the sum of these costs that are themselves the result of solvable technical issues, shortcuts to market (like mechanical turking), and costly adaptations (hacks). Too much debt can drag on a company’s key performance indicators across the board. A pragmatic approach to planning and accounting for technical debt on the other hand allows you to achieve things in a timeframe you couldn’t otherwise.

Principal vs. Interest

In our experience, the principal of the tech debt is the cost of what it will take to provide a solution that doesn’t have negative impacts. The interest is the toil involved in maintaining the debt-incurring solution and the growth of that toil over time as other code has to work around or incorporate the tech-debt in order to get its job done.

Take for example a function that lets you create and update a form, but there’s nothing self-service to delete it; imagine that deleting and cascading was more complicated, so you needed longer to consider how to do it right. Not having the deletion is the principal. The interest on that principal consists of the time and cost of every ticket your DBA had to take care of manually in SQL in order to delete a customer’s form. It’s also the cost to the company’s reputation of all the times that it took longer than its customers felt was reasonable to delete that form.

A scenario.

Sure, you managed to beat your competitor to market by three months by banking on the fact that the DBA can pre-populate the customer’s account with CSVs that have been emailed to a customer support specialist. But the DBA is a limited resource that spends hours a week on these requests.

She has her own backlog of important work that’s not getting done. She costs you six figures a year fully burdened. And the turnaround on each request means that the customer is waiting for 72 hours to use your product. So make sure that those costs don’t build up until they overbalance what you gained by releasing the product three months before your competitor could.

The technical debt items here look like:

Validate customer input
Provide customer-readable error messages that help them fix the input.
Build an interface that lets the customer input bulk data.
Lock the interface down to the right key customer users.
Document and possibly train users on that interface.
If it’s not there already, add monitoring and rate-limiting that keeps customers from impacting service with large or frequent requests.

All of those things have a cost associated with their development and all are less exciting than building the next new product. But locking up those six items means that the DBA is freed up to work her backlog, the customer is self-servicing, and part of what you built may be usable for the next similar product (rate limiting, monitoring, generic bulk-data loading, and some of the validation and error handling).

Be outcome-oriented in communicating debt

One challenge in communicating technical debt is simply getting it on the docket. Describe consequences to be mitigated instead of the solutions you plan to use to mitigate. Imagine a system where you’re going from a single-instance cache to a highly-available, scalable cache. Your description of the work could be “Switch web caching from our managed redis to a managed ElastiCache,” That communicates nothing about the why, and in a backlog of 1000 tickets the title tells the product owner nothing about how to prioritize that vs. everything else. A better ticket would be titled, “Cache misses are causing users to complain about slow performance at peak times.”

Technical debt is not a catch-all.

There is work to be done in engineering that is neither customer-facing nor technical debt but nonetheless needs to be done. These include

Proactive Work. Anticipating a predictable change in scale or focus, e.g. scale we have not yet realized but expect to.
Engineering priorities.
Prerequisite work for future features / planned maintenance.
Bugfixes and routine maintenance.

Anticipate and plan for change: “Building Technical Capital”

There is an important distinction between doing work that anticipates change vs. work in reaction to it. Work that is done as a reaction to change is often paying down principal or interest on technical debt. Work that’s done in anticipation of change expands our overall technical capital.

To illustrate the difference, consider a mobile/web shared code project. At some point in the past we began using React on web as well as React Native to build our application. In the beginning, there was very little shared code, but we anticipated that much of the code between web and mobile would be shared in the future. If we had taken that foreknowledge and applied it then to solving the problem of “how to share code between web and mobile”, prioritized, and scheduled that work, that work would not have been “technical debt.”

Why? We realized our code repo was inadequate to future needs. Why is that not Technical Debt? The answer to that is also the answer to the question “Was there realized impact or did we get ahead of it?” It’s the difference between being forced to react vs. having the advantage of the situation. If we had done it then, we could have done it without also being impacted by the negative consequences that came with waiting too long to address it.

Instead we didn’t plan the work ahead and we had to build a shared-code solution while also experiencing development drag from engineers manually keeping shared code in sync.

Prioritize proactive work by thinking about the technical debt you take on if it’s ignored.

Engineering priorities

This bucket of work is for experimentation and work that has the potential for positive disruption. It’s a bucket for work where the engineering department can be the force for innovation. Think “labs.” If when you crafted your story, you thought “Things are pretty good, but I think they could be way better.” then you have an engineering priority.

Prerequisite and requisite work

Prerequisite and requisite work is the work that should be done before building or revising a feature, or should be done in order to make the feature complete. This is often the work needed to make new development conform to engineering standards of quality, testability, and performance. Examples of this include:

Providing self-service admin functionality.
Refactoring code somewhere else in the stack that is common to the new / revised feature, so that it can be shared between the old and new.
Bulk uploads.
Settings screens.

Sometimes prerequisite work can be skipped and a feature can still be shipped, but it will be more costly to maintain and modify than a fully complete project. This causes technical debt to incur and thus the priority of the work can be based on that impact.

Bugfixes and routine maintenance

The difference between a bug and an item of technical debt is obvious most of the time. The distinction is blurry when the bug doesn’t affect the correctness of output, only some aspect of importance to engineering or operations. In some cases, the distinction may be down to urgency or whether treating it as technical debt can bring a single item into the context of a wider cleanup push.

Prioritizing technical debt paydown

The key concepts are Urgency and Impact. Another key activity for grooming technical debt, however, is contextualization. This is the planning activity of organizing technical debt into well-scoped refactoring plans, epics, and the collapsing of closely related stories. This makes it so that we can tackle more technical debt than we could grabbing a few stories off the stack. Teams should groom technical debt carefully and where possible create proactive solutions like refactors vs. playing “whack-a-mole” with issues that haven’t changed since they were initially reported.

Conclusion.

In a company setting, technical debt has wider impact to the organization outside of engineering and product development. Understanding how to communicate it effectively and prioritize it against new development concerns means quantifying it against its cost to the organization as a whole.

Within Teamworks we use a framework of urgency and impact for characterizing technical debt that lets us describe the consequences to engineering, product development, customer success, and our customer base itself to put a value on what we tackle. This prevents “technical debt” from becoming a meaningless catch-all phrase that the rest of the company interprets as “stuff engineers want to do” and lets us be strategic about what we take on.