The Tech Debt Playbook

Chuck Groom
Jan 26 · 10 min read
Source

All software teams have technical debt — parts of the code that weren’t created with today’s challenges in mind, or were written poorly, or were expedient hacks that are now problematic. Having tech debt isn’t necessarily a bad thing; if people spent all their time making code perfect, nothing would ever get done. But too much accumulated debt makes it slower to deliver new features and is a source of bugs and quality issues.

What should an engineering leader do when a team is mired in tech debt? This post lays out the the general playbook I’ve followed. Many people have faced similar hurdles, and there are established best practices and accumulated wisdom to draw from.

The bad news is that addressing tech debt is a long, slow slog. People outside of engineering will struggle to see the value of things like “refactoring” or “data isolation.” They will need to be convinced to allocate time to things other than new features which add direct customer value.

The biggest challenges to addressing tech debt will be cultural, not technical.

The biggest challenges to addressing tech debt will be cultural, not technical. Engineering teams need to accept more mature processes, organizations need to commit to a shift from operating reactively to deliberate planning, and everyone must truly get behind addressing tech debt as an investment in future team capacity.

Rally leadership around tech debt

Tech debt manifests as engineering projects that take longer to ship than expected and quality being a problem. This almost always goes hand-in-hand with frequent interruptions from urgent bugs, operational problems, and customer support escalations. There may be an impression that engineering is in disarray and isn’t doing a good job writing software. But the root problem is that the system has grown too complex and haphazard to be manageable. It will require dedicated effort to untangle and clean up.

The system has grown too complex and haphazard to be manageable

Growing teams may face an additional organizational challenge of a culture where everything is a last-minute scramble. Sales may drive product development by demanding urgent features to meet commitments. The CEO may expect to set the engineering team’s focus on near daily basis. The mindset that all tasks are urgent and immediate encourages quick-and-dirty hacking to meet tight deadlines. Heroically responding to sudden changes is what gets rewarded. Plans tend to be sketchy at best, and process is undervalued and ignored. While any organization must have the ability to confront urgent changes, the question is whether operating reactively ought to be the norm.

You will need to convince your leadership that the engineering machine is in danger of grinding to a halt. The team wants to be productive but is constrained by the accumulated weight of maintenance costs and edge cases from past decisions. If people want to make this better, they must commit to these things:

Establish best engineering practices

Engineering teams should adopt best practices for shipping high-quality code and communicating effectively. This level of baseline maturity is a prerequisite for tackling tech debt initiatives so people don’t get lost in the weeds, and also prevents unsustainable hacks or errors from getting into the codebase. Set standards like:

Metrics

Choose essential metrics that show progress and help planning. Start with:

Build a list of tech debt projects

Maintain a living document that lists all your possible tech debt projects. For each project, include a brief description, the benefits, and a rough sense of effort. When there are incidents or regressions caused by unaddressed tech debt, add a note to the corresponding item to bolster support for fixing this.

Building this list will be surprisingly easy. Some of your engineers probably already have notes along these lines. Survey your team, review your backlog, and you’ll start to see patterns. Sharing this list with your team will help validate that their concerns are being heard.

Set expectations that engineering will be gradually chipping away at tech debt for months or even years. Be very judicious in how you prioritize work; start with smaller projects that yield easy wins to establish trust.

Tech debt projects are notoriously difficult to scope and to track progress against. You don’t want an engineer to dig into a massive re-write and have them disappear for two months; instead, you want there to be a plan with discrete milestones. Define a “pre-project” phase for an engineer to investigate a problem area and write up a proposal that will be used as the basis for estimating work and managing the project.

Commit capacity to tech debt

Work with your boss and product manager to budget what percent of team time will go into each of these areas:

When a team deep in tech debt does this accounting realistically, they may realize that their support overhead alone is consuming the lion’s share of their capacity. Unless they invest almost all their remaining capacity to paying down their debt, they will drown. Explaining the reality of the situation in terms of a “capacity budget” may help manage uncomfortable conversations with stakeholders about why new feature work will be limited.

Start the clean-up with DevOps-ing

The most effective starting point for cleaning up tech debt often isn’t the code itself, but infrastructure improvements around deploying and monitoring code. These have an outsize impact on cycle time. Worthwhile goals are:

You don’t need a super-advanced setup to get started. If your tests aren’t in good shape, don’t block your DevOps efforts waiting for teams to add test coverage; go ahead and focus on getting deployments and monitoring in good shape.

On-call rotation and bug rotation

Interrupting engineers hurts productivity. But someone needs to be available to fix urgent issues and answer questions.

The standard practice is to introduce an on-call rotation. Each week, a different engineer on the team is responsible for fielding all interruptions and urgent fixes. The goal is that the rest of the team can focus on planned work. Because the team is on the hook for the code it writes, there is real skin in the game for preventing problems. The sprint should be planned without any expectation that the on-call will have time for “normal” project work — that would be a happy bonus.

Many teams struggle with a long backlog of bugs and customer support requests. Inevitably, important features are prioritized over bugs, and the backlog only gets longer. Teams need a way to allocate protected capacity for working on bugs. One solution is to ask the on-call to work on bugs during their downtime. If that’s not sufficient, teams may consider setting up a separate bug rotation so someone is 100% dedicated to the bug backlog each sprint.

Putting it together — an example

As an engineering leader, you will need to spend a lot of your time selling people on the plan to address tech debt. You’ll need to remind people that ongoing support and tech debt are real costs, and part of engineering’s job. Reinforce that we as a company have decided tech debt matters, so clean-up tasks aren’t an indulgent engineering romp.

In addition to a more detailed quarterly roadmap that explains the rationale projects and estimates, I highly recommend sharing a distilled explainer for where engineer effort will be going broken into three tracks of work: support, tech debt, and features. Share this regularly and repeatedly, especially at all-hands and team meetings. Here’s a made-up example:

We have 7 engineers. For Q1:

Two engineers a week are on support:

These internal initiatives will make our system more stable and lay a foundation for upcoming projects: refactoring our scheduling system, moving software deployments from Elastic Beanstalk to ECS, and making user permissions a standalone service. We estimate this work to be about 40 engineer-weeks, or 3 dedicated engineers a week for the quarter.

This gives us capacity for about 2 engineers a week on features. We expect to roll out a partner integration with Sesame Co., add support for Outlook calendars, and provide customers with an internal dashboard showing their top performers and under-performers by market.

Don’t let the approach become the purpose

I’ve seen cases where a team addressing tech debt adopts an unnecessarily extreme technical position because they took their approach as a mantra, and let it became the goal. For example:

Let’s break the monolith into manageable chunks → everything should be a microservice.

The attention on “microservices” obscures the benefit we are seeking, which is to isolate code and data into isolated components so a developer can make changes quickly and without unintended consequences. Perhaps there are data that must be tightly coupled, so separating out a component as a larger service is appropriate. If a team bangs on the “micro” part of the “microservices” drum too loudly, that may lead to poor architecture decisions like having multiple codebases which update the same data set.

Another example:

We need to add unit tests → there must be 100% code coverage.

The statement that “parts of the code should have better tests” is very different from “every part of the code must have tests.” There may be old areas of the code which just aren’t worth testing; or highly stateful workflows where maintaining tests is more like updating a complicated simulator; or 3rd party integrations without test environments that make automation a challenge. While a leader may be tempted to frame a goal in the most ambitious terms (“everything must have tests!”) with the tacit understanding that people should apply discretion, that’s not a great idea because engineers do take things literally and it’s very easy for an initiative to spin out of control.

Keep tech debt projects on-track and focused on delivering value. It’s important that the organization sees regular progress and that projects are set up for success by having clearly defining objectives set ahead of time. And finally, make sure to recognize and reward engineers who work on code clean-up because it is just as important as feature work.

Other Resources

Geek Culture

Proud to geek out.

Sign up for Geek Culture Hits

By Geek Culture

Subscribe to receive top 10 most read stories of Geek Culture — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Chuck Groom

Written by

Consulting CTO open to projects. I’m a serial entrepreneur, software engineer, and leader at early- and mid-stage companies. https://www.chuckgroom.com

Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store