Tackling Tech Debt Effectively

Published in

The Startup

11 min readAug 27, 2020

“Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.” — Ward Cunningham, 1992

Most teams deal with technical debt in one way or another. Just like with credit card debt, once you get to a certain point a feeling of defeat may take over and declaring bankruptcy may feel like the only way out. In software land this is equivalent to declaring we need a full rewrite. Fortunately I never had to dig myself out of financial debt but I did have to do it for technical debt on more than one occasion. To further extend this analogy, the advice to get rid of financial debt is being strategic, and prioritizing the debt that is hurting you the most so you can increasingly improve the financial situation. In this article I’ll talk about similar strategies to deal with technical debt, including concrete tips that can be applied to almost any codebase.

Understand Your Situation

The first step in the improvement journey is to determine where you currently stand. In order to do that the team will need a good set of metrics. These metrics will help paint a clear picture of the current state of the codebase so developers and stakeholders can understand why things need to change. These metrics will also help track the progress, which is extremely important for the success of the project. Stakeholder support will continue (or increase) as progress is demonstrated and the developer team will also feel a burst of motivation as they see the impact of their changes.

Different teams may choose different metrics and that’s all good. What really matters is if these metrics will be an effective measurement for your organization. If the organization already values (or collects) a certain set of metrics, the safest route may be to use those. For our team, we use SonarQube (SQ) which exposes a good set of sensible metrics:

Code coverage
Duplication
Code smells

SonarQube exposes many more in-depth metrics as defined in their docs but for the purposes of our project we found those to be good metrics to use to track progress for a few reasons. First, SQ is the official tool in our organization so every other project uses the same metrics. Second, these are the main metrics in the dashboard, so anyone that looks at the report can clearly see them. Last, these metrics are fairly easy to explain to any stakeholders, unlike afferent coupling, cyclomatic complexity etc.

This is by no means a pitch for SQ but this tool does have a very handy feature to measure a change in the team’s behavior pretty quickly. When you make a few small changes in a codebase with hundreds of thousands (or millions) of lines of code, a few changes may be a drop in the bucket, not affecting the overall metrics at all. SQ defines a Leak Period, which analyzes code that was modified since your last release. So, any change in how the team is committing code will show up pretty quickly, even if the total numbers are insignificant when looking at totals.

Below is a sample screenshot from SQ showing the total numbers on the left and the leak period numbers on the right, in yellow.

Sample dashboard provided by SonarQube. On the left, the all time numbers for the code analyzed. On the right, in yellow, is the leak period showing the metrics for code that has changed in the current period.

These metrics are great to track the overall progress but they don’t really tell you what the actual problem areas of the codebase are. For that we relied on another tool: JArchitect. This is the Java version of NDepend, a tool I’ve used a lot in the past and that makes it really easy to get really useful metrics about your code, such as:

Lines of code per class and method;
Methods per class and parameters per method;
Afferent and efferent coupling;
Cyclomatic complexity;
etc

These are just a few of the metrics but the JArchitect documentation lists dozens more. Below is a sample visualization with a treemap showing the number of instructions per method.

JArchitect snapshot showing a treemap with the number of bytecode instructions per method.

These metrics are the ones we initially used to determine the health of the codebase from the developers, fine grained, perspective. Again, choose whatever ones make the most sense for your team.

Once metrics are defined the next step is to figure out where to get started. The natural place to start is to look at the metrics and find the worst parts of the codebase and tackle those, right? Well… not really. Sure, refactoring a huge class, breaking it down into smaller components and adding tests, will make that code easier to maintain in the future. These improvements will also show up in our dashboards as an improvement but that’s not guaranteed to be the most effective change for our team, especially in the short term.

The problem with looking at static code analysis results is they can’t really say what is impacting developers the most. A class may contain a lot of duplication, high cyclomatic complexity or even be a God object but if that class is a part of the system that has been relatively stable, not requiring many changes, then that is not the best place to start spending time on in my opinion.

Git can offer the additional insights that are missing, the temporal data to determine where the team is spending most of their time. A couple of good indicators are the number of commits per file and the age in months since a file was modified. Below are two charts with real data from one of our projects.

Git log analysis showing a small number of files with a high number of commits and a long tail with almost no commits at all.

Git log analysis showing how long ago files were last changed. Most files haven’t been changed in the last two years.

The commits per file chart shows how a small number of files received the overwhelming majority number of changes. The Age in Months chart shows that a great majority of the files haven’t been changed in over two years. The files that show up in the far left of both charts are excellent candidates for further investigation. Now, cross-referencing these results with the hot spots revealed by the static code analysis tools produces a solid list of components that, when refactored, will have a high impact on the quality of life for the team.

The overlap of code highlighted by the metrics, code frequently changed and that was modified recently makes for a great place to start an investigation.

I find teams will have a very good gut feeling about which places of the codebase will be the most impactful if fixed. Having the data to backup these assumptions will take the confidence to the next level, especially by stakeholders such as managers, directors etc.

How to Get Started

Going back to the financial debt analogy, we need to find where to start making the incremental changes that are going to get us out of debt. Teams, like most people, can’t afford to put everything on hold, they have to continue to do everything else that keeps the lights on, that keeps the business running while they start to introduce changes to get back on track. Even if they could afford to do it, they may still not be ready to start at the expense of using their time and resources on the wrong things.

Starting slow is good, it allows for the time to analyze the results and document any patterns learned that work along the way. In our case, we created a few helper classes on top of Mockito to help simplify testing some parts of the system that were really difficult to test and were amongst our areas of interest. This work took a week or so but it made it possible to test classes that hadn’t been tested in years due to the complexity in mocking the dependencies. That was a good investment.

Some components that were at the top of the priority list after our analysis became small refactoring projects. We used them as examples with the rest of the team on how complex changes could be tackled incrementally. Every change was done in small pull requests that attempted to keep the old code and new code running in parallel. I won’t go into the refactoring patterns here but checkout Working Effectively with Legacy Code for more details.

At first, the team might be skeptical of these changes, and you might hear statements like:

“This codebase is hopeless, we might as well just start from scratch”
“We won’t have the time to do this, we’re already behind as it is, management will never let us do it”
“There are too many risks, you might end up breaking something”

Don’t let that drag you down, there’s also going to be some that believe in the cause and start showing changes in behavior early. Make these folks your allies, your champions. Work closely with them, support them and empower them to make the changes that need to be made. That group will grow and you’ll see the power of multiplier effect.

Don’t let the time constraints affect you either. Think about Uncle Bob’s Boy Scout Rule: Always leave the code a little better than you found it. Adding one test to a class may not seem like a big deal but the first test is the hardest one to write. It’s hard because there’s all the initial setup to be done, no examples to follow and no precedent to inspire the current developer to add the next test. At first, it won’t seem like a big deal but give it time and the impact will become really clear, just keep watching the metrics.

Stuff Will Break

Stuff will break, that’s a fact! The question is how the team will deal with it. Taking on legacy codebases and dragging them into the future isn’t easy. Ideally one would write tests before every change made, in order to define the base behavior and make sure that holds after the changes. This isn’t always possible and at times you might want to take some calculated risks, actually that might be the first step so that you can even write tests at all. Second, that might be code that no one understands which you might want to remove or refactor. That’s what source control is for, make the changes and revert them if it doesn’t work.

All of that isn’t to say a safety net isn’t required. One way of creating a safety net of sorts, when unit tests aren’t there, is to create a suite of integration tests. Testing your software though its external interface, be it a user interface or an API, is easier (at least initially) than trying to get good unit test coverage. Integration tests are generally slower and more prone to fail due to environmental issues, like latency for example. That’s why Martin Fowler’s test pyramid recommends having more unit tests that are fast and reliable instead of relying too much on integration tests. As the project progresses, the integration tests can be replaced by unit tests wherever possible.

Code Reviews for the Win

Once the transformation begins, code reviews will become your best friend. In our case, that was the best way to keep sharing the work that was being done as well as starting the conversation about how to write better code, how to follow good patterns and how to test challenging parts of the system.

Code reviews aren’t meant to impose your style, to force people to do what you want or to demand changes to the code. Use them to ask questions, to understand how developers are thinking about the code and sometimes to provide alternative ways in which problems can be solved. Be patient, the code doesn’t have to be perfect, it should just be trending in the right direction, it should be showing improvement. It’s better for the code to be at 70% of the ideal state but have the developers have ownership over the solution than for you to obsess over every little detail. In other words, choose your battles wisely, let the hard opinions for when they are absolutely necessary, for everything else, prioritize your team’s ownership.

How Far We’ve Come

We’ve been on this journey for two years at the time of the writing. We’re not even close to paying all of our debt, and there’s still a lot to be done, but our “quality of life” has changed completely and so has the way we work. When we started, most pull requests didn’t have tests on them, actually a good chunk of our team hadn’t written unit tests at all.

Currently, most of our pull requests are at 85–100% coverage. New people on the team that are completely oblivious to our old state are creating pull requests with 100% coverage on their first week. That’s just our new normal. Thanks to one of the developers on our team, the code has been completely reformatted and it’s following a sensible style guide that is enforced on every build in our CI. Code duplication, which was at about 10% is consistently at 0%. We went from roughly 800 unit tests written over many years by a few heroes to almost 7000 written by each one of us.

The following image shows the real metrics of our current (as I write this article) leak period without any embellishments. As you can see, it’s not perfect but it feel really good because we know the progress we’ve made.

Our actual metrics on the current leak period.

Honestly, I don’t care about hitting 100% coverage or any other specific number really. The most important thing is that our team knows to test the code that increases the confidence in the releases. We are all empowered to make decisions about what is right, the metrics are just guidance.

Conclusion

Hopefully this article gave you a little insight on how to run your own revolution and to dust off that old codebase that everyone says it’s a lost cause.

The main takeaway from this should be that you don’t need to fix everything, just what has the most impact on slowing your team down. Finding the right metrics is a critical step for you and your team to be confident you’re in the right direction. Additionally, if you need the support of your stakeholders, these numbers will show them you mean business.

Changing the culture on the team is going to be the most difficult part of the work, don’t overlook that aspect or else everything may go back to the old ways once you step away. One of my favorite quotes about culture is: “Culture is what happens when no one is looking”. For the change to be permanent your team has to embrace this new way of working.