Refactoring Chapter 2 — Principles in Refactoring (Part II)

5 min readJan 26, 2020

Problems With Refactoring

You need to understand the tradeoffs to decide when and where to apply something. I do think refactoring is a valuable technique, but there are problems associated with it, and it’s important to understand how they manifest themselves and how we can react to them.

SLOWING DOWN NEW FEATURES

There is a genuine tradeoff here. I do run into situations where I see a (large-scale) refactoring that really needs to be done, but the new feature I want to add is so small that I prefer to add it and leave the larger refactoring alone. That’s a judgment call — part of my professional skills as a programmer. I can’t easily describe, let alone quantify, how I make that tradeoff.

I’m more likely to not refactor if it’s part of the code I rarely touch and the cost of the inconvenience isn’t something I feel very often. Sometimes, I delay a refactoring because I’m not sure what improvement to do, although at other times I’ll try something as an experiment to see if it makes things better.

Although it’s often managers that are criticised for the counterproductive habit of avoiding refactoring in the name of speed, I’ve often seen developers do it to themselves. Sometimes, they think they shouldn’t be refactoring even though their leadership is actually in favor. If you’re a tech lead in a team, it’s important to show team members that you value improving the health of a code base. That judgment I mentioned earlier on whether to refactor or not is something that takes years of experience to build up. Those with less experience in refactoring need lots of mentoring to accelerate them through the process.

We refactor because it makes us faster — faster to add features, faster to fix bugs. It’s important to keep that in front of your mind and in front of communication with others. The economic benefits of refactoring should always be the driving factor.

CODE OWNERSHIP

Code ownership boundaries get in the way of refactoring because I cannot make the kinds of changes I want without breaking my clients. This doesn’t prevent refactoring — I can still do a great deal — but it does impose limitations.

Due to these complexities, I recommend against finegrained strong code ownership. Some organizations like any piece of code to have a single programmer as an owner, and only allow that programmer to change it.

My preference is to allow team ownership of code — so that anyone in the same team can modify the team’s code, even if originally written by someone else. Programmers may have individual responsibility for areas of a system, but that should imply that they monitor changes to their area of responsibility, not block them by default.

Such a more permissive ownership scheme can even exist across teams. Some teams encourage an open-source like model where people from other teams can change a branch of their code and send the commit in to be approved. This can often be a good compromise between strong code ownership and chaotic changes in large systems.

BRANCHES

The problem of complicated merges gets exponentially worse as the length of feature branches increases. Integrating branches that are four weeks old is more than twice as hard as those that are a couple of weeks old. Many people, therefore, argue for keeping feature branches short — perhaps just a couple of days. Others, such as me, want them even shorter than that. This is an approach called Continuous Integration (CI), also known as Trunk-Based-Development.

With CI, each team member integrates with mainline at least once per day. This prevents any branches diverting too far from each other and thus greatly reduces the complexity of merges. CI doesn’t come for free: It means you use practices to ensure the mainline is healthy, learn to break large features into smaller chunks, and use feature toggles (aka feature flags) to switch off any in-process features that can’t be broken down.

I’m not saying that you should never use feature branches. If they are sufficiently short, their problems are much reduced. The cost that feature branches impose on refactoring is excessive. Even if you don’t go to full CI, I certainly urge you to integrate as frequently as possible.

TESTING

Mistakes happen, but they aren’t a problem provided I catch them quickly. Since each refactoring is a small change, if I break anything, I only have a small change to look at to find the fault.

The key here is being able to catch an error quickly. This means that in most cases, if I want to refactor, I need to have self-testing code. This also answers those who are concerned that refactoring carries too much risk of introducing bugs. Without self-testing code, that’s a reasonable worry — which is why I put so much emphasis on having solid tests.

LEGACY CODE

Refactoring can be a fantastic tool to help understand a legacy system. But the dragon guarding this happy tale is the common lack of tests. If you have a big legacy system with no tests, you can’t safely refactor it into clarity.

There’s no simple route to dealing with this. A good way is to get the system under test by finding parts in the program where you can insert tests. Creating these parts involves refactoring — which is much more dangerous since it’s done without tests, but is a necessary risk to make progress. Sadly, there’s no shortcut to getting out of a hole this deep — which is why I’m such a strong proponent of writing self-testing code from the start.

I don’t advocate trying to refactor a complicated legacy mess into beautiful code all at once. What I prefer to do is tackle it in relevant pieces. Each time I pass through a section of the code, I try to make it a little bit better. If this is a large system, I’ll do more refactoring in areas I visit frequently — which is the right thing to do because, if I need to visit code frequently, I’ll get a bigger payoff by making it easier to understand.

DATABASES

As with regular refactoring, the key here is that each individual change is small yet captures a complete change, so the system still runs after applying the migration. Keeping them small means they are easy to write, but I can string many of them into a sequence that can make a significant change to the database’s structure and the data stored in it.

One difference from regular refactorings is that database changes often are best separated over multiple releases to production. This makes it easy to reverse any change that causes a problem in production. So, when renaming a field, my first commit would add the new database field but not use it.

I may then set up the updates so they update both old and new fields at once. I can then gradually move the readers over to the new field. Only once they have all moved to the new field, and I’ve given a little time for any bugs to show themselves, would I remove the now unused old field.

End of Part II, continue on Part III

Go back to Index