Understand legacy code by refactoring it

Jordan Chapuy
4 min readAug 25, 2020

--

I’m currently working on an old project with a painful legacy. I’m often dealing with code that I don’t understand and I can’t get help on: other developers don’t have much more knowledge, and Product Owners have lost functional knowledge.

It’s a common observation. Legacy code is surrounded by a loss of technical and business knowledge. The team no longer understands what’s going on in the code, and doesn’t know what it should do functionally.

At such times, the only remaining answer is in the code. The code is the only truth. The truth of the software in production. But understanding legacy code is difficult. So, how do we get it to talk?

My two favorite methods are drawing and refactoring. Recently, I used refactoring to gain understanding, and it inspired me to write about it.

Gaining understanding

If modifying code to better understand it surprises you, let’s look at a (simple) definition of refactoring: it’s improving the code without changing its behavior. Making code more readable, easier to understand, is improving the code. This is exactly what we are looking for.

When I’m stuck on a block of code, I modify it. I try to appropriate it, to gain understanding. If I break something, it doesn’t matter, I go back thanks to version-control.

What kind of refactoring can I do? I’d say anything that helps us. To illustrate my use of the different refactoring techniques, I wrote a small piece of code. This will be our legacy.

I often start with simple actions like renaming. Renaming a variable, a function, a class. I also do some cleaning: by deleting unused code I remove the superfluous.

I like invert if statement. I sometimes have trouble reading negative conditions, especially when it’s a complex one. I first need to understand the condition, map it mentally, then see the negation, reverse the whole logic… and finally get lost. Sometimes I even miss that tiny exclamation point stuck too close to the variable (ring the bell and say shame). So I reverse the if statement. It becomes clearer, I know why I’m going into the if body and it can reduce nesting.

I’m also looking for better structuring and splitting to reduce the mental load necessary to understand. I was talking about ifs just before, I like extract method to make the condition explicit. I also use the extract method in the body of long functions or loops. The code becomes more readable, the function is shorter, and I have insight about the intention. I gain understanding.

For an even more deep restructuring, I sometimes use extract class or move method to organize the functions and improve readability. For example, here I could move some methods or create extensions on the Player class. The design of the code would be improved, but it wouldn’t necessarily be a huge help for my understanding. You have to think about when to stop.

With this kind of modifications, I went from a very dense area of 3000 lines (difficult to map mentally, with a lot of things superfluous for my context) to a few small functions (much better for my mental load). I was able to focus my effort on the parts I was most interested in.

Of course, there are a whole bunch of other refactoring techniques. Take a look at refactoring.com and refactoring.guru websites, they offer an interesting overview. If you want to go deeper, I recommend reading Martin Fowler’s book Improving the Design of Existing Code.

Taking liberties

I don’t necessarily keep my changes. Sometimes my main goal is to gain understanding. Refactoring is a way to achieve this, it is not the final purpose. Therefore, I sometimes take liberties with the code I write.

I’m not trying to make the best code or the best design. For example, I allow myself to pass inout parameters for simplicity. Creating functions with many parameters doesn’t bother me either.

I don’t write unit tests to guarantee that my modifications won’t change the behavior. Again, my goal is to understand. I don’t plan to keep this code, so the behavior won’t be altered.

These liberties allow me to quickly get a better global understanding of a piece of legacy code. Going back is easy with a version-control. Either erase everything or use another branch (allowing small comprehension commits).

Pay attention

Refactoring is a discipline where maintaining the same behavior is essential. Even when the modifications are intended to be erased, we must be careful to avoid making a big mistake that would falsify our understanding by going too fast.

To minimize this risk, I use safe refactoring techniques as much as possible, I keep the same method signatures to have checks at the compilation level. Some IDEs also offer automated refactoring actions that minimize human errors.

If you take liberties to go faster and plan to discard the code, discard it. Don’t get attached to it. With the understanding you’ve gained, your second refactoring will be better.

In a nutshell

I use refactoring to generate understanding. To make the code more readable, to discover the intent. It’s relatively fast and extremely powerful. I appropriate the code and it becomes an interesting starting point to improve the legacy code.

I also uncover points of pain, or areas that raise deep questions. This is equally important to identify possible future problems and share them with the rest of the team.

Next time you are facing a difficult area, give it a try. Anyone can do it, on any language or platform. Refactoring is a known discipline, as well as the techniques used. All you have to do is remember it and try it.

And you, what do you do to understand legacy code? What are your techniques?

--

--