A primer on the noble craft of refactoring
“What in the world… is this refactoring thing?” The question hung in the air one crisp overcast morning, intermingled with the varied roars and toots of stressed out commuters accelerating to merge onto the Brooklyn Queens Expressway. I’d recently ventured out from the lonely homestead of solo-cowboy programming and moved to a big-ranch team of seasoned developers. The term “refactoring” ricocheted in and out of every other tech-speak paragraph while they wrangled big sections of legacy code and pondered the design implications of introducing a new feature into the system.
What is refactoring?
“It sounds like math jargon, but I can tell from the context that it’s not.”
It’s true, when you stop and think about the word, it sounds like math, namely “factoring” — breaking down a number into its factors. But then you add the “re-” into the mix and it sounds like something you’ve done before, and that you’re doing again. Truth be told, though refactoring has nothing to do with math per se, the link in terminology does provide a hint as to the purpose and practice of refactoring as executed on the grassy plains of software engineering.
In math, to factor a number generally refers to prime factorization: breaking it down into its prime factors. In a sense, that’s often what we’re doing with refactoring. You have a class or method that’s too big or overly complex, so you break it down. You take a chunk of code here and extract it into its own well-named method, you scoop up a collection of methods and fields in a class and extract them into their own more cohesive class, etc. Well-factored code is clean code — easy to understand, easy to test, easy to modify.
What about the “re” part. Here’s where the software term diverges from its etymological math roots. When factoring a number, it quickly becomes impossible to reduce it further when the only remaining factors are primes, since a prime number has no factors except for 1 and itself. When you’ve reached that point, you’re done. This is rarely the case with software. It’s quite possible to never stop refactoring. The code will never be perfect. You’ll always find ways to tweak and spit-polish it here and there. You can refactor and re-refactor and re-re-refactor until pigs fly. We actually need self-control to know when good enough is good enough.
A formal definition of refactoring
While you’re mulling that over, take a look at the closest thing to a formal definition of refactoring that we’re likely to get:
Refactoring (noun): a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.
Refactoring (verb): to restructure software by applying a series of refactorings without changing its observable behavior.
— Martin Fowler, Refactoring (1999)
(I recommend reading those two definitions again before proceeding.)
So according to Mr. Fowler’s excellent definition, the key to refactoring is that you’re tidying up (re-factoring) the internal structure of the software without changing its observable behavior. This can be done within any given layer of encapsulation, but only within one layer at a time.
Don’t change the interface
If you’re refactoring a class, then the class’ users have every right to expect the class’ interface to behave exactly as before. The same goes for a method; if you refactor a method that receives a number as a parameter and returns its square root, the users of the method shouldn’t expect that behavior to change after your little refactoring incursion. Again, if you refactor the GUI code, the GUI itself should look exactly the same when you’re done. To illustrate, if lock engineers suddenly adopt the practice of refactoring and refactor the lock mechanism in your front door, you have every right to expect the doorknob to look and function exactly the same as before. Perhaps the lock internals have been improved with higher quality metals, the pins and tumblers made less likely to jam in cold weather, but you shouldn’t expect to see odd protrusions from the doorknob itself, or to have to turn the knob in the opposite direction to open the door. The external interface doesn’t change with refactoring. If it does, it’s no longer a refactoring — it’s a modification. Never venture outside of the cozy confines of the target object’s interface.
Finding areas in need of refactoring
How do you find areas in need of refactoring? According to Fowler, you follow your nose. Have you ever been trying to make heads or tails of a “clever” piece of code and finally sit back and your chair and yell, “This stinks!” Well then likely you discovered a code smell. As an example, you find that every time you make a change to class A, you also have to make a change to classes B, C, D, and Y. This is truly a bad smell for a number of reasons, and Fowler refers to this code smell as Shotgun Surgery. In order to rid the world of such malodorous code, you must unbuckle your saddlebags, haul out your heavily laden, weather-beaten old refactoring tackle box and lug it over to the source of the smell. Then, with the delicate care of an archaeologist, the patience of a fly fisherman, and the grit of a rodeo king, you apply the refactorings that will eliminate this odious aroma. You might use Move Method and Move Field to move all of the objects that are changing together into the same place. If you find that you can’t find a good home for the poor entangled orphans, use Extract Class to create a new class for them, a new dream home that makes Charles Dickens’ pen itch.
Every time you come upon some poorly crafted class or method and out comes wafting the fetor of eau de odoriférant toilette, you’re presented with a sadistic choice: take the time to banish the smell by refactoring the code you love, or… suffer the children. Mwahahaha. No really. Left untreated, these code smells often result in more work, lots more work, further down the line, and taken as a whole are often referred to as the technical debt of your software. The more features and patches you throw at the code without fixing its underlying structural problems, the more interest accumulates on that debt, and the longer and harder you and your team will have to work to pay it off. In fact it could result in the literal bankruptcy of your project. Of course in many cases you didn’t introduce the smelly code; you inherited the sins of your predecessors through no fault of your own, but you still have a choice: You can let the smell fester and hope that your successors are understanding when eulogizing about you under their breath, or… you can clean the code up now, choose to be the the catalyst for change, the transitional character in the genealogy of your product, the grizzled old engineer that draws a line in the sand and screams defiantly at the stinking, putrid, rotting fish-gut garbage dump of spaghetti code chaos, “YOU — SHALL — NOT — PASS!!!”
Don’t overdo it
This is probably a good place to mention that while you are embroiled in this noble crusade of refactoring, to the project stakeholders it may look as though you’re accomplishing nothing. Why? Because quite frankly, you’re not accomplishing anything that moves the project closer to meeting its objectives. Refactoring is in some ways similar to cleaning up a construction site before and after work. If you don’t do it, construction will take longer than necessary because of lost time looking for tools, tripping over scrap lumber, injuries, etc. But if taken too far, site cleanup can take an exorbitant amount of time and greatly slow down a construction project. If you stop to completely sweep the area clean every time you cut a 2x4, the lumber is probably going to rot before you get a roof on the thing. Much like a skilled craftsman, a skilled developer must always keep the big picture clearly in mind, the Big Why, or he risks getting too sidetracked or even obsessive-compulsive about refactoring. In other words, in professional programming, we don’t write code in order to refactor — we refactor in order to write code.
“But how can I refactor safely, without shooting myself in the foot?”
If you’re brand new to refactoring, you’ll have a tickling, nagging doubt regarding the safety of the whole process. How do I refactor safely? How do I know I’m not introducing bugs into the system when I use, for example, Extract Class, and rip the guts out of a bloated class, carefully forming them into their own new class? The answer of course, is an automated suite of comprehensive tests. Whether you write the tests upfront, or rough them in after development, you need them if you are going to attempt even the simplest of refactorings. As Fowler puts it, good refactoring has a certain rhythm to it: Test, small change, test, small change, test, small change, test. (Note that you test first or you won’t be certain that it wasn’t already broken.) Just imagine Bill Murray in What About Bob? “Baby steps. Baby steps. Baby steps.”
What Refactoring is Not
Now that we’ve seen what refactoring is, we should take a minute to review what it is not.
Refactoring is not fixing bugs
Speaking of tests, this brings me to a good point: What refactoring is not. Refactoring is not fixing bugs. To prove this to yourself, just think again about the definition of refactoring: Changing the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior. If your software has a bug, it is no doubt manifesting itself in some observable way. So if you refactor a bug, you are making the buggy code easier to understand and cheaper to modify without changing its observable behavior. So the bug is still there, you just made the underlying code prettier. How nice.
Refactoring is not performance optimization
Another thing that refactoring is not, is performance optimization. Perf optimization hacks often have a negative impact on code legibility (and very often are unnecessary and premature), so they don’t fit in with the definition of making the software easier to understand and cheaper to modify.
Perhaps understandably, when perusing the catalog of refactorings in Fowler’s book, you might worry about degrading performance with some refactorings. For instance, Replace temp with Query. Oh no! I’ll call a method more than once! It’s true, but more often than not, performance is not negatively impacted in a measurable way. Unless you’re writing a tight loop for an encryption algorithm or trying to cook your own BitCoin mining software, or you’re traveling back in time because you want to program like it’s 1999… it’s not going to matter. Like Donald Knuth said, “Premature optimization is the root of all evil”. If you write beautiful software and discover that some aspect of it is slow, measure the performance; don’t guess. Then You’ll see where the real bottlenecks are. If you find that 10% of the runtime is being absorbed by a single method, now might by the time to introduce a temporary variable or two. This would also be a good time to add a comment explaining why you’re degrading readability or a fellow refactovangelist will sweep through and refactor out your perf hack.
One last thought, if you haven’t read Martin Fowler’s book on the subject, it’s a must read. Indeed, it’s one of the defining books on modern software development.
Enjoy the journey!