Refactoring Chapter 6 — A First Set of Refactorings

9 min readJan 23, 2020

I’m starting the catalog with a set of refactorings that I consider the most useful to learn first.

Extract Function

Extract Function is one of the most common refactorings that I do. (Here, I use the term “function” but the same is true for a method in an object-oriented language, or any kind of procedure or subroutine). I look at a fragment of code, understand what it is doing, then extract it into its own function named after its purpose.

If you have to spend effort looking at a fragment of code and figuring out what it’s doing, then you should extract it into a function and name the function after the “what.” Then, when you read it again, the purpose of the function leaps right out at you. To me, any function with more than half-a-dozen lines of code starts to smell, and it’s not unusual for me to have functions that are a single line of code.

Some people are concerned about short functions because they worry about the performance cost of a function call, but that's not the case, optimizing compilers often work better with shorter functions which can be cached more easily.

Keep in mind that small functions like this only work if the names are good, so you need to pay good attention to naming. This takes practice — but once you get good at it, this approach can make code remarkably self-documenting.

Inline Function

Sometimes, I do come across a function in which the body is as clear as the name. Or, I refactor the body of the code into something that is just as clear as the name. When this happens, I get rid of the function.

I also use Inline Function is when I have a group of functions that seem badly factored. I can inline them all into one big function and then re-extract the functions the way I prefer. Also when it seems that every function does simple delegation to another function, and I get lost in all the delegation. By inlining the functions, I can flush out the useful ones and eliminate the rest.

Extract Variable

Expressions can become very complex and hard to read. In such situations, local variables may help break the expression down into something more manageable. In particular, they give me an ability to name a part of a more complex piece of logic. This allows me to better understand the purpose of what’s happening.

If I’m considering Extract Variable, it means I want to add a name to an expression in my code. If it’s only meaningful within the function I’m working on, then Extract Variable is a good choice — but if it makes sense in a broader context, I’ll consider making the name available in that broader context, usually as a function.

If the name is available more widely, then other code can use that expression without having to repeat the expression, leading to less duplication and a better statement of my intent.

Inline Variable

Variables provide names for expressions within a function, and as such they are usually a Good Thing. But sometimes, the name doesn’t really communicate more than the expression itself. At other times, you may find that a variable gets in the way of refactoring the neighboring code. In these cases, it can be useful to inline the variable.

Change Function Declaration

Function declarations represents the joints in our software systems. Good joints allow me to add new parts to the system easily, but bad ones are a constant source of difficulty, making it harder to figure out what the software does and how to modify it as my needs change.

The most important element of such a joint is the name of the function. A good name allows me to understand what the function does when I see it called, without seeing the code that defines its implementation. If I see a function with the wrong name, it is imperative that I change it as soon as I understand what a better name could be. That way, the next time I’m looking at this code, I don’t have to figure out again what’s going on.

Similar logic applies to a function’s parameters. If I have a function to format a person’s telephone number, and that function takes a person as its argument, then I can’t use it to format a company’s telephone number. If I replace the person parameter with the telephone number itself, then the formatting code is more widely useful.

In most of the refactorings in this book, I present only a single set of mechanics. Change Function Declaration, however, is an exception.

- RENAMING A FUNCTION WITH SIMPLE MECHANICS

I then find all the callers of circum and change the name to circumference.

I use the same approach for adding or removing parameters: find all the callers, change the declaration, and change the callers. It’s often better to do these as separate steps — so, if I’m both renaming the function and adding a parameter, I first do the rename, test, then add the parameter, and test again.

- RENAMING A FUNCTION WITH MIGRATION MECHANICS

I find all the calls of the old function and replace each one with a call of the new one. I can pause the refactoring after creating circumference and, if possible, mark circum as deprecated. I will then wait for callers to change to use circumference; once they do, I can delete circum. Even if I’m never able to reach the happy point of deleting circum, at least I have a better name for new code.

Automated refactoring tools make the migration mechanics both less useful and more effective. They make it less useful because they handle even complicated renames and parameter changes safer, so I don’t have to use the migration approach as often as I do without that support. However, in cases where the tools can’t do the whole refactoring, they still make it much easier as the key moves of extract and inline can be done more quickly and safely with the tool.

Encapsulate Variable

Refactoring is all about manipulating the elements of our programs. Since using a function usually means calling it, I can easily rename or move a function while keeping the old function intact as a forwarding function, as you’ve seen in last section.

Data is more awkward because I can’t do that. If I move data around, I have to change all the references to the data in a single cycle to keep the code working. For data with a very small scope of access, such as a temporary variable in a small function, this isn’t a problem. But as the scope grows, so does the difficulty, which is why global data is such a pain.

So if I want to move widely accessed data, often the best approach is to first encapsulate it by routing all its access through functions. That way, I turn the difficult task of reorganising data into the simpler task of reorganising functions.

My approach with legacy code is that whenever I need to change or add a new reference to a data variable, I should take the opportunity to encapsulate it. That way I prevent the increase of coupling to commonly used data. This principle is why the object-oriented approach puts so much emphasis on keeping an object’s data private.

Encapsulating data is valuable, but often not straightforward. Exactly what to encapsulate — and how to do it — depends on the way the data is being used and the changes I have in mind. But the more widely it’s used, the more it’s worth my attention to encapsulate properly.

Keeping data encapsulated is much less important for immutable data. When the data doesn’t change, I don’t need a place to put in validation or other logic hooks before updates. I can also freely copy the data rather than move it — so I don’t have to change references from old locations, nor do I worry about sections of code getting stale data.

Immutability is a powerful preservative.

Rename Variable

Naming things well is the heart of clear programming. But, even more than most program elements, the importance of a name depends on how widely it’s used. A variable used in a one-line anonymous expression is usually easy to follow — I often use a single letter in that case since the variable’s purpose is clear from its context. Parameters for short functions can often be terse for the same reason, although in a dynamically typed language like JavaScript, I do like to put the type into the name

Persistent fields that last beyond a single function invocation require more careful naming. This is where I’m likely to put most of my attention.

Introduce Parameter Object

I often see groups of data items that regularly travel together, appearing in function after function. Such a group is a data clump, and I like to replace it with a single data structure.

Grouping data into a structure is valuable because it makes explicit the relationship between the data items. It reduces the size of parameter lists for any function that uses the new structure. It helps consistency since all functions that use the structure will use the same names to get at its elements.

When I identify these new structures, I can reorient the behavior of the program to use these structures. I will create functions that capture the common behavior over this data — either as a set of common functions or as a class that combines the data structure with these functions. This process can change the conceptual picture of the code, raising these structures as new abstractions that can greatly simplify my understanding of the domain.

When this works, it can have surprisingly powerful effects.

Combine Functions Into Class

When I see a group of functions that operate closely together on a common body of data (usually passed as arguments to the function call), I see an opportunity to form a class. Using a class makes the common environment that these functions share more explicit, allows me to simplify function calls inside the object by removing many of the arguments, and provides a reference to pass such an object to other parts of the system.

This refactoring also provides a good opportunity to identify other bits of computation and refactor them into methods on the new class.

As well as a class, functions like this can also be combined into a nested function. Usually I prefer a class to a nested function, as it can be difficult to test functions nested within another.

Combine Functions Into Transform

Software often involves feeding data into programs that calculate various derived information from it. These derived values may be needed in several places, and those calculations are often repeated wherever the derived data is used. I prefer to bring all of these derivations together, so I have a consistent place to find and update them and avoid any duplicate logic.

One way to do this is to use a data transformation function that takes the source data as input and calculates all the derivations, putting each derived value as a field in the output data. Then, to examine the derivations, all I need do is look at the transform function.

An alternative to Combine Functions into Transform is Combine Functions into Class that moves the logic into methods on a class formed from the source data, and my choice will often depend on the style of programming already in the software. Using a class is much better if the source data gets updated within the code. Using a transform stores derived data in the new record, so if the source data changes, I will run into inconsistencies.

If I’m in a language with immutable data structures, I don’t have this problem, so its more common to see transforms in those languages. But even in languages without immutability, I can use transforms if the data appears in a readonly context, such as deriving data to display on a web page.

Split Phase

When I run into code that’s dealing with two different things, I look for a way to split it into separate modules. I endeavor to make this split because, if I need to make a change, I can deal with each topic separately and not have to hold both in my head together.

One of the neatest ways to do a split like this is to divide the behavior into two sequential phases. Before you begin, you can massage the input into a convenient form for your main processing. Or, you can take the logic you need to do and break it down into sequential steps, where each step is significantly different in what it does.

The most obvious example of this is a compiler. Over time, we’ve found this can be usefully split into a chain of phases: tokenizing the text, parsing the tokens into a syntax tree, then various steps of transforming the syntax tree (e.g., for optimization), and finally generating the object code. Each step has a limited scope and I can think of one step without understanding the details of others.

Splitting phases like this is common in large software; the various phases in a compiler can each contain many functions and classes. But I can carry out the basic split-phase refactoring on any fragment of code — whenever I see an opportunity to usefully separate the code into different phases. The best clue is when different stages of the fragment use different sets of data and functions. By turning them into separate modules I can make this difference explicit, revealing the difference in the code.

Go back to Index