Clean your kitchen, refactor your code

Chores at a high level

12 min readJan 10, 2023

Every so often, you may find yourself in the corner of your project’s code base where you don’t feel comfortable. The thought of making a change there automatically triggers you to go to the coffee machine first instead of your IDE. Maybe it’s the line length, maybe the size of the methods — they don’t seem to fit your monitor, and you are struggling to grasp what is going on in this code. Maybe it’s the naming of things, or maybe the sheer complexity and lack of any sensible abstraction. Or maybe something else, but you don’t like spending time there.

These are all good signs that you have a mess in your code that needs refactoring. This is the discipline of changing existing code to a specific goal without changing its functionality. These can goals include improved readability, reduced complexity, etc. They usually all boil down to making your codebase easier to understand and thus maintain.

Much like with a dirty kitchen, you could have ended up like this in many ways. You could have done it gradually, leaving dish by dish in the sink until it became a breeding ground for germs. You could have been in a rush for several mornings and left the spent coffee in your coffee maker to grow a garden of its own. Someone you live with could have cooked and didn’t clean after themselves. Or, you could have just moved in and found the kitchen in a disastrous state. Very similar things can happen in a codebase.

Why should you care?

Leaving the benefits of basic hygiene aside for a second, maintaining a healthy codebase is essential for several reasons:

As it grows and becomes more complex, it can become increasingly difficult for developers to understand and modify. This can lead to a phenomenon known as “cognitive burden”, where every new change takes longer and becomes more tedious to implement.
If it is not well-maintained, it can be prone to introducing unexpected bugs when new changes are made. This can make it difficult for developers to predict the impact of their changes and can lead to a lot of wasted time debugging.
If it’s not healthy may contain a large number of bugs that are difficult to understand or reproduce. This can make it hard to identify and fix these bugs, leading to a sense of frustration and a decrease in developer productivity.

After all, code is meant to be understood by humans first and machines second. Humans usually deal better with a clean, healthy code. Machines don’t mind unruly code as long as it is correct.

Codebase health can be difficult to define, but several key properties can help quantify it. These include size, complexity, correctness and performance and there are a number of metrics that can help with measuring those.

Codebase’s vitals

While Lines of Code is probably the most basic and straightforward metric by far, it can be used to determine how things are going. It is best to be used in conjunction with some other metric to get the benefit out of it. For example, if you only knew that the lines of a certain project have fallen down, that doesn’t tell you much. It could have been that a developer just used more complex expressions in his implementation, or it could have been a genuine improvement in the readability or correctness of the code.

Various metrics are used to quantify the readability and complexity of software and identify areas that may be more prone to bugs and maintenance issues. Cosmetic problems in code can include issues with formatting, naming conventions, and structure. These can be caught either by eye (i.e. while doing a code review) or by various language or framework-specific tools, such as code linters and formatters, and they can be easily integrated into your pipeline. Additionally, it is important to establish consistent naming conventions, overall style and process for commenting and removal of comments to maintain a readable, simple and maintainable codebase.

Measuring complexity is a different beast. One of the earliest methods for measuring code complexity is the Halstead Complexity Measures. These are based on the number of distinct operators and operands used in the code and include measures such as the number of distinct operators, the number of distinct operands, and the program vocabulary.

While this was handy in the early days of imperative programming, nowadays, these only partially capture the complexity of a code base, missing things like control, structure and data flow. Cyclomatic Complexity was developed around the same time. It measures the number of linearly independent paths through a program’s source code, calculated by counting the number of decision points (e.g., if statements) in the code.

To illustrate the difference in those two metrics, let’s consider the following short example written in Python.

def check_items(items, target):
    for item in items:
        if item == target:
            return True
        elif item < target:
            continue
        else:
            return False
    return None

In this example, we see that the Halstead measures indicate that the function is relatively simple with a small number of distinct operators and operands and easy vocabulary, so it would be relatively easy to understand and maintain. But on the other hand, the cyclomatic complexity is high (with 4 independent paths), meaning that there are many branches in the function, and it could be difficult to understand and test all possible cases. It would be more likely to cause bugs if the codebase is not well understood in case of changes being made to this function in future. It could go like this:

def check_items(items, target):
    for item in items:
        if item == target:
            return True
        elif item < target:
            continue
        elif item > target:
            for nested_item in item:
                if nested_item == target:
                    return True
            return False
    return None

Halstead metrics are almost the same in the updated version, while the cyclomatic complexity increased to 6, or by 50%. We could be in the opposite situation, of course. Consider the following example:

def find_frequent_items(items, n):
    item_count = {}

    for item in items:
        if item not in item_count:
            item_count[item] = 1
        else:
            item_count[item] += 1
    frequent_items = sorted(
        item_count.items(),
        key=lambda x: x[1],
        reverse=True
    )[:n]

    return [item[0] for item in frequent_items]

With just a single path in this code, cyclomatic complexity stands pretty low here, while the Halstead metrics indicate that the function is relatively complex with a high amount of distinct operators and operands, a large vocabulary and high volume, so it would be relatively difficult to understand and maintain.

Things got even more complicated with the evolution of programming styles. A wide variety of object-oriented complexity metrics followed, most of which are heavily critiqued due to their incompleteness (i.e. openness to interpretation) and failure to actually deliver what they promised (check out this study for more on this). That is not to say they are completely useless — it ultimately depends on how you define complexity for your case and what exactly you want to measure.

The Maintainability Index tries to provide a more holistic view of how complex and maintainable your code is by incorporating the cyclomatic complexity and Halstead’s Volume metric together with lines of code and percentage of comments. Cognitive Complexity, on the other hand, is a more recent approach that focuses on the human’s cognitive ability to reason about a piece of code. It takes into account factors such as the control flow of the code, the level of nesting, the level of abstraction, and the presence of any cosmetic issues such as poor naming conventions and code structure.

But just how readable and simple a codebase does not signify if it does the right thing. The correctness of the code can be verified by different types of testing, and Test Coverage can be viewed as a measure of how well covered with automated tests the code is. While there is not always a direct relationship between test coverage and the overall health of the codebase (it is possible to have a clean, well-maintained codebase that is not covered by automated tests), a higher level of test coverage is a good indication that the codebase is in fact healthy. A well-constructed Testing Pyramid (rather than an “ice cone,” where there are many more high-level tests than low-level tests) is often a sign of a well-maintained codebase.

And finally, apart from readability and correctness, you may also want to improve the performance of your code. Different performance metrics can be used depending on the specific implementation and the properties that are of concern. Examples of common performance metrics include response time, throughput, availability, accuracy, and latency. These topics are broad in their own right and will not be covered here.

It is important to use multiple metrics to get a complete understanding of the health of a codebase. Each metric provides a different perspective on the code and highlights different issues that need to be addressed. For example, while cyclomatic complexity can provide insight into the structural complexity of the code, cognitive complexity can provide insight into how easy or difficult the code is to understand and reason about, and low response time could mean a bad user experience. One way to make it easy to monitor these metrics is by integrating them into a CI/CD pipeline. This allows developers to get real-time feedback on the health of the codebase and take action to address any issues that are identified.

It’s worth noting that having too many metrics that are not relevant to the specific project or not easy to understand can add noise and make it harder to get a clear picture of the project’s health. Therefore, it’s important to use relevant metrics that are easy to understand and interpret and that can be easily integrated into your pipelines.

Ok, let’s refactor… but how?

There are two ways to approach refactoring, and they are similar to how you would keep your kitchen clean.

The first approach is to do it frequently, similar to wiping down counters and removing any visible dirt after every use. In coding, this is often referred to as ‘The Boy Scout rule’ or ensuring that code is left in a better state than when it was found. In this method, it’s important to ensure that the changes being made do not require any significant refactoring or introduce new technical debt. By making small, incremental changes to the codebase, developers can ensure that the codebase remains maintainable and easy to understand over time, reducing the risk of requiring a major refactoring effort later. Having good coding conventions in place for your project and practice of code reviews with a combination of constantly tracking and taking feedback from the aforementioned metrics is a good way of achieving this.

But just as a kitchen will eventually require a deeper clean despite daily maintenance, code will also need deeper refactoring over time. Like how a deep cleaning of a kitchen involves tackling hard-to-reach places and reorganising, refactoring also involves a more thorough review and reorganisation of code.

Start with laying the foundation

Define the behaviour
When planning to rework a piece of code, one of the first tasks should be to define the behaviour of the code. This helps to ensure that the changes being made are aligned with the requirements and goals of the project. There are various ways to formalise the behaviour of the code, depending on the project and the team’s needs. Some teams may prefer to use formal diagrams, while others may find that user stories or simple sketches are more effective. The goal here is to have a clear and accurate representation of the behaviour of the code before beginning any refactoring or modification.

Find the edges
Just like when doing a puzzle, you start with finding the frame pieces, when refactoring, you want to frame what it is that you’re refactoring and find its edges. This can be especially challenging when dealing with spaghetti code that is poorly structured or difficult to understand. You should aim to define the boundaries of what you are about to refactor. This may be anything from an API Endpoint, a method or something in between. Whatever it is, once you’ve identified it, you can formalise the interface, its inputs and outputs, which will further help you with the understanding of the code and with its testing.

Ensure its appropriately tested
If you’ve reached the point of refactoring a big piece of code, it’s likely that the existing tests for that code may not be sufficient. In many cases, there may be only a small number of unit tests and a greater number of integration and end-to-end tests (a.k.a. the “Inverted Testing Pyramid”). There may even be low or no coverage at all.

The end game here is to invert the pyramid to sit straight on its base, but for a start, you must ensure that the existing suite will reasonably cover all the behaviour you’re reworking. Having enough tests in place to check your work against will give you and your team the confidence to verify the correctness of the changes made. The number of tests required will largely vary depending on the case, but it’s a good idea to balance the test pyramid where possible with an adequate amount of unit tests, integration tests, and end-to-end tests, as well as manual test scenarios.

Define the architecture
A clear and well-defined architecture is crucial when refactoring a codebase, especially when making significant changes to the existing architecture. It’s likely that you’ve reached this point because you have a vision for how to improve the codebase, but it’s important to clearly outline and communicate this vision to the rest of the team. With a well-defined architecture, it becomes easier to scale and maintain the codebase over time, as well as adding new features. Additionally, it allows for more targeted testing, enabling you to focus on specific components of the codebase instead of testing the entire codebase each time. By clearly defining the architecture, the entire team can work towards a common goal, which is important to ensure that the refactor is successful.

Get to work

Now that everything is in place, it is time to roll up your sleeves and do the work. There are a few refactoring strategies that can be employed in different situations.

Having enough tests and a clear goal architecture in our hands puts us in a position where we can go ahead and implement in a TDD manner.

Local improvements
If your goal is to tackle complexities when refactoring is to use techniques like “Extract Method”/“Composing Methods” and “Simplifying conditional expressions.” These techniques aim to break down complex code into smaller, more manageable pieces, making it easier to understand and maintain. For example, by using the “Extract Method” refactoring technique, a complex method can be broken down into several smaller methods, each with a specific and well-defined purpose. Similarly, by using the “Simplifying Conditional Expressions” technique, a large number of complex conditional statements can be simplified by using techniques such as “Extract Method” and “Replace Nested Conditional with Guard Clause”, making the code more readable, maintainable and easy to understand. You can find a catalogue of similar techniques here.

If your goal is to tackle performance, you could be looking at improving performance can involve various techniques such as optimising algorithms, utilising efficient data structures, and parallelising code. It’s important to note that performance optimization is highly dependent on the specific use case and usage patterns of your system, and, just like with complexity, it’s good to measure the performance improvements each iteration.

Architectural strategies
Large-scale refactoring can be a significant undertaking, and it’s important to minimize the impact on the development and release process as much as possible.

One way to do this is by using Feature Toggles. Feature toggles are a technique where new features or changes to existing features are temporarily hidden behind a toggle, which can be turned on or off as needed. This allows developers to work on new features or refactoring in a separate branch and then merge the changes back into the main branch only when they are ready for release. This way, the development and release process can continue as normal, and the new features or refactored code can be released incrementally, reducing the risk of disrupting the release cycle.

Another technique that can be used is Branch by Abstraction. This technique involves creating an abstraction layer between the old and new code, which allows the new code to be developed and tested in parallel with the existing code. This technique can also be used to gradually introduce refactored code into the main branch, minimizing the risk of disrupting the release cycle.

Ultimately, the specific approach taken will depend on the needs of the project and the codebase, as well as the goals and constraints of the refactoring effort. With a clear goal architecture, sufficient tests, and a solid understanding of the codebase, it’s possible to successfully apply TDD and refactor a large-scale codebase in a controlled and incremental manner.

The takeaway

Refactoring is a process that can be applied to different areas of software development, and there are many tools available to help with it. It is an important aspect of the development cycle and it shouldn’t be done episodically and in isolated cases. It improves the overall quality and maintainability of the codebase, making it easier for developers to work on.

Now go clean your kitchen.