Refactoring For Clean and Extensible Codebase

Himanshu pandey
Tech @ Trell
Published in
9 min readDec 16, 2021

A few months ago, one of my colleagues wrote an article about the Importance of clean code. This article landed very close to me as one of the services I am maintaining was becoming cluttered and making changes or adding new features was taking longer and longer. I was already in the middle of a minor refactor and decided to go all out.

In this article, I will give you some examples of how we went on refactoring our codebase so things were much cleaner, lighter and how it helped us deliver new features faster. This is still a WIP as the codebase is fairly large and we are still finding things that we can improve on.

This is going to be a long one so brace yourself and…

The If-Else Ladder 🪜

We will start with something simple that we all probably have faced. One of the works our service does is to get data from a Kafka queue and further process it. Before processing, we have to filter trails based on some criteria. Initially, we only had two such filters. Our code looked something like this

Adding our logging and error handling, it looked something like this.

Now, this was fine when we had only two filters but, eventually, the number of filters grew to SIX!!. The code started to look messy. These conditions took around 50–60 lines and made the entire code less readable.

Refactor 🛗

The refactor for this issue is very easy and standard. Many of you might already be familiar with it. The idea here is that we want to group all these “Filters” under the same data type. This way, we can store them in an array and then loop over them.

We start by first defining an interface.

We don’t specifically need an interface here. We can also do this by defining a custom type in Golang. Something like this.

But by doing this, we lose the Name() method which, is very useful while logging.

Now we have to rewrite our filters so that it satisfies our interface. You can see an example below.

By doing this, all our filters are neatly packed in a small package that includes all our current six filters and potentially more in the future. It also makes adding and removing filters easy. Just add or remove the filter to or from the trailFilters array and our new filter is ready to be used.

And congratulations!!! we just converted our if-else ladder into a majestic elevator.

The Great Wall of Weights 🧱

This situation is very similar to the previous one. I included it here to show you how we can achieve our desired results by tweaking pre-existing patterns slightly.

The situation here is as follows. We are given an object and have to calculate a few weights based on some criteria. After calculating weights, we have to put them into a formula and calculate a final score. The catch here is that the formula can change anytime. For example, the current formula might be a+b+c but might change in future to a*b/c.

Here is what our previous code looked like

There were way more weights and more error handling stuff in our production code. The situation was slightly worst than the if-else ladder one.

The stuff about creating an interface will remain the same as previous but, there are two things different.

  • We have to store the result of our weights somewhere
  • Since the weights will be used in a formula, we have to ensure we get the correct weight value when we want it.

To elaborate on the second point a bit. Suppose you store the values in an array. Then, to use their values in the formula, you would have to use the array index to access them. Something like this score = arr[0] + arr[1] + arr[2]. This is not ideal. What if the order of these values in the array changes? Your final score will be massively miscalculated. This is also creating unnecessary overhead both for yourself and someone new who is not familiar with the codebase.

A better way to store these values will be in a HashMap. This will look something like this score = map["weightA"] + map["weightB"] + map["weightC"].

I am not a big fan of using raw strings while doing such operations. There are multiple reasons which I will not dive into here. I will show a slightly better way to achieve this without using strings.

Refactor 🔨

We will first start by creating the interface

This is what our implementation of the interface will look like.

Now, since I don’t want to use raw strings in my HashMap, I will have to define a different data type that I can use.

Hold up, what was that iota? you may ask. Why are imaginary numbers invading my programming career?? you might question. Well, no need to panic. This is just how we define “Enums” in Golang. If you want to learn more about them then you can check this article or this tutorial.

Let’s look at our final product. Things will become much clear

The final result is more clear and concise. This is also much more extensible as we can now add as many weights as we want without bloating our code. The code to initialize weights map can be moved to another method or even another package which will only deal with the initialization, making our code even more concise.

Take a deep breath. Take a break if you want to. The previous two sections were quite heavy. The next one is going to be heavier😈

Reusable Pipeline

Our service is a simple one. It takes data from one data source, processes it and puts it into another data source. Clutter happens when we have to get data from multiple sources simultaneously. And depending on which data source we get the data from, we have to process it slightly differently, specifically query different tables in the database.

I will explain the situation in more detail by an example.

One of our sub-services has three different “pipelines”.

  • MT5
  • D2D
  • YT

In all these pipelines, the work we are doing is the same. Get events from DB, process them, save them into another DB. The only difference is that we have to use different tables for all three pipelines.

Let’s look at the code that we were using to get events from the database. The challenges faced there apply to other parts of our code as well.

This is our model code. If we had only one pipeline, the flow would be like this. Important to remember, this is just a skeleton of our production code. There is much more code written in production like logging, instrumentation, thread management just to name a few.

Now, since we have multiple pipelines to work with, we would need to modify this code to handle them. Currently, there are only two ways we can add new pipelines.

  • By making new FetchEvents() for each of the pipeline. For Example, FetchMT5Events() , FetchYTEvents() , FetchD2DEvents(). Each function calls to its respective table to get events.
  • By having if-else ladder inside FetchEvents() Function. We also need to pass a variable that tells which pipeline we are currently working with. Depending on that, we will call the respective repository.Get[Pipeline]Events().

The second one is a big no in my book. This is what our code will end up looking

We can already see clutter. What happens when we have to add more pipelines.

The First idea is good. Creating separate functions will follow the separation of concerns. Things will be clear. During debugging, our headspace will focus only on one pipeline. But the issue there is that we are repeating a lot of code. As mentioned above, these functions are doing much more in production and replicating all of that is not a good idea.

Refactor

The solution that we found lies in Interfaces again. We can create an interface for our repository that has a method GetEvents() ([]Event, error) and all our pipelines can just define their repositories. Our fetcher will accept this interface and just call GetEvents() method. It does not need to concern itself with what pipeline it is dealing with or how many pipelines there are.

This is how our fetcher will end up looking.

If you are familiar with design patterns you may find this similar to a certain “Strategy Pattern”. And indeed, this is inspired by that pattern. Our repositories are nothing but different strategies for getting data. Whether it is from a different table or a different database.

And this is how our repositories will be defined.

Now, starting new pipelines is this easy.

And adding new pipelines is as easy as just defining new repositories. Your main logic in FetchEvents() need not be touched at all.

All of this applies to ProcessEvents() and ExportEvents() also. They just need to be injected with appropriate repositories and will just work. This is what we did recently. We had to add two new pipelines called GS and CGS. A new intern that I am working with was able to create two entirely new pipelines within 1–2 days because he only had to define repositories and tight some screws here and there.

Ummm… 🤔

People who are still paying attention. Hello there, nice to see you, it was getting lonely here.

You might have noticed a small problem here. Well, our FetchEvents() is calling ProcessEvents() to process the events. So… how does it know which repository it needs to pass to it. It has to know which pipeline it is dealing with and create it. Right?

Thanks for the question gentlemen, I applaud sharp eyes. The short answer to your question is “Dependency Injection” with of course “Interfaces”. The long answer, however… Well, it’s already getting late here. Can’t stay up for too long. We will probably meet in the next blog post. Till then, I am attaching a piece of code that is more close to our production code than what is shown here. Who knows, you might figure it by yourself.

These are the interfaces defined. I have left out Exporter as these two should make things clear.

These are our factory methods used to get Fetcher and Processor.

This is how our fetcher looks from the inside.

And then our main function. The place where it all begins. The code again is a bit more streamlined in production and we are still looking for ways to make it even cleaner.

Parting words 👋

As mentioned above, this refactor is still a work in progress. We are constantly finding things to streamline. The more we refactor, the better abstractions and cleaner implementations we come up with. Hope this will help you in some way if you were looking for some inspiration.

References

--

--