Refactoring Walkthrough – adding a service layer
How to code well? How to engineer software for scale?
After having spent a lot of time looking and coding in various code bases in different languages, different companies, different scale, different purposes — I have come to the conclusion that well structured code is the single most important thing for good engineering to brew. Language choice, scale of a startup, infrastructure choices are all secondary. Coding is Art. A good artist is defined not by the choice of tools but by his ability to conceptualise and express his thoughts for the others. As coding-artists, your architecture is your conceptualisation and your code is your expression. Be finicky about style, be OCD about beautiful code. Remember, someone else will be sitting and looking at your code after a few years for hours at a time. It needs to be understandable, look elegant and be soothing to the eyes.
In this post, I will cover my learnings from refactoring code and the best practices I applied from my experience — including better syntax styling, functional programming, building services and its interface definition and testing.
Note – The purpose of refactoring the code was to make it more readable and introduce better abstraction and a service layer. I have stayed away from changing too many things, like the API naming / folder structure, etc.
This exercise was started to add a new API to the existing 2 APIs to get similar data. In doing so, I realised a lot could be structured better. So, this became an exercise of refactoring first, then adding the additional API.
Lets begin ..
First off, lets have a look at the old code. This is written to expose a set of two APIs that deal with getting data from different collections/tables and expose them to the client. Lets take a look at the code now.
The old code as a router file which defines the route name-spacing, and in turn call the associated function from a controller file.
The controller here is responsible for parsing the data, querying the database, manipulating and transforming the data, and sending it back as a response. The database is accessed via an adapter/manipulation layer (***Dml). If you thought of this as the model, then the structure of the code here mimics the model-controller architecture.
Going deeper, there are two APIs —
These in turn use a few helper methods — ,
API and the helpers query for data via the Dml layer using —
Problem – where do I start? I only know the API route and the possible parameters. I don’t know anything else about this other than the code itself.
Problem – multiple database collections queried directly from the controller. Given the complexity, would be nice to have another abstraction layer instead of controllers do everything.
Problem – to rewrite this, I’d need to know the data that the Dml/Model returns. Not documented. Brute force of console-logging outputs.
Problem – long functions with little / no comments describing code blocks. Makes it very hard to comprehend what the flow does.
Problem – Variables used as global storage, hence requiring understanding of the whole code block.
Problem – code duplication between the two functions. If you look at it carefully, you’d realise that they both are very similar. One returns a list of sections, the other returns only one. Still yet, the code makes similar calls to the database and makes similar manipulations.
Net some, this doesn’t look like a code where someone would ‘like’ to go through for some changes. Typical example of it taking 2 days of understanding for a couple hours of work.
New code ..
To solve the above issues, I wrote new refactored code. This time around, I focused on creating a service-level layer for this, and using that service to cater to APIs and other services. This also includes the new API that had to be written.
Lets deep dive.
First off, the way the APIs work are – they need to query a config file, and two more sources of data. These data sources get merged and joined with each other to give the final data set. The two APIs are very similar. One queries a list, the other queries a specific section by its category. Yet still, both duplicate logic by calling database DMLs directly. Doing merges, joins, transformations, conditions, parsing, return object creation all under the API is infact quite a problem.
We should be able to create a service that joins the data sources and creates its own joint schema on top. This allows us to keep the whole data-base querying logic under one layer, and allows us to operate on top of it.
First, lets take a look at how the service would be laid out –
/index.js --- to contain the interface
/service_name.service.js --- to contain the implementation
Here is the schema for a service that gets provider sections/widgets/data options. This defines the service, its the interface of the service.
Now, here is the implementation of the above interface. Using that, API code has been re-written to use this service. Realise how simple and straightforward the code is since all the business logic is wrapped inside the service itself, which further relies on a helper to take away common logic.
Lets park the details in it for now, and instead, go back to re-writting the API. APIs should not have any logic. Their implementation should be validation, transformation on data from other services, and response creation / error handling. So here are the APIs re-written (left) using the above service. They simply utilise the service, extract and parse the required data into the required schema, and return.
Note – There is an additional API (right) which extracts widgets from sections, and combines it with a few other data-sources. While this needs further structuring, implementing the sections part has been very easy using the service.
Now jumping deeper into the code, another important piece of refactoring was how the schema was being extracted from the data base (3 different tables) and merged to give one object. In the old implementation, it starts happening from the very beginning where fields are getting added and deleted with no uniform or documented schema. As a result, the end schema is also very unpredictable.
While refactoring, this is well catered too. There is an explicit conversion of schemas happening where-ever required. Mostly, its broken into 3 steps –
* getting config data,
* getting provider data (from the 2 sets) and merging them,
* joining config data with provider data
This is exactly how the service does it – as different steps. Also, since this is done at a service level, it saves on code duplication, creating a linear chain of code. As in the service implementation, each block of code is independent, and in a linear flow.
Try to keep code structure linear, and branch off right at the end. It saves code duplication, allows better abstraction and layering.
Another important difference is the use and passage of variables. The older implementation makes use of various function-level (global) variables. Different parts of the code mutate those objects and use them as global-variables. This makes it very hard to understand code. It also makes it very error prone, because now the whole code maintains state. The right way to do it is to use a functional paradigm approach, where you create stateless blocks / abstractions. It makes readability better, and removes inter-dependence making it less error-prone.
Notice – code is neatly packed into code blocks, each independent of the other (no variable / global state). Very functional in style.
Notice – proper documentation of the return / parameter schemas for quick understanding.
Notice – proper comments to describe the function of each code block
Notice – USEFUL abstraction to avoid duplication and easy readability.
Service and implementation.
Unit testing the service.