Refactoring Legacy Application — Part II

7 min readOct 7, 2017

In Part I of this series, I have gone over historical background of the first iteration of the product I’ve been working on — Commodity & Allocation Tracking System (CATS). In this post I will go over high level view of the refactored architecture and delve deeper into one of the major components of the new version which is the data maintenance part of the application.

Goals for architecture refactoring

From business requirements point of view, the previous version of CATS addressed pretty much everything. The major challenges were related:

Usability: the application enforces very strict workflow to enforce data integrity.
Performance: as the application was used more and a lot of data is being captured into it, some pages started to take several seconds and even minutes to load causing frustration to users.
Consistency: even if CATS is composed of several modules, each one has different layout and style creating the impression that they are different system. Users feel as if they are using a different system when they move from one module to another.
Lost of code to maintain: in addition to code which implements business requirement, there were hundreds of thousands lines of code to implement permission, localization, user management etc.
Lack of automated tests
Deployment was done manually by logging into the server remotely and copying files to it. No one wanted to do it because it is very laborious and error prone ritual.

Taking the above points in to consideration the refactoring process should help alleviate pain associated with shortcomings in the first version. So I outlined main goals as follows:

There should be a single layout and style for all modules of the application. Deviating form the standard layout and style should be an exception.
All views in the application should load in less than 5 seconds (considering the context the app is used, this was a realistic bar to meet). If a given view is going to take more than 5 seconds then load the mandatory parts first and pull remaining bits in an async manner.
We should not write code to implement localization, permission management, logging and user management.
Every feature should have a corresponding automated test
Deployment should be automated from day one.

In addition I created a checklist of things to do as an experiment which includes using docker containers both for development as well as production deployment and log visualization.

Based on the above assumption and goals, I came up with a high level architectural diagram to address issue in the previous version as well as new requirements for the second iteration.

This architecture looks very ambitious and intimidating at first look, raising concerns if it’s something a 4 person team can pull off. Faced with pressing schedule and only three developers, I decided to address parts of the architecture one at a time focusing on the following aspects:

Model driven web framework to reduce boilerplate code
Separate data maintenance (CRUD) aspects of the app from reporting and business intelligence
Address NFR (non-functional requirements) with existing libraries rather than custom implementation
Automating build, test and deploy workflow as much as possible

Model driven web layer

This is were most of the data maintenance (CRUD) features of the application reside. In addition most users are going to experience CATS through the web frontend. My first instinct and of course recommendation from my team was to refactor the existing ASP.NET MVC application to address its shortcomings. But I wanted to raise the bar a little bit higher and compare the amount of effort and time it would take to refactor the ASP.NET MVC (C#) code base and re-write the web part in a different framework e.g. Rails or Django. My argument to add new frameworks and languages to the mix was on one part to reduce the number of boilerplate code just to have a basic data persisting and retrieving logic. I decided to use Rails for the web layer because of the speed at which one can implement data maintenance use cases compared to the existing codebase. One of the best part of using rails is the sheer number of available gems for just about everything my app needed. The following gems proved to be indispensable during the migration process and cut down my original estimate by a significant proportion.

Ancestry
Devise
Rolify
Pundit
Paranoia
Axlsx
Caracal
Capistrano

Reporting separated from data maintenance

One thing that’s always common in business applications is the need to extract and generate aggregated data in the form of reports and dashboards. The challenge here is that the way databases are designed for efficient write for data maintenance (CRUD) screens is quite different from that of required for reporting. I have always opted in for using normalized data models for data maintenance and creating views for report generation. This time around I wanted to experiment by separating the two by creating a separate database (MongoDB) to store aggregated and denormalized copy of the transactional data. This approach is not innovative by any standard as this is how data warehouses are implemented but what makes this a bit different is the use of Rails model hooks to create/update data for report generation as each record is created/modified in the data maintenance part of the application. The other advantage of this approach is the ability to re-construct reporting database from transactional one at any time. Additionally to avoid coupling between the two components, they communicate using a message based architecture. As records are created/updated in the Rails app, model hooks for create, update and read actions publish message to RabbitMQ so that it can be consumed by reporting API (see architecture diagram above).

Non-functional requirements (NFRs)

Well this proved to be the biggest win! My biggest beef from v1 approach was the fact that most of the NFRs were actually custom built even if there was no need to do that. This is actually one of the reasons why I choose to go with Rails as it provides several options for security, localization, user and role management. This is not limited to the web frontend rails app but as well as the reporting backend api built with spring boot. The good thing is almost all of the application NFRs are addressed with existing libraries and gems without needing custom implementations.

Automation — migration, build and deployment

I have always been fanatic when it comes to automating anything that can be automated. In v1 tasks related with deploying a new version of the app and updating the database require human intervention making deployments dreaded tasks usually resulting in errors. For v2 what paid off was I set a goal for automated database migration, build and deployment from the very beginning making this requirement front and center rather than an afterthought. And having frameworks which are built upon this concept like rails migration and spring boot self-contained jar packages made things easier to approach. Now deploying new version of the app only requires merging changes to mainline branches (either develop or master) through pull requests and issuing a Capistrano deploy command. Everything needed for deployment and versioning is in the git repo — Travis CI configuration, versioning (through git tags), database migrations, custom tasks and Capistrano scripts are all part of the repo.

Results

I’ve witnessed huge results from the architectural effort which are summarized below.

Model based framework reduced the amount of boilerplate and ceremonious code significantly allowing team members focus on business use cases.
Separating CRUD from reporting means that each component can be optimized for performance and usability by avoiding over generalization of implementations.
Use of existing libraries to implement NFRs resulted in less code to test and maintain compared to custom implementation.
Automated database migration, test, build and deployment not only helped remove manual steps requiring a human being to run it but also increased confidence in codebase to do multiple deployments daily.
Flexibility of the refactored architecture opened the door to look into integrating additional form factors like native apps and SMS as data entry and reporting channels.

In summary I believe the architecture for v2 addresses most of the pain from previous iterations but there are still concerns that need to be considered to ensure continued evolution of the project in the future. Some of the things that are on my backlog for further improvement of the architecture include:

To push further on the preliminary work I did in using Docker for both development and deployment of the different application parts. Remember that the first version of the app only had two parts to it — the monolithic web application and database. Now that there are more moving parts added to the architecture it could be intimidating for anyone to begin work on the codebase.
Having several components (web frontend, APIs, message queues, SMS gateway, databases etc.) means that there is going to be a lot of surface area to cover in tracing for errors when they arise. Having a central place to store and manage logs from all the different sources as well as provide a friendly visualization tool such as Kibana or Grafana.

I guess this is what my journey in evolving an architecture which at times required rewrite of some components than refactoring.