Turning Back the clock on legacy systems

Anand Raman
Kingfisher-Technology
3 min readJun 20, 2024

Having looked at an example where even with modern architectural patterns and tech stack we ended up with a legacy system, let me share another example where evolution went the other way around.

The make-or-break extractor

Nearly a decade ago, we inherited code which generated a product catalogue extract used by the search engine to refresh its index and return search results for an e-commerce site.

Simplified view of key components

The batch code ran as part of a series of scheduled jobs. Even though it was a recent implementation, the code had an abundance of static, helpers and util classes. The code did not follow any clean code or SOLID principles. We never paid it a lot of attention as it used to work until IT DID NOT.

I remember the sequence of events vividly. We were packing bags and leaving for the day when the monitoring team reported that the job had failed. Retriggering the job failed with an unhelpful error message and a NullPointerException. After a few failed attempts, an engineer revealed that if the extract could not be generated before midnight, the search engine would stop returning any results due to stale data, triggering a global high-severity incident. Net/net we were now also under time pressure to fix the issue.

With the clock ticking away, we came up with hypotheses on why the job failed. We argued that a “null” attribute in the product extract is causing the issue. Perhaps wrapping the offending part with a try/catch block could fix the issue. We were about to perform an open-heart surgery when an enlightened engineer stepped in. After hearing us out and the impending doom, he requested some time to investigate before we went ahead with any fixes.

He maintained a level head and paired with an engineer. To our shock, instead of attacking the code, he created a failing test case. Even during a high-severity incident, he maintained his discipline to test the hypothesis before proceeding further. While the leadership was losing its cool, a different type of transformation was happening on the floor. Writing tests can be hard if code already exists and suffers from static block proliferation. This didn’t stop the team. They wrote tests to prove the hypothesis. Only then the code was packaged and deployed to the production environment in time, averting any major incident.

This was the moment when we took a step forward in reclaiming a legacy codebase.

It was a small fix for the NullPointerException, but a giant leap forward for how we approached legacy systems. The test case (written reactively) documented for perpetuity the expected functional behaviour. The tests took the guesswork out of the equation, enabling us to experiment more and retrofit code. As long as the tests passed we knew the functionality would work. We continued to reactively address issues, following a test-first approach, gaining more confidence with every release.

Thankfully instead of spending effort to build a business case to rewrite the legacy app, we regained control, made it fit for purpose and extended its shelf life.

My views on test-driven development also dramatically changed after this incident. Unaware until then, I practised TDD and experienced first-hand how it enables teams to be fearless or regain control over their fears. As “fear” is an indicator of legacy systems, there isn’t a better antidote than tests (even ones written reactively).

When practised consistently, we regain control, drive away fear and even breathe life back into legacy systems.

If you are interested in joining us on our journey, please do check out our careers page.

Thanks for reading

--

--