Legacy stacks and innovation


If you have worked in successful companies, the odds are high that you have worked with legacy code. This is probably the code that founders wrote to get shit done (in the initial stages of the company). This might be the code that a wave of contractors, with absolutely no ownership of their long term implications wrote, and left. This might be the code incompetent engineers wrote, which is buggy as hell. This code should not exist but, companies continue to run on these code. Millions of customers rely on this code. Billions of revenue is generated by this code. So, the easiest thing to do is — we just build on top of it.

This fundamentally is one of the reasons innovation stops. When an employee joins, they need to deliver to stay on track. They tend to do what needs to be done at that moment, in whatever way possible, to just deliver. However, invariably, the lack of understanding of the deeper problems and the band-aid solutions cause numerous bugs to appear. This leads to more bug fixes and more band-aids, and the cycle continue forever.

In the wild, here are some scenarios I have seen -

  1. Features are added to a product by an engineer who started in a company which went public several years ago. The code is monolithic and this person has to add a new feature. She does not (actually, it is impossible) to understand how the code works. Her only window into the code is few tests that treat the code like a black box. She quickly writes a feature based on what data she sees and deploys. It breaks on use cases that only show up live. The feature has to be turned off.
  2. A technology stack of a web services company is built in PHP. They want to build APIs. They put engineers to work on it. But, the engineers are afraid to touch the code because the payment processing side is part of the single monolithic code base, and touching it means that they have to go through PCI audits. Instead of breaking the code into service containers, they write the API functions on top of the the same code base. This makes the code uglier.
  3. An engineer is enhancing the API calls. He wants his Services API to return “country code” in addition to other shipping data. He sees that “country code” exists already in the object model and is set to “NULL”. The guy who had written the code is no longer part of the company. He just adds “country” as an additional field without fixing the NULL field, because he is not sure what it does.
  4. A team works on database migration. They work on migrating billions of entries of user data from a single server to database clusters. When doing the migration, they see that some entries for country code are in caps, and others are not. They make everything consistently lower case. During testing, the QA and other vertical teams ignore this because their scripts ignore case sensitivity. When deployed in production, user accounts get locked up because users enter their country code in caps and it fails validation.

It is insanely frustrating to work on legacy stuff because you are starting in the middle of the story without perspective of how you got where you got. An engineer starting with a legacy system needs a great deal of ramp up time. Even with that, it is not guaranteed that they are going to understand the internals of the system they are going to work on, very well. In this case, how do we innovate?

Innovation becomes a myth here

Though it is ideal that we give employees time to understand and improve stacks (re-writing the parts that require so), test them thoroughly and become experts in what they do, it is never going to happen, and nor is it going to scale. We need to figure out a way to change engines when the plane is flying, and hope that the new engines are easy to fly with. There is no better way to describe it.

At a recent place I worked, we had a monolithic PHP code base that grew in complexity as the company grew. When we seriously introspected ourselves, we knew that we wanted to be a platform and not just a product. The journey from product to platform is freaking hard if you did not build the product with such a vision in mind in the first place. And you cannot innovate to stuff like platform by purely band-aid solutions. An approach that we adopted was to separate the code and decouple services into their own containers. A single barebones code became the web app. Everything not part of the core features moved to their service containers which communicated with the apps via APIs. This gave us the luxury of adding services and features without having to touch the core code. Finally, this also allowed us to write services in any technology or stack suited to the problem at hand, rather than worrying about it having to be in PHP, as part of the monolithic code base. This liberated a lot of engineers as well.

If the employees of a company are dealing with complexity, more than half their time would be spent in figuring out what is going on, rather than innovation. Their mindset was always defensive and cautious which is extreme opposite spectrum of what is required for innovation.

Fixing legacy stacks

Is this BS? Or does such a thing like fixing legacy mess into manageable code exist? How do we even go about doing it? At the risk of repeating myself, this is akin to changing engines when the plane is flying.

  1. If you can separate the code, you need to. The code breakup should result in your core code behaving like a service. It has to be supported by APIs, which lets other services consume and interact with the core service.
  2. If you cannot separate the code into service containers, you need to freeze building on top of that code. You need to write wrappers and APIs on top of that codebase and turn it into a giant service. It is uglier than what it should be, but it should always be ugly, which can be managed.
  3. Instrumentation is critical in understanding the performance of the code, and the areas that have to be improved upon. When we identify the sections of the code, where performance is poor, those functions have to be rewritten. Again this also depends on how many specifications and knowledge we have about that specific feature.
  4. Have your best SWAT team rewrite your core app as a bare bones service, and start creating core services around it. This is a fundamentally critical strategy, which frees your engineers to do amazing stuff down the road and not butt their heads dealing with legacy bugs.

Unfortunately legacy code is real and as an engineer, you will come face to face at some point. And, the uglier and complex the code is, the worst your nightmares are going to be. But here is your chance to kill the monster. Do it and the future generation of coders will salute you for making their lives fun.

Thanks to @CaseySoftware and @ppalavilli for reading the early drafts of this article and providing super feedback.