The Elegance of Messy Code
Software that solves real world problems is by definition a mess.
I rarely encounter a programmer who is excited to work on an existing project. Their eyes light up when they are invited to participate in a “Greenfield rewrite” — when they have full freedom to fix all the mistakes of the past and design something new; but when they are invited to tinker with and maintain an existing program they pass on the opportunity. So do you let them start a lengthy and risky fresh rewrite, or do you crush their spirits and force them to work on legacy code?
I’m here to encourage you to find a third path.
This is a common problem of software development — lots of people have written good and useful commentary on the problem of rewrites, the challenges of Refactoring an existing code base, and taking on a spaghetti codebase.
What I’m going to tell you is something painful: you must understand your code before you can replace it. This is a lesson that has been learned by IT organizations since the dawn of time, but gap analysis projects are never appealing and often sound like drudgery.
Here’s how to make it work.
Begin by Understanding your Current Business
Let’s start off with a proposition. Even if your engineers tell you that you have awful legacy code, is that really true? Why would you rewrite something unless you know what you can actually gain from doing it?
What kind of metrics do you have to show that your business works? How do you judge that your business is successful?
Before embarking on a rewrite project, you must know what you are measuring yourself against. You need to find measurements that reflect the middle of the problem; not the high level abstract value, but sufficiently broad to show the scope of the opportunity.
Some examples of bad measurements:
- Customer Satisfaction Scores (Yes, this is valuable, but customer satisfaction is often not directly related to the quality of the software)
- Number of Tickets (Employees can write tons of tickets, but just writing a ticket doesn’t mean that you have actually identified a real problem)
- Website Viewers (Marketing strategies often generate more interest in your website than the quality of your website code itself)
To develop a good measurement, you need to find a measurement that is precise, unambiguous, and has a direct relationship to the result you care about. If you care about the speed of your factory assembly line, you might measure the speed of the slowest task in the assembly line. If you care about the accuracy of your customer results, you should measure and categorize customer support calls to identify most frequent root causes.
What needs to be rewritten?
I rarely encounter a programmer who doesn’t want to rewrite everything. But the best way to succeed at a rewrite project is to rewrite one component at a time. By focusing your efforts on just one thing, you gain clarity of purpose, the ability to reuse this one component, and the ability to document and write integration tests comprehensively around this one element of your overall software stack.
Real World Examples
Here are some examples of real world rewrite projects that were solved at a significantly smaller cost than a full rewrite. In each of these cases, executive management said “We need to rewrite the whole application!” — but the problem was solved much more quickly by targeting the root cause.
Case #1: An Automobile Sales Company
Problem: It takes a week to code, test, and deploy a release to our core lead generation algorithm using a legacy programming language.
Consequence: When an opportunity is discovered to increase revenue, we often can’t respond quickly enough.
Measurement: I analyzed the changes made in the most recent deployments. The common thread between each one was changing a weighting constant in our algorithm or changing the order in which the algorithms were prioritized.
Action: I developed an XML file that controlled the sequence of our algorithms and the weight that was attached to each one.
Case #2: A Market Research Company
Problem: Exporting our weekly customer data set takes twelve hours.
Consequence: When an error is found in the dataset, we often didn’t have enough time to regenerate it.
Measurement: I added instrumentation to our database layer to record the number of database calls.
Action: I worked with the lead engineer to identify unnecessary queries and refactor the existing code to pass data around rather than refetching it.
Case #3: A Tax Software Company
Problem: Our liability estimation software could not produce results fast enough to keep up with customer data.
Consequence: Our customer’s tax data was often not ready to view until five days after the end of the month.
Measurement: I added time tracking metrics to every section of the code and added a monitor that identified when our data transfer queue fell behind. After a few months of research, we realized that the key problem was that all data had to be migrated from its source database to a reporting database before being used in the estimation process.
Action: I carefully fixed and extended our comprehensive test suite to verify the estimation program’s accuracy, then rewrote just the one component of the estimation system so that it did not require data migration.
Are Greenfield Replacement Projects Doomed?
No, they’re not doomed. I have successfully led these kinds of projects myself, but I did them without making them all-or-nothing projects:
- Identify one key component that is at risk of failure.
- Establish measurement around what is wrong with this component.
- Isolate this component by creating an API, or a data contract, or a sequence diagram around it.
- Create a thin layer in your legacy program that isolates this one component.
- Write a set of real-world integration tests (more on this later!) around that component.
- Develop your rewrite for that component.
- When your tests pass, and when you can validate it, replace that one component at a time.
This project pattern has worked over and over again. It is most useful in companies that are successful: where you have a business you do not want to jeopardize, and where your customers need to be certain that you are not falling down on the job.