We Decided to Rewrite Our Software. You Won’t Believe What Happened Next!
Let’s get this out of the way: Much has been written about why a full software rewrite is a bad idea. I largely agree with the sentiment. It typically is a bad idea. It brings with it high risk and a high rate of failure. You should spend a meaningful amount of time considering the risks involved in a software rewrite before taking the plunge.
For me, I don’t particularly enjoy talking in absolutes and I’m unconvinced that rewriting software is a clear-cut issue. Furthermore, my team is coming up on our one year anniversary of releasing Adobe Experience Platform Launch, a full rewrite of our previous tag manager, Dynamic Tag Management (DTM).
To understand the nature of the project, here is some background:
- The rewrite (actually writing code and not prior design discussions) started approximately four years ago.
- The rewrite had a rough average of six engineers from the beginning to the public release. The average is hard to pin down, since engineers had to spend time maintaining DTM while slowly transitioning to working on Launch. In other words, we started with zero engineering time and slowly ramped up to the full team being dedicated to the effort.
- DTM consisted of approximately 300,000 lines of code.
- While we released the rewrite about a year ago, in some ways the rewrite effort continues today. While Launch now has most features that existed in DTM and many more not in DTM, we are still in the process of rewriting some less critical features that existed in DTM that have not yet made it into Launch.
By most accounts, the rewrite has been considered a success. Through the process, we learned quite a lot about how to execute a successful software rewrite and I’d like to share some of these learnings with you.
Rewrite for the right reason
“This isn’t how I would have written the code” doesn’t qualify as a good reason to rewrite a software project. Not even close.
In our case, DTM allowed Adobe customers to install and configure Adobe marketing technologies on their websites. Since the creation of DTM, the industry landscape and needs of our users shifted in a big way. While DTM did a respectable job at supporting Adobe marketing technologies, the number of non-Adobe marketing technologies grew into the thousands and showed no signs of stopping. Our users pleaded with us to make it easier to install and manage non-Adobe marketing technologies as well. We needed to adapt in a big way or we would become irrelevant.
Being that DTM was a monolithic repository wholly owned by the DTM team, it was obvious that our team would not be able to implement and maintain support for hundreds or thousands of technologies; we could barely keep up with maintaining support for Adobe technologies. We had limited resources and weren’t the experts of individual marketing technologies. We knew we needed to transform into a platform and allow technology vendors and other third-party developers to build extensions that could run on our platform. They know their technology best and they would be able to control their own release cycle.
Shifting to a platform would not be an insignificant task. It wouldn’t be as simple as refactoring a file here or there. It wouldn’t be as simple as refactoring hundreds of files. The concept of a platform required us to rethink how the system fundamentally operated from the ground up, how actors in the system would interact, and how entities would be modeled. It became evident, after analyzing various strategies and long lists of pros and cons, that a full rewrite was the most appropriate approach for our situation.
Transforming from a monolithic, wholly-owned repo into a platform isn’t the only reason that might suggest a rewrite is the way to go. Here are a few others:
- The ramp-up time for new hires has become intolerably expensive and you’ve identified specific ways a rewrite would reduce this considerably.
- The codebase has become extremely fragile due to pervasive architectural problems. Tests have not overcome this fragility or have become intolerably expensive to write and maintain.
- The product is not sufficiently scalable due to pervasive architectural problems and other options for increasing scalability have been exhausted or are prohibitive.
- The technology used for the product makes recruiting unbearably difficult or expensive and is causing high employee attrition.
- Critical features or optimizations can’t be achieved without restructuring underlying architecture.
None of these reasons on their own are a clear indicator that a rewrite is necessary; each needs to be evaluated in the context of the software project.
For example, during my time working at ExxonMobil, they housed a huge mainframe running COBOL that would process all transactions from every ExxonMobil gas station. Any change made to the code was incredibly risky and could not have the luxury of downtime. There was an extremely limited number of individuals who understood both the language and the system, and they were rewarded handsomely for their work.
For this particular software, a rewrite didn’t make sense. It was completely in maintenance mode, it was doing its job sufficiently well, it required infrequent changes, and the risk of a rewrite far outweighed the risk and cost of maintaining the existing system.
Also, consider the size of the project you are considering to rewrite. In Software Estimating Rules of Thumb (a bit dated but still applicable), Caper Jones illustrates that as systems become larger, the cost per line of code increases dramatically.
In other words, as projects become larger, the cost to rewrite becomes exponentially larger.
Take a deep, objective look at why you are considering a rewrite and make sure it’s for the right reasons.
A rewrite is more difficult than a write
It might seem a bit counterintuitive. After all, you now know the pitfalls of your existing system and likely have a better understanding of your use cases than when the existing system was written. Don’t be fooled.
First, it’s extremely difficult to properly evaluate the scope of an existing project. While trying to gather requirements for the rewrite of DTM, I would take a look at a particular user interface and write notes about its requirements — how it behaved and how that might map to the new system. Then, a week later, I would accidentally discover that if I checked two checkboxes in the UI, a subsection would be revealed, which showed additional options. Then, a month later, when I was reviewing code related to the section, I would notice that a certain combination of options would behave differently based on the configuration of some other entity in a different part the system. Pay attention to the details and assume you will miss something — many things — in the process.
Second, it’s easy to forget how much work was put into finding and fixing bugs in the existing system. If you’re rewriting the code, you can be sure you’re going to have your fair share of bugs that you must also find and fix. The stability of your code essentially resets to zero. On the flip side, you may stand to benefit if bugs were caused by poor architectural decisions that can be rectified with a rewrite.
Third, your existing system will likely be maintained or receive new features while it is also being rewritten. Any changes to the existing system must also be accounted for in the new system. Consider these changes in the scoping effort.
Fourth, migration is not insignificant. We’ll talk more about this later.
The existing system will still require work
The existing system isn’t going away when you begin the rewrite. Software still needs be updated to patch security issues. Critical bugs will still need to be investigated and fixed. High-profile customers will still require changes. Other products in your company will still want or need integrations. Budget accordingly.
Release early and often
This principle applies to product management generally, but if the rewrite includes changes in the user interface, APIs, or behaviors, be sure you have early feedback.
With the rewrite from DTM to Launch, most of the core concepts carried over, but some dramatically changed. We also exposed new APIs and modernized our user interface. More than one year before our rewrite launched, we gave access to a group of real users that were intending to use the rewritten product. Half the rewrite wasn’t even completed at that point.
We paid great attention to user feedback. Based on this feedback, we redesigned and rewrote a core piece of the application four times until we got it right and users approved. These users came along with us for the ride and were candid in assessing and validating whether we were doing the right thing. I could sense they considered themselves to be an integral part of the building process — and they were. This was reflected in one of the highest NPS scores from an Adobe product to date.
When our general availability release date came, it was largely a formality. There were no surprises to our users. There were no new features revealed. In fact, there was no code release at all. It was uneventful and low-stress, which is exactly what we desired.
Reid Hoffman, the founder of LinkedIn, imparted some wisdom that I truly believe in:
If you are not embarrassed by the first version of your product, you’ve launched too late.
In our application, we have things called rules and data elements. They’re as integral as emails are to Gmail or posts are to Facebook. When we released Launch, a user couldn’t even delete rules or data elements. There were multiple significant features in DTM that did not yet exist in Launch. It was embarrassing, but it wasn’t an oversight. We knew we had built something that provided sufficient value for users — largely because they told us — and we knew we could add missing functionality soon afterward.
Have a migration plan
In our case, it wasn’t feasible to immediately replace DTM with Launch. DTM continues to exist alongside Launch today. Users must take the initiative to migrate from DTM to Launch.
We did not want to lose our users or make them frustrated, so we built a migration tool that made it as simple as possible for them to migrate resources from DTM to Launch. The user still needs to test the results and will most likely need to make some small changes, but the migration tool has greatly eased the burden of the transition.
Before we released the migration tool, we did a test migration of all customer data. While we couldn’t fully test that the results were logically/behaviorally sound for all users (this is where users must lend a hand), we could determine that data transformations yield expected data and would not throw errors.
It’s interesting to note that many users have decided to bypass the migration tool and instead have taken the opportunity to do some “spring cleaning” by rebuilding their resources in Launch from scratch. In part, this is due to cruft that has inevitably built up in their implementations over time as well as the changing landscape of the web. Another reason is that Launch allows them to do things more effectively than they could in DTM.
Shortly after we released Launch, we published a clear timeline for sunsetting DTM. It’s a tricky balance to get right, and we’ve already made a revision to the schedule. The schedule must be aggressive enough that you’re not spending an unreasonable amount of resources supporting the prior system while being lenient enough to give users time to make the transition. The more carrots you can provide in the new system, the quicker users will migrate. There must be carrots — abundant carrots.
You can bet that competitors will try to take advantage of this situation, especially if users have to take any initiative at all to migrate to the new system. The message that will be portrayed is, “If you have to migrate to a new system anyway, why not move to ours instead?” This can be addressed by (1) providing a superior product and (2) making migrating to the new product a lower barrier than migrating to competing products.
If you’re decommissioning your existing system at the same time the new system goes live (in other words, they never live simultaneously), you’ll have a slightly different set of concerns. Will the cutover be immediate or must there be a migration of data? If it’s not immediate, will the system be down while migration scripts run? How might that impact business? Have you done any testing on real data? What’s the criteria for a successful migration? If migration fails, is there a way to roll back?
Have a solid team
You must have organizational buy-in from the top to the bottom. Communicate the cost and benefits clearly. Have a plan. Define success and failure. Don’t attempt to hide the difficulty or investment required.
Consider the history and members of the team. Is there high turnover within the team? Have all the authors of the existing system left the team? Are team members new to the product or product domain? If you answered “yes” to any of these questions, the risk of a rewrite increases.
In the case of our rewrite, I was able to work with an excellent engineering and product team. About half the members had long-term experience with the existing system. The team bought into the vision and were fully on board. Remarkably, aside from a couple engineers that were moved to a different team early on, nobody left the team during the rewrite and all are still members of the team even now. Some of this is luck; most of this is treating your employees right. Treat employees in such a way that they will want to stay, and plan for when they don’t.
Conclusion
More than a year since Launch…launched, the number of marketing technologies supported through extensions developed by third-party developers is nearing 100 and growing at a rate of approximately 10% per month. Usage is increasing similarly and users have reported a greater satisfaction than with DTM. We could not have achieved this success if we had not taken the step of transforming into a platform and allowing third-party developers to participate. In my opinion, this would not have been feasible without a rewrite.
I hope you’ve found something informative in this article. Again, a rewrite is not for everyone, but I do believe it is for some. Be as realistic, objective, and transparent as possible and I hope you find success in whichever journey you take.
Follow the Adobe Tech Blog for more developer stories and resources, and check out Adobe Developers on Twitter for the latest news and developer products.