Three Tips for Managing Technical Debt: While Maintaining Developer Velocity (and Sanity)

Technical debt is well-known to every developer. No good app starts as a legacy app with spaghetti code — it takes a journey to get there. And many times that journey isn’t a very long one. With technology evolving constantly, apps and services written even two years ago can be running with an outdated stack or architecture, that becomes buggy, slow and difficult to maintain with every passing day. Once upon a time the LAMP stack was the de facto stack for web applications, and today it’s considered legacy. This is true for once cutting edge technologies, from JBoss to VMware. And it will be true for the code you are writing right now.

Technology is a constantly moving target, and this is what causes even the best engineering teams to accumulate technical debt.

Time to market is often times a serious consideration in first iterations of architecture and design, and the decision to get to market with an MVP and keep iterating after, is the right business decision for a nascent company. However, once adoption starts to grow, it quickly becomes clear that your application wasn’t built for scale or resilience, and begins to show signs of wear and tear — becoming slow or buggy, or even breaks.

The problem with legacy applications is that many times they are the most profitable applications in the company, used by many customers, and migration to newer technologies is no simple undertaking.

According to Martin Fowler:

“ Technical Debt should be reserved for cases when people have made a considered decision to adopt a design strategy that isn’t sustainable in the longer term, but yields a short term benefit, such as making a release. The point is that the debt yields value sooner, but needs to be paid off as soon as possible.”

Technical Debt Horror Stories

Not all companies understand that ignoring early-stage technology debt will result in much more resource consumption at a later stage, when there is no choice but to deal with the issue. Even large web-scale companies learn this the hard way. In 2013, when Microsoft launched a new SaaS version of Visual Studio, they were surprised to find their servers unable to handle the load that the millions of users generated on their new online service, because the system wasn’t built to handle such an influx of simultaneous requests. After a seven-hour outage, the company realized that it had erred in dealing with the product’s technical debt.

Common Approaches to Managing Technical Debt

The approaches to grappling with technical debt vary from company to company. Some companies have a “Quality Week” or hackathons from time to time, where they tackle the most pressing issues. But addressing technical debt as a one-off or periodic problem, is often not effective enough.

The common insight is that such patchwork solutions largely only touch the edges of the issue, and fail address the root of the matter. This kind of superficial activity usually does not deal with the really difficult debt, such as the need to replace an entire technology with a more suitable one, but rather with refactoring and bug fixes of a relatively small scale. The more critical problems will not be solvable in a week or a hack day.

In the end, after a such a week, developers get back to work on the roadmap, and will continue to encounter the issues that arise from the really complex technical debt, and it will continue to accumulate with ongoing work.

One of the ways to cope with technical debt is by minimizing the amount of new debt created by maintaining the status quo, until the technology can be migrated or rewritten. This happens many times, when we’re afraid to touch legacy code for fear of breakage, and we don’t notice when a “minor” merge of a pull request has grave consequences.

Regular maintenance (e.g. running the full suite of tests) or scheduled deployment of legacy services can help us avoid production issues that will have us “digging” through months of pull requests to find out what was broken and when, taking hours or days to restore full functionality to the production service.

Our Philosophy on Technical Debt

We at AppsFlyer are familiar with technical debt, and have found that the most effective way to handle technical debt is by ensuring it is continuously taken into consideration and part of technological decisions being made on an ongoing basis. Technical debt is a natural part of the engineering process. Once the R&D organization understand this, the need to continuously manage the challenges it creates, and mitigate its risks becomes an integral part of all development planning.

Make time for refactoring and dealing with technical debt.

The significant increase in the volume of AppsFlyer traffic — from hundreds of millions of events per day to more than 55 billion HTTP requests today — required us to understand how to build flexible systems, built for elasticity and resilience, and made it clear how important it is to place more emphasis on areas places where there is a large accumulation of technical debt, that can ultimately, if unmanaged, lead to cascading failures.

One of the critical pieces for us is maintaining our DNA as a startup and ensuring developer velocity and a rapid pace of growth and change, while moving into our next phase of maturity as a well-established company. This can really only be achieved when there is a continuous effort to maintain as little technical debt as possible. And this is company-wide culture of code craftsmanship, and engineering excellence. Your system is really only as strong as its weakest link.

And AppsFlyer has taken some real risks when it comes to managing technical debt, from switching out its core data store in production (see this excellent post that dives into that story “Large Scale NoSQL Database Migration Under Fire”), or even rewriting mission-critical services from Clojure to Go to improve functionality. And many times this is in order to be ahead of the curve. If technical debt is postponed for too long, critical breakage can happen when you least expect it, and you’ll find yourselves in panic mode when production is down, and all of your engineering will be diverted for days or weeks to restoring functionality.

Another mistake many companies make is to give the job of managing a legacy applications to a junior developer — and this is a gross miscalculation. Technical debt and legacy applications need to be managed by senior engineers who have a good understanding the entire system as a whole, and how even minor changes made to complex code can affect other mission-critical services.

Three Tips We’ve Found Helpful

To wrap it up, here are some tips we learned the hard way for managing technical debt:

  1. Remember to always be cognizant of the problem. Continuously assess the current status vs. the risk of migrating to newer systems and technologies, and do not postpone the migration indefinitely. The longer it is postponed, the greater the debt will increase, and the problem will become much more difficult to manage and much more threatening to the core business. When legacy code is known to generate many bugs and issues on an ongoing basis, and touching the code can lead to weeks of labor just to repair problems, you will need to start planning how to migrate to new technologies.
  2. Do not be afraid to write code knowing that it is only an interim solution, and will be tossed out in a few months. This is part and parcel to building a more long-term and robust solution. Many times, we need to have a bridge between two technologies, to be able to maintain our SLAs, and business continuity. This is an example of behavior that relates to technical debt in Martin Fowler’s terms of loan and interest, when the goal is essentially important enough to deal with the ‘interest’ later.
  3. Always remember to reward and even celebrate the success of engineers who focus on reducing technical debt, even more than the release of new features. Give the challenge of managing legacy code to senior engineers, who will take ownership of mission-critical parts of the system that directly affect customers and production, and ensure that it is perceived as a vote of confidence and seniority — which is part of the engineering culture you create.

>> Come meet AppsFlyer at DevOps Days Tel Aviv!

And stay tuned for a recap of the excellent talk by our very own Michael Arenzon on Deploy & Destroy Testing environments. More to come….

(We’re hiring, btw.)