Dark Debt

John Allspaw
8 min readNov 27, 2017

This is an excerpt from the Stella Report, which is the result of the first project “Coping With Complexity” performed by the Resilience Engineering In Business-Critical Software Consortium (“SNAFUcatchers”). A video introduction to the report and the consortium is here.

Dark debt was named that to draw a parallel with dark matter. Dark matter has detectable effects on the world but cannot be seen or detected directly. Matter that can be seen and measured directly accounts for only about 15% of the mass of the universe; the remaining 85% is dark matter.

Contrasted with technical debt, the dark debt metaphor:

  • arises from unforeseen interdependencies,
  • is invisible until revealed by anomalies,
  • is a product of creeping complexity,
  • cannot be seen by looking at pieces, and
  • for which specific countermeasures are too narrow.
  • SNAFUs that cascade to failure arise from dark debt (not technical debt)

(below begins the excerpt from the Stella Report, pg. 24)

4.6 Dark Debt

There was a wide-ranging discussion regarding decisions during development and the liabilities they introduce. In addition to ‘technical debt’ another sort of liability, dark debt, was suggested. This section reviews technical debt and proposes the notion of dark debt.

4.6.1 Technical debt

Origins of the debt metaphor:

In a 1992 “Experience Report”, Cunningham suggested that software development may incur future liability in order to achieve short-term goals in this oft-quoted portion of an object-oriented programming conference proceedings paper:

“Shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise." Cunningham, 1992

The choice of 'debt' as the metaphorical foundation was, according to Cunningham, prompted partly because the system being developed, WyCash, was for use by institutional investors who understood debt as a technical management tool -- one part of their portfolio of regular methods of work.*

The paper in which this suggestion appears was not about debt per se but about the way that object- oriented programming was changing the way in which big systems were developed:

"...changing market demands often require massive revisions which we have been able to accommodate because of the modularity intrinsic in a totally object-oriented implementation. Our customers value our responsiveness as much as, if not more than, our product’s fit to their current needs... Mature sections of the program have been revised or rewritten many times..."

"...key implementation ideas were slow to emerge [during] development... [a] category of objects only surfaced through a process we could call Incremental Design Repair. We found these highly leveraged abstractions only because we were willing to reconsider architectural decisions in the light of recent experience. ...[P]ure object oriented programming... allowed us to include architectural revisions in the production program that would be judged too dangerous for inclusion under any other circumstance." [emphasis added]

This was a time of change, a decade prior to the "Manifesto for Agile Software Development". The waterfall development cycle was firmly entrenched and object-oriented programming still novel.

"The traditional waterfall development cycle has endeavored to avoid programming catastrophe by working out a program in detail before programming begins. We watch with some interest as the community attempts to apply these techniques to objects. However, using our debt analogy, we recognize this amounts to preserving the concept of payment up-front and in-full. The modularity offered by objects and the practice of consolidation make the alternative, incremental growth, both feasible and desirable in the competitive financial software market." Cunningham, 1992

Cunningham's thesis was that the object-oriented programming method created an opportunity to build systems quickly, to deploy them, and from their use to discover new abstractions that could then be incorporated into the software. The advantage that objects and, in particular, inheritance brought to the party was the ease with which these changes could be made.

Technical debt and refactoring

A decade after Cunningham's paper, Fowler (2003) described technical debt as:

"...doing things the quick and dirty way... [After which, i]nterest payments... come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. [emphasis added]

In Fowler's formulation, technical debt is "that which can be corrected by refactoring". Refactoring is

"......is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior."

Fowler and others have developed guides for refactoring (Fowler et al., 1999). Improving internal structure makes the software "cleaner" and, it is claimed, easier to understand, maintain, and modify. In a setting where frequent revision is expected, the benefit of clean software is to make these activities easier. Refactoring is not itself productive because it does not change the software's external behavior. Thus refactoring "pays back" technical debt but does not produce immediate value for users. Technical debt makes development less efficient which makes new dev harder; this inefficiency is -- in the language of the metaphor -- the 'interest' paid on technical debt.

There is a tension here. Taking on technical debt can make it easier to bring improvements to the user quickly but such debt will make it more difficult to do so in the future. Refactoring will remove ("pay back") technical debt and make further development easier but the effort used for refactoring is not available to add new functionality for users. Software development must strike a balance between these two extremes by wise choices based on accurate assessments. Accepting too much technical debt in order to bring product features to the customer may doom the long-term viability of the product by making it impossible to revise in the future. In contrast, concentrating exclusively on keeping the software spotlessly clean may cause the enterprise to miss opportunities for improving the current product and make it less competitive.

Technical debt 25 years on

Like many useful metaphors, technical debt has been expanded and exploited, sometimes in ways that do not do justice to the original notion. (Stopford, B., Wallace, K., & Allspaw, J. (2017) There are now elaborate measurement tools, financial calculators, and programs that seek to quantify, track, and manage technical debt. The ease with which 'debt' and 'interest' can be understood can make the issues surrounding software design, development, and maintenance seem simple and easily managed. Managers with little understanding of software may perhaps be forgiven for so eagerly grasping the metaphor that so strongly resonates with finance. This is perhaps an example of the hazard of metaphor: it can encourage inaccurate or even misleading analogic reasoning.

The theme of technical debt is intertwined with organizational issues. Accounting for tech debt is not done at an organizational level, it's done on a team or individual level. The organization has little idea of how much technical debt it 'carries' in its code and paying tech debt is notoriously difficult to make visible to those setting business level priorities. There is an expectation that technical debt will be managed locally, with individuals and teams devoting just enough effort to keep the debt low while still keeping the velocity of development high.

Critically, technical debt is, by definition, appreciated prior to its creation, visible in code, and can be eliminated by refactoring.

4.6.2 Dark debt

The three anomalies discussed in the workshop arose from unappreciated, subtle interactions between tenuously connected, distant parts of the system. It was proposed during the workshop that the anomalies revealed a particular type of vulnerability that one participant described as “dark debt” because the vulnerability was not recognized or recognizable until the anomaly revealed it.

Events that have the dark debt signature include:

  1. Knight Capital, August 2012
  2. AWS, October 2012
  3. Medstar, April 2015
  4. NYSE, July 2015
  5. UAL, July 2015
  6. Facebook, September 2015
  7. GitHub, January 2016
  8. Southwest Airlines, July 2016
  9. Delta, August 2016
  10. SSP Pure broking, August 2016

In each instance, the failure was generated by mechanisms unappreciated prior to the event. The event revealed the interaction potential of the contributors. Like the anomalies discussed during the workshop, these events were surprises. It takes an anomaly to bring the contributors and the interactions into view.

Whence cometh dark debt?

Dark debt is found in complex systems and the anomalies it generates are complex system failures. Dark debt is not recognizable at the time of creation. Its impact is not to foil development but to generate anomalies. It arises from the unforeseen interactions of hardware or software with other parts of the framework. There is no specific countermeasure that can be used against dark debt because it is invisible until an anomaly reveals its presence.

Dark debt is a product of complexity. To a large extent, adding complexity is unavoidable as systems change. Systems are designed and constructed from components that are expected to fail. This leads to incorporation of layers of defense against failure. Architectures, distributed systems, failovers, backups, exceptions and exception handlers, encapsulation, and other aspects of IT are explicit recognitions of the potential for failure. These layers contain multiple, constantly shifting, apparently innocuous defects. The logic of design ensures that no single fault can generate an anomaly.

The challenge of dark debt is a difficult one. Because it exists mainly in interactions between pieces of the complex system, it cannot be appreciated by examination of those pieces. After anomalies have revealed the relationships they appear obvious but the appearance is mainly hindsight bias (Woods & Cook, 1999). The existence of dark debt poses a substantial challenge to system owners. Unlike technical debt, which can be detected and, in principle at least, corrected by refactoring, dark debt surfaces through anomalies. Spectacular failures like those listed above do not arise from technical debt. Critics of the notion of dark debt will argue that it is preventable by design, code review, thorough testing, etc. But these and many other preventative methods have already been used to create those systems where dark debt has created outages.

— —

*Cunningham W, personal communication, 2017.

References

Cunningham W. (1992) The WyCash Portfolio Management System. OOPSLA’92 Object Oriented Programming Systems, Languages and Applications, Vancouver, B.C., Canada — October 18–22, 1992. NY: ACM, 29–30.

Fowler M, Beck K, Brant J, Opdyke W, Roberts D. (1999) Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional.

Fowler, M. (2003, October 1). Technical Debt. Retrieved October 4, 2017, from http://martinfowler.com/bliki/TechnicalDebt.html

Stopford, B., Wallace, K., & Allspaw, J. (2017). Technical Debt: Challenges and Perspectives. IEEE Software, 34(4), 79–81. doi:10.1109/ms.2017.99

Woods, DD, Cook RI (1999) Perspectives on Human Error: Hindsight Biases and Local Rationality. In Durso FT, Nickerson RS, Schvaneveldt RW, Dumais ST, Lindsay DS, Chi MTH, eds. Handbook of Applied Cognition. NY: John Wiley & Sons Ltd, 142–171.

--

--

John Allspaw

Currently building Adaptive Capacity Labs with @ri_cook & @ddwoods2 Former CTO, Dad. Author. Guitarist. Cognitive Systems Engineer.