The Andon cord and ITSM’s DevOps challenge

If enterprise IT organisations seek to understand and integrate DevOps, they need to be aware of some of the key influences on the movement’s culture, such as its roots in Lean manufacturing.

One famously visible feature of the Toyota Manufacturing System is the Andon cord. Installed at each station of Toyota’s Lean manufacturing lines, this cord was within reach of every production worker. Anyone on the line can pull the cord (or push a button), to indicate that something is wrong. Pulling the cord triggers remedial action. It may even result in the stoppage of the line, enabling the issue to be resolved before production restarts.

With Andon, if something is wrong, it does not need to be escalated laboriously up a chain-of-command. Any worker on the production line, whatever their pay grade or experience, has the right to pull the cord.

As a result, it represents a degree of autonomy which exceeds most industrial convention. This autonomy is one of the cultural building blocks of Lean methods. It is important for any outside observer, seeking to understand a Lean-derived movement like DevOps, to recognise this.

I have frequently discussed IT Service Management with DevOps practitioners. Sentiment isn’t always positive, particularly when certain topics are raised: If Andon is one symbol of the Lean movement, then ITIL’s Change Advisory Board (CAB) meeting is a potent symbol of ITSM for many in the DevOps world.

Remember: a key goal for DevOps teams is the establishment of a high cadence of trusted, incremental production releases. The CAB meeting is often seen as the antithesis of this: a cumbersome and infrequent process, sucking a large number of people into a room to discuss whether a change is allowed to go ahead in a week or two, without in reality doing much to ensure the safe implementation of that change.

This perception is perhaps unfair to a degree, but it has significant grounding in reality. I have myself sat through dozens of these meetings in previous roles. There were occasions when I’d probably have pulled the ceiling down, had there been an Andon cord attached to it.

Fortunately, ITSM has evolved its thinking over the years, and plenty of effort has been made to adapt its messaging. This article from ITIL custodians Axelos, for example, points out that…

Having the CAB review every single change request isn’t efficient, and it’s definitely not common sense. However, having the CAB review change requests of unknown risk, when parts of the business need to be consulted because they might be impacted, makes a lot of sense.

This might seem reasonable at first glance. However, does it really pass what might be called the “Andon test”?

One issue that stands out with the sentence is the notion of changes “of unknown risk”. Much of the DevOps movement’s achievement has been built on the removal of the concept of “unknown risk”, through automated testing and a safe-by-design deployment pipeline. This is illustrated nicely in Gene Kim’s brilliant article about the transformation of HP’s Laserjet firmware team:

To support self-testing builds, they built a set of automated unit, acceptance and integration tests, which would continually be run against trunk. Furthermore, they created a culture that “stopped the line” anytime a developer checked in code that broke the build, broke a unit test, etc.

This is Andon, automated: nothing gets to production if anything is going to break production.

But even if we set aside this point, there is still another significant issue with the Axelos statement: the assertion that CAB meetings “make a lot of sense” for identifying and managing stakeholder risk.

Here’s one problem with this: DevOps champions make the case that DevOps is reducing risk far more effectively than legacy processes. The 2016 State of DevOps Report (a commercially sponsored paper, but one which builds on academic research by Nicole Forsgren and Jez Humble) argues that the difference between “high performing” and “low performing” organisations is considerable. High performing organisations — that is, those who have embraced the best practices of Agile development and Devops — were found to be achieving incredible numbers:

  • 200 times more frequent deployments than low performers
  • 2,555 times faster lead times.
  • 24 times faster recovery times
  • Three times lower change failure rates.

Fundamentally, these are all leading or trailing measures of the effectiveness of the Change Management process. So what does this do for the assertion that a structured CAB meeting “makes sense”? Why should this particular structure be accepted as the means of achieving the goal of determining impacts and involving stakeholders? DevOps seems to be doing just fine without it.

In his book Toyota Kata, author Mike Rother highlights the problems with a focus on existing solutions over new ways of solving challenges:

Toyota opens its factory doors to us again and again, but I imagine Toyota’s leaders may also be shaking their heads and thinking, “Sure, come have a look. But why are you so interested in the solutions we develop for our specific problems? Why do you never study how we go about developing those solutions?” Since the future lies beyond what we can see, the solutions we employ today may not continue to be effective. The competitive advantage of an organization lies not so much in the solutions themselves but in the ability of the organization to understand conditions and create fitting, smart solutions.

There is plenty of opportunity in enterprise IT to find such solutions. DevOps isn’t perfect, because no methodology is perfect. The business structure of an IT organization is broader and deeper than DevOps, and there are many challenges to be solved now and in future. ITSM may offer the best solutions to many of those challenges, but that value needs to be proven, not asserted.