Why having ‘quality standards’ mean you won’t build quality software

David Genn
Technology @ Goji
Published in
6 min readNov 1, 2015
Photo credit: http://n5.fotomaps.ru/tools.php

In an organisation with many development teams, how do we ensure that there is consistency in quality across the software these teams deliver?

As companies move from a small number of large apps to a more Service Oriented Architecture, this often results in many more apps being built, deployed and maintained. Previously, if a bad practice was identified, it may have meant a few apps needed to be updated — now it may mean several dozen need to be fixed. To solve problems like this it can be tempting to introduce automated checkers that monitor apps for known issues or ‘bad smells’.

This approach can be really helpful in stopping well meaning development teams from making accidental mistakes. They can act a bit like an automated checklist. Atul Gawande argued in his book ‘The Checklist Manifesto’ that checklists are a great way of ‘helping brilliant people avoid making dumb mistakes’ (paraphrase mine).

This automated checking for quality can run into problems when its governance is too centralised and the definition of ‘quality’ is taken out of the hands of the development teams. We’re going to look at why having a centralised definition of what a quality app looks like can actually result in lower quality software. Obviously this would never be our intention, so we’ll look at what we can do to help our development teams to intentionally produce quality applications.

Focus on excellence, not avoiding failure

In his excellent book, ‘Turn the ship around’, David Marquet describes his leadership journey as the commander of a US nuclear naval submarine. As you can imagine, quality really matters on a nuclear submarine — not only are you carrying nuclear weapons and powered by a nuclear reactor — you are doing so with 130 or so men hundreds of feet under water. There is very little margin for error in this kind of environment and the submarine crews have to follow very strict procedures and reach exacting standards. To ensure these standards are kept, the crews are subject to regular and stringent inspections and audits.

As Marquet held senior positions on a number of boats throughout his career, he observed that those submarines that were focussed on simply passing the inspections often performed the worst and had the lowest morale. These crews became afraid of failure and this hindered innovation and no-one was willing to take responsibility for fear of being to blame when (not if) it went wrong.

In contrast, the highest performing crews (those that not only performed the best in the tests but also had the highest re-enlistment rates and saw the most promotions) did not focus on the tests. Instead they set their sites on being the best combat crew they could be and then used the inspections as a helpful bench mark for checking their progress.

The same dynamics are at play in software development teams.

If we centralise the definition of what a quality app looks like and then rigidly enforce it, we run the risk of having teams that focus on ‘avoiding failure’ rather than ‘striving for excellence’.

Teams that see avoiding failure as their number one priority usually end up being defensive and do just enough to clear the bar but are afraid of being innovative for fear of the consequences if they make mistakes. Whoever is enforcing the centralised definition of quality is seen as the ‘strict headmaster’ rather than a partner in the quest towards operational excellence.

Making metrics useful

When a measure becomes a target, it ceases to be a good measure. (Charles Goodhart, 1975)

This quote by Goodhart originated in a paper in 1975 on economics about how monetary policy cannot be formulated purely from the metrics that are used to measure it.

Any metric that is used as a target can be ‘gamed’ so things look better than they are.

We see the same effect in software engineering. For example, code coverage is a useful tool for giving teams insight into how they are testing their code. This is a useful metric for the team because they understand the history of the app and probably have a good understanding of how much they trust the tests in the app. If an app has 60% coverage and these tests are trustworthy, if the percentage drops, then this would be a cause for concern. If the tests are brittle and erroneous, a drop in coverage may, under some circumstances, be a good and deliberate thing!

However, if a target is given to this same team to achieve 70% coverage, it would be very easy for the team to add the 10% needed by adding pointless, but easy to write, tests.

Sandro Mancuso, in his excellent book The Software Craftsman, describes a company he once worked for that was hired to keep the test coverage at a certain percentage on a body of code. The team writing the application never used the tests, they were simply there to meet a centrally set target.

Any metric that becomes a target is open to this kind of abuse and as a result the metric loses its value. Even well intentioned teams can skew the way they view a metric and start to game it if there is external pressure to do so.

Instead, we should allow teams to collect the metrics that are useful to them to help them understand their performance. If targets are needed (and sometimes they are, for example percentage up time is often measured in regulated environments) then these should be separate from the team’s metrics and a clear explanation should be offered for why the target is important.

There is often a temptation to set a team an ambitious target in the belief it will help them drive up performance. This is rarely effective and too easily shifts a team into ‘avoiding failure’ mode.

The tragedy of the commons

Garret Hardin is an ecologist who first used the term, ‘The tragedy of the commons’ to discuss the social effect when a group of people have access to a ‘common resource’ eg grazing land and the dynamics that are in play as to whether they will use the resource for the common interest, or whether they will act selfishly.

There is often a belief that people tend towards a selfish mindset. We often believe the same of software development teams — we say, that if they can, developers won’t worry about the effect of their code on other systems or teams, they will only consider the impact on them. In other words, for development teams to build quality applications they need rules and standards and penalties need to be doled out if they break them.

In their book, ‘The Lean Mindset’ , Mary and Tom Poppendieck cite research from Nobel Prize winning economist Elinor Ostrom who set out to question whether people really needed as much centralised regulation as we maybe believe to preserve shared resources. She found numerous counter-examples that demonstrate that self-governance can work very effectively in social groups. Ostrom’s research identified eight characteristics that these groups had in common:

  1. Clearly defined community boundaries
  2. The rules used are well matched to local conditions
  3. Most individuals who are affected by the rules can participate in defining the rules
  4. The community sets up a system for monitoring compliance
  5. A system of graduated sanctions is used
  6. Low-cost conflict resolution mechanisms are available
  7. External authorities respect the right of the community to be self-governing
  8. Governance activities are organised in multiple layers of nested enterprises

All these principles apply to groups of software development teams. Which leads us to the central point of this article:

The software development teams themselves are in the best place to define and enforce software application quality.

Conclusion

Given the three points this article makes (1. Focus on excellence, not avoiding failure 2. Use metrics not targets 3. Self governance works) I’d like to suggest the following set of conclusions:

  1. Software quality matters — we may not be on a nuclear submarine but whatever our software does will be severely impacted if we don’t focus on quality.
  2. The software development teams should be setting their sights high — focussing on achieving excellence, not merely avoiding failure
  3. Metrics like code coverage or error rates can provide rich insight into a team’s performance but lose their value when they become a target
  4. Organised self-governance is the most effective way of creating a culture that both has well defined quality standards and puts the responsibility for achieving it in the hands of development teams

Let’s start trusting that our teams, given the chance, want to produce the best code they can. Let’s take Ostrom’s eight principles to create a culture where teams are authorised and responsible for the quality of our software architecture. Let’s start aiming for excellence, and not settle for simply avoiding failure.

--

--