Escape the trap! Avoiding traps in software development with systems thinking

This is the second of a two part post on systems thinking in software development. Like any sequel, it’ll make more sense if you’ve already seen part 1 and like any good sequel, it attempts to resolve some unanswered questions posed by its predecessor.

Part 1 introduced 8 systems archetypes as described by Dana Meadows in Systems Thinking — a Primer and described them from my frame of reference as a software engineer. These system archetypes form traps that can lead to negative outcomes in many areas of life, and software engineering or knowledge work in general is no exception. We looked at the management of technical debt, collective code ownership, legacy code, motivation, incentives, code metrics, bug fixing, productivity and Agile and the saw that their challenges are symptomatic of the exact same systemic patterns that we see in global issues, ranging from decreasing fish stocks to addiction to the arms race.

I specifically avoided providing answers in part 1, preferring to stimulate thought through inquiry rather than advocacy. Entire Amazon wish-lists have been written on the topic of addressing these challenges so a full thesis might be beyond the means of a mere blog post but I hope to be able to shine some light on these traps, with the goal of helping you (if indeed you need such help) to illuminate your own path to safety.

So if you’re willing, lets light up the torch and peer in…

The trap room

Laid out in front of you is a scene that resembles an Indiana Jones movie. An assortment of traps lay in wait. One false step, a trip, a momentary loss of concentration, a hasty choice and your insides will be outside before you can say “I hate snakes”.

The 8 system traps are now laid out in sequence before you. We’ll approach them one by one, and if we survive them all unscathed we’ll reflect on what we’ve learned about the approach of viewing software engineering challenges through the lens of systems thinking.

Hold onto your hat…

Policy Resistance

Otherwise known as ‘Fixes that fail’, this trap prevents system change due to the conflicting needs of multiple actors in the system.

The E Bomb

We looked at illicit drugs laws in part 1 and observed that the conflicting needs of users, dealers, suppliers and law enforcement conspire together to dampen attempts to limit supply. We drew similarities between this and a software engineering team’s attempts to effectively address technical debt in an organisation where the needs of developers, managers and stakeholders conflict with respect to code quality.

I used the example of technical debt management not to reinforce the inaccurate stereotype that coders only care about code and managers only care about ‘delivery’, but to illustrate a potential example of policy resistance in software engineering, where change is impeded through conflicting needs. Equally valid (and equally stereotypical) examples might be a developer’s desire for autonomy contrasted with a manager’s desire to maintain control, or a regulatory body’s desire for exhaustive risk mitigation contrasted with a team’s desire to reduce waste. The point here is to illustrate systemic patterns that conspire to inhibit change.

So how do we disarm the policy resistance trap? Let’s start with some questions:-

  • Is the stereotypical pointy haired manager who cares little about technical debt and autonomy inherently evil?
  • Is the auditor who wants to insert a million checks and balances into your workflow doing this just to slow you down?

Can you empathise with these people? What would happen if you did?

The truth is that these folks are products of a system that makes their behaviour rational, understandable and predictable. These people simply have different needs than you. These needs are not inherently better or worse, they’re just different and they’re all part of the system that you inhabit.

What would happen if the energy expended on resistance was instead devoted to seeking out mutually satisfactory ways for all goals to be realised? What if you climbed a ladder and viewed the system from a higher level? Maybe you could re-frame the goal in a way that attends to everyone’s needs.

If you don’t, who will?

If you think about it, a system optimised for speed and speed only is a sub- optimisation, as is a system optimised for technical excellence and technical excellence only. Most systems require balance. In software engineering, low technical debt often means better productivity, which is “win win”. If you reframe the goal at the system level, “win win” becomes simply “win”. Is your challenge really a zero sum game?

So is attending to folks’ needs really the answer?

I could have waxed lyrical about the boy scout rule, TDD or the joys of static analysis and you’d have had an answer on a plate. Would that have met your needs more effectively?

A word of caution. If you start lobbing the “E Bomb” (Empathy) into workplace conversations, you’ll sooner or later have to face up to a common brand of corporate machismo which believes that such sappy wimpery is the purvey of bleeding heart liberals and well meaning hippies who have no place in the cut and thrust world of business. I’d argue that nothing could be further from the truth. Also, I’m Indiana Frickin’ Jones so get the hell out of my way.

Onto the next trap…

Tragedy of the commons

Declining fish stocks, global warming, deforestation: all examples of the tragedy of the commons.

Part 1 described how individual incentives can result in the depletion of a common resource, ultimately resulting in a system that simply can’t sustain itself or the individual. If fishermen are incentivised (through the promise of profit) to catch as many fish as possible, there will eventually be no fish, and no fishermen.

Meanwhile back in the workplace, we weave our very own tragedies with the thread of incentives and the fabric of our common resources. Part one outlined collective code ownership as a ‘commons’ that is susceptible to depletion: depletion in terms of quality, maintainability, cohesion, extensibility etc. I described how this erosion can be especially prevalent if the balance of developer incentives tend towards super quick value delivery rather than long term maintainability.

So what’s the fix? Should we focus on long term maintainability rather than development speed? Would other traps lay in wait if we did?

The key here is balance, the tug of war between long term maintainability and development speed needn’t be a zero sum game. In fact maintainability should reinforce development speed. So why does our pursuit of speed so frequently result in quality depletion? And how can we fix it?

Lets answer this with another question (sorry folks, you were looking for answers weren’t you?):-

Why does my pursuit of doughnuts result in me getting fat?

The answer is that I feel pure joy in the moments after a ‘doughnut event’ and the fat producing effect of one doughnut is tiny and doesn’t manifest until long after the sweetness has left my lips. In other words, the initial feedback rewards me, the results of one transgression are minimal, and there’s a massive delay in the feedback loop between cause and ultimate effect.

Imagine if you bust a shirt button every time you finished a doughnut. How would that effect your Krispy Creme (other doughnuts are available) consumption?

Imagine if an alarm went off every time you used a global variable. How would that effect your awareness of how you were depleting a shared codebase?

I hope you’ve reached your own conclusion here about how to deactivate a tragedy of the commons. What do you do to shorten the length of the feedback loop between your code and its effects?

First you need awareness, then pick your solution: Pair programming, static analysis, code reviews, pull requests, TDD, performance benchmarking, mentoring, training, coaching.

What feedback enhancing tools do you wield to avoid the tragedy of the commons?

Drift to low performance

We’ve all experienced the wide eyed optimism associated with a greenfield project.

“This time it’s going to be different. We’ll do TDD right from the start, we’ll be so SOLID that Uncle Bob himself will christen us the ‘So SOLID Crew’.

Then someone forgets a unit test or hastily commits a class that adheres to the much lauded ‘single irresponsibility principle’. Suddenly the pristine paragon of coding perfection is tarnished and it becomes easier to omit the next unit test while design by contract starts to resemble design by cataract. This is the drift to low performance.

Part 1 reflected on this tendency for standards to drift in legacy code. It’s easy to understand why. It’s so tempting as a developer to focus on the negatives of a legacy codebase and to ignore the fact that it’s been successful and valuable enough to become a legacy. When we focus on the negatives there is little incentive to improve the code. After all, it’s already pretty bad so what harm will one more hack do?

So, what to do Indi?

This trap is fundamentally caused by measuring the current state of a system against its previous state. When the current state is only a little bit worse than the previous state there’s little reason to panic and little reason to take action. This is why the drift to low performance trap is often called ‘boiled frog syndrome’.

What if instead we were to measure the current system state against its best ever state? How might that effect how you perceive the impact of one more hack?

First you’ll need an objective measure of the current quality and complexity of your code. This is necessary as we can’t trust ourselves not to focus on the bad more than the good. What tools and metrics might you use for this? Then you’ll need an agreement among your team to not allow those metrics to drift below a certain point.

Might you be able to use a similar technique to turn this trap on its head, creating a drift to high performance?

Those of you who've read part 1 may now be seething at my hypocrisy. “Seeking the wrong goal” you cry! Well yeah…if a team was incentivised to improve their Sonar metrics over everything else then you’d have a problem of faulty goal seeking. Systems are a delicate balance and optimising one sub-system often causes imbalance in the whole.

This is life.

This why silver bullets don’t exist.

Understand this and my posts will have been worthwhile.

Escalation

Part 1 described the escalation trap as an arms race, where multiple actors are motivated to outdo each other in a crescendo of one-upmanship.

This results in reinforcing feedback loop that may start with positive results but normally ends with a horribly imbalanced system. An example might be a supermarket price war. Each supermarket undercuts the prices of the others resulting in near term happiness for customers but low profit margins for the food industry, and farmers driven out of business.

“War” is the pertinent metaphor here. An escalation in competition until the system breaks or somebody get so hurt that they can no longer continue. Is there a war in your organisation? what exactly is it good for?

War….Huh…What is it good for?

Part 1 described how some organisations attempt to motivate individuals through competition. This is dangerous game as this typically flirts not only with the escalation trap but with ‘seeking the wrong goals’, ‘success to the successful’ and ‘tragedy of the commons’ as well. Have you ever worked for an organisation that publishes monthly lists of the top 10 committers for example? Top ten most prolific unit test writers? Top ten bug fixers? Are teams with vastly different remits compared against each other as if they’re apples being compared with other apples?

Harmless you say? What conditions might need be in place to ensure that this is so?

So lets say you’re stuck in an escalation trap. How do you get out? Well this one’s not so easy. The best cure is prevention. Turn the other cheek. Put a flower in that gun. If you’re unfortunate enough to find yourself in an ongoing war, consider negotiating for unilateral disarmament.

Fancy motivating via competition? Fine, give it a go but just make sure that its not a zero sum game because you may find that success in the game becomes more important to its players than the health of your organisation.

Success to the successful

Part 1 described ‘success to the successful’ as a trap that limits opportunity and privilege to only those who appear to be already successful.

Punish the fail whale?

This sentiment is encapsulated in the common phrase “success breeds success”. The obvious corollary to this is that failure breeds failure.

This trap hides in plain sight within the perceived-to-be benign embrace of ‘meritocracy’ — a political philosophy espoused by many companies which holds that power should be afforded to individuals based solely on their demonstrated merit. This is admirable as is eschews factors such as privilege or wealth in determining opportunity and focusses entirely on individual merit. The idea is that the metaphorical cream rises to the top, regardless of the delineation of the metaphorical cow from whence it came.

Meritocracy is not without problems however, not least in its ability to achieve its stated objectives, but there is a fundamental issue with a system that consistently rewards success and penalises failure. The issue is that such a system incentivises people to appear to be successful at all costs. More than this, it disincentivises failure and in-so-doing disincentivises learning and innovation. Part 1 described this in terms of the fixed and growth mindsets, coined by Stanford psychologist Carol Dweck.

Lets reduce this concept to the absurd and imagine for a moment that we are to apply the concept of success to the successful to software development.

I write my first unit test and in good ole’ TDD fashion I run it before I write the implementation code. It fails. Career over. The end.

Of course this is absurd but it illustrates how the pursuit of tight feedback loops within XP, agile and lean practices is fundamentally about surfacing failure early and often. This is baked-in all the way from TDD and pair programming to MVPs and innovation accounting. How might an organisation’s ability to profit from agility be affected by an unwillingness to flirt with failure?

So your organisation punishes failure. What can you do about it? It’s actually no surprise that many orgs despise failure. It’s normal. You’d have to be some kind of masochist to greet every failure with a grin and a click of the heels. Funnily enough though, the key to seeing failure’s silver lining might be to simply do more of it. Failing more means failing more frequently, and failing more frequently means that the cost of each individual failure is lower and the associated learning allows eventual success to arrive faster.

A similar concept called ‘limited blast radius’ is one described in Henrik Kniberg’s now famous videos about Spotify’s engineering culture. This is the simple idea of ensuring that a failure in any part of your process, architecture or infrastructure affects as few parts of the system as possible. Continuous delivery, circuit breakers, micro services, MVPs are all tools that can be helpful in limiting the negative effects of failure, leaving just the positive learning opportunities remaining. A failure is only really a failure if you learn nothing from it.

Failure needs a rebrand. There’s even a global conference called Failcon that specialises in studying organisational failures and the lessons learned.

Success to the successful is far broader than the narrow context of this discussion. The rich continue to get richer and the poor continue to get poorer. But within the limited confines of your workplace, it it possible for success to come to those who learn from frequent failure?

Shifting the burden to the intervenor

Part 1 described shifting the burden to the intervenor as addiction.

Addiction occurs when reliance on a particular solution to a problem undermines the capacity of the system to support itself. Such solutions often take the form of quick fixes. One might cope with stress by self medicating with alcohol for example. This provides temporary relief but quickly undermines the body’s ability to cope effectively with further stress, resulting in a negative spiral. Simply, shifting the burden is caused by treating the symptoms rather than the cause.

What is your organisation addicted to?

  • Patching bugs rather than the underlying causes?
  • Working overtime (more passion!) to meet deadlines rather than working to achieve a sustainable pace?
  • Hiring more staff to chase resource efficiency rather than addressing flow efficiency?
  • Using command and control rather than trusting and nurturing autonomy?

Each of these quick fixes on the left guarantee that more of the will be required in the future, and all undermine the system’s ability to sustain itself.

How do you stop this?

Healing addiction is hard. It’s painful. Organisational addictions may not bring to mind the same visceral imagery associated with drug or alcohol addiction but I do argue that they still destroy lives. The first step is to acknowledge that you are addicted. The second is to understand why. Only then can you work on rehabilitation.

“The intervenor” is always a solution to a problem. Alcohol for stress, hacks for urgent production issues, overtime for pressing deadlines. What might be the benefits of looking into the root causes of these problems rather than jumping at solutions? What conditions might need to be in place for you to be able to do so?

Retrospectives in scrum are a great opportunity to delve deep into root causes and I’ve personally found exercises such as “the 5 whys” to be invaluable in looking beyond mere symptoms. Frequently however, individuals and teams lack the power or autonomy to address root causes that are beyond their remit. These circumstances are much more difficult to address in my experience, and success depends heaviliy the willingness of individuals at all levels in an organisation to listen, to empathise and to attend to the needs of themselves and others.

As with many illnesses, prophylaxis (prevention) is the best approach and recognising when you’re descending into addiction is vital.

Rule Beating

You have a problem with rule beating if you find something stupid happening because its a way around a rule.

Part 1 described how drivers commonly slow down erratically for speed cameras only to speed up again once the threat to their licence and bank balance is past. The rule looks to the law as if it’s being upheld, but it’s being beaten. This might be described as following the letter rather than the spirit of the law.

Back in software engineering land you may have been subjected to rules that require lengthy sign-off for permission to start work estimated at more than x days. If this resulted in a glut of x-1 day estimates you’ll have witnessed rule beating. @matthewpskelton mischievously and ingeniously suggested after reading part 1 that this exact rule could be used to encourage small batch sizes. This might just work, though I imagine that an organisation which is aware enough of systems thinking to use false bureaucracy to dirty double cross their employees might be able to think of better ways to reduce batch sizes.

Talking about better ways, why are rules needed in the first place? Is it because organisations don’t trust their employees? Is it because organisational needs contradict the needs of employees, requiring behaviour to be brought into line with the threat of punitive action? Is it because organisations find it so difficult to describe their purpose and principles that the only way they can get their needs met is through the threat of punishment?

This last question just might contain the vaccine for rule beating. Purpose and principles trump rules any day of the week. What is your team’s purpose and what principles do you live and die by? Answer this question and you might not need rules and you certainly won’t need to keep a sharpened stick under the desk with which to threaten transgressors. The police force have a a tough job in improving road safety as they have an entire population to contend with. Organisations have it easier in that they can select who works for them. Does your hiring process ensure that the people who you’re courting share your principles and agree with your purpose? If not you’ll have to get busy making rules.

Seeking the wrong goal

You have a problem with rule beating when you find something stupid happening “because it’s the rule”.

The L Bomb

We looked at the legend of King Midas in part 1, who made a wish which he thought would bring him riches but merely brought him gold. Like wishes in fairy a tale, goals have a terrible tendency to produce exactly and only what you ask them to produce, so be careful what you aim for.

If you wish to increase code quality but define your goal in terms of increasing unit test coverage, you’ll increase your unit test coverage. Whether your code quality will increase at the same time is anyone’s guess.

If you wish to generate more value for your clients but you express this goal in terms of velocity increase, your velocity will increase. As for value, who knows? Add in a dash of rule beating and you might also find that the velocity increase is a mirage.

This of course is not to say that aiming to increase test coverage is a bad thing. Neither is aiming to improve development speed, or to hit enforced deadlines or to become ‘more agile’ or to improve your findbugs, checkstyle, PMD or lint metrics. Just don’t expect these goals to change anything else. They might do, but then again they probably won’t. Such goals can be genuinely helpful, important even, but they are rarely the goal. They are however easily measured: something which can’t be said for vague concepts such as “business value” or the extent to which users’ needs are being met.

What about happiness?

What about love?

Whoa there…did I just mention the “L” word? If you’ve tested the waters with the E-Bomb, try the L-Bomb out for size and watch on in amusement as the starch in your colleagues’ collars starts to run.

The infeasibility of measurement shouldn’t however preclude something from being the genuine goal. Furthermore, if proxy goals are leveraged for performance management it’s easy to see how the genuine goal can be subverted by the desire to hit proxy targets.

“It is wrong to suppose that if you can’t measure it, you can’t manage it — a costly myth” ~ W E Deming

Eric Reis’s Lean Startup warned against the folly of vanity metrics: measures that boost the ego but yield nothing in terms of learning. I’m not even going to attempt to offer ‘advice’ on what your true goal is. This is something that only you can answer for yourself.

Just a hint though: If you do summon up the courage to ask the question and the answer comes back as “maximise shareholder value”, please don’t feel ashamed to vomit all over the boardroom table before politely excusing yourself, because frankly we’ve all got better things to do with our lives than that.

Summary

Though maybe, just maybe hold onto that breakfast… shareholder value, as dry and uninspiring as it is, is a desired output of many systems. It might not be the primary goal but everyone in a system has needs. Even shareholders. If you can put yourself in the shoes of the multiple actors in a system and see what they see, feel what they feel, even for a moment, you’ll have done something so vanishingly rare that you’ll shine brightly in whatever you choose to do with your life.

So is Admiral Ackbar right? Surely he knows a trap when he sees one. Are these traps no longer traps?

Of course they’re still traps, but that’s not really the point. The point is in recognising the situations that confront us for what they are, and in the ability to view a system as a whole rather than just the small piece which hovers in front of our noses.

After part 1, a few readers expressed a need for answers. I’ve tried to walk the tightrope between patronising prescription and unhelpful vaguery.

The silver bullet. This doesn’t exist.

The funny thing is that we’re often stuck in a perpetual search for the silver bullet, and a bullet, silver or otherwise is a hopeless metaphor for the solution to systemic problems (and most problems are systemic). A bullet’s trajectory is determined before it’s fired, it travels in a perfectly straight line, and takes out a small, single target. If systemic problems could be solved via such crude means, the world would be a rather different place than it is today. Hopefully this pair of posts has shown that solutions can affect systems in ways beyond initial intention and this is as true in software development as it is in the ‘real’ world. As a software engineer I sometimes feel that the problems that I face are unique to my narrow discipline and it’s comforting to me that in systems thinking I can lean on the experience and learnings from disciplines as diverse as ecology, politics and sociology to help broaden my perspective.

Would you be willing to share your thoughts with me about how this has helped — or failed to help you in your search for answers?

Further reading

Follow me on twitter @smrimell