Preventing Blame From Entering the Workplace: a DevOps Story

Dale Hopkins
Vendasta
Published in
7 min readSep 29, 2021
The Scapegoat by William Holman Hunt

Blame has no place in a DevOps culture. This is showcased most clearly in the blameless post mortem espoused by Google in their book, Site Reliability Engineering. The reasoning behind this is what I want to discuss today, and it is succinctly captured in Devin Carraway’s parable, “The cost of failure is education.” Of course, this is only true if we have processes in place to allow us to learn from our mistakes and capture that learning. Process management experts like W. Edwards Deming, the father of statistical process management, would have smiled to see the blameless post mortem at the core of our Agile software development practice (because Agile is just like any other process):

Cease to blame employees for problems of the system. Management should be held responsible for faults of the system.

- W. Edwards Deming, Out of the Crisis

So if we know that blame isn’t a part of a healthy software development process, why does it always seem to sneak back into the organization? I’d like to start with a story to illustrate.

Story Time

In June 2020, one of our software development teams wrote about their plans to modernize the technology in their application to better match that of our other applications. This work would be taken on slowly over the coming quarters. Then, in January 2021, the team presented a set of mocks at our Vision & Strategy meeting, showing off new functionality they would be adding to the software. The presentation was steeped in usability and a customer-first mantra, and everyone was excited about this great new feature.

Over the coming months, we were treated to more and more detailed mocks. In May, the team decided to start building the customer-facing features that would leverage the newly modernized infrastructure. The new functionality took a phased approach and would be put into users’ hands in early August. Then the team’s project manager parted ways with the company a few weeks before the initial delivery of the feature.

In order to bridge the transition to a new product manager, we brought in a senior leader to be the interim product manager and to help the team get this hotly anticipated feature delivered. In early August, the team was ready to show the first slice of the software to a set of trusted testers — customers who want to help us shape the product with early access to features.

During the demo, it became clear that the customers were not happy with the slice and went so far as to ask, “Why did you build this?” The feature was a revamped view of their customer relationship data that contained more information and allowed more flexible searching, but it didn’t offer all of the functionality of the old screen. Thus, it turned out that the new feature we were so excited about didn’t satisfy the most important use cases of the customer in its MVP form. This, unfortunately, was the moment when the interim product manager discovered that there wasn’t a written strategy to clearly articulate the “why”. Enter blame.

How Blame Enters

With a transition in product leadership for the team, it became natural to blame the previous product manager for the lack of strategy. It also became natural to blame the team. Why had they proceeded to develop software for over a year without releasing it to the customer? Why hadn’t the team spent more time with its customers along the way to ensure the first slice was a success?

Blame tends to surface in organizations when things do not go as planned and is often paired with the word “accountability”. This isn’t the accountability that Michael Lopp talks about, which refers to an ability to account for one’s actions and decisions. Instead, this version is the accountability that means “who do we blame when things go wrong,” so much so that an accountable person is often referred to as “the single wringable neck” (my least favourite phrase in business).

The logic is that if/when things go wrong, the organization needs to know who is responsible, which is a very natural human reaction to problems. We have to remember that the human brain is not optimized for statistical process control, which Daniel Kahneman points out in his incredible book Thinking Fast and Thinking Slow. Indeed, knee-jerk reactions based on habits or emotions (System I) take less mental effort and come more naturally than higher-level logical thoughts (System II).

So while this is completely natural, it yields the following process for fixing problems:

Typical Process for Problem Solving

Now, imagine for a minute that you are the one affected by a problem (i.e. the customer). What is the problem with the above diagram? It seems obvious that the red portion of the picture adds unnecessary delay to the process, which further deteriorates confidence and trust. If we could skip straight into understanding the problem we would arrive at a workaround faster.

Why Blame Enters

This is where psychology comes into the picture. Fred Kofman talks about the Player vs. Victim mentality in his seminal book Conscious Business. When a problem is encountered, a person can choose to accept their part in creating the problem and thus allow themselves the opportunity to fix the problem. This is how Kofman defines the Player mentality, which is described as having an internal locus of control — they are in control and can influence the situation. A person can also choose to deny any part in creating the problem and thus avoid any blame for the problem. This is how Kofman defines the Victim mentality, which is described as having an external locus of control — they are not in control and can’t influence the situation.

In his book The Coaching Habit, Michael Stainer further breaks down the Victim mentality (i.e. negative thought patterns) based on how a person decides to approach the solution. For people with an external locus of control, it is easy to assume the role of the Victim because being unable to fix the problem renders them passive. It is also possible that they will assume the role of Persecutor, which means they will actively push others — those who are to blame — to fix the problem because they can’t fix the problem themselves. This is reminiscent of the “Silence” and “Violence” roles from the Crucial Conversations framework.

For people with an internal locus of control, there is still a negative thought pattern possible called the Rescuer. This is a person who does not accept responsibility for the problem but chooses to take action to fix the problem. This is a person who believes that someone else is to blame but believes they have to fix that person’s mess.

Thinking Traps

All three of these negative patterns (Rescuer, Persecutor, Victim) require that blame be established upfront before proceeding towards a solution — they need to find the “accountable” person so as to make sure that person is made a scapegoat (as an interesting aside, the etymology of the word Scapegoat is incredible). This laying of blame allows everyone involved to feel that the situation has been resolved (a collective catharsis).

However, in addition to delaying a solution, this line of thought also sends a clear message through the organization: failure will not be tolerated. Unfortunately, this establishes a risk-averse culture and builds a type of perverse accountability often referred to as CYA (Cover Your Ass) where people are careful to keep a record of their actions so as to avoid any blame in future. This is the opposite of the Accountability that we want to see in DevOps as it reduces the ability of people to be vulnerable, which is an important precursor to any creative process. This type of culture quite effectively stymies creative problem-solving in favour of less creative approaches deemed less prone to criticism/blame.

How Can We Fix This?

We don’t need to fall prey to our instincts around blaming. Culture is a powerful tool for establishing social norms and it can be used to extricate blame from the workplace. This lowers the latency of problem-solving and increases innovation (i.e. calculated risk-taking) within the organization. It starts with a shift from blaming people to blaming systems and processes.

When something hasn’t gone to plan, this is exactly the moment when we need to acknowledge people’s efforts and rebuild their confidence. This shift encourages the iterative improvement of the process and over time disincentivizes playing the blame game. This change can originate anywhere inside of the organization but requires clear support from senior leadership as their actions will have a large influence on the culture, especially when it comes to deciding blame’s place inside the office.

In the earlier story involving a miss on customer expectations, we needed to pivot away from blame and restore the trust of the team. This effort started by removing the blame element from the equation. This problem quickly turned into fertile ground for the discovery of process improvements. The question of “Why didn’t you have a clear strategy?” became “How did our process fail to correct this problem?” And the question of “Why did we build such a large chunk of work before demoing to a customer?” became “How could our process bring the customer in earlier?”

While this change in approach certainly made improvements to our processes and the safety of future teams, it will be a long path to restoring the confidence of this particular team after allowing blame to enter their sphere. They say that time heals all wounds but in the software world, I prefer the adage “Shipping Fixes Everything” because it feels good to deliver a great product to customers. After a series of successful deliveries, the confidence of the team will return as long as we are able to successfully keep blame out.

--

--