Why Blameless Post-mortems?

David Anderson
Sep 25, 2019 · 4 min read

We had a minor outage recently: we made a change to our systems and started serving errors to customers. We rolled back the change within minutes, and all was well again. This kind of thing happens at every company, but not every company is able to improve and learn from these situations.

Image for post
Image for post

After every outage, we write a blameless post-mortem to try and learn from our mistakes. It would be easy to slap a bandaid on whatever broke and move on, but we want to be more thorough. What exactly happened? Why did things go wrong? How do we learn from this and prevent the problem from happening again?

In this instance, I also took the opportunity to do a refresher on what “blameless post-mortem” means. Here’s a lightly edited version of what I told the team.

People are rarely the cause

As a rule, people are not the cause of an outage. The fault lies in the systems and software that should have done something reasonable but didn’t. Most outages are triggered by a change in the system, so there’s usually going to be a human pushing a button that sets things in motion.

We could just say “the reason this happened is that Dave pushed the button.” Instead, try asking:

  • Why did Dave push that button? Presumably, it seemed like a good idea at the time, why was that?

There are a number of ways to get at the underlying systemic causes of outages — for instance the Five whys method pioneered by Toyota, or the Fault Tree Analysis popular in traditional engineering fields like aerospace.

Three key questions to ask

Personally, I’m a fan of a simple three-question prompt that Google’s post-mortem template used:

  • What went right? Document processes that worked as designed, safety systems that did their job, and so on. In post-mortems, this section is usually short, but it’s a chance to document the software and processes that are giving you good value during incident response.

All these things are tools, not algorithms to follow blindly. Think of these as ways to get the conversation started, just like brainstorming is a tool to get people in a creative frame of mind. And regardless of the specific tools you use, aim to push past “a person did a thing”, and get at what in their environment led to that action being reasonable, and so forth.

Why does all this matter?

Blameless post-mortems matter for a couple of reasons. The first is obvious psychological safety. Being blamed for outages creates a crappy working environment, and people are going to look for another job.

More self-interestedly for the company, assigning blame for outages leads people to cover their asses. When that happens, the post-mortem ends up incomplete or outright incorrect, because people are not volunteering all the information about what happened, and that leads to drawing the wrong conclusions. Blaming people for outages makes the company worse at doing what it does, in addition to making it a bad place to be for employees.

And on a more personal note: regardless of these words about it not being a person’s fault that outages happen, I know firsthand that it feels really bad when you’re the one who pushes the button and sets things off. If you’re anything like me, you get a fight-or-flight adrenaline dump, and generally, have a very bad time.

So, when outages do happen, please look out for each other and, if you think it’s necessary, reinforce in that moment that it’s not the fault of whoever pushed the button. It may feel silly to do, because “obviously we all know we don’t blame people for this sort of thing”, but humans aren’t linear creatures that you can program once with information and forget about. A timely reminder will do absolute wonders to that person’s mood and feeling of safety.

Tock

Connecting the world’s great experiences

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store