For many IT operations professionals, the growing notion of “the blameless postmortem” in incident management just feels like a myth.
The idea of holding a blameless postmortem conjures up images of the awkward ceremony that is the holiday dinner: a broad spectrum of tastes, values, viewpoints, and generations seated around the table, carefully stepping through a field of land mines. Everyone tries to suppress their overt judgments and just get to dessert. Everyone smiles and nods — until they don’t. And when that inevitably happens, severe discomfort ensues.
Given this connotation, it’s no wonder their disbelief and disdain are often visceral. And no matter how many times they hear about how “DevOps unicorn companies” are conducting totally blameless, healthy, love-fest postmortems, they question how much of that narrative is a constructed public relations fantasy.
The fact of the matter is, they’re right. There is no such thing as a blameless postmortem.
We’re Wired for Blame
People’s incredulity at this proclaimed blameless utopia is firmly rooted in science. Brené Brown, a sociologist with a Ph.D. in social work from the University of Houston, spent her career studying human interaction, leadership, vulnerability, and courage. Given these are all critical aspects in play during an actionable, productive postmortem, what she has to say about them is instructive.
In a 2010 TED talk, she described the surprising findings of research on the human tendency to blame: It exists as “a way to discharge pain and discomfort,” she said.
In other words, humans are hardwired through millions of years of evolutionary neurobiology and thousands of years of social conditioning to use the technique of blaming as a way to give voice to painful and uncomfortable feelings, in order to effectively disperse them from our psyches.
Swimming upstream against millennia of biology and sociology is a tall order, so it makes sense why we (often secretly) shake our heads when we’re told to “let’s all just agree to not blame anyone at our outage postmortem.” We know deep down that the blameless postmortem is a myth because it asks us to ignore an ingrained part of ourselves, to not use this built-in coping strategy.
The very idea of a blameless postmortem just feels weird to people because we’re hardwired for blame. Blameless doesn’t exist.
Beyond blameless: The Blame-aware Postmortem
The buzz around “blameless” is too loud to ignore, but what are successful “blameless teams” really experiencing? Their postmortems are not blameless, they are blame aware: They collectively acknowledge the human tendency to blame, they allow for a productive form of its expression, and they constantly refocus the postmortem’s attention past it. Another term you can use for blame-aware postmortems, one that I regularly use, is “actionable retrospectives,” where the goal is to have actionable outcomes upon which the engineering teams and business can act.
How can you move your incident postmortems toward blame awareness?
- Be deliberate about postmortems. They must be considered critical enough, organizationally, that you’re willing to halt other work to hold them, and you should schedule one within 48 hours of an incident. I commonly see clients schedule postmortems one to three days after the fact, but then it’s the first meeting to get kicked down the road in a pinch. Suddenly, two or three weeks go by, and by then, it’s a significantly less useful exercise. When you meet for a postmortem, hold people accountable to show up, put electronic distractions away, and actively participate.
- Develop the organizational muscle around postmortem fundamentals. If you have the mechanics memorized — meeting format, deliberate invocation of the space, same data prepared, same expected output deliverables — you can use the cognitive energy saved to pay attention to the softer, and more delicate, human interactions during your postmortem.
- Work as a group at recognizing the human tendency to blame. You’ve probably seen engineers jumping to blame themselves for fat-fingering a command or not monitoring a deployment closely enough. Blame is so pervasive in the way we rid ourselves of that pain and discomfort that it isn’t always directed at others and it doesn’t always carry a negative tone. But however people direct blame, the act itself doesn’t help to foster a deeper understanding of the context of the issue that caused the incident. Quite the contrary, it halts further exploration of how to avoid the same situation in the future.
- Blame-aware, actionable postmortems require a blame-aware culture. To move forward, managers must decide that they care more about improving organizational performance and outcomes when faced with incidents than they do about so-called blaming and shaming, and they must model this behavior. The management at one company I worked with couldn’t understand why its postmortems continued to be unhealthy, despite having worked to make them better. As it turned out, one senior vice president had scheduled weekly meetings where he expected his reports to explain outages and “take responsibility.” This had the effect of pitting members of the team against each other. Since the manager with the most “blame grenades” won, team-level postmortems were, unsurprisingly, effectively blame-munitions factories.
Getting to Blame Aware
Moving from a blameful to a blame-aware culture takes a lot of effort and time, no doubt. But research shows that the positive impact to safety, operator burnout, and ultimately business outcomes is improved using this approach.
Ultimately, the secret of those mythical DevOps blameless cultures that hold the actionable postmortems we all crave is that they actively foster an environment that accepts the realities of the human brain and creates a space to acknowledge blame in a healthy way. Then they actively work to look beyond it.
You succeed in spite of the tendency toward blamefulness, not by pretending it doesn’t exist.