Blame process, not people
The latest project rollout didn’t go so well. The deploy went south, causing the site to be down all morning.
This was situation normal at a previous job. Every deploy went so poorly that we began having a “lessons learned” meeting the day after to try to talk through what went wrong. It would inevitably turn into a finger-pointing blamestorm.
After all, some person was responsible for each aspect of the deploy. It’s hard to avoid calling out a specific person, especially when everyone knows who was responsible. And when you’re put on the spot to explain the failure, you may not be prepared with a good answer and feel like you’re under fire. Instead of working towards answers, people will instead get defensive and try to cover their asses.
So how can an organization learn from its mistakes without creating a negative blamestorming environment? What I do is blame process instead of people.
Humans (in other words, your coworkers) inevitably get tired, careless, forgetful, or just downright lazy. They take shortcuts and make mistakes. If this kind of behavior causes your company’s process to fail, then it’s a bad process. A robust and well-designed process will be resilient to the unpredictability of human behavior.
In this view of the world, all failures are process failures. Human behavior should be seen as an unchangeable fact of life that you must deal with rather than try to change.
Of course, you should always try to educate people about better practices. But don’t assume this will magically fix your process problems. At the end of the day, it’s far easier to change a process than to change human behavior.
Let’s take a code deploy process as an example. At my previous job mentioned above, all deploys were done manually. A sysadmin would log into the production machine, unzip the new code, and follow a step by step process written in a text document, which could be different for each deploy. Inevitably, one of these steps would not be done correctly, and we would all waste time trying to figure out what had gone wrong.
Even though this sysadmin was my friend, it was hard to avoid getting frustrated and calling him out for his mistakes. But the thing is, it was the process that was poorly designed. No human being can reasonably be expected to always be perfect.
An automated deploy process would have completely eliminated the potential for human error in this case. If all you have to do is push a button, there is less potential to screw something up.
Whenever a failure of any kind occurs, ask yourself the following questions.
- Is there a standard process defined? If not, define one.
- What actually caused the process to fail?
- Is there a way to redesign the process in order to make that kind of failure impossible?
- Do we need a completely new process that is less prone to failure?