Getting Things Right With Checklists
I’ve always been a very organized person, sometimes to an obsessive degree (don’t touch my stuff!). I like being on top of things. I don’t want to miss any steps when carrying out a task. Naturally, checklists have long been one of my favorite tools to get things done.
When it comes to software development, for example, it’s hard to beat GitHub’s task lists. To plan and prioritize personal tasks, I also like to write checklists on paper. And of course, there are plenty apps out there to choose from, all based on the simple but powerful idea of the checklist.
But although I’ve used different checklists for many years and firmly believe in their effectiveness, I realized that my knowledge of them was superficial at best. That’s why I decided to read The Checklist Manifesto by Atul Gawande, which was recommended to me multiple times.
The book is about Gawande’s journey in designing a checklist for surgery. He, quite surprisingly, draws most lessons from aviation and construction, where checklists are standard practice, and applies them to medicine. He ultimately succeeds (spoiler alert). The final 19-item Surgical Safety Checklist has gone on to show a significant reduction in the number of complications and deaths by decreasing errors and increasing teamwork in surgery.
The following is a summary of what I learned from reading The Checklist Manifesto.
Man is fallible
The first question that comes to mind is why we need checklists in the first place?
The short answer: we humans need checklists because we are fallible. More precisely, we fail at what we set out to do for three reasons:
- Necessary fallibility. Even enhanced by technology, some things are simply beyond our capacity and will remain outside our understanding and control.
- Ignorance. We make mistakes because we don’t know enough about the world and how it works. We still can’t predict the weather accurately. There are still diseases we cannot cure.
- Ineptitude. We fail to apply existing knowledge correctly. Despite our best efforts, we write buggy code. We construct buildings that collapse.
In today’s complex world, our main problem is ineptitude. Gawande writes:
science has filled in enough knowledge to make ineptitude as much our struggle as ignorance. […] Know-how and sophistication have increased remarkably across almost all our realms of endeavor, and as a result so has our struggle to deliver on them.
He continues to depict in great detail how medicine in particular “has become the art of managing extreme complexity”:
To save this one child, scores of people had to carry out thousands of steps correctly: placing the heart-pump tubing into her without letting in air bubbles; maintaining the sterility of her lines, her open chest, the exposed fluid in her brain; keeping a temperamental battery of machines up and running. The degree of difficulty in any one of these steps is substantial. Then you must add the difficulties of orchestrating them in the right sequence, with nothing dropped, leaving some room for improvisation, but not too much.
Getting things right is becoming harder and harder every day — even for specialists who have received intense training. We need a different strategy.
Checklists do not only compensate for the limits of our memory and attention, but they also lead to higher performance by forcing us to be disciplined:
people can lull themselves into skipping steps even when they remember them. In complex processes, after all, certain steps don’t always matter. […] Checklists seem to provide protection against such failures. They remind us of the minimum necessary steps and make them explicit. They not only offer the possibility of verification but also instill a kind of discipline of higher performance.
So checklists help us apply the knowledge we have consistently and correctly. However, that by itself is not enough. Gawande continues:
the volume and complexity of what we know has exceeded our individual ability to deliver its benefits correctly, safely, or reliably. Knowledge has both saved us and burdened us.
No one can do everything anymore, neither in aviation, nor in construction, nor in any other complex environment. The Genius Master Builder is dead. Besides, not everything can be reduced to a simple recipe.
But who says that a checklist only contains, say, tasks for constructing a building? In addition to decreasing errors, checklists can also be used to increase teamwork and communication in order to overcome individual weaknesses and deal with unexpected problems as a team. These communication checklists are used in construction and elsewhere to force specialists to talk to each other:
While no one could anticipate all the problems, [experts] could foresee where and when they might occur. The checklist therefore detailed who had to talk to whom, by which date, and about what aspect of construction — who had to share (or “submit”) particular kinds of information before the next steps could proceed.
Above all, the goal of checklists is to embrace a culture of teamwork and discipline. Complexity requires group success.
What makes a good checklist?
Now that we know the “why” of checklists, let’s take a look at the “how”. What exactly makes a good checklist?
According to the book, good checklists are:
- Precise, efficient, to the point
- Reminders of the most important steps, not comprehensive how-to guides
- Between five and nine items long (the limit of working memory)
- Quick and simple tools to support the skills of experts
- Practical — tested in the real world
- Easy to use even in difficult situations
- Written in simple and exact language, using familiar terms of the profession
- Frequently revisited and refined to help rather than hinder
Unless the moment is obvious, we also need to define pause points at which a checklist is supposed to be used in a process. We can choose between two options: a DO-CONFIRM checklist (perform tasks from memory, then stop to verify) or a READ-DO checklist (carry out tasks as you check them off). Both have pros and cons.
To reap the benefits of checklists, we need to be willing to adopt them as part of our daily work and, ultimately, our company culture. Checklists alone cannot make anyone follow them.
As Gawande knows all too well, we will meet resistance when introducing a checklist at a larger scale. Besides eating your own dog food, he came to the conclusion that the first people using a new checklist should “have the seniority and patience to make the necessary modifications and not dismiss the whole enterprise”. Sounds reasonable.
Some people may still object that checklists are merely about ticking boxes, as this passage from the book points out well:
It somehow feels beneath us to use a checklist, an embarrassment. It runs counter to deeply held beliefs about how the truly great among us — those we aspire to be — handle situations of high stakes and complexity. The truly great are daring. They improvise. They do not have protocols and checklists. Maybe our idea of heroism needs updating.
It’s important to understand that a checklist isn’t just a protocol you’re supposed to follow mindlessly. It is rather supporting us in our work:
The checklist gets the dumb stuff out of the way, the routines your brain shouldn’t have to occupy itself with (Are the elevator controls set? Did the patient get her antibiotics on time? Did the managers sell all their shares? Is everyone on the same page here?), and lets it rise above to focus on the hard stuff (Where should we land?).
Even with checklists, there’s still plenty of room for individual judgment and performance.
A production-readiness checklist
Of course, I wouldn’t write this article if there wasn’t a direct relationship to software development and running systems in production. Web systems are also among the complex environments where we can — and should — use checklists for greater efficiency, consistency, and safety.
I want to wrap this up with a practical example. While runbooks make for great checklists in web operations, I found a different example by reading yet another book.
In Production-Ready Microservices, Susan J. Fowler provides a useful checklist to decide whether a microservice is ready for production or not. According to Fowler, a production-ready service must be:
- Stable and reliable
- Scalable and performant
- Fault tolerant and prepared for any catastrophe
- Properly monitored
- Documented and understood
For each item, the production-readiness checklist defines specific criteria that must be met. For example, this is what it takes for a microservice to be fault tolerant (3):
- It has no single point of failure.
- All failure scenarios and possible catastrophes have been identified.
- It is tested for resiliency through code testing, load testing, and chaos testing.
- Failure detection and remediation has been automated.
- There are standardized incident and outage procedures in place within the microservice development team and across the organization.
I think it’s a fantastic idea to create production-readiness checklists for system components if your goal is to build standardized systems across an engineering organization. For my part, I’m planning to go deeper into the topic and come up with some checklists of my own.
Our jobs aren’t too complicated to reduce to a checklist. In fact, we are more likely to fail if we don’t try.
P.S. This article first appeared on my Production Ready mailing list.