It’s nice to release software in small increments. Small deployments are more manageable, and they can be rolled back faster if there’s a problem. With good planning, even major changes can be broken down into smaller chunks and drip-fed into production. But as a project manager or team leader, every now and then you come across a gnarly project where — for whatever reason — there are a bunch of interlocking changes that all have to go out together. You’re responsible for updating a dozen critical services and databases. Everyone’s relying on you to get it done. And it’s scary.
A technique I’ve used for dealing with these situations is to set up a Mission Control Centre for the release. The idea is simple: get empowered representatives to commit to being present for the entire span of the release.
- Empowered representatives means people from all disciplines that have contributed to the project, or who will be affected by its outcome. In addition to the obvious software developers and operations staff, that could mean people like commercial stakeholders, designers, and customer support.
- Being present is more than just being in the room — it means that when you’re in the room, you’re paying attention to what’s going on, even if you’re not “at the wheel.” You shouldn’t have to ask for a recap of the last half hour of conversation if someone suddenly needs your opinion.
- The entire span really does mean the full duration of the deployment. Block out calendars. Rearrange meetings. Even if a representative only has one part to play during the deployment, they shouldn’t wander off as soon as they’re done.
The purpose of all this is that if something goes wrong the people who can solve it are all in immediate communication with each other. You don’t have to go scrambling to find someone outside the group if you need to make a hard call. You don’t have to spend ten minutes bringing them up to speed on context. You can make decisions fast, and act on them immediately.
Setting it up
The idea may be simple, but it can be hard to arrange. People are busy. It can be hard to get them to clear an entire morning or afternoon in their calendar, let alone a whole day. Software developers are notorious for disliking meetings, and may not see the point of sitting around for hours when all they have to do is run one task in a checklist of twenty.
The first step towards success is therefore getting people on board with the idea. You can use the standard tricks of offering food and treats for everyone in the room. But don’t underestimate the power of cool code names and the very fact of calling it “Mission Control,” either! Stick up a poster on your meeting room door, and you’ll get people envious of the clearly important stuff going on inside. If you’ve got a hot designer on your team, get them to make some mission patch laptop stickers for participants.
When I talk about “being in the room”, I mean that in a broad sense. The point is to optimize communication flows, but “optimal” is different for every team. At FanDuel, our teams are spread over multiple offices in different time zones, and include remote workers in home offices. If it’s possible to get everyone in a single room, try for that. If not, use a video call and get everyone to be dialled in throughout the event. If you don’t have video calling available, get everyone together in a single chat room.
Atul Gawande’s book The Checklist Manifesto is about cross-functional teamwork in industries with a high degree of specialisation, such as construction, air travel, and medicine. The lessons apply to the software industry as well. Not every item on a checklist is a measurement to be taken, or a switch to be flipped. Some of them seem non-obvious, and perhaps even counter-intuitive. In the World Health Organization’s Surgical Safety Checklist, the first item before a surgeon makes a cut is “Confirm all team members have introduced themselves by name and role”:
This item acknowledges that surgical teams are composed of doctors, nurses, and other support staff who may never have worked together before. It saves junior staff the awkwardness of asking the head of surgery, “uh, who are you again?” It focuses attention on why everyone is there, and it promotes effective communication and teamwork.
For regularly repeated procedures, such as taking off in a commercial aircraft, practitioners will typically be familiar with a static checklist. For complex software releases, the checklist may be different every time. For the deployment our team performed last week, each task on the checklist had a verification condition (to make sure the step had completed correctly), and an “escape hatch” — instructions for what to do if it failed. Before we started the deployment, we ran through the list to make sure that everyone had a clear view of what lay ahead, and was happy with the role they had to play.
Does it work?
Going by the times I’ve tried it, I give it a resounding yes. And not just for the obvious benefits of clarity and the speed at which decisions can be made.
Having a room full of smart, committed people around you gives emotional support and reassurance when things go wrong. At our deployment last week, we were all dialled in on a video call, and each of us shared our screen when it was our turn to perform a step. My own critical step was running a database migration script, and…it didn’t work properly. My stomach knotted up, and I broke out in a sweat. We had a rollback for that step, but before I could finish going through the error logs to identify the problem, my colleague John spotted it. I was using a different set of security permissions than I had been in our test environment.
I wasn’t doing this alone. The team was there to catch me when I fell.
The sense of teamwork and shared purpose on the ground is hard to overstate, and it makes the sense of shared accomplishment when the mission is over that much more meaningful.