Your Ops Review == Your Engineering Culture… Part 2

Bryan Dove
5 min readAug 19, 2015

--

(This is Part 2 of Your Ops Review == Your Engineering Culture. Read Part 1 here)

The second part of this article it to convert philosophy into specific instructions and help everyone envision what a “great” operations review looks like. I wish more organizations would talk about the mechanics of how they run their reviews so that we can all learn from each other. How will anyone ever know if they’re doing it “right?” How will anyone ever learn about a better set of mechanics for how to create and improve this critical function of every modern software company?

I’m still learning about our current model in my new company so I can’t share our specific details, but I can share what I believe are the mechanics of a great operations review. This is certainly not the only way, but if I had to start from scratch, these tenets would be my starting point.

The only goal is learning — These meetings are expensive to put on the calendar. Any operations review will involve a number of people spending a meaningful amount of time together on a regular basis. If you add up the hourly wage of everyone involved and multiplied by fifty-two weeks in a year, these meetings cost millions of dollars for even a moderately sized organization. It’s important to reinforce the intended ROI is broad learning throughout the team. If all we cared about were inspection and measurement of each service, we could all sit at our desks and look at dashboards or read post mortem emails when we had time. Dashboard and post-mortems are simply tools to allow us to pursue our goal.

Executive Presence — If attention == importance, then there is no stronger sign of the criticality of this process than having your engineering leadership team in attendance and engaged every week. After seeing this done successfully elsewhere, I now expect the senior leaders of an organization to attend every week and to participate in the discussions. For an engineering team to see their most senior leaders participating and asking a lot of questions has a substantial influence on the culture of the product teams.

Write things down — For learning, writing trumps verbal. Every time.

“Written communication to engineering is superior because it is more consistent across an entire product team, it is more lasting, it raises accountability.” -Ben Horowitz

Verbal is a one-time event, or if it’s repeated, it loses details and focus between each iteration. Written forms can be consumed asynchronously and are far more durable. New person on the team? Onboard them by having them read the most impactful post mortems. They will be learning by their first afternoon on the team and it reinforces the team’s culture and lessons learned. The more people that learn these lessons early, the lower the probability of repeating the same mistake again in the future. This is how you change the future. (More on importance of writing — good read and read the deep links for Bezos here and here, and Horowitz’s classic on product managers here)

Attendance — Send the invite out to your entire engineering org. Anyone who wants to learn can attend. Remote offices? Online meeting, with cameras as appropriate, screen sharing, and clear audio. The only required attendance is that each service team must have at least one representative at each session. If you’re doing it well, you’ll have much more attendance than the bare minimum because people are learning and find it a valuable use of their time.

Cadence — Weekly. Our services run every hour of every day, there is no downtime for customers, and if our customer-focused culture is reinforced by ensuring we are constantly getting better at serving our customers, we need to be consistent about dispersing knowledge throughout the organization.

Action items actually get done — Too often, it’s easy to say you’ll take some follow up action and then you never do it and the broader team has moved onto ten other problems and forgotten about this. You don’t need ultra complex tools, but you do need a simple, public tracking system that flags overdue items in a public way. An example would be to create simple lists that are shared across your company and write a script to email your entire company for any item overdue by 30+ days. Action items are a contracts between the to each other that everyone is agreeing to do their part. When you have a simple, no exceptions rule policy and some healthy public awareness, the strong culture you’ve been building will naturally reinforce getting these action items completed.

Meeting Agenda — This is where it all comes together. I believe there are only three important elements to a successful, culture-changing operations review.

  1. Ask the audience to share anything cool from the last week. No rules / guidelines / examples, just whatever anyone thinks is relevant to the audience.
  2. Teams present their post-mortems. Up to 15 minutes per team. The sequence is any Pri-1 (top priority) events since the last ops review. If your event was 3+ days ago, you’re expected to have completed your post mortem. If the event was last night, be able to talk it through. Then, for the rest of priorities, teams will nominate the lessons they believe most applicable to the broader organization. Quality of self-selection will improve over time. This step is the single most important step of the process and it has to happen every single week. The most important part is that you go through the same process for big issues and small issues. Hopefully, you’re lucky enough to start with the small issues. When you can build the culture of learning and not blaming on simple things, then when big issues come up, it just feels like the same learning routine instead of an opportunity for blame.
  3. Any time left? Dashboard lottery. Find a method to randomly choose a team and ask them to present their dashboard. This is why it’s mandatory each team has at least one person at the meeting. “Present their dashboard” means to walk through it, explain why they’ve chosen these metrics as their most critical, and the audience may see interesting anomalies, like “why is there a strange spike on your requests per second from last Saturday?”

So Now What?

Spend a minute to think about what your operations culture says about you. Are you really showing your customers the utmost respect and earning their trust every day? Remember, without them, you don’t have a business

If you see any ideas that you think could help you from this post, try them out. We’re all in the business of helping each other get better at serving our customers.

If you have other ideas that you think would make this process better, post them here in the comments. I’d love to get feedback that can improve beyond the framework here.

--

--

Bryan Dove

SVP Eng @ Skyscanner. Sharing a few things I learn along the way.