Also published at https://paulosman.me/2019/12/30/production-oriented-development.html
Throughout my career, I’ve developed some opinions. Some have worn particularly deep ruts, reinforced by years of experience. I tried to figure out what these had in common, and it’s the idea that code in production is the only code that matters. Staging doesn’t matter, code on your laptop doesn’t matter, QA doesn’t matter, only production matters. Everything else is debt.
This perspective probably comes from years of sitting in between operations and product development. I strongly believe that teams should optimize for getting code to production as quickly as possible as well as responding to incidents in production.
This idea, and a lot of the practices it implies, can be counter-intuitive or controversial, so I want to dive in a bit. What follows is a set of practices and principles I believe are true, considering my underlying belief that code working in production is the only code that matters.
1. Engineers should operate their code.
Engineers are the subject matter experts for the code they write and should be responsible for operating it in production. In this context, “operating” means deploying, instrumenting, and monitoring code as well as helping to resolve incidents related to or impacting that code. The responsibility of operating code aligns incentives — it encourages engineers to write code that is observable and easy to debug, and connects them to what customers care about. It encourages them to be curious about how their code is performing in production. Importantly, engineers should be on-call for their code — being on-call creates a positive feedback loop and makes it easier to know if their efforts in writing production-ready code are paying off. I’ve heard people complain about the prospect of being on-call, so I’ll just ask this: if you’re not on-call for your code, who is?
If you’re not currently on-call for your code but want to be, and can help influence this decision, there are some things you can do. Set up PagerDuty (or similar) schedules for each group of engineers responsible for specific services or parts of your code. A good schedule has 6–8 engineers. There are plenty of variations, but a typical template is to have one-week rotations, where you’ll be on-call for secondary for a week and then primary for a week. Configuring alerts is a separate topic, which probably…