When are we going to deploy on Fridays?
I've been asked about Friday deploys so often I thought it might be good to write this to explain why I will consistently resist the…
I've been asked about Friday deploys so often I thought it might be good to write this to explain why I will consistently resist the suggestion that they become a matter of policy.
As an operations guy the most important thing for me, professionally, is keeping the site up and healthy. “Healthy” is a vague concept but includes keeping pages loading quickly and functioning properly.
Sometimes site disruptions result from external factors like denial of service attacks or power outages. However, far and away the most common cause of ill health with the site has been changes introduced by the people managing the code that runs the site. I have in mind developers and operations engineers, and the changes are software and configuration changes.
If you care about the business and want to grow it, you have to embrace these changes to the system. That’s what being in the software world is all about.
However, as an operations engineer you want to minimize the risk posed by these changes to the health of your site. This risk involves, in general, two parts. The first is introducing a mistake that causes some kind of failure. The second is not being able to correct that failure as quickly as possible.
Not deploying on Fridays is a policy meant to address the second of these. The reason for not deploying on Fridays is that this is generally the time when you will subsequently have the least coverage by your engineering team to respond to unexpected breakage. This is also true of late-in-the-day deploys from Monday to Thursday.
I know empirically that there is a great temptation to put code changes into production as soon as they are approved via a code review. It feels good to make things better for your users. And then, it feels good to celebrate. But sometimes it takes a while for the failures introduced by these changes to appear, and outside of normal work hours it takes time to get the relevant engineers involved to work on fixing them (for example, by reverting those changes).
This is why I resist requests to establish a protocol that accommodates Friday deploys, or late-day deploys in general.
There are reasons to make exceptions to this rule. A good operations engineer will understand this and make them when necessary, but not unnecessarily, and defend their status as exceptional, and not the rule.