Is Your CI/CD Pipeline Ready For Chaos?
Will it work when you absolutely need it to?
I continuely stress the importance of creating bulkheads in our systems to mitigate the risk of honest human error, such as here and here. But what about our CI/CD pipelines? Do we have sufficient bulkheads between our services and their pipelines? When chaos breaks loose we tend to find edge cases in our systems that must be fixed to restore order to the system. But we need our CI/CD pipelines to deploy these fixes. Will your pipelines work when you absolutely need them to? Are your pipelines autonomous? Are they?
If you are self-hosting your CI/CD tools, are they running on the same infrastructure as your system? Are they running in the same cluster? Are they running in the same account? If someone makes a lethal mistake on your infrastructure, how big is the blast radius? Do you have regional fail-over for your pipelines? This sounds like a lot of work.
Given my preference for serverless-first, I recommend using fully-managed hosted CI/CD services. Delegate this responsibility to the provider. But does your CI/CD provider have sufficient continuity of service?
Is your pipeline hosted on the same cloud provider as your system? What happens when your mutual cloud provider has a regional disruption? Is your pipeline hosted in the same regions? Does your pipeline have multi-regional support? Does your pipeline have multi-cloud support?
Unfortunately, the fallback is for someone on your team, with sufficient permissions, to execute deployments from their own machine. This may not seem like the end of the world, but that certainly depends on the compliance regulations you are subject to. It certainly raises security concerns. Who has the necessary permissions? How many people have those permissions? How is your access key hygiene? Do you require MFA? What gaps now exist in your deployment audits?
Its best to just avoid this fallback position. Use a hosted CI/CD solution. Understand what your CI/CD provider offers. If they run on a different cloud provider then you are pretty much set. If not, then you will likely have to ask for detailed information, so that you can be prepared for that inevitable bad day when chaos breaks loose.