It’s Friday: CI/CD as an unfinished journey

Alon Nisser
Zencity Engineering
5 min readJan 13, 2022

While many of us struggle with building our CI/CD systems, I’ve observed some mixup between the availability and setup of technical solutions for running CI/CD pipelines and having a real CI/CD end-to-end solution. Do you have a fully working CI/CD end-to-end solution? I’d suggest answering a simple question to reveal the answer:

“Do you deploy fearlessly to production on Fridays?”

By TheCustomOfLife (talk) (Uploads) — Taken by me on 5/28/2005., CC BY-SA 3.0, https://en.wikipedia.org/w/index.php?curid=1956450

If the answer is a NO, then you’re probably still on the path to implementing a CI/CD solution, but not quite there yet. You might have implemented some of the moving parts, such as a CI service triggered by git actions, an automated deployment script, or monitoring. However, you haven’t yet achieved the holy grail — Fearless Friday — end of working day deployment to prod.

Why? Because continuous deployment is not just about “moving code to production,” but also about delivering your changes to the production environment in a sustainable, automated, and continuous manner. If you’re afraid of deploying on Fridays, then you probably don’t fully trust your system from protecting you from deploying a regression. You don’t believe it can detect a regression automatically, and autonomously roll back. You probably aren’t sure that you can rollback any changes on will. You are wary of needing to babysit the process or having someone call you Friday night with some unexpected side effects of the deployed code.

But hey, it’s just Friday. It’s not a big deal. Removing 2–3 hours from the deployment slot each week isn’t a lot and won’t hurt us. Or will it?

It’s Friday again, Friday all week long

If you’re afraid of Friday deployments, then surely you won’t want to deploy at the end of a working day. Or before lunch. Or before starting a long meeting.

And when exactly is the “end of the day?” What is the time frame you need to verify — and if needed, to recover from–a regressive deployment? Half an hour? Three hours? What are the safety margins you take? Are you comfortable with them?

Soon we find out that it’s Friday all week long. If you’re “push to deploy” wary, then having an automated pipeline that builds a container and deploys in the production cluster isn’t enough, your commits aren’t getting to production as fast as they can.

Caught in the slow deployment loop

Now what? You accumulate “work in process”* code piling up all around, and you deploy bigger batches of code each time you deploy. The risk of each deployment grows accordingly, thus making you more deploy-averse and vice versa, creating a negative feedback loop.

After one regression-introducing deployment you might add process management tasks after a deployment, demanding manual verification, which only slows deployments more because it’s not simply a push — but requires continuous “babysitting” — which no one wants to do. This spirals on and on. Clearly, this is the wrong direction.

Bold deployment and continuous improvement

A better alternative is constantly moving forward by adding the components needed for fearless Friday deployments: better monitoring, automated smoke tests, really easy safe rollbacks, and more. While you may never reach the ability to allow interviewees to safely write a feature deployed straight to production, as I’ve been told some (quite rare) companies do, you can certainly get closer to and better at this.

Since I view fearless Friday deployments as a litmus test for the robustness of my continuous deployment scheme, I try not to follow the famous software law about not deploying at the end of the workweek and deploying when I commit. Yes, this comes with a price, but it also forces me to treat CI/CD as a first-class concern and builds incentives for treating CI/CD issues at a high priority.

Our CI/CD journey in Zencity’s R&D team

I think we got the general gist of CI/CD early on at Zencity. We adopted a CI mindset and worked hard on moving all our code (including that legacy monolith) to docker (as a universally deployable artifact). We’ve embraced Kubernetes mostly for the easy deployment/rollback patterns unlocked by it, and generally made CI/CD a priority in our R&D organization. But did we achieve fearless Friday deployments? Nope. We’ve actually witnessed the downward spiral: deploy windows;code freezes during important customer demos; merging PRs without deploying them (leaving the scary part to someone else); a checklist of things “you must test” after a deployment. We also witnessed that new microservices were developed without a strong CI/CD story, pushing us backwards.

So how did we tackle this?

Currently, we’re focused on standardizing on a DevOps platform that is helping us to standardize CI/CD with base templates. In our case, the DevOps team decided on moving us to GitLab and developed certain reusable templates for CI/CD pipelines, cutting the cost of CI/CD setup. We’ve also developed a number of cookiecutter based templates for new projects (different templates for different tech stacks, my team for example has built one for Spark/Scala projects) with CI/CD, monitoring, and deployment concerns baked in.

Other teams have focused on reducing any instability that was baked in legacy parts of the monolith, which caused flaky behavior and false-positive monitors. Being able to trust the monitoring to alert you when something is broken is a key step towards fearless deployments.

Epilogue

This article was conceived in a constraints workshop during our “Zencity’s Technological Vision — 2022” offsite where I tried to apply TOC (Theory of Constraints) techniques like CRT (Current Reality Tree) to our R&D pains. Deployment was all over the place, and it set me thinking about deployment as a first-degree constraint, causing WIP* piling all over the place

And one more thing: CI/CD is more about a mindset and a journey than tools and a set of final objectives. Treat it as such, keep learning from your experimentation and the experimentation of others — like ours at Zencity, and choose the route of continuous improvement.

  • Limiting the amount of “work in process” is a central piece of multiple project management methods, for example in kanban.

Addendum

Of course after I’ve finished writing this post I’ve found out Charity majors have already written this in 2019, so a reading recommendation for FRIDAY DEPLOY FREEZES ARE EXACTLY LIKE MURDERING PUPPIES

--

--