In the good old days, when we had only Jenkins out there for deployment, everything was clear: take it, install on the server machine, configure it and if there is not enough functionality use plugins or write your own.
Things got to change since Travis in 2013 has been released. Travis aimed to bring CI/CD to Github projects suggesting a new approach: you just need to put a .travis.yml file to the root of your projects with instructions for build/deployments/tests.
The approach was called Pipeline as Code and was so successful, that very quickly became a standard for CI/CD processes. In 2016 Jenkins released its second version with support of Jenkinsfile — analog of .travis.yml. And with the growth of Gitlab and its wonderful SaaS platform (which, of course, implemented PaC), everyone has happily moved their pipelines to Gitlab. That’s completely understandable: many things become automatically managed and simple. And already in 2019 GitHub came up with its GitHub Actions, providing exactly the same pattern.
How does PaC actually work
The main idea of PaaC is to keep your application code together with a CI/CD pipeline: developers can adjust it flexibly and frequently with no additional permissions or movements: the pipeline file is just right there in the root folder of your app. In order to handle differences in deployments it’s suggested to use stages, branches and globally defined variables. Different providers suggest different ways to handle deployment scenarios, it can be declarative YAML approach (Gitlab, GitHub, Travis) or using programming languages as with Jenkins (Groovy), but usually allows to do almost any possible tricks.
Did we miss something?
If we recall how people actually used Travis and how it became so popular we can see, that in most of the cases it was unit tests. Indeed, GitHub is famous for its enormous amount of open-source software, which doesn’t need to be deployed, but tested or linted on push.
What do we have right now: the great number of companies uses PaC for everything related to CI/CD: tests, builds, deployments, including multi environments.
The problem comes when you realize that you have to somehow match your branches and Git flow to the deployment process to different environments. To simplify this step Gitlab came up with Gitlab Flow, which very roughly suggests to use only master branch (no develop) along with its feature branches. Merge to master is supposed to be used for production deployments.
So far so good, you just need to check whether your commit comes from merge request with target = master, otherwise consider it as development.
If you need another environment, say staging, you have to create the appropriate branch and merge to it. If you have bugs you have to fix both master and staging. The same is applied to release branches. The imaginary simplicity of GitLab Flow in comparison with classical Git flow disappears.
Multi environment deployments
PaC perfectly works with tests and linting, when you just need to run some local checks against the current version of code. We don’t care about environments. And those checks are directly bound to new commits, which makes Continuous Integration quite compatible with PaC.
When it comes to multi environment deployments, PaC suggests using stages to distinguish deployments to different environments. So when a merge comes to the master branch we make production deployment, other stages are ignored. Here we have the first limitation:
We cannot deploy any branch to any environment. They are hardly coupled.
It’s quite an often case when developers want to test their feature branch on the test environment without merging it to develop or master. In Gitlab you can pass variables to the pipeline and theoretically, we can say which commit on which environment to deploy, but that breaks the entire idea of PaC and makes GitLab CI/CD obsolete.
Is it fine to keep everything in one pipeline?
Since it can be not a big problem, the next issue seems to be more serious:
PaC config files by its nature don’t support user management
It’s a good practice to decouple builds and deploys and there are 3 reasons for that:
1) Deployments contain production secrets that should not be accessible for each and every developer
2) Deployment process can be quite tricky and developers having not enough DevOps knowledge can accidentally break the production deployment or introduce a hidden bug
3) Deployment might require additional assets like e.g infrastructure as a code templates, Kubernetes or Helm charts, which have nothing to do with application code and should not be in the same repository.
When we try to split build and deployment, we have to create another repository and bind the deployment pipeline to it. At this moment we completely break PaC, because we lost connection to the original repository, although we deploy exactly that code.
To solve it somehow in my practice I saw some hacks:
- Create branches-environments with config file which declares which version (in the simplest case commit hash) to deploy. Once you make changes there, the triggered pipeline figures things out, makes git pull of the original repository and deploys it. Sometimes those branches are completely independent and already unmergeable , since deployment scripts can look completely different from env to env.
- Use pipeline specific variables to provide which branch from the original repository to which environment you are going to deploy
- Cross repository commits in order to trigger the second pipeline
In all of these cases coupling of Git and the pipeline looks like obstructive rudiment. All that we need is to point to some external repository and define variables per environment.
How about microservices
Microservices within Docker Cluster today is de facto a standard for big complicated applications. If we look at this pattern we can see, that since microservices are independent, they should be different Git repositories. But a Docker cluster, let’s say Kubernetes, is also a separate deployment entity. In order to deploy/upgrade cluster you should have your Kubernetes templates or charts in case of Helm. It makes completely sense to move the Docker cluster to the separate repository-pipeline (you can of course copy paste your kubernetes deployments across all microservices, but good luck with debugging then :) ), where we again come to the previous point — we lost connection with the original code base, which needs to be deployed (in this case we even don’t have code already, Docker images should have been pushed to the registry).
Waterfall of commits
With cloud CI/CD solutions like GitLab the only way to test functionality of your pipeline is to make a commit after changes in gitlab-ci.yaml and see in UI if it works. Such iterations spawn hundreds unnecessary commits and pollute the Git history. It’s not that critical since you can squash them afterwards, but can be pretty annoying and seems to be one another side effect of coupling of deployment and code.
Gitlab is a nice platform, and they made a great progress last years, but its popularity makes people to make wrong DevOps decisions. It’s completely fine to use PaC for unit tests and different code checks, but it’s completely unsuitable for complicated deployments for big projects. Doing deployment pipeline with separate tools like classical Jenkins jobs or Pipeline tools from any of cloud providers like AWS CodePipeline you can build your processes in a more harmonical and natural way and use all the power of it (plugins and flexibility of Jenkins and environment and ecosystem of Cloud)
Would really love to know your opinion and follow me on Twitter!