GitOps is a modern way to make better IaC for delivering apps in Kubernetes. It is all about determinism, idempotence, automation, observability… and many other exciting features! However, are you sure all this happens in the real world using existing approach and tools? Here’s our comprehensive analysis of GitOps and its features, comparison with CIOps as well as insights on how this all should be done to actually get what each DevOps engineer dreams of.
Please note the article is based on this 30-minutes video that has a bit more of details. Enjoy the talk or its text version below:
How GitOps works
What comes to your head when you hear “GitOps”?
There is a Git repository. In that repository we have YAML files describing state for the Kubernetes, e.g.:
- two Deployments,
- some StatefulSet,
- and an Ingress.
On the other side of our equation, there is a Kubernetes cluster with all our objects forming a simple application.
The only missing piece is a GitOps operator. It is responsible for syncing the state from the Git into the Kubernetes. To do this, it periodically (or by event):
- reads the state from the Git,
- reads the state from Kubernetes,
- compares them,
- changes the state of the Kubernetes (if needed).
So it’s simple as that: Git repo, K8s cluster, and the thing to keep them in sync (GitOps operator).
By the way, while the GitOps operator can be outside, usually (almost always) it resides inside the Kubernetes cluster. To keep things simple, we draw it outside.
Just by using this approach, we already have some safety features. If a user directly modifies anything in Kubernetes, the GitOps operator detects this change and fixes it back to the state defined in the Git.
This makes a small fence that forces users — instead of going directly to Kubernetes — to make their changes to the single source of truth, i.e. in the Git.
Instead of this small fence, we can build a solid wall (by not giving users any direct access to the cluster) or a “transparent” wall (i.e. read-only access). But that is not important; what is important is that the Git is the only way in.
Wait… was that a full picture?
There is one more important part of the equation: the container registry.
If a new image arrives in the registry, the GitOps operator will detect this change at some moment and propagate it to the Kubernetes, so the new image will be pulled.
It is obvious that the state of the Kubernetes is not actually fully defined in the Git. It is determined by the Git and the container registry.
Pros & cons of GitOps
When we read about GitOps, we can find that it brings us a lot of great features:
- Automation ✓ We make neither manual changes in Kubernetes, nor manual actions to synchronize the state from the Git. The operator is responsible for keeping everything in sync, and it does it automatically.
- Convergence ✓ Our system tends to come to the desired state, and even if it becomes, occasionally, out of sync — it gets back on its own. What can bring it out of sync? Two major reasons: a) something has changed in the K8s cluster (a manual action, unauthorized activity…), b) new changes in Git that haven’t yet been delivered to the Kubernetes. In both cases, the GitOps operator is responsible for getting the system back in sync.
- Idempotence ✓ If we repeat the synchronization several times, the result of the first sync won’t affect the result of the second one. The first and second ones won’t affect the third one, and so on. By the way, if you have compiled manifests committed in the Git, that idempotence is provided just by Kubernetes and its API, so… that’s not a GitOps merit.
- Determinism ✗ The state in Kubernetes is solely and entirely determined by what is in the Git repo. As explained above, that’s not true since the state of the container registry also matters. If someone changes the state of the registry, changes an image in the registry… everything falls apart.
- Observability ✗ We need to know at any moment whether our system is in sync or not. We want to get an alert if it is not. Looks like we have the observability since we know whether our Kubernetes state matches manifests in the Git. However we don’t know whether our system is fully in a desired state. It’s simply because we don’t even know the actual desired state: once again, it is a combination of the manifests (Git) and container images (registry). So only half of the state is determined by the Git, and only half of the state can be observed.
- Audit ✗ All the changes made to our Kubernetes should be seen in one place reliably and conveniently. This place is Git… but that’s not true. We also have to rely on the audit functionality of our container registry. Combining audit data from two systems is far from being reliable or convenient.
- Security ✗ It seems to be implemented by not giving the CI system a direct access to the K8s cluster. The operator resides inside the cluster and pulls the changes with no direct access from outside — it pretends to be more secure, doesn’t it? But what about the CI system (or the user) that still should be able to push images to the container registry and to update manifests in the Git repo? There is something (CI system or user) having all the possible access to the cluster. Changing this access from direct to indirect does not improve security, it just gives a wrong sense of security that makes the whole environment even less secure. You have to keep your CI secure, there are no other ways.
What can we compare GitOps against?
Let’s think what is the usual way to deliver to Kubernetes? The most obvious and the most used way is just a simple “deploy-from-CI” approach. Sometimes, it is called CIOps.
How CIOps works
There is a Git repository, and this time, it’s not just a repo with Kubernetes manifests. It contains:
- our application’s source code;
- Kubernetes manifests but now they are represented as a Helm chart;
- and, most likely, some tests.
There is a CI system connected to this repo. It can be anything: Jenkins, GitLab CI, GitHub Actions, etc. This CI system has several jobs (or tasks, actions, stages…);
- Build — to build the image;
- Unit test — to test this image;
- Publish — to put it into registry;
- Deploy to stage…
At first glance, this last job just executes
helm feeding the chart from the Git to it. However the chart is not enough. Usually, you also need to feed information about freshly built images: new images tags. Based on these new tags and the Helm chart, Helm manifests are rendered and sent to the Kubernetes API. Kubernetes converges to the defined state and pulls new images.
NB. By the way, you can substitute Helm with any other tool — there will be no difference. Anyway, you will have some kind of templates and you will have images tags that are used to render (to compile) these templates.
Let’s come back to our jobs, we still have more of them:
- E2e test — to run end-to-end tests on the deployed app;
- Deploy to production that does the same as Deploy to stage but to the production environment.
Pros & cons of CIOps
Let’s just quickly evaluate how this workflow performs by the same “GitOps” criteria.
Is it deterministic? Well, usually it’s not:
- The most common problem is found in how we build our Docker images. Even having the source code and the Dockerfile in the same repository, and really trying to freeze all the external dependencies in the same repo, we end up without the guarantee of the reproducibility of our builds. If we build our image twice (from the same Git commit), we might get two different images, images with different content. Our challenge is to have consistent manifests. Since consistency of rendered results — rendered by Helm or whatever you use — heavily relies on the consistency of the building (and publishing) stages, the final result is not consistent, thus the whole flow is not deterministic.
- Another problem resides in a tagging strategy. Consistent tagging is not easy. The most natural implementation is to use Git commit IDs and to check whether an image exists (in the container repository). So we will skip building and publishing this image if it exists already. Then, if you repeat this process many times, all repetitions will do nothing. But have you ever seen this done right? Usually, either tag is reused, so the image is repeatedly replaced in the registry (and gives you unpredictable results), or tagging strategy is based on some data from a CI system (job IDs or something similar).
It also means “no” to idempotency. By the way, if some of the steps rely on the data from a CI system, the state of Kubernetes will be determined not only by Git, but also by this CI system.
NB. There are no idempotency problems in the manifest applying process itself, the problem is in the consistency of the data we pass to Helm (we pass to the rendering process).
Thus, despite the Helm chart residing in Git, despite the Helm on its own being idempotent (you can change Helm to kubectl or any other similar tool), despite Dockerfile and source code also residing in the same Git, the whole flow is not deterministic and not idempotent. There is no guarantee of restoring our cluster to the state of the specific commit in Git.
When idempotence and determinism falls, everything else falls apart:
- No more convergence and observability as we don’t have a consistent and repeatable way to get to the desired state (and how can we observe it then?).
- Git does not fully define the state, so no real audit is possible. Git history may have something, but it’s just a fraction of what we need.
- Do we need to talk about security? Swapping direct and indirect access does not make a difference…
However everything is good with automation — that’s what CI systems are all about, so obviously our changes are still delivered automatically.
NB. One important thing to mention about automation is feedback. To make a proper automation, you should provide your user with clear feedback. When we deploy apps in Kubernetes, we quite often get into the situation when Helm (or
kubectl apply) says, “Successfully applied.” However it does not actually mean that our change is rolled out — it only means that the request for apps to be rolled out is successfully received. If that resonates with your experience, you might find either werf or kubedog useful.
To sum it up, CIOps done right might work. You need to pay enough attention to the build and tagging stages making them really idempotent and deterministic. When it’s done, you will instantly get everything else good ✓ (more or less). Especially, if you don’t forget about “clear feedback.”
That’s what we should compare GitOps against. But another step is important here. Being honest with yourself, how often do we need to redeploy our app? To deploy it in some historical state? When bad times come, we can deploy new changes only and, doing it sequentially, we can live with the non-deterministic and non-idempotent flow. I’m not trying to convince you that it’s right — quite the opposite, I’m strongly for determinism and for idempotence.
What I’m trying to say is that in real world situations, you might be okay with a non-perfect flow, and you might not want to invest a lot of extra effort to achieve the “full safety.” So the question is, how much are you ready to pay for it?
GitOps vs CIOps
On paper, GitOps is almost perfect, and CIOps is just bad. However the perfectness of GitOps on its own is quite questionable. Furthermore, CIOps that is done right might work or even be really good. But that’s not the full picture.
Whilst CIOps describes the whole flow — from changes introduced in Git to them deployed to the production Kubernetes cluster — GitOps covers only a part of this process. Let’s look at that bigger picture.
The bigger picture of GitOps
Let’s come back to our diagram of GitOps. Actually, there are multiple branches in the Git and, probably, several Kubernetes clusters — for example, for staging and production.
But what’s really important is that the main Git repository — that contains the application’s source code and all related stuff — has been missing from our picture. This is the same repo we’ve already seen in the CIOps flow. Let’s call it Application repo and our previous Git repo will be known as Cluster repo.
Now, you can see here almost the same flow as in CIOps:
- Build job,
- Unit test job,
- Publish job,
- Deploy job: here, we start from same actions — taking information about images from the Publish job and feeding it to Helm — but then comes the difference. Instead of making changes straight to the Kubernetes, we (directly or indirectly) make a commit into our Cluster repo.
Meanwhile, the GitOps operator is running. And it either notices new images in the container registry or new manifests in the Cluster repo. And it does the job — it converges Kubernetes to the desired state.
A few more things to notice:
- We don’t have feedback in our CI system. The Deploy job says all is good because it successfully committed new manifests to the Cluster repo. However that doesn’t mean the changes have been successfully rolled out to the cluster. Therefore we have to look into some other system. We have to do it even to know simple things, e.g. to check whether our new manifests are valid.
- The system is asynchronous. For example, the GitOps operator might notice the new image in the container registry before new manifests have been committed (and seen). In this case, it might apply old manifests with new images, and they may not match.
Finally, our deployment to production happens. The right way to do it is using the Cluster repo: merge the staging branch into the production one. (You might also need to promote images from staging to production in your registry, but that can be avoided by using the appropriate tagging strategies.)
That’s the whole picture where you can see that “real-life GitOps” is not as neat as you might get used to conceive it.
Even if the GitOps part does everything what is expected from it (sadly, it doesn’t as it’s explained above), the overall flow inherits all the problems from CIOps and adds an extra layer of complexity.
Is GitOps an anti-pattern?
I think that GitOps implemented in the described or similar way is more an anti-pattern. The whole DevOps culture is about the smoothness of the flow — the flow of changes from Git to production — and about the collaboration. If GitOps promises us transparency and convenience, but actually obstructs the flow with the unnecessary intermediate repo and the unnecessary wall (between Dev and Ops)… well, that’s my point of view.
What’s evident is that you can’t compare GitOps (as a small part of the whole CI workflow) and CIOps. The right comparison would be about the whole process built around GitOps with CIOps: GitOps plus “Pipelines” versus CIOps. And what we get then?
- CIOps might be non-idempotent and non-deterministic, and by being that it might harm.
- GitOps brings idempotency and determinism. But…
- GitOps’ determinism is mediocre because half of the truth is not in the Git repo but in the container registry.
- Other GitOps’ shortcomings are not receiving feedback from the right place (from the CI system where developers do all the things) and a significantly increased complexity (due to new elements and introduced asynchrony).
I bet that the “standard” implementation of GitOps will be superseded by something that will give you both idempotency and determinism, but in a more practical & convenient way. In a way that will be compatible with existing CI approaches. In a way that doesn’t build the wall. Should it be called GitOps 2.0?..
P.S. about werf
For the last few years, we have been working on the Open Source CLI tool named werf. And this is how we have solved the main challenge of this story (as I see it). With werf, we’ve made the main Git repo (of your app) to be able to really serve as the single source of truth.
Werf is implemented in such a way that it guaranties idempotency and determinism of build, tag, and deploy stages. And everything else is built upon that.
This article has been written by our CTO Dmitry Stolyarov.