Demystifying GitOps - Intro
This is the first part of a series of posts and I want to talk about general concepts of GitOps on this one. In the upcoming parts I plan to dive into the details. You can find the links for the completed ones of the series at the bottom.
GitOps
GitOps is the cloud native way to achieve continuous deployment. Its name comes from standard and dominant version control system, Git. Git became de facto standard and its success story deserves another series of posts. Git for GitOps is similar to etcd for kubernetes in someway but even further since etcd do not keep version history itself. Needless to say, any source code management service including GitLab, GitHub, Bitbucket, Azure DevOps or similar can be used. In GitOps, VCS is the single source of truth.
As developers, we implement our infrastructure definition, database schema, application configuration and policies declaratively, then we push that code to a git repository. An operator takes this desired state and applies it to our control plane. This control plane is a kubernetes control plane mostly. GitOps is a term emerged and evolved with Cloud Native ecosystem. Because of this all GitOps providers works integrated with Kubernetes. Kubernetes is evolving to be the one control plane to rule all other control planes. In the upcoming posts I will talk a lot about this phrase.
OpenGitOps
It is appropriate to start GitOps discussion with OpenGitOps community I suppose. OpenGitOps aims to standardize GitOps principles, define and help adopting best practices of GitOps. OpenGitOps is the sandbox level project of CNCF at the same time. It has founded by seven members: Amazon, Azure, Github, RedHat, CodeFresh, WeaveWorks and Crayon. It is fair enough to say that, these members are among the most contributing companies to GitOps ecosystem. Four main GitOps principles are defined by OpenGitOps community:
- A system managed by GitOps must have its desired state expressed declaratively: Declarative state is one of the hallmarks of Infrastructure As Code(IaC) tools lately. Declarative means describing what to achieve instead of ordering what to do. To sample it in a sentence, “I want two servers to be running” is declarative but “Run two servers” is imperative. When you repeat both sentences, first one results in two servers running but second one results in four servers running. Declarative statements are idempotent naturally which is too important in IaC world.
- Desired state is stored in a way that enforces immutability, versioning and retains a complete version history: Immutability, versioning and complete version history point out to use of a version control system since configuration kept on VCS is versioned by default. It can only be changed with a commit, push and/or pull request. This guarantees immutability in desired level and all changes can be traced and rolled back if needed.
- Software agents automatically pull the desired state declarations from the source: In traditional CI/CD approach container image is built and pushed to a container registry. Resource definition files(raw manifests, helm charts, kustomize files) are modified with image tag. After that, they are pushed to a cluster by using helm, kubectl clis. This requires a service account token to be stored outside of the cluster and used to apply changes. Because of this, password has to be securely kept in CI/CD pipelines. Instead of this GitOps tools offer pull based approach, meaning GitOps operator runs in the cluster and pulls periodically configuration changes from a git repository or subscribes to git repository events and wait changes to be triggered. After a change is detected operator applies it to the cluster by calling kubeapi. It is more secure and natural.
- Software agents continuously observe actual system state and attempt to apply the desired state: GitOps operators pull changes from git repositories and compare desired state to the cluster state periodically. This is called drift detection. Any change is applied to the cluster. Manual changes made by anyone to the resources managed by GitOps operators is reverted. And this is called reconciling. Git pull requests are the only way to change running cluster’s state. This helps to bring immutability that is used on container level to infrastructure level and prevents many problems.
Big Bang Of GitOps
I mentioned OpenGitOps as the starting point but this can be a little injustice to the inventors of the naming who also brought together ideas flying around and built the very first GitOps tool. To remember how GitOps term first mentioned I dived into WeaveWorks blog and went back to the ancient times of GitOps(2017). First article written mentioning GitOps has emphasized lots of principles we use today such as everything as code, all desired state must be defined declaratively, automatic drift detection and so on. After this very first article other three articles followed. They were also using automatic reconciling as they mentioned in these articles.
Alexis Richardson(co-founder, CEO of WeaveWorks and writer of these articles) talked about the tools they used on Weave Cloud. He also mentioned Ansiblediff, Terradiff and Kubediff jobs to detect drift on kubernetes cluster and automatically reconcile desired state committed to Git for all kind of resource definitions. Weave Flux one of the best GitOps tools of today(aka Flux) was mentioned with its name back(it changed a few times later on). These four articles included lots of principles we know today as GitOps standards.
Nowhere in these articles Alexis Richardson said they have invented GitOps from ground up. He honored who had contributed to the Continuous Delivery ecosystem for years. (You can find references to all four articles in the references section below. There are also lots of great articles being written by WeaveWorks team on their blog)
We can treat what WeaveWorks did as a drift detection between the best practices of continuous delivery and what was actually being applied on our environments. They reconciled our continuous delivery ecosystem to those best practices. We can say, lots of things mentioned on those articles were the consensus on that time too, but using all of them with the name we use today and mentioning git as the subject matter was the big bang of GitOps.
GitOps Tools
Starting from those days, principles stayed more or less the same but lots of different tools have emerged. Standardization took its place with massive contribution from open source community including tech giants.
Two biggest tools on GitOps area have emerged, Argo CD and Flux. Both are CNCF incubating level projects at the time of this writing and both provide lots of great features with helper tools. GitOps addresses only continuous deployment part of CI/CD, because of that we need other parts from different products. These other parts include progressive delivery tools(blue green deployment, canary deployment) and continuous integration tools, image controllers(wait for new images to be pushed and update resource definitions) and many other.
Argo project family includes Workflows(continuous integration), Events(event driven workflow automation), Rollout(progressive delivery) tools in addition to the CD tool, it also has lots of helper tools built around. On the other hand, Flux project family includes Flagger as progressive delivery tool in addition to Flux itself and also has lots of helper tools under GitOps Toolkit umbrella. Both are great projects when we compare CD tools and if we expand this comparison to other tools Argo family seems more complete because of Argo Workflows(Any CI tool can be used with Argo CD or Flux). It also has a dashboard which doesn’t mean much to most, but definitely helping in some cases. If you have other observability and visualization tools locked and loaded, you do not need that much to a dashboard. Anyway we gave a huge credit to Flux inventors because of their great contribution to born and improvement of GitOps. Here we must appreciate Argo community because of their great work to catch up Flux community.
There are also other GitOps tools which are not as mature as these two but promising indeed. Rancher fleet is one of them. They use GitOps at scale motto and they have the idea to manage a fleet of clusters in their mind like what Rancher has been trying to do for years. But we have to admit it was not that possible without GitOps. You can also manage a fleet of clusters with Argo CD and Flux too. I will personally keep an eye on Rancher Fleet.
There is an explosion on the number of GitOps projects. In addition to the familiar ones there are others like PipeCD, Werf, Atlantis and many. I personally haven’t tried most of them.
Argo CD and Flux has become building blocks of many enterprise PAAS services. To name a few, Azure Arc and WeaveCloud(not suprisingly) uses Flux, Openshift GitOps and Codefresh uses Argo CD. There has to be much more I missed for sure.
Pros And Cons
For any new technology, one of the several questions we have to ask is what problems does it solve. What did prior tools miss? Which new flavors do GitOps tools bring to the table?
Declarative desired state has been used prior to GitOps tools. As we mentioned it is one of the hallmarks of modern IaC tools before GitOps has emerged. Declarative desired state is also applied on Kubernetes itself. It is fair to say, GitOps tools have adopted this practice very well.
Storing desired state with versioning has been also a common paradigm before the GitOps tools but GitOps has taken this approach one step further by forcing the usage of VCS. With other tools desired state can be applied from a developer machine. When GitOps is used, changes must be pushed to a remote Git repository to be deployed.
Other new flavor that comes with GitOps to IaC ecosystem is pulling the desired state from a git repository instead of pushing with a CD tool. This brings a new security hardening way to table. There is an article on the references section talking about this in detail.
We can easily say that biggest improvement GitOps tools bring us is continuous drift detection and reconciling. This is the real game changer obviously. By this feature we can build immutable infrastructure and eliminate human error. Nowadays there are tools that bring this feature to even Terraform universe. Flux Terraform Provider(experimental) and Crossplane Terrajet Providers(in someway since they use terraform providers to generate Kubernetes CRDs) are two of these. We will also talk about these tools in detail in the upcoming parts of this series.
As the famous saying states nicely, there is not a silver bullet. Every technology comes with its pros and cons. We have mentioned pros of GitOps on the previous paragraphs. GitOps tools have solved lots of their problems over the years with the enormous community gathered around Cloud Native ecosystem. If I was writing this post two years ago I would have talked about the problems of storing secrets in a GitOps way, lack of infrastructure management tools in GitOps ecosystem and such. Most of the problems were there for early GitOps tools but they evolved as time goes by. Now we are in a great time to adopt GitOps. There is a great article talking about these problems in the references by Codefresh. Some of the problems mentioned there still exists or promoted solutions do not totally solve the problem. To name a few problems existing today relatively, there is not that much PaaS offerings come from big cloud vendors(we can name Anthos Config Management and Azure Arc here), we need a Kubernetes Cluster to apply GitOps(not a problem for most of the community including me), Git rollback mechanism has been offered for rolling back changes but git has different commands to rollback so it has to be handled with care, GitOps and horizontal pod autoscaler coworking is also another point needs careful thinking. Propagation between different environments is also not supported in GitOps, it has to be handled with another CI/CD tool.
Conclusion
When the aim is to manage everything as code there are lots of challenges have to be overcome. Some questions have to be asked and answered depending on the architecture we are trying to build. How can I deploy and manage lifecycle of many Kubernetes Clusters? What do I do if it is a multi cloud or hybrid cloud environment? How can I keep my secrets in a GitOps friendly way? What about databases? How can we handle GitOps CI integration? Do we prefer raw kubernetes manifests, helm, kustomize, jsonnet or does it really matter? And there are a bunch of other ones. I will talk about all of these in separate posts sampling with different GitOps tools, so upcoming posts will be deep dive into these topics. Of course only a GitOps tool(Argo CD or Flux) can not solve all of these challenges and many other tool have to be integrated with GitOps tools.
References
WeaveWorks First Article - Operations By Pull Request
WeaveWorks Second Article -The GitOps Pipeline
WeaveWorks Third Article -Observability
WeaveWorks Fourth Article -GitOps Compliance and Secure CI/CD
Why is a pull vs a push pipeline important?