An Observability Balancing Act With Gitops

Sarjeel Yusuf
6 min readDec 16, 2020

--

As companies intensify their push towards adopting DevOps practices and cultural values, there are several practical methodologies cropping up. One such concept is GitOps which stems from the DevOps need of automating everything and the philosophy of you build it you run it.

The fear of losing out the competition warrants the need to be agile which then leads to a retrospective of how teams and organizations are getting code from ideation to production. DevOps! The motivation is simple faster the release the more chances of securing the survival of your product in the fast-paced moving world of technology. So why is this such a big problem?

Well, the success of an organization or product, leading to the scaling up of the teams and development capacity eventually becomes inimical to further success. Increased development capacity means more deployment coordination, larger more complex systems to pull out of incidents, and more noise when building new features. Long story short, the smaller the team or organization, the quicker they can operate and push out code. Now, this is quite intuitive and is also seen replicated across various industry research.

However, this is not the only problem. A less intuitive issue impeding teams in their efforts to increase their velocity is adapting to new technologies. In this new age of software, cloud development is a popular choice for development. More specifically, Kubernetes as a cloud computing container platform is quickly gaining in popularity.

Kubernetes Usage 2020

Therefore, teams attempting to adapt their development practices to better their performance need to also ensure that these practices comply with their technology stack.

An Overview of GitOps

The aim of GitOps is to promote git as the single source of truth by which all other pipelines in the release cycle can be automatically triggered by git operations. The idea is to replace push-based pipelines with pull-based pipelines enabling developers to perform deployments directly with their pull-requests. This ideology is supported by a simple yet sophisticated infrastructure that kicks off a series of events in the deployment process once developers perform merges or open up pull requests.

For example, GitOps can be achieved with the following stack fo tools:

  • Bitbucket as your Git VCS tool
  • Docker to store your images
  • Amazon S3 to store Helm charts
  • AWS Lambda to pull the charts and commit to the cluster repo
  • Weaveworks Flux to detect changes in the cluster repo and make the appropriate changes
Fig 1: GitOps Infrasturcture

Therefore, with the infrastructure mapped out above, the following overall steps are seen:

  1. CI tools such as Bitbucket pipelines push docker images to hosting tools such as QuayCloud.
  2. Cloud functions copy the configs and helm charts from the master storage bucket to the master git repo.
  3. GitOps operators such as Weaveworks Flux then updates the cluster according to the config charts and pull helm charts by the Lambda function.

Considering what GitOps archives, the benefits are quite evident. One of the first obvious advantages that are achieved with GitOps is the developer experience and achievements in DevOps.

Considering the DevOps goal of breaking down silos, GitOps brings more control to developers, where developers work. The pull-based pipeline model allows developers to trigger the entire release and deployment process from within their code development tools, reducing the need for two things. The need for enhanced expertise in separate release and deployment infrastructure and tools, and the effort in switching between these different tools throughout the software building and operating process. GitOps really does promote the shift-left attitude, consolidating DevOps practices to a handful of tools that developers are familiar with. Hence increasing productivity.

Similarly, there are many more advantages that GitOps achieves, but this is where we also get reminded of the fact that there is no such thing as a silver bullet in software. Alas, we do see some sacrifices made in our endeavor to increase velocity in building our Kubernetes-based systems.

Not All Is Well

Going through the literature and resources regarding GitOps, there is a lot of praise for how it simplifies the otherwise difficult Kubernetes plumbing. The basic idea of deploying to production just by creating the necessary pull-request brings in required automation and ease for software development teams.

However, there is an intrinsic issue that arises as we give up full control of the automated method, and that is the issue of observability. With the more traditional method of going to production, we see the involvement of a team member almost every step of the way. With GitOps we no longer need this proactive involvement. That also means though, that there is a possibility of reduced reliability and assurance in what we are pushing to production.

Hence one thing we see with GitOps is that extensive testing becomes imperative. We need to ensure that whatever we are creating a pull request for does not break production. Nevertheless, no matter what our test coverage is, there is always a possibility of missing some edge cases, or encounter new unintended behaviors with new code deploys.

This is where observability comes into play, adding to the practicality of GitOps by providing the right insights to increase the visibility and assurance of what is being sent to production.

Git With Observability Supports GitOps

As mentioned Git is the single source of truth for how the intended state of the system. Observability, on the other hand, provides the source of truth for the actual state of the system. Therefore, observability provides the required insights for practitioners of GitOps to understand the state of their system.

These insights are in the form of the three pillars of observability which are as follows:

  • Logs — A record of discrete events.
  • Metrics — Statistical numerical data collected and processed within time intervals.
  • Traces — A series of events mapping the path of logic taken.

These three forms of insights allow us to answer the most crucial question which is is the actual state of the intended state after the deployment. This question expands over all facets of the system, including the intended UI, intended configurations, intended architecture, intended behavior, intended resources, and whatnot.

For example, if the intended system is meant to have four Redis instances as per the definitions of the Kubernetes system in the Git repository of your GitOps system, then monitoring tools will periodically check for this. If the required number of Redis clusters is not met, then as per the alerting configuration, diff alerts can be sent.

These diff alerts not only inform us of divergence in the system but also play an integral role in the convergence of the system to the intended state within the scope of GitOps.

Considering our example above, when the system becomes aware of the fact that the actual number of Redis instances do not match with that desired, then diff alerts trigger the Kubernetes convergence operator. The operator then attempts to sync the actual stage with the desired state, leveraging Kubernetes’ convenient convergence property.

Finally, as there are no more ‘diff’ alerts or if there is a ‘converged’ alert then the mechanism can conclude that the actual state has reached the desired state.

In this entire process, the imperative point is that we were initially aware of the difference. After all, we can only fix or sync that which we know has diverged from the intended state. As a result, it can be seen that observability isn’t only a concept to supplement GitOps, but is absolutely necessary for achieving it.

Concluding Remarks on the Case for Observability

As shown, observability provides two crucial capabilities:

  1. To achieve the required assurance and reliability that is needed when the proactive approach is replaced by automation.
  2. To enable the convergence of the system to the desired state due to the interests of automation.

As a result, when considering our initial basic GitOps system illustrated in Fig.1 we need to supplement it with the required observability enabling monitoring tools.

Fig 2: GitOps Infrastructure with Monitoring and Alerting

At the end of the day, considering the benefits of GitOps, it is clear why the concept is gaining in popularity. However, premature adoption is something we need to avoid at all costs, and this can only be ensured by securing the right practices in operating our Kubernetes systems. That means making observability an integral part.

Originally published at https://blog.thundra.io.

--

--

Sarjeel Yusuf

An engineer turned product manager passionate about cloud computing and everything DevOps.Product Manager @Atlassian building DevOps capabilities.