Recently while implementing a GitOps based strategy, I wrote a post comparing the GitOps push and pull based model and evaluated their different pros and cons. Since then, I worked on an implementation of the process for our project team.
A short introduction to GitOps: Everything that is related to operations in your project is centered in Git and the repo is then used as your single source of truth.
There are many great other great articles that provide an overview of what GitOps is in detail and what the main benefits are. If you go on a quick search on Medium or Google you will find plenty of introduction material.
This article is centering around the next link in the chain, actually implementing all this process for a product. The following section will show my opinions on implementing this process.
All components related to infrastructure are checked into a single git repository for each project or team. This means not only kubernetes templates but furthermore your infra as code like terraform, you’re ansible playbooks, your grafana dashboards, your alerts, docs, you name it.
- You get only one single source of truth, what is actually currently running on your infrastructure and you have one central place where to find runbooks, alert manifests, and documentation.
- You can apply linting and policy checks on this repos CI system, that will allow you to scan everything that goes into your infrastructure systems and apply your own rules. At best this is always as automated as possible, but can also be done by hand at the beginning.
- Disaster recovery is a way easier because GitOps is always centered around Operators that apply the stuff in the repository for you and you have all information and stuff central at one location, at best documented very well. Applying this, reduced our disaster recovery time from 1 day to 50 minutes (A lot of this time is waiting for Infra to be provisioned)
- Central controllability and audit-ability are provided by the git system you are using. If you use for example Gitlab or Github, you can enforce signing of commits by PGP, enforce Pull Request strategies apply policies, extract the git log as audit source and so forth.
A sample structure could be as follows:
/ terraform -> This will be scanned by your terraform operator (Atlantis)
— / modules -> can be extracted globally if required in multiple teams
— / workspace -> the actual folder where terraform is applied from
/ k8s -> this will be scanned by your kubernetes deploy operator (Flux)
— / prod -> your production k8s templates
— / int -> your integration k8s templates
/ templates -> this could contain your customize overlays (triggered at each push to generate stuff in the k8s folder)
— / alerts
— / dashboards
The structure can be as flexible as you need it, the only rule has to be, that each operation on infrastructure is started from this repository.
Automation is key because it speeds you up immense. So if you are using, for example, Prometheus alerts or grafana dashboards, try to automate the creation of dashboards based on generic templates that can be applied for each http service, displaying RED or USE metrics and also lint and validate this resources automated, so they are all suitable for a schema.
A simple example is creating the following RED metric graphs ( The RED Method: key metrics for microservices architecture) as code templates and deploy them to grafana for each newly introduced service with a REST API. The same can be done, as stated above, with Prometheus alerts and also other configs.
Use Operators where possible.
They will keep the burden away from applying all this stuff manually, it is that simple.
- For example, there is an Operator out there, called Atlantis, that can automate your terraform workflow over pipelines by using pull requests. This Operator will apply what you will provide in the repository in the terraform folder, based on pull requests, and with Terradiff you can always control if the desired state equals the declarative.
- To apply kubernetes templates or Helm files, there is an Operator available called Flux and Flux HelmOperator, by Weaveworks. It will constantly apply the cluster state from inside the clusters itself, based on your template folder.
Store your secrets encrypted as close as possible in the repository or at least in a way, that they will be automatically picked up after deployment.
Secrets are still just parts of the deployment, that is why they are required for full disaster recovery for example. If you keep them close, you will be faster and complexity keeps lower. The important step here is to keep them encrypted at all time. The secret is not allowed to be pushed to the repository in an unencrypted form.
- Use an Operator like Bitnami Sealed Secrets), that allows encrypting secrets before pushing them to git and decrypting them when they arrive at the cluster (Maybe Mozilla SOPS is also a good option here)
- Use Hashicorp Vault as central secret store and let your deployments pull their secrets from there
- Use your cloud provider offerings to fulfill this task
- Use something like git-crypt or git-secret to manually encrypt them
GitOps is beside the fancy name, an approach that can make you more efficient and increase your speed of deployment, but there has to be a mind shift in how you want to handle your infrastructure deployments. In my opinion, the strategy from above works best on project scope. If you want something to be added to the list, feel free to comment.
To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇