Efficient Infrastructure with Containerized Pipelines, Kubernetes and GitOps
This post was first published on The New Stack.
Infrastructure provisioning was largely a manual task until a few years ago. Ordering new hardware, mounting them into racks, as well as installing and configuring the operating system and software, all required humans. Now, in a cloud native world, we are able to get compute within seconds, with just a few clicks. Even bare metal machines do not need much additional time to start up. We can also build advanced networks, define firewall rules or even run our containers without thinking about the underlying software and hardware stack. All of this can be done in a public cloud, private cloud or hybrid cloud — it just doesn’t matter.
We also started automating our whole software development lifecycle by using practices like CI/CD (continuous integration/continuous delivery) to be able to keep up with our fast-changing world. However, we quickly realized our infrastructure, as we know it, wasn’t able to keep up with the pace and wasn’t able to fulfill the needs within the required timeframes. Because of this, we started to adopt development best practices to also speed up operations.
That’s where infrastructure-as-code (IaC) came in. We started implementing IaC to be able to define our whole infrastructure as programmable code. This allowed us to work with our infrastructure definitions in the same way that we had done with code for some time already. We were able to:
- store and version our code in a version control system.
- build our infrastructure out of code.
- document within the code.
- use pull (merge) requests to manage code changes.
Rollbacks and immutable infrastructures were rocket science no more.
With IaC, we began to use different tools and CLIs to support our infrastructure code changes. We then discovered a new constraint: our deployment toolchain couldn’t scale. To be able to manage different kinds of infrastructure, we needed various tools and CLIs.
Automated Deploy Pipelines Arrived
We ended up requiring many local dependencies in order to be able to deploy our infrastructure. “We need to wait for our colleague because they have all the tools required to deploy this change” was something, unfortunately, we began to hear more often. Our main goal of getting faster-delivering infrastructure changes was disrupted. That’s why we started using automation and pipelines with CI/CD practices in mind.
Automation helped us to get back on track and deliver changes as well as to add new infrastructure that fit our timescale needs. Automated deployment pipelines were also instrumental in removing any local dependencies by moving all dependencies of our toolchain into a centralized and managed deployment environment instead of our own managed local machines. With automated deployment pipelines in place, we no longer had any constraints — changes in our infrastructure could be deployed anytime by anyone.
However, consequently, we also acquired pipelines that somehow needed to be defined and managed. Therefore, we decided to use Git as our single source of truth to manage all of our infrastructure. We now use Git to create, manage or remove our infrastructure in an observable and verifiable manner. But, furthermore, Git also enables us to also store any dependencies, such as deployment pipelines and any other related code and definitions. We were finally able to create fully automated and integrated deployment pipelines with a single source of truth.
But the real question is: Are we satisfied? Maybe not all of us. Since we’ve combined all possible dependencies into just one or two worker nodes, this means we’re now dealing with large and complex environments running all of our workloads. Many different toolchains, for example, may need to be available in different versions, and those versions may have different dependencies. You could almost call it a monolith. Wasn’t there a method that could help us to get rid of these big monoliths? Yes, it was containerization.
Then, Containerized Pipelines
A pipeline, as we know it, is divided into different chain links called stages. Those stages can contain a single or multiple jobs. A job describes the commands that need to be executed to achieve the desired outcome. A command can be a binary or a complex toolchain call. Independent of complexity, the tools as well as their dependencies, need to be available on the pipeline worker nodes. Depending on your project, you may also need to choose the right version and path for multiple installed versions.
Containerized pipelines offer the following benefits:
- isolation between pipeline jobs.
- no dependency issues between pipeline jobs.
- immutability, every pipeline job runtime is exactly the same.
- easy scalability.
In a containerized pipeline, every single job will run in a container, based on an image that includes all of the dependencies and a particular version of the toolchains needed by a single project. One of the many advantages is how there will be no conflicts between different jobs in a project or even different project pipelines running on the same node. You can also run this particular pipeline job on any of your pipeline worker nodes because all the needed dependencies are baked into the container image.
This is how a simplified containerized pipeline could look (for the sake of clarity, this example skips various pipeline stages):
Let’s assume we would like to build a pipeline for setting up a managed Kubernetes cluster on a public cloud and to manage some basic Kubernetes resources, such as a Pod Security Policy and role-based access control (RBAC), which should be in place in any production environment.
We would use Terraform to create the managed Kubernetes cluster (Terraform is used as an example, but other tools could be used as well). That means our pipeline job images require a single dependency: the Terraform CLI. In this example, we use GitLab CI, which automatically checks out our latest code from its Git repository (depending on your CI/CD toolchain, you may need to provide this on your own). An example Dockerfile of our pipeline job image could be:
ENV TF_VERSION=0.12.0-r0RUN apk add --update --no-cache ca-certificates terraform=$TF_VERSIONENTRYPOINT ["terraform"]
For our second pipeline job, we only need a second container image providing the kubectl CLI to allow us to communicate with our Kubernetes Cluster and to manage resources. GitLab CI makes this step again much simpler by automatically providing all needed credentials in runtime using its Kubernetes integration (based on your CI/CD toolchain, you might need to mount your credentials in runtime to access your cluster).
As you can see, every pipeline job uses its dedicated container image containing the requirements the single job needs.
Add Kubernetes and GitOps
Speaking of Kubernetes, there is an additional positive side effect containerized pipelines offer when running them on Kubernetes: Containerized pipelines make it easier for us to run our pipeline jobs in different environments by packaging all dependencies into the container image. Running them on Kubernetes will strengthen this upside even more! Kubernetes will abstract everything below, which will make it even easier for us to run our pipelines wherever we want, while also helping us consolidate workloads into fewer environments.
Finally, a quick note about GitOps: automation, as well as Git as a single source of truth, is key. What really distinguishes GitOps is the use of a “pull” CD model, meaning that changes are no longer pushed through a pipeline.
Furthermore, the environment itself will make changes and ensure that the desired state is achieved. This is done through the use of agents that can analyze their environment and know how to get to the desired state. Because of these agents, GitOps is a Kubernetes-only topic so far.
To recap: The way we provide the infrastructure has changed a lot over the last year. The benefits of doing so have increased our efficiency and transparency and will continue to improve and evolve our processes in the future. You, too, can try out these new technologies, and tell me what you think — as I’ve described above, it is possible and is becoming simpler to use as Git and Kubernetes mature.