Kargo Series: Part 3 — Create Kubernetes Clusters at scale with Cluster API

Tech@ProSiebenSat.1
ProSiebenSat.1 Tech Blog
6 min read3 days ago

By Constantin Geisler

This third blog post is part of a series of posts on how we built KARGO — A Container focused Developer Platform at ProSiebenSat.1 Tech & Services GmbH (the internal service provider at ProSiebenSat1). If you want to know more about the decision to do so, or the overall architecture take a look at our previous blog posts.

There are many ways to create an EKS cluster in AWS. Clicking around in the AWS Management Console, using the Amazon command line tools eksctl or awscli. Certainly, using the de facto standard for infrastructure as code, terraform, was high on our list and would have been the way to go if there wasn’t something else that worked better for us.

We, the Platform Engineering & Operations team, have built our own custom vanilla Kubernetes distribution, which creates a lot of work for the team just to maintain the platform so that all components are up-to-date. That was fine when we started this in 2016, but in 2024 that’s a solved problem. So the way to go is moving up the stack. We want to move away from maintaining a few large Kubernetes clusters to being able to deploy as many (smaller) clusters as we want. To achieve this, we need to automate everything. Creating or deleting a cluster needs to be easy and fast. It’s a challenging task to operate and maintain an unknown, but definitely growing number of Kubernetes clusters. We are lucky that there are tools in the CNCF ecosystem that make this possible.

An Introduction to Cluster API

Cluster API (CAPI) is an open source project that provides declarative APIs and tooling to provision and manage Kubernetes clusters. It was originally created by SIG Cluster Lifecycle and has now become a subproject under SIG Architecture. Cluster API makes it easier to manage Kubernetes clusters across different infrastructure providers like AWS, Azure, Google Cloud, vSphere etc. It’s the next level of Kubernetes cluster creation, configuration and management and is built on top of previous built cluster managers such as kops and kubicorn.

Why use Cluster API for creating Kubernetes Clusters?

We chose Cluster API to create EKS clusters because it uses Kubernetes style APIs, and we feel very comfortable with this approach. Since 2016, the team has been working with Kubernetes, so working with YAML files is something we are very used to. Also, the architecture of Cluster API, which is robust and also flexible, led to this decision.

How the Cluster API works

To get started with CAPI, the first thing you need is a Kubernetes cluster. Wait, what? I thought I would be creating my Kubernetes cluster with CAPI? Yes, but these are the so-called workload clusters. First, you need a management cluster.

Okay, so you need Kubernetes to deploy your Kubernetes. Well, okay. That sounds pretty much like the good old chicken-and-egg problem.

The chicken egg problem (Picture by Constantin Geisler, via Bing — https://www.freepik.com/)

Let me explain our approach:

We found a nice Linux distribution for hosting Kubernetes clusters called Talos Linux. We use Talos to create a management cluster on EC2 instances in AWS. The management cluster manages the lifecycle of the workload clusters and hosts the Cluster API controllers. It is the “brain”. A management cluster is also where one or more providers run and where resources such as machines are stored.

Figure 1 — Talos overview (Picture by the author, drawn with draw.io)

Architecture

The following picture shows the architecture of Cluster API:

Cluster API concept (from https://cluster-api.sigs.k8s.io/user/concepts)

At its core, the Cluster API uses custom resources to define the desired state of a Kubernetes cluster. Here’s a breakdown of the key components:

  • Cluster: The central resource that represents the entire cluster.
  • Machines: These are the worker nodes (or VMs) that run your applications.
  • MachineDeployments: These manage the lifecycle and scaling of a set of machines.
  • MachineSets: These ensure a stable set of replica Machines are running at any given time.

With these resources, you can use familiar Kubernetes commands to manage your clusters, making the experience consistent and scalable.

To run workloads in AWS EKS it is necessary to install the AWS provider which is called CAPA. With the running management cluster we are able to deploy as many workload clusters as we need.

Cluster API in action

To see how infrastructure resources are defined, let’s look at a simple example:

Defining a Cluster:

These examples highlight how you can streamline cluster management across different environments by using Kubernetes-native constructs to define and manipulate your clusters. We’ve created a helm chart, which is used to install all resources into separate AWS accounts with a single command. Before we can install a new workload cluster, we run Terraform to prepare the AWS account with all the necessary IAM permissions for Cluster API and also for a set of basic tools (e.g. Cert Manager) that are installed on each workload cluster. To install the aforementioned basic tools, we also install Flux CD for our GitOps flow, so that all additional tools and applications are installed on each workload cluster without any further interaction from our side. There will be a separate blog post on Flux CD, stay tuned.

Pros of Cluster API

Let’s look at some of the benefits that the Cluster API brings to the table:

  • Declarative syntax: Define your cluster in YAML and manage it with kubectl, just like you would with other Kubernetes resources.
  • Infrastructure agnostic: Cluster API works with public clouds, on-premises VMs, and even bare metal, providing a consistent API across different environments.
  • Scalability: Easily scale your clusters up or down as demand changes.
  • Version management: Simplify upgrades and rollbacks of cluster versions.
  • Consistency: Standardize cluster deployments, reducing the likelihood of environment-specific issues.
  • Community-driven: With the backing of the Kubernetes SIGs, the Cluster API is supported by a strong community.

Cons of Cluster API

Despite its promise, the Cluster API isn’t without its challenges:

  • Learning curve: New users may need to climb the learning curve of Cluster API concepts and resources.
  • Project maturity: Being relatively new, Cluster API may not be as mature as other Kubernetes tools and some features are still evolving.
  • Integration with existing tools: Not all existing tools and workflows are fully compatible with the Cluster API yet.

Conclusion

Cluster API provides a powerful way to provision and manage Kubernetes clusters declaratively. With support for multiple infrastructure providers, it makes it easy to consistently deploy clusters across environments. The modular architecture allows for flexibility and extensibility. And overall, we feel like Cluster API is a very useful project for anyone who needs to operate Kubernetes at scale.

Related Blog posts:

  • Moving up the stack (https://medium.com/tech-p7s1/moving-up-the-stack-c680cebe234c)
  • Kargo — a container focused developer platform on AWS (https://medium.com/tech-p7s1/kargo-a-container-focused-developer-platform-on-aws-0bdc5262fa46)
  • Exposing workloads on EKS — (https://medium.com/tech-p7s1/exposing-workloads-on-eks-0ec39acd5fa9)

--

--