Orchestrating The Orchestrator: Kubernetes Management In Multi-Clustered Environments

Using Terraform, ArgoCD, and Gitops to automate clusters and deployments.

Sami Alakus
11 min readAug 20, 2023
Simplifying complicated environments through automation.

I know people will agree when I say managing a Kubernetes cluster can be difficult. What happens when environments start to expand and require multiple Kubernetes clusters? Simply put, things don’t get any easier.

In this article, I’ll walk you through the Gitops approach I took to help organise my multi-clustered environment and how to streamline deployment processes along the way.

Together we’ll run through some ideas and build out an orchestration platform. We’ll automate the provisioning of cluster infrastructure with Terraform and use ArgoCD to handle application deployments.

Once we’re finished we’ll use a single commit to spin up a new cluster and have our applications deployed there automatically, ready to go.

To keep this article from becoming longer than it needs to be, I won’t be sharing all of the code here. Instead, I’ve created a GitLab group with all repositories relevant to this article. I’m also making use of GitLab CI/CD to automate the majority of actions in this project. For more detail, dig into the repositories.

https://gitlab.com/multi-cluster-orchestration

So, How Do We Orchestrate Kubernetes Clusters?

I’ve created this diagram to better demonstrate how the platform works and what we’ll be looking to build. I’ll break it down as we go.

Our Kubernetes orchestration platform divided into four sections. “Git”, “Kubernetes”, “Infrastructure”, and “Applications”.

Firstly, I want to frame how we need to start thinking about clusters to make this work. Rather than having long living instances that require constant maintenance and upgrades we want to think about immutable pieces of infrastructure that can be spun up and destroyed with ease.

To enable this thinking we’ll implement ArgoCD and automate the deployment of our applications. This means that once a new cluster has been created our applications are deployed to it automatically, leaving the process as hands-off as it can get.

To achieve this we have a solution that is comprised of these key components:

Management Cluster

  • Establishing a central management cluster configured with ArgoCD creates a single point of management instead of across multiple clusters.
  • Worker clusters register with the management cluster, enabling centralised deployments while freeing worker clusters resources to focus on application workloads

Cluster Labelling

  • Worker clusters are labeled in ArgoCD upon registration, providing flexibility in cluster bootstrapping and application deployments
  • Labels have a one-to-one relationship with a Git repository that contains the application manifests in Helm charts
  • Each cluster can use as many labels as required, adding additional flexibility
  • Some examples of ways labels could be used are as follows:

Gitops With ArgoCD

  • Git repositories linked to clusters through labels trigger automated application deployments through ArgoCD.
  • Unique values for each cluster are defined in the directory structure of the repositories which provides support for various environments

Building The Platform

The implementation is divided into two parts. Firstly, we’ll create our application manifest repositories that will be our one-to-one link to the cluster labels. Then we can begin provisioning our clusters using Terraform.

Application Manifest Repositories

The application manifest repositories contain Helm charts that are used to deploy each one of our applications. Each repository uses an app-of-apps pattern that allows our applications to be deployed to a number of different cluster environments.

The bottom section of the diagram shows us the relationship between the manifest repos and deployments to the clusters. We can see our purple global applications are being deployed to all of our clusters, while the green production applications are only deployed to clusters with the production label.

Applications from our application manifest repositories deployed to clusters with the relevant labels.

Each manifest repository will have the same file structure with two root directories:

  • apps - The Helm charts for our applications
  • values - Unique values.yaml files used on specific clusters

Each subdirectory below the root directories serves a purpose as well. Namespaces and application names are defined under the apps dir. The cluster name and application name are used under the values directory to link the correct values file.

See below for an outline of what that looks like:

apps/
└── {namespace}/
└── {application-name}/
├── templates/
├── Chart.yaml
└── values.yaml
values/
└── {cluster}/
└── {application-name}/
└── values.yaml

In the diagram we can see that the production repository above has variables specified for both the eks-production and aks-production clusters allowing them to have unique values like URLs applied.

Creating The Applications

Let’s create a set of repositories (and cluster labels) that we’ll use in our environment. If I think about what applications I want to deploy I get a better idea of the types of labels/app manifest repositories I want.

For example, I want Nginx Ingress and Cert Manager installed on all of my clusters. These applications are defined by the global label and added to the global application manifest repository. When I create a new cluster I add the global label to it and my applications are deployed there automatically.

https://gitlab.com/multi-cluster-orchestration/global-applications

I also want to be able to deploy my web application to my production clusters. These are defined by the production label and added to the production application manifest repository:

https://gitlab.com/multi-cluster-orchestration/production-applications

Finally, I want to create an ingress record for the Argo installation on the management cluster. This means I’ll create a management label for them and the management application manifest repository that will contain a simple Helm chart with an ingress resource:

https://gitlab.com/multi-cluster-orchestration/management-applications

In the next stage, I will be creating a management cluster called do-management. Because of that, I can create a values file that will be unique to that cluster and specify my URL, like so:

# cat values/do-management/argocd-ingress/values.yaml
ingress:
enabled: true
hostname: argocd.example.com
tls:
enabled: true

Infrastructure Provisioning

Now that we have prepared our application manifest repositories we can go about provisioning our cluster infrastructure. We’ll be using Terraform as it enables a cloud-agnostic approach and lets us deploy infrastructure basically anywhere (AWS, GCP, Azure, etc).

Diagram of the platform describing the infrastructure section
Infrastructure being provisioned and labeled through the infrastructure repository.

We can see here that each cluster directory matches up to a cluster created. We can also see that the cluster has been added to ArgoCD. This, and the installation of the ArgoCD instance, is all done during the infrastructure provisioning stages.

For this article, I’ll be creating two clusters:

  1. do-management - My management cluster with ArgoCD deployed onto Digital Ocean
  2. do-production - My Digital Ocean production cluster for my production workloads

For more information and the full code for these clusters see the infrastructure repository here:

https://gitlab.com/multi-cluster-orchestration/infrastructure

Creating the Management Cluster

Creating the management cluster consists of two parts, all created using Terraform:

  1. The management cluster infrastructure itself
  2. The installation of ArgoCD to the management cluster (This will also be where we configure our application manifest repositories)

Add Infrastructure Files

I’ve created the directory clusters/do-management and added the Terraform files there. See the repository for more details, but the core components are as follows:

main.tf

This file is responsible for creating the cluster infrastructure and installing ArgoCD. We pass our cluster labels and application manifest repositories to the Argo module for its configuration.

variables.auto.tfvars

Here we’re defining variables for our management cluster. Notably:

  • name — The name of our new cluster. This will be used to reference custom values in the application manifest repositories
  • cluster_label — How we define which labels we want to add to our cluster
  • repositories — The configuration of our application manifest repositories to ArgoCD during installation. In the future, if we want to add additional labels and repositories to our environment it should be done here.

Commit the code and GitLab CI/CD will run terraform apply to create your management cluster, install ArgoCD, configure the application manifest repositories, and register the cluster to ArgoCD.

Creating the do-management cluster and ArgoCD provisioning the applications.
Successful creation of our do-management cluster in GitLab CI/CD.

The Terraform project outputs a few variables in this step. We’ll use the terraform output to get these three secrets:

  • Kubeconfig — terraform output kubeconfig
  • ArgoCD admin user password (used to login to Argo) — terraform output admin_password
  • ArgoCD automation token — terraform output automation_user_token (This token must be set as a GitLab CI/CD variable for worker cluster registration)

Note: When using Terraform in GitLab you can access your remote state by following this guide.

Viewing Cluster and Applications

We can now look at our new management cluster and see the successful ArgoCD installation:

# kubectl get pods -n argocd
NAME READY STATUS RESTARTS AGE
argocd-application-controller-0 1/1 Running 0 83s
argocd-applicationset-controller-cc494cb48-92c6s 1/1 Running 0 85s
argocd-dex-server-7fc47868f5-vr5g2 1/1 Running 0 85s
argocd-notifications-controller-686647c5d7-bnpvk 1/1 Running 0 86s
argocd-redis-5b5754454b-9pbsl 1/1 Running 0 86s
argocd-repo-server-6cdc496b75-sgqjk 1/1 Running 0 86s
argocd-server-656cbcc798-jdjpp 1/1 Running 0 86s

Gitops Automation

Here’s where the fun comes in. Because we’ve registered our management cluster in ArgoCD, and labeled it, we’ll already have some of our applications deployed.

Here we can see our ingress and cert-manager installations that have come from the global application manifest repository:

# kubectl get pods -n ingress
NAME READY STATUS RESTARTS AGE
ingress-nginx-controller-9cbdcdd9-tcvg2 1/1 Running 0 4m30s

# kubectl get pods -n cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-869f7b446-b92fx 1/1 Running 0 5m8s
cert-manager-cainjector-869d958c8-h2b5n 1/1 Running 0 5m8s
cert-manager-webhook-67cf5854d-bfmgs 1/1 Running 0 5m8s

We can also see our ArgoCD ingress record from the management application manifest repository, with the unique values file applied:

# kubectl get ingress -n argocd
NAME CLASS HOSTS ADDRESS PORTS AGE
argocd-server-ingress nginx argocd.example.com 170.64.123.123 80, 443 3d4h

Note: DNS records will need to be configured for the load balancer before your ingress is accessible. The tool external-dns can be used to automate this process.

ArgoCD

We can now access our ArgoCD installation on the management cluster and see the applications that have been deployed.

ArgoCD installation on the new do-management cluster.

That’s the management cluster setup and running. Now we can begin to stand up worker clusters and register them to ArgoCD.

Creating A Worker Cluster

Like the management cluster, creating a worker cluster also consists of two parts:

  1. The worker cluster infrastructure
  2. Linking the worker to the management cluster’s ArgoCD installation

Add Infrastructure Files

I’ve created the directory clusters/do-production and added the Terraform files in the infrastructure repository. See the repository for more details but again the core components are as follows:

main.tf

This file creates our Digital Ocean worker cluster and uses the argo-cluster module to attach it to the management cluster.

Here’s where we define our management cluster’s ArgoCD URL. Since we’ve created it using the ingress chart in our management repository we’ll use the URL set from that.

The argocd_auth_token variable is using the output from our terraform output automation_user_token command during the management cluster creation. It has been set as a GitLab CD/CD variable in my case and is referenced in the .gitlab-cd.yml file.

variables.auto.tfvars

Here we define our argocd_url, as well as the cluster labels to apply to this cluster.

Once again, we commit our code and we can watch our infrastructure being provisioned in GitLab CI/CD.

Successful run creation of the do-production cluster through GitLab CI/CD.

We’ll also have our cluster labeled and automatically registered with ArgoCD. This diagram demonstrates the workflow that stems from the commit of our do-production cluster files.

Workflow creating the do-production cluster and ArgoCD deploying applications to it.

Now we can monitor ArgoCD and watch our platform fit all the pieces in to place. We should start seeing applications from the global and production application manifest repositories being automatically deployed.

ArgoCD installation with applications deployed to both do-management and do-production clusters.

That’s it. With a couple of commits the platform is coming together and handling a load of work for us. Applications are automatically deployed thanks to ArgoCD and we can begin to send traffic to them straight away.

New clusters are be added by creating a new subdirectory in the infrastructure repository (clusters/{new-cluster}) and the platform will handle everything for us.

Let’s Review

With all of this implemented we’ve managed to create a highly flexible and dynamic environment allowing Kubernetes clusters to be deployed across various clouds.

We’ve framed how we think about clusters and ensure we treat them as immutable pieces of infrastructure.

We’ve also reduced the creation of new cluster infrastructure and the initial deployment of our applications to a single commit.

Now we can imagine the different scenarios in how this platform can be used:

  • A new production cluster is required in a new cloud or region. Simple add the cluster Terraform files and label the cluster production
  • A new staging environment is required for applications in testing. Create a new staging label and repository and either create a new cluster or attach the label to an existing one
  • A new Kubernetes upgrade is available. Deploy a completely new cluster with the production label so we can test our application charts in an isolated environment
  • A new developer is coming on board. We can create a label and app manifest repository for them and add their label to a development cluster or create a new cluster for them to use

Application Deployments

Our platform has now also enabled a Gitops deployment strategy for our applications that enables centralised changes to be propagated across different clusters.

Deployment workflow to deploy a new image to the eks-production cluster

Here we can see the flow for the deployment of the microservice-01 application. We’re changing the container’s tag in the production applications manifest repository from 8d785 to eff5c. We’ve made the change specifically in the eks-production cluster by committing the tag into the values file for that cluster.

What’s Next

This platform by itself should be a jumping-off point to create further improvements and environment-specific changes. Some additional functionality I would introduce in a production environment would include:

  • Hashicorp Vault and the ArgoCD Vault Plugin — This way we can manage and deploy secrets across all of our clusters
  • External DNS — As mentioned previously installing the external DNS tool will enable another layer of automation and handle the DNS configuration for us. This allows for a completely hands-off approach and applications deployed on a new cluster will be accessible immediately.
  • Istio/Linkerd — Service mesh to enable communication across clusters

I hope that with this article I’ve been able to demonstrate a way of working with Kubernetes that can alleviate some of its management complexities.

I’d love to hear your thoughts about this implementation and what you would use or change in your environment.

Thanks a lot for reading, until next time.

--

--

Sami Alakus

Engineering simple solutions to complicated problems.