Kubeception with CAPOCI — ClusterAPI for OCI Part 2

Ali Mukadam
Oracle Developers
Published in
10 min readMar 3, 2023

I have previously written about using Cluster API (CAPI for short) on OCI (CAPOCI). Until recently, all you could use CAPOCI for is to provision self-managed Kubernetes clusters on OCI.

Come again, what is a Kubernetes self-managed cluster?

In the Kubernetes world, if you examine them using provisioning method as a parameter, there are 2 types of clusters:

  • Self-managed
  • Managed

Self-managed clusters are those that you install and manage yourself. You have full control over self-managed clusters, including access to the control plane, the underlying infrastructure, storage, networking etc. However, with great control also comes great responsibility and you have to handle its entire lifecycle, including installing packages, generating certificates, building images, infrastructure, scaling, handling security and the entire tooling and operation that goes into its provisioning until its retirement. In other words, you are repeating Kubernetes The Hard Way every time you need to build a cluster and then some. I’m stretching this quite a bit and it’s the kind of thing I occasionally do on Saturday nights just for the high of tinkering with technology, or to convince myself I can still hack it. But you get my point: this is still a very significant endeavour, let alone in an enterprise setting.

On the other hand, managed clusters are those that are provisioned and managed by your infrastructure provider e.g. Oracle Container Engine (OKE), EKS, GKE etc. You don’t usually have full control over managed clusters e.g. most managed clusters don’t give you access to the control plane. Your infrastructure provider also handles the entire lifecycle of the cluster and you usually only need to make higher level API calls to initiate its provisioning, scaling or retirement instead of having to go into the trenches.

I hear you already: What if a system integrator or third-party vendor installs it for me? Well, I would argue it’s still self-managed. Your vendor is handling the less silky tasks for you but they still need to manage it.

The type of clusters you deploy are essentially a tradeoff decision. In my not so humble opinion, managed clusters should be able to satisfy your requirements in the overwhelming majority of cases. On OCI, this is OKE for you. That’s because it’s battle-tested, we have both internal and external customers doing all kinds of funky things with it and we are constantly working to improve it.

Now, occasionally, you may need to scratch an itch or meet some specific requirements (technical, regulatory, other) which your provider’s managed service is unable to support, although by and large, these are rapidly shrinking. This is the situation where you pull out the self-managed cluster from the drawer so you have full control over the entire stack. You can most obviously use Terraform or other IaC tools for this. The challenge comes when you have to repeat this on multiple infrastructure providers. Each Terraform provider have their own implementation e.g. OCI, AWS, Azure etc and in a world increasingly moving towards multi-cloud, this constitutes a significant effort. This is where CAPI shines in its ability to provide a common experience across all infrastructure providers. CAPI is like the One Ring: It rules them all.

All of this is just a long-winded way to announce that with the release of CAPOCI 0.7.0, you can now use CAPI to provision workload clusters that use OKE too and not just self-managed clusters on OCI.

Time to take this for a spin. If you are not familiar with CAPI, you can read the docs or my previous pontification on it for an abridge summary. You will also need a management cluster before you do anything. You can provision the management cluster anywhere. For the purpose of this article, my management cluster is conveniently on OKE.

Install clusterctl

If you provisioned your management cluster with https://github.com/oracle-terraform-modules/terraform-oci-oke, ssh to the operator host to run the following commands.

Install the latest version of clusterctl:

curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.3.5/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl
sudo mv clusterctl /usr/local/bin

Set the environment variables:

export OCI_TENANCY_ID=<insert-tenancy-id-here>
export OCI_USER_ID=<insert-user-ocid-here>
export OCI_CREDENTIALS_FINGERPRINT=<insert-fingerprint-here>
export OCI_REGION=<insert-region-here>
export OCI_TENANCY_ID_B64="$(echo -n "$OCI_TENANCY_ID" | base64 | tr -d '\n')"
export OCI_CREDENTIALS_FINGERPRINT_B64="$(echo -n "$OCI_CREDENTIALS_FINGERPRINT" | base64 | tr -d '\n')"
export OCI_USER_ID_B64="$(echo -n "$OCI_USER_ID" | base64 | tr -d '\n')"
export OCI_REGION_B64="$(echo -n "$OCI_REGION" | base64 | tr -d '\n')"
export OCI_CREDENTIALS_KEY_B64=$(base64 < <insert-path-to-api-private-key-file-here> | tr -d '\n')
# if Passphrase is present
export OCI_CREDENTIALS_PASSPHRASE=<insert-passphrase-here>
export OCI_CREDENTIALS_PASSPHRASE_B64="$(echo -n "$OCI_CREDENTIALS_PASSPHRASE" | base64 | tr -d '\n')"

You can now initialize the management cluster:

# for oke only
export EXP_MACHINE_POOL=true
export EXP_OKE=true

# for all clusters
clusterctl init --infrastructure oci

You should see the following output:

Fetching providers
Installing cert-manager Version="v1.11.0"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.3.5" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v1.3.5" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v1.3.5" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-oci" Version="v0.7.0" TargetNamespace="cluster-api-provider-oci-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -

Verify the pods are all in the running state:

kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-77d6bd5886-4k9w7 1/1 Running 0 96s
capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-84c7c88ffd-l8gqw 1/1 Running 0 92s
capi-system capi-controller-manager-5bcb4fbdc6-wrzpx 1/1 Running 0 99s
cert-manager cert-manager-6499989f7-rsbhr 1/1 Running 1 (111s ago) 2m4s
cert-manager cert-manager-cainjector-645b688547-rtp9p 1/1 Running 0 2m4s
cert-manager cert-manager-webhook-6b7f49999f-vc6pt 1/1 Running 0 2m4s
cluster-api-provider-oci-system capoci-controller-manager-ddc784f67-s2vhl 1/1 Running 0 86s 1/1 Running 0 3h47m

Set the additional environment variables to create the workload cluster:

export OCI_COMPARTMENT_ID=
export OCI_MANAGED_NODE_IMAGE_ID=
export OCI_MANAGED_NODE_SHAPE=VM.Standard.E4.Flex
export OCI_MANAGED_NODE_MACHINE_TYPE_OCPUS=2
export OCI_SSH_KEY=$(cat /home/opc/.ssh/id_rsa.pub)
export CLUSTER_NAME=capoke
export KUBERNETES_VERSION=v1.25.4
export NODE_MACHINE_COUNT=3

For worker node images, you can even pick Optimized OKE images from here. Make sure you pick the image from the right region.

We can now create the OKE cluster:

clusterctl generate cluster capoke --flavor managed | kubectl apply -f -

cluster.cluster.x-k8s.io/capoke created
ocimanagedcluster.infrastructure.cluster.x-k8s.io/capoke created
ocimanagedcontrolplane.infrastructure.cluster.x-k8s.io/capoke created
machinepool.cluster.x-k8s.io/capoke-mp-0 created
ocimanagedmachinepool.infrastructure.cluster.x-k8s.io/capoke-mp-0 created

Get the controller manager pod name:

kubectl -n cluster-api-provider-oci-system get pods

NAME READY STATUS RESTARTS AGE
capoci-controller-manager-ddc784f67-s2vhl 1/1 Running 0 9m34s

Tail it to see what’s going on:

kubectl -n cluster-api-provider-oci-system logs -f capoci-controller-manager-ddc784f67-s2vhl

I0302 22:30:40.454173 1 ocimanagedcluster_controller.go:145] "Closing managed cluster scope" controller="ocimanagedcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OCIManagedCluster" OCIManagedCluster="default/capoke" namespace="default" name="capoke" reconcileID=562d6820-4736-4f8c-9adb-fbe36ab70fb6 OCIManagedCluster="default/capoke"
I0302 22:31:02.490370 1 ocimanagedcluster_controlplane_controller.go:74] "Inside managed control plane reconciler" controller="ocimanagedcontrolplane" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OCIManagedControlPlane" OCIManagedControlPlane="default/capoke" namespace="default" name="capoke" reconcileID=3e096c5a-003b-4992-93b2-0824fd0bed36 OCIManagedCluster="default/capoke"

We can see it’s creating the cluster after it has created the underlying infrastructure. In the OCI Console, we can also see the cluster is now being created:

When the OKE cluster is active, use the OCI Console to copy the command to retrieve the kubeconfig:

oci ce cluster create-kubeconfig --cluster-id <cluster-id> --file $HOME/.kube/config --region <region>--token-version 2.0.0  --kube-endpoint PUBLIC_ENDPOINT

On the operator host, run kubectx:

[opc@gh-operator ~]$ ktx
context-cyssi2ijeua
oke

The context named ‘oke’ was what I initially provisioned with Terraform and using as the management cluster. The one named ‘context-cyssi2ijeua’ is what we just provisioned using CAPI.

Let’s rename the contexts:

ktx mgmt=oke
ktx capoke=context-cyssi2ijeua

Once the node pool is ready, get a list of nodes:

[opc@gh-operator ~]$ k get nodes
NAME STATUS ROLES AGE VERSION
10.0.65.209 Ready node 8m18s v1.25.4
10.0.67.244 Ready node 8m58s v1.25.4
10.0.78.177 Ready node 8m21s v1.25.4

Now, OKE has the concept of node pools. A node pool is essentially a group of compute instances that have the same configuration and function as worker nodes for a cluster. I wrote about them in more details here. In CAPOCI, this is mapped to:

machinepools.cluster.x-k8s.io/v1beta1 

or simple machinepools. There’s even a helpful, shorthand version for it (mp). Let’s see the list of node pools from a CAPOCI perspective:

kubectx mgmt
kubectl get mp

NAME CLUSTER REPLICAS PHASE AGE VERSION
capoke-mp-0 capoke 3 Running 41m v1.25.4

Let’s verify in the OCI Console:

The cluster and node pool names and size all match. We have now successfully provisioned an OKE cluster using CAPI.

Scaling the OKE cluster

By default, the managed flavour template comes with a single node pool of 3 worker nodes. What if I want to scale this node pool? Or add additional node pools? So, instead of generating and running apply as we did earlier, we’ll grab the manifest first:

clusterctl generate cluster capoke --flavor managed > capoke.yaml

We’ll now do the following operations:

  1. Scale the mp-0 node pool to 5 worker nodes.
  2. Add a new node pool mp-1 with 2 worker nodes.

Edit the capoke.yaml. Locate the following annotation and remove it:

  annotations:
"cluster.x-k8s.io/replicas-managed-by": ""

Locate the replicas and increase it to 5:

replicas: 5

Copy the MachinePool and OCIManagedMachinePool objects and paste them at the end of the manifest. Ensure you change the imageId to the one available in the region you want to use.

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
name: capoke-mp-1
namespace: default
spec:
clusterName: capoke
replicas: 3
template:
spec:
bootstrap:
dataSecretName: ""
clusterName: capoke
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OCIManagedMachinePool
name: capoke-mp-1
version: v1.25.4
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OCIManagedMachinePool
metadata:
name: capoke-mp-1
namespace: default
spec:
nodeMetadata:
user_data: IyEvYmluL2Jhc2gKY3VybCAtLWZhaWwgLUggIkF1dGhvcml6YXRpb246IEJlYXJlciBPcmFjbGUiIC1MMCBodHRwOi8vMTY5LjI1NC4xNjkuMjU0L29wYy92Mi9pbnN0YW5jZS9tZXRhZGF0YS9va2VfaW5pdF9zY3JpcHQgfCBiYXNlNjQgLS1kZWNvZGUgP$ nodeShapeConfig:
ocpus: "2"
nodeSourceViaImage:
bootVolumeSizeInGBs: 50
imageId: ocid1.image.oc1.ap-melbourne-1.aaaaaaaassrxoh3or2aszqkpnfq4r6biqbgyyhgp6ag5xwpsywuvp4ra54zq
sshPublicKey: ""
version: v1.25.4

Now, instead of clusterctl, we’ll use the venerable kubectl:

kubectl apply -f capoke.yaml

cluster.cluster.x-k8s.io/capoke unchanged
ocimanagedcluster.infrastructure.cluster.x-k8s.io/capoke unchanged
ocimanagedcontrolplane.infrastructure.cluster.x-k8s.io/capoke unchanged
machinepool.cluster.x-k8s.io/capoke-mp-0 unchanged
ocimanagedmachinepool.infrastructure.cluster.x-k8s.io/capoke-mp-0 configured
machinepool.cluster.x-k8s.io/capoke-mp-1 created

Let’s verify by first getting the list of node pools:

[opc@gh-operator ~]$ k get mp
NAME CLUSTER REPLICAS PHASE AGE VERSION
capoke-mp-0 capoke 3 ScalingUp 61m v1.25.4
capoke-mp-1 capoke Provisioning 24s v1.25.4

In the OCI Console, we can see the 2 node pools scaling and being provisioned respectively:

Similarly you can scale back or perform other operations on your OKE cluster.

Using GitOps

Being able to use kubectl in that last step raises some tantalizing possibilities, namely to use GitOps principles to manage Kubernetes clusters wherever they are.

Instead of manually running clusterctl or kubectl, we want to check our code into a git repo and use something like ArgoCD or Flux automatically deploy and maintain it for us. I’ve written about running ArgoCD on OKE before.

# install argocd
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# download the cli
curl -L https://github.com/argoproj/argo-cd/releases/download/v2.6.3/argocd-linux-amd64 -o argocd
chmod +x argocd
sudo mv argocd /usr/local/bin

# get the argocd initial admin password
kns argocd
argocd admin initial-password

#port-forward to the argocd ui
kubectl port-forward svc/argocd-server -n argocd 8080:443

Add the following to a github repo, making sure you replace the compartment and image ids:

We can now use the ArgoCD UI and the manifest in the repo to create an application by pointing it to to the GitHub repo:

Hit ‘Sync’ and ‘Synchronize’:

We can now watch the cluster being created in Argo CD UI:

In the OCI Console, we can see this cluster being created too.

As you can see, we are now treating an infrastructure project as just another application and in doing so meet the GitOps principles:

  1. Declarative
  2. Versioned and immutable
  3. Pulled automatically
  4. Continuously reconciled

Cluster API and multi-cloud

In this post, we have seen the provisioning of an OKE cluster on OCI using CAPI. We can similarly use the CAPI implementation of other cloud providers to provision self-managed or managed Kubernetes clusters and deploy your workload there. Or if you are already using CAPI to provision your Kubernetes clusters, you can now also provision OKE with it. 1 cluster to rule them all. But this post is already lengthy as it is, so we’ll defer this to a future post.

Summary

Although this is the first release of CAPOCI with experimental OKE support, I hope this post excites you enough about the possibilities of what you can achieve with it that you’ll take it for your own spin. If you find any issues, have ideas for further improvement, do reach out to us on GitHub or on Slack.

I would like to thank my colleagues Shyram Radhakrishnan and Stuart Turner for their considerable help during the writing of this article.

References:

  1. https://piotrminkowski.com/2021/12/03/create-kubernetes-clusters-with-cluster-api-and-argocd/
  2. How Adobe Planned For Scale With Argo CD, Cluster API, And VCluster — Joseph Sandoval & Dan Garfield
  3. https://opengitops.dev/

If you’re curious about the goings-on of Oracle Developers in their natural habitat, come join us on our public Slack channel! We don’t mind being your fish bowl 🐠

--

--