Building a Kubernetes Operator in GO: Reconciling our PdfDoc CRD for Converting Text to PDF files

Sabuj Jana
11 min readMay 12, 2023

--

Kubernetes Operators are powerful tools that allow us to use custom resources to manage state in the cluster. They will watch the state changes to your Custom Resources (CRs), that you have defined in Custom Resource Definitions (CRDs). Any changes to your CRD will be reconciled by the operator to update the state of the cluster.

And the best part about it, we can use Golang to write these operators, design and tune them to our own desires. The possibilities are endless!

In this blog, we will focus on writing our own custom operator (using the kubebuilder framework) that will watch our own custom CRD (that we will create) and update the cluster state as required.

The entire code is available here: https://github.com/JanaSabuj/PdfDocCrdOperator

Let us first visualise what we want to achieve.

Architecture

Operator Flow

Our k8s cluster will be pre-installed with our PdfDoc Custom Resource Definition (CRD).

A User will come and create a PdfDoc Custom Resource(CR) in our cluster, which will have a DocName and the corresponding RawText. As soon as he applies the CRD, our custom operator will kick in to reconcile the CR.

Our PdfDoc operator will pick up the PdfDoc CR spec and start a k8s Job which will spawn a Pod. The Pod will have 2 Init containers and a main container (we will explore later what they do).

Eventually, the final PDF file will be generated and saved in the shared volume.

Steps

Let’s break down the steps we will follow to build this flow :

  1. Scaffold the entire project using kubebuilder.
  2. Create a PdfDoc CRD that has 2 fields: DocName and RawText.
  3. Write the Reconcile function of our operator.
  4. Make the manifests and install the CRDs in the cluster.
  5. Test the operator functionality by creating a sample PdfDoc CR and pull out the corresponding PDFs from the Shared Volume.
  6. Build the operator image, upload to DockerHub and deploy the operator in the cluster.

1. Scaffolding an operator project using kubebuilder

Have you used Spring Boot before ? If yes, then you would be familiar with how the Spring Initializr spawns a bare minimum Spring Boot project in the matter of a few clicks.

For building K8s operators, in comes the kubebuilder project. By simply applying a few commands, we will be bootstrapped with a starter k8s operator project and can start writing the operator logic directly, without worrying too much about the authentication, roles, deployments etc.

The high level of abstraction kubebuilder bootstrap provides is the reason it has become the de facto scaffolder for k8s operators.

a. Init

kubebuilder init 
--domain janasabuj.github.io
--repo github.com/JanaSabuj/PdfDocCrdOperator
init command

This will initialise a new kubuilder project. The domain will be used to update the Kubernetes API endpoint path for our custom CRD (we will see later).

b. Creating our CRD Api Definition

Let us design the schema for the CRD we want. We will name the CRD as PdfDoc and it will contain 2 fields:

  • docName — the name of the PDF file we eventually want
  • rawText — the raw text that will eventually be converted into PDF

Now that we have the schema design in mind, let us ask kubebuilder to give us the platform for writing this api definition.

> kubebuilder create api --group customtools --version v1 --kind PdfDoc
  • Let us name our CRD group customtools, with version v1 (preferably, should name it v1alpha) and kind as the name of the CRD.
create api
  • We ask the cli to create the Resource ( which is the CRD) and also set up a controller ( the functional component that will be watching for updates to our CRD)

Let us look at the file structure of our project after this.

operator tree

We will tinker with very few parts of this structure to get our operator up and running. The rest will be taken care of by kubebuilder.

  • api/v1 — will contain our CRD specs
  • The main controller logic will be in internal/controller
  • Sample CRs will be added in config/samples

c. Modifying the CRD specs

In the file api/v1/pdfdoc_types.go, we will update our desired CRD spec.

// PdfDocSpec defines the desired state of PdfDoc
type PdfDocSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file

// Foo is an example field of PdfDoc. Edit pdfdoc_types.go to remove/update
//Foo string `json:"foo,omitempty"`

// DocName: defines the name of the PDF document
DocName string `json:"docName,omitempty"`
// RawText: defines the markdown content of the PDF document
RawText string `json:"rawText,omitempty"`
}
# customtools_v1_pdfdoc.yaml
apiVersion: customtools.janasabuj.github.io/v1
kind: PdfDoc
metadata:
labels:
app.kubernetes.io/name: pdfdoc
app.kubernetes.io/instance: pdfdoc-sample
app.kubernetes.io/part-of: pdfdoccrdoperator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: pdfdoccrdoperator
name: pdfdoc-sample
spec:
# TODO(user): Add fields here
docName: sabuj-document
rawText: |
# Shakespeare
---
> Shall I compare thee to a summer's day?
> Thou art more lovely and more temperate:
> Rough winds do shake the darling buds of May,
> And summer's lease hath all too short a date:
> Sometime too hot the eye of heaven shines,
> And often is his gold complexion dimmed;
> And every fair from fair sometime declines,
> By chance, or nature's changing course untrimmed:
> But thy eternal summer shall not fade
> Nor lose possession of that fair thou owest;
> Nor shall Death brag thou wanderest in his shade,
> When in eternal lines to time thou growest:
> So long as men can breathe or eyes can see,
> So long lives this, and this gives life to thee.
---
THE END
  • Once the operator is up and running in the cluster, we will be applying this CRD and checking if the desired PDF file is generated or not.

Writing the Reconciler (Crux of the Operator)

Now we know what resource we will watch in the cluster. Time to write the logic for what we will do once we capture such a resource event i.e, how will we react and update the cluster state once such a CRD is generated in the cluster.

Plan

The Reconcile function can be found here: https://github.com/JanaSabuj/PdfDocCrdOperator/blob/81eb3d8b8ca25072cb22ef96d0b50400c95a552d/internal/controller/pdfdoc_controller.go#L54

func (r *PdfDocReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
lg := log.FromContext(ctx)
lg.WithValues("PdfDoc", req.NamespacedName)

// TODO(user): your logic here
// get the PdfDoc
var pdfDoc janasabujgithubiov1.PdfDoc
if err := r.Get(ctx, req.NamespacedName, &pdfDoc); err != nil {
lg.Error(err, "Unable to fetch PdfDocument")
return ctrl.Result{}, client.IgnoreNotFound(err)
}

// create the JobSpec
jobSpec, err := r.CreateJobSpec(pdfDoc)
if err != nil {
lg.Error(err, "failed to create the desired Job Spec")
return ctrl.Result{}, client.IgnoreNotFound(err)
}

// create the Job
if err := r.Create(ctx, &jobSpec); err != nil {
lg.Error(err, "Unable to create Job")
return ctrl.Result{}, client.IgnoreNotFound(err)
}

return ctrl.Result{}, nil
}
  • I have annotated with comments for better understanding.
flow of controller Reconcile function
  • On getting the CRD, a Job is created which in turn emits a Pod with init containers and a main container.

Understanding the JobSpec code

The Job creation function has been abstracted out in a separate CreateJobSpec function: https://github.com/JanaSabuj/PdfDocCrdOperator/blob/81eb3d8b8ca25072cb22ef96d0b50400c95a552d/internal/controller/pdfdoc_controller.go#L89

func (r *PdfDocReconciler) CreateJobSpec(doc customtoolsv1.PdfDoc) (batchv1.Job, error) {

fmt.Println(doc.Spec.DocName, doc.Spec.RawText, "being transformed....")
encodedText := base64.StdEncoding.EncodeToString([]byte(doc.Spec.RawText))
docName := doc.Spec.DocName

// init1 - base64 encode the data and dump to volume
initContainer1 := corev1.Container{
Name: "text-to-md",
Image: "alpine",
Command: []string{"/bin/sh"},
Args: []string{
"-c",
fmt.Sprintf("echo %s | base64 -d >> /data/%s.md", encodedText, docName),
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "dumpbox",
MountPath: "/data",
},
},
}

// 2nd init container - decode the dumped text and convert to pdf
initContainer2 := corev1.Container{
Name: "md-to-pdf",
Image: "auchida/pandoc",
Command: []string{"/bin/sh"},
Args: []string{
"-c",
fmt.Sprintf("pandoc -V documentclass=ltjsarticle --pdf-engine=lualatex -o /opt/docs/%s.pdf /opt/docs/%s.md", docName, docName),
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "dumpbox",
MountPath: "/opt/docs",
},
},
}

// main container
mainContainer := corev1.Container{
Name: "mainc",
Image: "alpine",
Command: []string{"/bin/sh", "-c", "sleep 3600"},
VolumeMounts: []corev1.VolumeMount{
{
Name: "dumpbox",
MountPath: "/data",
},
},
}

// volume
volume := corev1.Volume{
Name: "dumpbox",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
}

// job
job := batchv1.Job{
ObjectMeta: metav1.ObjectMeta{
Name: doc.Spec.DocName + "-job",
Namespace: doc.Namespace,
},
Spec: batchv1.JobSpec{
Template: corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
InitContainers: []corev1.Container{initContainer1, initContainer2},
Containers: []corev1.Container{mainContainer},
Volumes: []corev1.Volume{volume},
RestartPolicy: "OnFailure",
},
},
},
}

return job, nil
}

Let us visualise the Pod Flow using a diagram:

Job Pod Flow
  • Init container 1: The text-to-md init container will take the rawText field from the PdfDoc spec and save the file as .md in the dumpbox shared volume.
  • Init container 2: As soon as the prev container finishes, the md-to-pdf container will kick in. It is running a pandoc image which is used to convert the text document to pdf document. The pdf file is then stored in the dumpbox shared volume.
  • Main container: This container is simply sleeping for 1 hr. We will exec into this container and scp (secure copy) the pdf file into our local desktop.

Generating the Manifest files

Now that we have the CRD and the controller in place, let us generate the appropriate the WebhookConfiguration, ClusterRole and CustomResourceDefinition objects for our controller. Kubebuilder does it for us in one command.

> make manifests

Installing the CRD manifests

Let us install the CRD manifests into the cluster. Kubebuilder again does it for us in one command.

> make install
install the CRDs

Let us verify that the CRD was actually installed into the cluster.

pdfdoc CRD

Testing the operator

Let us test the functionality of this operator before we build an image out of it and deploy it inside a Pod.

  • Ensure that your current terminal session is pointing to your desired Kubernetes cluster.
> go run ./cmd/main.go

This will spin up the operator in the cluster. If you see the logs, you will see that it is waiting for a custom PdfDoc CR to be generated.

Generating a custom PdfDoc CR

We will be applying the sample CRD customtools.janasabuj.github.io_pdfdocs.yaml that we had edited, at the beginning of the blog.

apply the CR

Let us check whether the corresponding Job and the Pods have been generated by the operator. You can also look at the operator logs for better tracking.

kubectl get all

As you can see, the Job has been created and the Pod has already started with the Init containers. It is visible in Init:1/2 .

After sometime, on re-checking, we get

k get all

The Pod is up and running, meaning that the Init containers have finished. Only the main container is running now. The Job will complete, once the main container exits ( after 3600s).

Copying the PDF file out from the Pod

Let us copy the Pdf file out of the Pod. We know that the file will be /data/sabuj-document.pdf.

kubectl copy

Let us open the Pdf file now.

PDF file

The PDF file looks good. Hence, we can conclude that our controller is working fine !

Deploying the controller image in the cluster

First, let us build the Docker image of the controller. Kubebuilder makes it as easy as 2 commands.

- sudo make docker-build IMG=sabujjana/pdfcrddoc-operator:v1

docker-build

I will be uploading the image to my public DockerHub account.

- sudo make docker-push IMG=sabujjana/pdfcrddoc-operator:v1

docker-push

Creating a release folder for the final deployment

Ref: https://github.com/JanaSabuj/PdfDocCrdOperator/tree/main/release

We will not need all the files for the deployment, now that we have the image. Let us jot down what all YAMLs we need for the controller image deployment in our cluster.

# release files needed

1. ns.yaml # for the namespace (new file)
2. role.yaml # for the operator roles (taken from the project)
3. role_binding.yaml # for the operator rolebindings (taken from the project)
4. service_account.yaml # SA for the controller (taken from the project)
5. customtools.janasabuj.github.io_pdfdocs.yaml # CRD ( taken from the project)
6. controller.yaml # for the actual controller (new file)
  • The files 2,3 and 4 are found in config/rbac/ path.
  • Just a small modification in role.yaml , since our operator is also creating aJob, we need a RBAC for that too. This was not generated by Kubebuilder and hence we need to add it explicitly.
# role.yaml (tail)
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
  • The file 5 is found in config/crd/bases/customtools.janasabuj.github.io_pdfdocs.yaml
  • If you look inside the service_account.yaml file, you will see that the SA is created in the namespace namedsystem. So, we will create a ns.yaml for system.
# ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: system
  • Similarly, let us create a controller.yaml for the controller image.
# controller.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pdfcrddoc-operator-deployment
namespace: system
spec:
replicas: 1
selector:
matchLabels:
app: pdfcrddoc-operator
template:
metadata:
labels:
app: pdfcrddoc-operator
spec:
containers:
- name: pdfcrddoc-operator
image: sabujjana/pdfcrddoc-operator:v1
imagePullPolicy: IfNotPresent
serviceAccountName: controller-manager
  • The final release folder looks like this (the only folder we need from now to deploy this operator in any cluster)
release folder

Deploying the controller in the cluster

Set your cluster context appropriately and apply all the files in this release folder.

After this, you will see that the controller has been deployed as a single Pod in the system namespace.

apply the release folder

Now, you will find that a system ns has been created and you’ll find the controller pod in the namespace.

Controller Pod

Now, go ahead and create any new PdfDoc CR in the cluster and the operator would automatically kick in!

That is the power of K8s operators. You are now operating the cluster with your own Go code logic.

Conclusion

This was a beginner level operator that we wrote from scratch using kubebuilder. In future blogs. we will focus more on writing advanced operators.

Follow my Medium space for more such writings!

Github code : https://github.com/JanaSabuj/PdfDocCrdOperator

DockerHub Image: https://hub.docker.com/r/sabujjana/pdfcrddoc-operator

--

--

Sabuj Jana
Sabuj Jana

Written by Sabuj Jana

Building software @Flipkart . ex-Amazon, Wells Fargo | Follow me for linux, k8s, go, elb, istio, cilium and other intriguing tech | https://janasabuj.github.io

No responses yet