Building a Kubernetes Operator in GO: Reconciling our PdfDoc CRD for Converting Text to PDF files
Kubernetes Operators are powerful tools that allow us to use custom resources to manage state in the cluster. They will watch the state changes to your Custom Resources (CRs), that you have defined in Custom Resource Definitions (CRDs). Any changes to your CRD will be reconciled by the operator to update the state of the cluster.
And the best part about it, we can use Golang to write these operators, design and tune them to our own desires. The possibilities are endless!
In this blog, we will focus on writing our own custom operator (using the kubebuilder framework) that will watch our own custom CRD (that we will create) and update the cluster state as required.
The entire code is available here: https://github.com/JanaSabuj/PdfDocCrdOperator
Let us first visualise what we want to achieve.
Architecture
Our k8s cluster will be pre-installed with our PdfDoc Custom Resource Definition (CRD).
A User will come and create a PdfDoc Custom Resource(CR) in our cluster, which will have a DocName and the corresponding RawText. As soon as he applies the CRD, our custom operator will kick in to reconcile the CR.
Our PdfDoc operator will pick up the PdfDoc CR spec and start a k8s Job which will spawn a Pod. The Pod will have 2 Init containers and a main container (we will explore later what they do).
Eventually, the final PDF file will be generated and saved in the shared volume.
Steps
Let’s break down the steps we will follow to build this flow :
- Scaffold the entire project using
kubebuilder. - Create a PdfDoc
CRDthat has 2 fields: DocName and RawText. - Write the
Reconcilefunction of our operator. - Make the manifests and install the CRDs in the cluster.
- Test the operator functionality by creating a sample PdfDoc CR and pull out the corresponding PDFs from the Shared Volume.
- Build the operator image, upload to DockerHub and deploy the operator in the cluster.
1. Scaffolding an operator project using kubebuilder
Have you used Spring Boot before ? If yes, then you would be familiar with how the Spring Initializr spawns a bare minimum Spring Boot project in the matter of a few clicks.
For building K8s operators, in comes the kubebuilder project. By simply applying a few commands, we will be bootstrapped with a starter k8s operator project and can start writing the operator logic directly, without worrying too much about the authentication, roles, deployments etc.
The high level of abstraction kubebuilder bootstrap provides is the reason it has become the de facto scaffolder for k8s operators.
a. Init
kubebuilder init
--domain janasabuj.github.io
--repo github.com/JanaSabuj/PdfDocCrdOperatorThis will initialise a new kubuilder project. The domain will be used to update the Kubernetes API endpoint path for our custom CRD (we will see later).
b. Creating our CRD Api Definition
Let us design the schema for the CRD we want. We will name the CRD as PdfDoc and it will contain 2 fields:
docName— the name of the PDF file we eventually wantrawText— the raw text that will eventually be converted into PDF
Now that we have the schema design in mind, let us ask kubebuilder to give us the platform for writing this api definition.
> kubebuilder create api --group customtools --version v1 --kind PdfDoc- Let us name our CRD group
customtools, with versionv1(preferably, should name it v1alpha) and kind as the name of the CRD.
- We ask the cli to create the
Resource( which is the CRD) and also set up acontroller( the functional component that will be watching for updates to our CRD)
Let us look at the file structure of our project after this.
We will tinker with very few parts of this structure to get our operator up and running. The rest will be taken care of by kubebuilder.
api/v1— will contain our CRD specs- The main controller logic will be in
internal/controller - Sample CRs will be added in
config/samples
c. Modifying the CRD specs
In the file api/v1/pdfdoc_types.go, we will update our desired CRD spec.
// PdfDocSpec defines the desired state of PdfDoc
type PdfDocSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
// Foo is an example field of PdfDoc. Edit pdfdoc_types.go to remove/update
//Foo string `json:"foo,omitempty"`
// DocName: defines the name of the PDF document
DocName string `json:"docName,omitempty"`
// RawText: defines the markdown content of the PDF document
RawText string `json:"rawText,omitempty"`
}- As a good practice, we must, in parallel, update the sample CRD spec — so that we can keep track of the changes being made to our CRD resource api specs.
- Ref: https://github.com/JanaSabuj/PdfDocCrdOperator/blob/main/config/samples/customtools_v1_pdfdoc.yaml
# customtools_v1_pdfdoc.yaml
apiVersion: customtools.janasabuj.github.io/v1
kind: PdfDoc
metadata:
labels:
app.kubernetes.io/name: pdfdoc
app.kubernetes.io/instance: pdfdoc-sample
app.kubernetes.io/part-of: pdfdoccrdoperator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: pdfdoccrdoperator
name: pdfdoc-sample
spec:
# TODO(user): Add fields here
docName: sabuj-document
rawText: |
# Shakespeare
---
> Shall I compare thee to a summer's day?
> Thou art more lovely and more temperate:
> Rough winds do shake the darling buds of May,
> And summer's lease hath all too short a date:
> Sometime too hot the eye of heaven shines,
> And often is his gold complexion dimmed;
> And every fair from fair sometime declines,
> By chance, or nature's changing course untrimmed:
> But thy eternal summer shall not fade
> Nor lose possession of that fair thou owest;
> Nor shall Death brag thou wanderest in his shade,
> When in eternal lines to time thou growest:
> So long as men can breathe or eyes can see,
> So long lives this, and this gives life to thee.
---
THE END- Once the operator is up and running in the cluster, we will be applying this CRD and checking if the desired PDF file is generated or not.
Writing the Reconciler (Crux of the Operator)
Now we know what resource we will watch in the cluster. Time to write the logic for what we will do once we capture such a resource event i.e, how will we react and update the cluster state once such a CRD is generated in the cluster.
Plan
The Reconcile function can be found here: https://github.com/JanaSabuj/PdfDocCrdOperator/blob/81eb3d8b8ca25072cb22ef96d0b50400c95a552d/internal/controller/pdfdoc_controller.go#L54
func (r *PdfDocReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
lg := log.FromContext(ctx)
lg.WithValues("PdfDoc", req.NamespacedName)
// TODO(user): your logic here
// get the PdfDoc
var pdfDoc janasabujgithubiov1.PdfDoc
if err := r.Get(ctx, req.NamespacedName, &pdfDoc); err != nil {
lg.Error(err, "Unable to fetch PdfDocument")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// create the JobSpec
jobSpec, err := r.CreateJobSpec(pdfDoc)
if err != nil {
lg.Error(err, "failed to create the desired Job Spec")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// create the Job
if err := r.Create(ctx, &jobSpec); err != nil {
lg.Error(err, "Unable to create Job")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
return ctrl.Result{}, nil
}- I have annotated with comments for better understanding.
- On getting the CRD, a Job is created which in turn emits a Pod with init containers and a main container.
Understanding the JobSpec code
The Job creation function has been abstracted out in a separate CreateJobSpec function: https://github.com/JanaSabuj/PdfDocCrdOperator/blob/81eb3d8b8ca25072cb22ef96d0b50400c95a552d/internal/controller/pdfdoc_controller.go#L89
func (r *PdfDocReconciler) CreateJobSpec(doc customtoolsv1.PdfDoc) (batchv1.Job, error) {
fmt.Println(doc.Spec.DocName, doc.Spec.RawText, "being transformed....")
encodedText := base64.StdEncoding.EncodeToString([]byte(doc.Spec.RawText))
docName := doc.Spec.DocName
// init1 - base64 encode the data and dump to volume
initContainer1 := corev1.Container{
Name: "text-to-md",
Image: "alpine",
Command: []string{"/bin/sh"},
Args: []string{
"-c",
fmt.Sprintf("echo %s | base64 -d >> /data/%s.md", encodedText, docName),
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "dumpbox",
MountPath: "/data",
},
},
}
// 2nd init container - decode the dumped text and convert to pdf
initContainer2 := corev1.Container{
Name: "md-to-pdf",
Image: "auchida/pandoc",
Command: []string{"/bin/sh"},
Args: []string{
"-c",
fmt.Sprintf("pandoc -V documentclass=ltjsarticle --pdf-engine=lualatex -o /opt/docs/%s.pdf /opt/docs/%s.md", docName, docName),
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "dumpbox",
MountPath: "/opt/docs",
},
},
}
// main container
mainContainer := corev1.Container{
Name: "mainc",
Image: "alpine",
Command: []string{"/bin/sh", "-c", "sleep 3600"},
VolumeMounts: []corev1.VolumeMount{
{
Name: "dumpbox",
MountPath: "/data",
},
},
}
// volume
volume := corev1.Volume{
Name: "dumpbox",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
}
// job
job := batchv1.Job{
ObjectMeta: metav1.ObjectMeta{
Name: doc.Spec.DocName + "-job",
Namespace: doc.Namespace,
},
Spec: batchv1.JobSpec{
Template: corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
InitContainers: []corev1.Container{initContainer1, initContainer2},
Containers: []corev1.Container{mainContainer},
Volumes: []corev1.Volume{volume},
RestartPolicy: "OnFailure",
},
},
},
}
return job, nil
}Let us visualise the Pod Flow using a diagram:
- Init container 1: The
text-to-mdinit container will take therawTextfield from the PdfDoc spec and save the file as.mdin the dumpbox shared volume. - Init container 2: As soon as the prev container finishes, the
md-to-pdfcontainer will kick in. It is running apandocimage which is used to convert the text document to pdf document. The pdf file is then stored in the dumpbox shared volume. - Main container: This container is simply sleeping for 1 hr. We will exec into this container and
scp(secure copy) the pdf file into our local desktop.
Generating the Manifest files
Now that we have the CRD and the controller in place, let us generate the appropriate the WebhookConfiguration, ClusterRole and CustomResourceDefinition objects for our controller. Kubebuilder does it for us in one command.
> make manifestsInstalling the CRD manifests
Let us install the CRD manifests into the cluster. Kubebuilder again does it for us in one command.
> make installLet us verify that the CRD was actually installed into the cluster.
Testing the operator
Let us test the functionality of this operator before we build an image out of it and deploy it inside a Pod.
- Ensure that your current terminal session is pointing to your desired Kubernetes cluster.
> go run ./cmd/main.goThis will spin up the operator in the cluster. If you see the logs, you will see that it is waiting for a custom PdfDoc CR to be generated.
Generating a custom PdfDoc CR
We will be applying the sample CRD customtools.janasabuj.github.io_pdfdocs.yaml that we had edited, at the beginning of the blog.
Let us check whether the corresponding Job and the Pods have been generated by the operator. You can also look at the operator logs for better tracking.
As you can see, the Job has been created and the Pod has already started with the Init containers. It is visible in Init:1/2 .
After sometime, on re-checking, we get
The Pod is up and running, meaning that the Init containers have finished. Only the main container is running now. The Job will complete, once the main container exits ( after 3600s).
Copying the PDF file out from the Pod
Let us copy the Pdf file out of the Pod. We know that the file will be /data/sabuj-document.pdf.
Let us open the Pdf file now.
The PDF file looks good. Hence, we can conclude that our controller is working fine !
Deploying the controller image in the cluster
First, let us build the Docker image of the controller. Kubebuilder makes it as easy as 2 commands.
- sudo make docker-build IMG=sabujjana/pdfcrddoc-operator:v1
I will be uploading the image to my public DockerHub account.
- sudo make docker-push IMG=sabujjana/pdfcrddoc-operator:v1
- The final image can be viewed here: https://hub.docker.com/r/sabujjana/pdfcrddoc-operator
Creating a release folder for the final deployment
Ref: https://github.com/JanaSabuj/PdfDocCrdOperator/tree/main/release
We will not need all the files for the deployment, now that we have the image. Let us jot down what all YAMLs we need for the controller image deployment in our cluster.
# release files needed
1. ns.yaml # for the namespace (new file)
2. role.yaml # for the operator roles (taken from the project)
3. role_binding.yaml # for the operator rolebindings (taken from the project)
4. service_account.yaml # SA for the controller (taken from the project)
5. customtools.janasabuj.github.io_pdfdocs.yaml # CRD ( taken from the project)
6. controller.yaml # for the actual controller (new file)- The files 2,3 and 4 are found in
config/rbac/path. - Just a small modification in
role.yaml, since our operator is also creating aJob, we need a RBAC for that too. This was not generated by Kubebuilder and hence we need to add it explicitly.
# role.yaml (tail)
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch- The file 5 is found in
config/crd/bases/customtools.janasabuj.github.io_pdfdocs.yaml - If you look inside the
service_account.yamlfile, you will see that the SA is created in the namespace namedsystem. So, we will create ans.yamlforsystem.
# ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: system- Similarly, let us create a
controller.yamlfor the controller image.
# controller.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pdfcrddoc-operator-deployment
namespace: system
spec:
replicas: 1
selector:
matchLabels:
app: pdfcrddoc-operator
template:
metadata:
labels:
app: pdfcrddoc-operator
spec:
containers:
- name: pdfcrddoc-operator
image: sabujjana/pdfcrddoc-operator:v1
imagePullPolicy: IfNotPresent
serviceAccountName: controller-manager- The final
releasefolder looks like this (the only folder we need from now to deploy this operator in any cluster)
Deploying the controller in the cluster
Set your cluster context appropriately and apply all the files in this release folder.
After this, you will see that the controller has been deployed as a single Pod in the system namespace.
Now, you will find that a system ns has been created and you’ll find the controller pod in the namespace.
Now, go ahead and create any new PdfDoc CR in the cluster and the operator would automatically kick in!
That is the power of K8s operators. You are now operating the cluster with your own Go code logic.
Conclusion
This was a beginner level operator that we wrote from scratch using kubebuilder. In future blogs. we will focus more on writing advanced operators.
Follow my Medium space for more such writings!
Github code : https://github.com/JanaSabuj/PdfDocCrdOperator
DockerHub Image: https://hub.docker.com/r/sabujjana/pdfcrddoc-operator
