Writing Kubernetes Operator using Operator SDK

Shubhomoy Biswas
Jan 11 · 8 min read

Kubernetes (K8s) operators are a great way to deploy and manage your Kubernetes application.

Operator is basically a construct. In a cloud native environment, anything that can package, deploy and manage your application in the cloud becomes an Operator.

Being a developer, it is upto us whether we want to be the operator or let software handle it.

In Kubernetes, we can leverage operators to extend, add and manage Kubernetes specific functionalities and automate administrative tasks as if working with a native K8s component.

There are many useful operators open-sourced by various communities which achieve specific tasks, like for example, CoreOS have released Prometheus operator for your cluster monitoring, Etcd operator for managing etcd database cluster in K8s and many more.

For me, it took time to understand how to write operator from scratch and after digging through the user guide and going through existing operators, I finally was able to write a simple operator which solves the purpose of log management in our cluster. So here I’ve put down the basic concept of writing a simple operator using CoreOS’s Operator-SDK.

Once our logging operator is deployed, it’ll setup and manage EFK stack in K8s cluster automatically, i.e it will be able to deploy
Elasticsearch, FluentD and Kibana and also manage them to stay updated in case of any configuration changes. Github link here.

Our architecture looks like this.

Operator SDK released by CoreOS is a brilliant tool for writing your own operator from scratch. It provides necessary boilerplate code and a high level abstract to communicate with the Kubernetes API. This makes you focus on writing code for specific tasks at hand.

Writing a custom operator using this SDK have the following steps:

  1. Design your own Custom Resource yaml file that will be read by the operator.


Install the Operator SDK first.

$ mkdir -p $GOPATH/src/github.com/operator-framework
$ cd $GOPATH/src/github.com/operator-framework
$ git clone https://github.com/operator-framework/operator-sdk
$ cd operator-sdk
$ git checkout master
$ make dep
$ make install

Once this is done, we will create a separate project directory and initialise a new operator.

$ mkdir -p $GOPATH/src/github.com/log_management/
$ cd $GOPATH/src/github.com/log_management/
$ operator-sdk new logging-operator --cluster-scoped
$ cd logging-operator

Note that we have added --cluster-scope flag to our new logging-operator. That is because the operator should be able to watch and manage resources cluster-wide, and not get restricted to the resources residing in the same namespace in which it will be deployed on.

LogManagement — A new kind of K8s resource!

Since our logging operator should be able to deploy and manage the above tools, we need a custom K8s resource for defining our architecture. For this, we use a native K8s component called — CustomResourceDefinition which will register the new kind of resource we will later define and also specify the custom API group to which it will belong.

Run the following to create the same.

$ operator-sdk add api --api-version=logging.example.com/v1alpha1 --kind=LogManagement

This will create a new CustomResourceDefinition of kind LogManagement with API version of logging.example.com and add the necessary Go code for extracting your custom resource from yaml to LogManagementSpec object.

I’ve modified logging_v1alpha1_logmanagement_cr.yaml according to the architecture we’ll follow. You can explore and add your own kind of design here.

Next, we just need to update logmanagement_type.go under pkg/apis/ to extract our custom resource yaml and create the LogManagementSpec object definition to be used throughout the operator design.

This is quite simple. Our main object definition is LogManagementSpec which have different member types like Output, Watch, ElasticSearch, etc. LogManagementSpec defines what attributes in the yaml corresponds to which object member. Modify this in case your yaml design is different than this.

Make sure to run the below command whenever you make changes to your Custom Resource structure.

$ operator-sdk generate k8s

This will update the underlying codebase for the operator-sdk to compile the new version.

The second thing we need to take care is the Manager which gets initialised when the main program of the operator runs. This is located in cmd/manager/main.go

The Manager will automatically register the scheme for all custom resources defined under pkg/apis/... and run all controllers under pkg/controller/...

The Manager restricts the namespace that all controllers will watch for resources and by default this will be the namespace that the operator is running in. But here, we need to watch all namespaces, so we leave the namespace option empty.

mgr, err := manager.New(cfg, manager.Options{Namespace: ""})

It’s all about “watching” others

With all the specs set, we need to add something that will watch our LogManagement resource and perform reconciliation steps if resources used for log management changes. This is done by the controller. To add a new controller, run this

$ operator-sdk add controller --api-version=logging.example.com/v1alpha1 --kind=LogManagement

This will add the logmanagement_controller.go where we’ll write our logic for watching other resources and the reconciliation steps.

Inside logmanagement_controller.go, we first specify what all resources to watch for under the add method. First we watch for our primary resource LogManagement for any add/delete/update operation. We also need to add watch for these types of resources for the purpose of log management.

  1. Deployment

Here is a gist of the add function

You can add other resources if your operator need to watch for those too.

Change detected! Reconcile

Every controller have a Reconciler object with the Reconcile() method which is responsible for running the reconcile loop. The reconcile loop will run if there are any changes in the above registered resources detected by the operator.

The Reconcile() method returns a reconcile.Result and an error object type. Based on these, the request can be requeued and the Reconcile loop can be run again.

There are 3 ways to do it.

  1. If the reconcile is successful and we don’t want to requeue — return reconcile.Result{}, nil

Automate everything

In this example, we want to do the following things

  1. Create the resource templates we want (FluentBit, FluentD, ElasticSearch, etc) in Go (yes, not in yaml files anymore!)

Creating resource template

This is relatively similar to the code we wrote for our LogManagementSpec object. We check the format of any existing yaml of the resource and map that into Go code. For example, here is a simple deployment for ElasticSearch

If you find it difficult to understand the corresponding Go structure of the yaml, just let your IDE dig into the operator SDK code to find that for you.

For example, for the pod template part, you can see that the structure is under the core package k8s.io/api/core/v1 in PodTemplateSpec struct.

Create resource if not exist

Once your resource templates are created, get an empty reference of each resource of the same resource type. This reference will hold the object in case it already exists. If not, then we create it. Here is an example gist

The steps we follow:

  1. We get the ES deployment template in esDeployment.

Handle ConfigMap Update

This is quite simple, just check if the existing config is same as the desired config map using DeepEqual method. If not, then update the config map resource.

If the resource needs to be restarted, just delete it and requeue the request to run the reconciliation loop again. This will create the resource again (provided you’ve written that resource creation code above).

Here is an example of updating the FluentBit configMap

When you’ve created the resource templates and added the reconciliation logic, it’s time to run the operator.


Before running our operator, we first need to tell Kubernetes of our newly Custom Resource type our operator will have. So deploy the CustomResourceDefinition auto-generated by the SDK.

$ kubectl create -f deploy/crds/cache_v1alpha1_memcached_crd.yaml

For debugging purpose, we can run the operator as Go program outside our cluster. This will save us from creating docker image every time we re-compile our code.

$ operator-sdk up local

This will run the operator code as a Go program outside your cluster and provide you with the log output. Note that this will point to your K8s cluster defined in your ~/.kube/config.

Once the operator is running, we can apply our Custom Resource yaml file — logging_v1alpha1_logmanagement_cr.yaml.

$ kubectl apply -f deploy/crds/logging_v1alpha1_logmanagement_cr.yaml

Once deployed, check the operator logs. In our case, the operator deploys all the necessary tools and configures them to show container logs in our Kibana!

Since we’ve added the logic to watch for ConfigMap changes, any change under parser, or watch sections in logging_v1alpha1_logmanagement_cr.yaml file, are picked up by the operator and it updates the ConfigMaps of corresponding components (FluentBit or FluentD) and restarts the deployments automatically.

Please note that the operator SDK I’ve used is v0.2.0 which is in alpha version. There might be changes in later versions of this SDK but the core concept remains the same.

Hope this helps you to get started in writing your own K8s operators. Feedbacks are welcomed :)

Go Cloud Native!