Kickstarting your First Kubernetes Operator

Ankur Soni
Momenton
Published in
6 min readSep 6, 2021
Photo by Ian Taylor on Unsplash

Kubernetes Operators are a powerful mechanism for extending Kubernetes to perform application-specific functions. Operators can seem like magic, automating tasks that ensure Kubernetes clusters are healthy and optimal. It may seem daunting to start using Operators after all Kubernetes is complex, however, they are worth the effort to understand, use and maybe even create your own.

Read on to see how Operators can improve your Kubernetes experience and optimise the way you work with your Kubernetes cluster. This article will paint a picture of the importance of Operators, touching on: what they are and when to use them, the existing operator ecosystem both from an implementation and development lens and provide some best practices when building your own Kubernetes Operator.

Why use Kubernetes Operators?

Operators are useful when there is a need to automate Kubernetes resources on an ongoing basis. This piece of automation, once bundled as a controller code and its custom resource definition forms the Kubernetes operator.

‘Operators make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.’ [1]

- Kubernetes.io.

Operators allow an application to interact with Kubernetes API and whose behaviour is driven by its own specific manifest, namely a custom resource. Figure 1 provides an overview of how operators work from a user modifying a custom resource and the reconciliation loop that drives the operator to keep the desired state in sync with the current state.

Figure 1: Operator Concept blog.container-solutions.com

You can use Kubernetes objects such as pods, config maps, and secrets to run your application. For simple applications, this is just fine. But what if you want an overarching management layer that observes these native Kubernetes objects and changes their state based on events? The simplest example of such a need was first expressed by distributed database clusters running natively on Kubernetes:

These operators encapsulate the automation code necessary to scale out or scale in a distributed database cluster. This is without intervention to the stateful sets or Kubernetes configurations needed to achieve the same outcome for their desired state.

A more complex example is that of the Elasticsearch operator. This operator helps spin up multiple application pods, services and stateful sets that are part of its search, visualisation and logs storage stack — reflecting the power of operators in the Kubernetes world.

Head to Operator.io, a community hub for sharing operators, across numerous categories such as AI/ML, database, cloud, networking, integration and Openshift.

Figure 2 and 3 shows an example that I built, describing how to run a machine learning workflow in Kubernetes. This diagram shows a Go-based workflow running on a command line or Kubernetes, with the help of a custom operator for a quick and automated data pipeline for your machine learning projects from the aspects of MLOps.

Machine learning and similar programs need a workflow engine to orchestrate various steps for processing data, training model and evaluating model in a pipeline. Occassionally, popular CI platforms (like Jenkins) are available to run pipelines but they are more suitable for application centric build and integration pipelines. Then, there are powerful workflow engines like Apache Airflow, Argo workflow etc. that provide the workflow management with many capabilities to get the end result. My motivation to carry out a new workflow engine — Roiergasias is simply to:

1. Have a very simple declarative workflow that keeps the machine learning engineer always in the driver seat by giving him/her the opportunity to be in complete control of the environment in which the steps are running.
2. Reuse the capabilities of Kubernetes operator to be the core driver for the workflow engine thereby making it K-native.

Besides this, there is a unique selling point for this operator over others i.e. it allows the workflow to split into smaller ones that can run in multiple worker nodes.

The example given here starts with a bunch of python scripts (one for each stage — process data, train model and evaluate model) along with unprocessed source data preloaded to S3 bucket.
With a simple custom resource manifest yaml, instructions to the workflow engine are shown below:

Figure 2: Example of a Custom Resource manifest
Figure 3: Example of an Operator https://github.com/ankursoni/kubernetes-operator-roiergasias

Notice the sequence of actions begins:
1. Create config map 1 + job 1 for split workflow — “process data” on “node1”.
2. Wait for job 1 to complete.
3. Create config map 2 + job 2 for split workflow — “train model” on “node2”
4. Wait for job 2 to complete.
5. Create config map 3 + job 3 for split workflow — “evaluate model” on “node2”.
6. Wait for job 3 to complete.

You may refer to this article for a more detailed explanation.

Applying the Operator Framework

Not all application stacks require an Operator. The flowchart below provides the decision process to understand whether an application would be candidate for automating using an Operator:

Figure 4: Process for determining if you need an Operator

There are many options to go about building your first operator, namely:

Kubebuilder provides a quick tutorial and easy scaffolding for code generation of various components involved in an operator codebase such as CRD and Controller-API.

Momenton’s Best Practices

Setting up your first Kubernetes Operator can be a daunting task. From our experience, some of the below considerations and best practices should help you get started. These practices are not exhaustive and things like declarative APIs, leveraging SDKs are all considerations for Operators.

Firstly, Operators should be specific to a single application. For example, Airflow is normally used with MySQL and Redis. You could develop an operator that automates the features of an application for all three: Airflow, MySQL and Redis. However, it is better to build three operators — one for each application. It gives you the flexibility to swap out MySQL or Redis for another database. Try to break down an operator into its smallest component that still provides value. This promotes separation of concerns, domain isolation and greater flexibility for the operator.

Secondly, the reconciliation code in the operator should be stateless and only depend on the current state of Kubernetes objects provided by calling the Kubernetes API. Idempotency, meaning “the ability to apply the same operation multiple times without changing the result beyond the first try”, is crucial in the reconciliation code. Hence, the Operator should achieve this by piggy-backing on the native resources’ controllers (like config maps, secrets etc) rather than doing custom resource creations for every need of the application. This will promote maximum reuse and least custom resource management, i.e. reconciliation overhead [2, 3, 4].

Reach out to hello@momenton.com.au or send us a message on LinkedIn to understand how Momenton can help your organisation and how we can help mature delivery of your products.

References

--

--

Ankur Soni
Momenton
Writer for

Cloud DevOps Engineer | AWS Certified | Certified Kubernetes Administrator | Ex — Microsoft | Connect here: https://www.linkedin.com/in/ankursoniji/