Comparing Kubernetes Operator Pattern with alternatives

This post compares Kubernetes Operator pattern with other alternatives that may exist when building tailored workflows for your application platform on Kubernetes.

Kubernetes Operators are first-class citizens of a Kubernetes cluster. An Operator is an application-specific controller that extends Kubernetes to create, configure, and manage instances of complex applications. Several such Operators exist today. Newer Kubernetes-native softwares like Jenkins X are also relying on Kubernetes Custom Resources and the Operator pattern.

Let us explore how Kubernetes Operators compare with other alternatives using a real-life example. Recently, we have been investigating how to setup Kubernetes based application platform for WisdmLabs. They want to use Postgres as the backend database in their platform. The team needs a separate Postgres instance per customer. Each instance needs to adhere to following requirements.

  1. Initialization requirements: A Postgres instance needs to be initialized with specific databases and users.
  2. Workflow actions: It should be possible to perform following actions on an existing Postgres instance:
    - modify a user’s password
    - create new databases
    - create new users
    - delete existing users

The problem is, given there are multiple Postgres instances to be setup, the process of initializing instances and performing workflow actions on them needs to be repeatable and easy to follow.

Towards solving this problem we investigated two approaches:

  1. Baseline approach
  2. Building and using Kubernetes Postgres Operator

Here is a pictorial representation of the two approaches:

Approaches overview

Baseline approach

In the baseline approach, our goal was to investigate and use mechanisms that are already available in Kubernetes (or in the ecosystem) to solve above mentioned problem.

Three mechanisms exist for satisfying the initialization requirements, as shown in the following figure.

Initialization options
  1. Customer-specific Postgres images: This option consists of building a custom Postgres image using a customer-specific Dockerfile. Postgres supports creation of custom database, user, and password by defining appropriate environment variables. We set these variables in the Dockerfile for each customer. Advantage of this option is that it is easiest to get started with. The drawback is that a container needs to be built for each customer.
  2. PostStart container lifecycle hook: This option consists of using a PostStart container lifecycle hook to satisfy the initialization requirements. A custom script is developed that creates databases, users, etc. based on the data that is passed to it at runtime. A one-time custom Postgres container image is built with this script embedded in it. Then, for each customer, the required data (database names, usernames, etc.) is generated and provided as input to the script through environment variables. Advantage of this approach is that only one custom Postgres image needs to be built which can be used for all the customers.
  3. Postgres Helm chart with post-install hook: This option consists of using the Postgres Helm chart and enhancing it with post-install hook. The hook will work similar to the container lifecycle hook mentioned above. Main complexity in this approach is learning and using Helm correctly.

Once a Postgres instance has been created and initialized, we have to resort to out-of-band automation (i.e. from outside of Kubernetes cluster) for performing required workflow actions. At high-level the automation consists of — keeping track of which Postgres Deployment/StatefulState belongs to which customer, and performing required workflow actions using the corresponding Service’s public IP when needed.

Kubernetes Postgres Operator

In this approach we built a Kubernetes Postgres Operator. It creates Postgres instances, initializes them, and performs workflow actions on already provisioned instances whenever requested. The Operator defines a Custom Resource for Postgres and uses a diff-based implementation to reconcile existing state of an instance with the desired state. We have followed REST PUT semantics when implementing updates. We considered different Patch options (e.g.: JSON Patch and Strategic Merge Patch) but decided to go with PUT semantics since they are well-known and straightforward to understand.

Using this Operator has following advantages over the baseline approach.

(a) Declarative inputs— Required databases and users are specified declaratively in the Spec of the Custom Resource Definition (CRD).
(b) Declarative updates — Performing workflow actions on an existing Postgres instance is straightforward. Updating an instance amounts to updating the required declarative attributes in the CRD YAML with new data and then re-applying the CRD using kubectl. The Operator’s diff-based reconciliation logic ensures that only required changes are made to a Postgres instance.
(c) No out-of-band custom automation needed — All the workflow actions are embedded in the Operator code. The Operator monitors create/update events for the CRD and performs the required actions.
(d) Kubernetes-native — All the database setup and modification actions are done using ‘kubectl’. There is no need to use any other CLI.

Details of our experimentation in which we have implemented some of the above mentioned options, is available here.

Teams often complain that writing Kubernetes operators is complicated compared to other alternatives. This should start changing with projects such as Kubebuilder and Operator SDK which aim to lower the barrier of writing Kubernetes Operators.

In summary our recommendation is — if you need to perform any sort of initialization and workflow actions on platform elements in your Kubernetes cluster, consider embedding them in a Kubernetes Operator. That will make the process simple, repeatable, and easy to follow.