Comparing Kubernetes Operators with in-house scripts to build Platform automation
With the wide adoption of Kubernetes, enterprise challenges have progressed from deployment / configuration of Kubernetes cluster to building and managing desired platform stacks on Kubernetes. Teams adopting Kubernetes typically start with exploring some third-party Platform-as-a-Service (PaaS) solution built on Kubernetes, but soon realize that the choice of platform elements done by the PaaS provider is not optimal for their needs. They then move to the alternative of DIY approach for building their platform stack. We have seen Kubernetes adopters creating number of snowflake scripts (using bash, Python, etc.) to create and manage their platform stacks on Kubernetes. This DIY approach of building platform automation with in-house scripts has significant development as well as maintenance cost.
Kubernetes Custom Resource Definitions (CRDs), popularly known as Operators, can act as a great replacement for lot of in-house scripts. The platform stacks built with one or more Operators not only prove to be efficient from the perspective of investment in platform engineering but also offer inherent design benefits over in-house script based automation. Read on to understand why CRDs/Operators are better than in-house scripts that are external to Kubernetes when building platform layer on Kubernetes.
What is a Kubernetes CRD / Operator?
A Kubernetes Custom Resource Definition (CRD) / Operator consists of a new Type (Kind) and a backing controller. The controller is written to handle typical CRUD operations on instances of that Type (Kind). Once the custom Type definition and the controller is added to the cluster the new Type becomes a first-class citizen of that cluster. You can then use Kubernetes mechanisms such as ‘kubectl’, Service Accounts, RBAC, helm tooling, etc. with the new Type. As an example, you can define a Type called ‘Postgres’ and the corresponding controller can be written to perform following workflow actions — provision a postgres database container, update it by adding/removing users, delete it, etc. The way you install a CRD/Operator in your cluster is by creating a container for it and running a Pod/Deployment with that container. There are more than 400+ Open source Operators on Github today. You can find an Operator for databases (MySQL, Postgres, etc.), messaging and streaming systems (Redis, Kafka), AI/ML systems (KubeFlow, Spark), SSL certificate management, Volume backup/restore, Cloud resource provisioning, etc.
Why CRDs/Operators are better than writing ad-hoc scripts for platform automation?
Here are five reasons why you should consider using Kubernetes CRDs/Operators when building your platform stack.
1. State maintenance:
Any platform automation script that is external to Kubernetes will typically need some sort of database to maintain state about the actions performed, their statues, etc. When the platform automation exists outside of Kubernetes, it is your responsibility to provision and maintain the database for this state management. If you are using Kubernetes CRD/Operator, then the state maintenance mechanism comes for free. A CRD/Operator leverages the etcd store on the Kubernetes cluster itself for its state storage.
You will typically need some configurability/control for your platform automation. The standard mechanisms used for this are, command-line flags or configuration files. These mechanisms are hard to manage and build further automation with. With CRD/Operator, the Spec definition of a Type is the mechanism through which you would provide various configuration inputs. Moreover, the Spec definitions are declarative making it straightforward to define the end-goal of the automation rather than ‘how’ to get there.
3. Discovery of supported functionality:
For typical script based platform automation, the way to discover its capabilities is through varying approaches such as — ‘help’ commands, internal shared documentation, source code comments, or even tribal knowledge. With CRDs/Operators, you can easily find out information about Spec properties of Custom Types using the OpenAPI Spec definitions that are generated by Kubernetes API Server. Moreover, you can leverage tools like KubePlus that help with discovering static and dynamic information about your Custom Types.
4. Automation robustness:
For platform automation that is external to Kubernetes you will need to add extra tooling for making the automation robust in terms failures and restart. With CRD/Operator, Kubernetes itself ensures recovery of the CRD/Operator Pod if it fails for some reason.
5. Minimize cost with reuse:
Platform automation that is developed in-house is hard to share with other Kubernetes adopters for reuse. With CRDs/Operators, the Spec definition essentially provides a contract around which your custom controller can evolve. This also makes it possible to re-use existing community CRDs/Operators where applicable.
At CloudARK, we are seeing our customers choose more than one Operator in building their platform stacks. We recently analyzed about 100 of the available Open source Operators. When delivering purpose-built platform stacks built on Kubernetes to our customers, our strategy is to use existing community Operators whenever they seem appropriate for a target use-case. If existing Operators do not satisfy all customer platform requirements, we augment the stack by developing new Operators or enhancing the existing ones. When using multiple Operators together, the main challenges arise with regards to their installation, interoperability, consistent discovery and usage. Towards this we have developed Guidelines that can be used during Operator development. We are also building a Platform toolkit — KubePlus — which simplifies discovery and use of Operators installed in a cluster.
Building platform stacks using multiple operators essentially enables a ‘Platform as Code’ approach towards platform assembly and management. Sign-up to get a free copy of our eBook to know more about the Platform-as-Code approach.