Analysis of Open source Kubernetes Operators

In this post we present analysis of open source Kubernetes Operators available on GitHub. Operators have now become mainstream in Kubernetes world. Over 400 GitHub repositories show up as a result of the search term “kubernetes operators” on GitHub. At CloudARK, we provide Operator aggregation services to enterprises adopting Kubernetes that allows them to get their custom platforms on base Kubernetes clusters. Once multiple Operators are installed in a cluster, you can then create application platform stacks declaratively by leveraging Custom Resources introduced by various Operators. This is essentially the ‘Platform-as-Code’ model of creating platform stacks on Kubernetes that are composable, repeatable, and shareable.

We performed this Operator analysis targeting following questions:

  1. Which softwares are supported to be managed as Kubernetes Operators?
  2. How many Operators are packaged as Helm charts?

3. How many Operators are registering Custom Resource Definitions as YAML manifests vs. how many are registering them programmatically through Operator code?

4. How many Operators are correctly setting Owner References?

5. How many Operators are using kube-openapi annotations?

6. How many Operators are defining Custom Resource validation rules?

7. Which programming languages are being used to develop Operators?

8. Which tools are being used to develop Operators?

Out of 400+ repositories for this analysis we selected 102 repositories that have ten or more stars . Here are the results of the analysis:

1) Operators are being created for different kinds of softwares. From 102 repositories, here are predominant softwares for which we see Operators coming up. In parenthesis we have included the number of Operators that exist today for that software.

Mongodb (4), AWS management (4), Redis (3), Jenkins (3), Spark (2), Airflow (2), Vault (2), Postgres (2), Mysql (2), Kafka (2), Prometheus (2), Memcached (2), GCP management (2), Azure management (2), Cert management (2), Cassandra (1), Istio (1), Elasticsearch (1), Fluentd (1), Jaeger (1)

2) Languages used for writing Operators:

  • Java: 5
  • Python: 3
  • Go: 75

3) For 75 repositories of Operators written in Go:

  • Helm charts defined: 37
  • Helm charts not defined: 38
  • CRD defined as YAML manifests/ in Helm chart: 52
  • CRD registered in Code: 19
  • CRD defined in YAML and also registered in Code: 4
  • CRD validation defined in YAML manifests / in Helm: 26
  • Kube-openapi annotations on type definitions: 29
  • Owner references set on Custom Resources: 54

4) Tools used for creating Operators:

  • Operator SDK: 21
  • Kubebuilder: 8

For remaining we didn’t find any apparent tool being used for creating the Operator.

Key Observations:

There is significant variability in how Kubernetes Operators are being written today.

  1. Helm charts are defined for only half the Operator repos.
  2. 75% Operators register their CRDs as YAML/Helm definitions, but 25% Operators still register CRDs only in code making evolution and maintenance of such CRDs more difficult. Also of those Operators that register CRDs as YAML/Helm definitions, only 50% include CRD validation in YAML manifests. This means for remaining Operators, Custom Resource creation process does not have any Spec level validation rules defined. This can lead to error prone Custom Resource registration and handling process.
  3. Only 38% Operators have kube-openapi annotations defined on Custom Resource type definitions. This means for such Operators it will be difficult to find out information about Custom Resources through ‘kubectl explain’. kube-openapi annotations are essential for making Custom Resources work with ‘kubectl explain’.
  4. 72% Operators have owner references set. This means for remaining 28% of Operators, Custom Resource garbage collection and clean up may not happen as intended. Moreover, for such Operators users will struggle to find the dependencies of Custom Resources and native resources during tracing or debugging.
  5. We found at least 3 Operators that did not define any Custom Resource. They worked with Kubernetes’s native resources.

Note about the analysis:

This is the base list of all the operators: https://github.com/cloud-ark/kubeplus/blob/master/operator-analysis/operator-repos.txt

This was generated by searching for “kubernetes+operators” using the Github API. While direct search for this term on Github shows 400+ repos, through the API we got 379 results. From these we selected the repos that have 10 or more stars. In the final analysis we also included some repos that we knew are for Operators but which did not show up in above list of 379. The actual analysis was done using combination of automation and manual checks. Manual checks were performed for verification of the automation output and to accommodate special cases such as Helm charts being present in other places, differences in how owner references are set, etc., which were not easily captured through automation.

Conclusion:

There is ample scope for bringing consistency in how Operators are developed, especially for environments where platforms will be built assembling more than one Operator together. If you are looking to develop a new Operator, we would encourage you to follow the guidelines that we have developed for Operator development based on our experience of delivering platforms based on Kubernetes Operators.

If you liked this analysis, you might be interested in learning more about Platform-as-Code approach of assembling Kubernetes Platforms using one or more Operators. Sign-up for getting your free copy of Platform-as-Code eBook here.

www.cloudark.io