Building security into your Kubernetes operators

Published in

IBM Cloud

7 min readApr 15, 2021

Authors: Srinivas Pothuraju, Yanni Zhang

Kubernetes has evolved into the dominant container orchestration platform. It is known as the “Linux of the cloud” for a reason, and continues to gain market share today. Kubernetes is open source, highly scalable and available, flexible enough to run on various cloud infrastructures, automatable, and most significantly, configurable and accessible on an on-demand basis. For example, you can deploy an application, increase its capacity on demand, upgrade to the next version seamlessly, gain insights into health and performance, and backup, restore, and reconfigure with ease. These capabilities can all be accomplished by using Kubernetes operators.

Operators are an extension of the Kubernetes API server. Operators automate the management of applications or service lifecycles on behalf of a human operator. More companies are moving toward building and using Kubernetes operators in production environments. While the full potential of operators hasn’t been reached, it’s important to not lose sight of the importance of building in security as early as possible in the creation process. As an extension of Kubernetes, operators might require more privileges to perform their tasks than other microservices for applications. While the gains of switching to operators are great, from a security perspective, it means that there’s more potential risk if compromised.

IBM Cloud Pak foundational services went through the journey of converting all of their common service helm charts into optimized operators. The purpose of this blog is to share what IBM Cloud Pak foundational services did to ensure the security of their operators.

Minimize cluster-scope and namespace-scope permissions

You can classify an operator as either namespace-scoped or cluster-scoped based on where you want the operator to operate. A namespace-scoped operator watches and manages resources within a namespace. A cluster-scoped operator watches and manages resources across multiple or all namespaces within a cluster.

Depending on how you classify your operator, operators require permissions at the namespace or cluster level to perform certain operations or access Kubernetes resources. These permissions can be granted by creating role bindings and cluster role bindings that bind the operator’s service account to the required roles and cluster roles. You must carefully assess and understand what access is required for the operation on certain API groups, and limit the required permissions to that access level in the role and the cluster role that is associated with the operator’s service account. In other words, permissions should be as restrictive as possible.

Avoid wildcards

In Kubernetes, you can use the wildcard character (*) in role or cluster role definitions. A role or cluster role includes a list of rules. Each rule is a set of API groups, resources, and verbs. The rule binds together resources in the API groups and actions. If the wildcard character (*) is used under resources and verbs (as shown in the following example), it means that we allow every possible operation to be performed on all of the resources under the API group:

rules:
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - *
  verbs:
  - *

A rule definition like the preceding example in the role or cluster role would not cause any problem with how the operator functions. The problem is that it’s too permissive and might be dangerous. You might unknowingly provide privileged access to resources that you didn’t intend on providing per your requirements. Instead of using the wildcard character, the best practice is to explicitly list out each API group, verb, and resource as shown in the following rule definition:

rules:
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete

Reduce the usage of cluster level permissions

In general, cluster-scoped operators require access to resources across the entire cluster and need cluster level permissions. This permission is obtained by using cluster roles and cluster role bindings. Namespace-scoped operators require access only to resources in a single namespace and these permissions can be obtained by using roles and role bindings. You might come across situations where a namespace-scoped operator must create a cluster-scoped Kubernetes resource, for example, a CustomResourceDefinition (CRD) which is a cluster-scoped Kubernetes resource. Whenever there is a cluster-scoped Kubernetes resource involved, even a namespace-scoped operator requires cluster level permissions. Cluster roles and cluster role bindings have to be used instead of roles and role bindings.

Using cluster level permissions is often not the preferred way of handling permissions because it gives the operator’s service account access to the entire cluster. This might allow the operator permission to do things that it is not intended to do. You need to evaluate whether certain operations can be changed to avoid the dependency on cluster level permissions. For example, Jetstack’s Cert-Manager can automate the management and issuance of TLS certificates. It has three Kubernetes custom resource types: certificate, issuer, and cluster issuer. When the operator uses issuer instead of cluster issuer to sign a certificate, it avoids requiring the cluster permission.

Also, if there are static cluster-scoped resources whose definition won’t change based on the inputs given to the operators, you can move the creation of those resources to the Operator Lifecycle Manager (OLM) catalog. For example, you can move CRD creation from your operator to OLM since it doesn’t change throughout the operator’s lifecycle. IBM Cloud Pak foundational services did this exactly for the IBM IAM operator. Since OLM has cluster administrator privileges, it can deploy the cluster-scoped resource and no additional permissions have to be given to the operator for its creation.

Document all cluster permissions for your customer

In spite of every effort, it’s sometimes unavoidable to give operators cluster level permissions. For example, some operators need to create customized cluster-scoped resources such as cluster role bindings or cluster roles depending on the use case. In that case, make sure that all of the cluster level permissions are properly documented and consumers of the product are aware of the permissions that you’re assigning.

Pod security policies and security context constraints

Containers are not totally isolated from their hosts. A privileged process running as root inside the container is similar to a privileged process running on the host itself. They can affect the host and other containers running on the same host if the process running inside the container has vulnerable code. Similarly, the usage of host path volumes allows files on the host node to be accessible from the container. If a container is compromised, the attacker can easily gain access to the host node and attack the host and other containers running on the host. To solve this problem, Kubernetes provides security context for containers to control the privileges of the processes running inside it. Kubernetes also provides pod security policies that enable the administrator to configure policies to enforce security on every container running on the cluster.

Security context and pod security policies are useful tools to secure any container. Since the operator process also runs as a container in a Kubernetes cluster, you can leverage the same concepts to enhance the security of your operator container too.

Security context

Security context defines the pod or container’s privileges and access control settings. Adding the following security context to a container ensures that the process inside the container doesn’t run as root or in privileged mode. It also ensures that the root file system of the container is read only. It’s important for the operator container to use such restrictive security context:

securityContext:
  allowPrivilegeEscalation: false
  privileged: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true

For more information about security context, see Configure a Security Context for a Pod or Container in the Kubernetes documentation.

Pod security policies

Policies can be set centrally in the cluster by the administrators using pod security policies to prevent running pods/containers with high privileges and host access. Essentially, a Pod Security Policy is a cluster-scoped resource that define a set of conditions that a pod must run with in order to be accepted into the system. For more information about the conditions that can be set, see What is a Pod Security Policy? in the Kubernetes documentation.

To leverage pod security policies, you have to enable the admission controller in the Kubernetes API server. For more information about enabling the controller, see How do I turn on an admission controller in the Kubernetes documentation. After enabling the admission controller, you can create a restrictive pod security policy as shown in the following example, and all of the authenticated users and service accounts must be provided with use access to that pod security policy as demonstrated in Enabling pod security policies via RBAC. Providing use access ensures that the admission controller doesn’t permit any pod access if it doesn’t satisfy the conditions that you set in the pod security policy.

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: example
spec:
  privileged: false  # Don't allow privileged pods!
  allowedHostPaths: false # Don't allow pods using host paths
  hostNetwork: false # Don't allow pods running with hostnetwork
  hostPort: false # Dont' allow pods running with hostport

Some distributions of Kubernetes may not have pod security policies. If they don’t, they do have something similar to it. For example, the OpenShift Container Platform has Security Context Constraints (SCCs) which serve a very similar function. For more information about SCCs, see Managing security context constraints in the OpenShift Container Platform documentation.

Continuous security scans

Finally, it’s extremely important to scan your source code and perform penetration testing from time to time. Continuously scanning helps to identify vulnerabilities and pick up the latest security bug fixes in Go, Kubernetes, and the operator container’s base image. Being diligent in keeping your environment up-to-date pays off by mitigating the risk of vulnerabilities.

For more information about what you can do with foundational services, see IBM Cloud Pak foundational services in IBM Docs.