Using K8s Admission Controllers to Detect Container Drift at Runtime

Sai Diliyaer
Box Tech Blog
Published in
6 min readApr 9, 2021
Illustration by Munire Aireti

At Box, we use Kubernetes (K8s) to manage hundreds of micro-services that enable Box to stream data at a petabyte scale. When it comes to the deployment process, we run kube-applier as part of the GitOps workflows with declarative configuration and automated deployment. Developers declare their service configs (including K8s manifest, docker images, etc.) into a Git repository that requires code reviews and automatic checks to pass, before any changes can get merged and applied inside our K8s clusters. With kubectl exec and other similar commands, however, developers are able to directly interact with running containers and potentially alter them from their deployed state. This interaction could subvert the change control and code review processes that are enforced in our CI/CD pipeline. Further, it allows such containers to continue receiving traffic long term in production. To solve this problem, we leveraged K8s admission controllers and kubectl plugins, which work together in detecting and terminating any potentially mutated containers, as well as revealing the events to service owners for better visibility.

Admission controllers for handling interactive kubectl command

We know that once a request is sent to K8s, it needs to be authenticated and authorized by the API server to proceed. Additionally, K8s has a separate layer of protection called admission controllers, which can intercept the request before an object is persisted in etcd. There are various predefined admission controls compiled into the API server binary (e.g. ResourceQuota to enforce hard resource usage limits per namespace). Besides, there are two dynamic admission controls named MutatingAdmissionWebhook and ValidatingAdmissionWebhook, used for mutating or validating K8s requests respectively. The latter is what we adopted to detect potentially mutated containers. The whole process can be divided into three steps as explained in detail below.

1. Receive the admission request of interactive kubectl command
We need to first define qualified admission webhooks to receive specific requests in our admission controller service. In the interest of kubectl exec, for example, we configure the webhook’s rules with resources being “pods/exec” and operations as “CONNECT”. They tell the K8s API server that all “exec” requests should be subject to our admission controller service. We also specify a “url” which gives the location of our service and “caBundle” to provide its certificate, both under the “clientConfig” stanza. Here is a short example of what our ValidatingWebhookConfiguration object looks like:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: "example-webhook"
webhooks:
- name: "admit-pod-exec-webhook"
rules:
- apiGroups: ["*"]
apiVersions: ["*"]
operations: ["CONNECT"]
resources: ["pods/exec"]
clientConfig:
url: "https://example:9443/admit-pod-exec"
caBundle: "<PEM encoded CA bundle>"

2. Identify the pod with potentially mutated container
Once a request of kubectl exec comes to our admission controller service, it identifies the target pod by adding a customized K8s label. In this way, we can not only query all the pods with potentially mutated containers, but also enable the service to retrieve previously identified pods in case it gets restarted. The service cannot directly label the pod and return the updated object in its admission response. This is because the object given in the admission request is of type “PodExecOptions”, a child object of type “Pod”. As a result, there is a separate process in our admission controller service that patches the label by a Kube client connecting to the cluster. The service always returns an admission response as allowed after passing that pod reference to its Kube client, who is responsible for labeling and sending related K8s events to the target pod. Developers can check whether their pods are affected simply by running a kubectl describe command:

$ kubectl describe pod test-pod
...
Events:
Type Reason Age From
Message
---- ------ ---- ----
-------
Warning PodInteraction 5s admission-controller-service
Pod was interacted with 'kubectl exec' command by a user (username) initially at time 2021-04-01 10:00:00 -0800 PST
Warning PodInteraction 5s admission-controller-service
Pod will be evicted at time 2021-04-01 12:00:00 -0800 PST due to running potentially mutated containers (in about 2h0m0s)

3. Evict the target pod after a predefined period
As you can see in the above event message, the affected pod is not evicted immediately. At times, developers might have to get into their running containers necessarily for debugging some live issues. Therefore, we define a TTL of affected pods based on the environment of clusters they are running. In particular, we allow a longer time in our dev clusters as it is more common to run kubectl exec or other interactive commands there. While in prod clusters, there is a more limited time specified to avoid the pods with potentially mutated containers serving traffic long term. The controller service internally sets a timer to an affected pod according to the given TTL duration. Once the timer is up, it will use the K8s Eviction API to restart the pod. The Eviction API ensures service availability since it respects the PodDisruptionBudget (PDB) corresponding to the Deployment object. If a user has defined X number of pods as critical in their PDB object, the eviction process, requested by our admission controller service, will not continue when the Deployment has fewer than X pods running.

Here comes a sequence diagram of the entire workflow mentioned above:

New kubectl plugin for a better user experience

Our admission controller service works great for solving the container drift issue we had on the platform. It also submits related K8s events to the target pod that has been affected. However, as K8s events get away quickly (one hour of persistency by default configuration), we need to provide other mechanisms for developers to get their pod interaction activity. A kubectl plugin is a perfect choice for us to expose this information. We named our plugin “kubectl-pi” (short for pod-interaction) and provided two subcommands: “get” and “extend”. When the “get” command is called, the plugin simply checks the label attached by our admission controller service and transfers it to human-readable info. Here is an example output of the kubectl pi get command:

$ kubectl pi get test-podPOD_NAME   INTERACTOR   POD_TTL   EXTENSION   EXTENSION_REQUESTER EVICTION_TIME
test-pod username-1 2h0m0s / /
2021–04–01 12:00:00 -0800 PST

The plugin can also be used to extend the current eviction time of a pod, in case developers need more buffer to debug ongoing issues after exec’ing into their containers. This is achieved by the kubectl pi extend command, where the plugin patches relevant K8s annotation to the given pod. This annotation includes the duration and username who made the extension request for transparency (displayed in the table returned from “get” command). There is another webhook defined in our admission controller service that admits this annotation request and resets the eviction timer of the target pod accordingly. An example of requesting the extension from the developer side would be:

$ kubectl pi extend test-pod --duration=30mSuccessfully extended the eviction time of pod/test-pod with a duration=30m
$ kubectl pi get test-pod
POD_NAME INTERACTOR POD_TTL EXTENSION EXTENSION_REQUESTER EVICTION_TIME
test-pod username-1 2h0m0s 30m username-2
2021–04–01 12:30:00 -0800 PST

Future improvement

Although our admission controller service works great in handling any interactive requests to a pod, it could as well evict the pod while the actual commands in these requests are no-op. For example, developers sometimes run kubectl exec merely to check their service logs stored on hosts. Nevertheless, the target pods would still get bounced despite the state of any containers not changing at all. One of the improvements here could be adding the ability to distinguish commands passed in kubectl exec, so that no-op commands should not force a pod eviction by our admission controller service. However, this becomes challenging when developers get a shell to a running container and execute commands inside the shell, which will not be visible to the admission controller service anymore.

Summary

With the power of admission controllers, we are able to secure our K8s clusters by detecting the pods with potentially mutated containers at runtime, and later evicting them without affecting service availability. We also utilize kubectl plugins to provide flexibility of the eviction time and hence, bringing a better and more self-independent experience to service owners. We are planning to open-source this admission controller service together with its kubectl plugin in the near future. Stay tuned and feel free to follow us on GitHub for any upcoming updates!

Special thanks to Ayush Sobti and Ethan Goldblum for their technical guidance on this project. If you are interested in joining us, please check out the open opportunities at Box.

--

--