Kubernetes Security — Pod Security Standards using Kyverno

11 min readFeb 24, 2022

The Pod Security Standards define three different policies to broadly cover the security spectrum. These policies are cumulative and range from highly-permissive to highly-restrictive.

Unfortunately, there are currently two implementations in Kubernetes that implement this, one is being deprecated (Pod Security Policies) and the other one is still in beta (Pod Security Admission).

In this story, I’m going to show how to implement Pod Security Standards with Kyverno, a policy engine for Kubernetes that can be used to describe policies and validate resource requests against those policies.

I will use the node shell attack as an example and show how we can defend against this attack using Kyverno by going through:

Create a local Kubernetes cluster
Perform node shell attack on it
Deploy Kyverno and Kyverno policies
Attempt to run a node shell attack again (and succeed)
Harden Kyverno setup
Attempt to run a node shell attack again (and fail)

Create a local Kubernetes cluster

Let’s create a simple Kubernetes cluster with Kind by running the script below:

kind create cluster --image "kindest/node:v1.23.3" --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: control-plane
  - role: control-plane
  - role: worker
  - role: worker
  - role: worker
EOF

This will give us a 3 master and 3 worker nodes cluster.

Perform node shell attack

With the cluster created above, we going to run a node shell attack using kubectl node-shell plugin.

Basically, the node shell attack allows an attacker to get a shell as root on a node of the cluster by starting a privileged pod with access to host namespaces (hostPID, hostIPC and hostNetwork).

Once the attacker has access to a shell on the node, it can corrupt the file system, retrieve sensitive informations, start/stop processes, and so on…

Let’s pick up a node from our cluster (kind-control-plane for example should be a good candidate):

$ kubectl get nodekind-control-plane    Ready    control-plane,master   13m   v1.23.3
kind-control-plane2   Ready    control-plane,master   13m   v1.23.3
kind-control-plane3   Ready    control-plane,master   12m   v1.23.3
kind-worker           Ready    <none>                 12m   v1.23.3
kind-worker2          Ready    <none>                 12m   v1.23.3
kind-worker3          Ready    <none>                 12m   v1.23.3

Without any kind of security policy, running kubectl node-shell kind-control-plane should bring us a shell on the master node:

$ kubectl node-shell kind-control-planespawning "nsenter-8ze5mw" on "kind-control-plane"
If you don't see a command prompt, try pressing enter.
root@kind-control-plane:/# ls -la
total 60
drwxr-xr-x   1 root root 4096 Feb 24 16:31 .
drwxr-xr-x   1 root root 4096 Feb 24 16:31 ..
-rwxr-xr-x   1 root root    0 Feb 24 16:31 .dockerenv
lrwxrwxrwx   1 root root    7 Nov  2 20:43 bin -> usr/bin
drwxr-xr-x   2 root root 4096 Oct 11 08:39 boot
drwxr-xr-x  17 root root 4440 Feb 24 16:31 dev
drwxr-xr-x   1 root root 4096 Feb 24 16:31 etc
drwxr-xr-x   2 root root 4096 Oct 11 08:39 home
drwxr-xr-x   1 root root 4096 Feb 24 16:31 kind
lrwxrwxrwx   1 root root    7 Nov  2 20:43 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Nov  2 20:43 lib32 -> usr/lib32
lrwxrwxrwx   1 root root    9 Nov  2 20:43 lib64 -> usr/lib64
lrwxrwxrwx   1 root root   10 Nov  2 20:43 libx32 -> usr/libx32
drwxr-xr-x   2 root root 4096 Nov  2 20:43 media
drwxr-xr-x   2 root root 4096 Nov  2 20:43 mnt
drwxr-xr-x   1 root root 4096 Jan 26 08:06 opt
dr-xr-xr-x 524 root root    0 Feb 24 16:31 proc
drwx------   1 root root 4096 Feb 24 16:32 root
drwxr-xr-x  11 root root  240 Feb 24 16:32 run
lrwxrwxrwx   1 root root    8 Nov  2 20:43 sbin -> usr/sbin
drwxr-xr-x   2 root root 4096 Nov  2 20:43 srv
dr-xr-xr-x  13 root root    0 Feb 24 16:31 sys
drwxrwxrwt   2 root root   40 Feb 24 16:48 tmp
drwxr-xr-x   1 root root 4096 Nov  2 20:43 usr
drwxr-xr-x  11 root root 4096 Feb 24 16:31 var

Now that’s scary, anyone with permissions to create a pod in the cluster can get root access to the cluster nodes ! 😱

Deploy Kyverno and Kyverno policies

In order to detect (and block) privileged pods creation requests, we will deploy Kyverno policy engine using Helm.

Note that blocking all privileged pods is not a viable solution as some pods in the cluster actually need elevated privileges (CNI, scheduler, api-server, etc…), we will come back to this later.

Let’s deploy Kyverno without minimal configuration for now by running the command below:

helm upgrade --install --wait --timeout 15m --atomic \
  --namespace kyverno --create-namespace \
  --repo https://kyverno.github.io/kyverno kyverno kyverno \
  --values - <<EOF
replicaCount: 3
EOF

This will deploy Kyverno but no policies. Fortunately, there is a second Helm chart that contains the Pod Security Standards policies.

Let’s deploy the Pod Security Standards policies by running the command below:

helm upgrade --install --wait --timeout 15m --atomic \
  --namespace kyverno --create-namespace \
  --repo https://kyverno.github.io/kyverno kyverno-policies \
  kyverno-policies --values - <<EOF
podSecurityStandard: restricted
validationFailureAction: enforce
EOF

Now Kyverno should be running and the Pod Security Standards policies should be deployed and effective, we can check this with:

$ kubectl get clusterpolicies.kyverno.ioNAME                             BACKGROUND   ACTION    READY
disallow-capabilities            true         enforce   true
disallow-capabilities-strict     true         enforce   true
disallow-host-namespaces         true         enforce   true
disallow-host-path               true         enforce   true
disallow-host-ports              true         enforce   true
disallow-host-process            true         enforce   true
disallow-privilege-escalation    true         enforce   true
disallow-privileged-containers   true         enforce   true
disallow-proc-mount              true         enforce   true
disallow-selinux                 true         enforce   true
require-run-as-non-root-user     true         enforce   true
require-run-as-nonroot           true         enforce   true
restrict-apparmor-profiles       true         enforce   true
restrict-seccomp                 true         enforce   true
restrict-seccomp-strict          true         enforce   true
restrict-sysctls                 true         enforce   true
restrict-volume-types            true         enforce   true

Attempt to run a node shell attack again

With Kyverno running and the Pod Security Standards policies effective, let’s try the node shell attack again:

$ kubectl node-shell kind-control-planespawning "nsenter-xzw7ab" on "kind-control-plane"
Error from server: admission webhook "validate.kyverno.svc-fail" denied the request:resource Pod/default/nsenter-xzw7ab was blocked due to the following policiesdisallow-capabilities-strict:
  require-drop-all: 'validation failure: Containers must drop `ALL` capabilities.'
disallow-host-namespaces:
  host-namespaces: 'validation error: Sharing the host namespaces is disallowed. The
    fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to
    `false`. Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privilege-escalation:
  privilege-escalation: 'validation error: Privilege escalation is disallowed. The
    fields spec.containers[*].securityContext.allowPrivilegeEscalation, spec.initContainers[*].securityContext.allowPrivilegeEscalation,
    and spec.ephemeralContainers[*].securityContext.allowPrivilegeEscalation must
    be set to `false`. Rule privilege-escalation failed at path /spec/containers/0/securityContext/allowPrivilegeEscalation/'
disallow-privileged-containers:
  privileged-containers: 'validation error: Privileged mode is disallowed. The fields
    spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged
    must be unset or set to `false`. Rule privileged-containers failed at path /spec/containers/0/securityContext/privileged/'
require-run-as-nonroot:
  run-as-non-root: 'validation error: Running as root is not allowed. Either the field
    spec.securityContext.runAsNonRoot must be set to `true`, or the fields spec.containers[*].securityContext.runAsNonRoot,
    spec.initContainers[*].securityContext.runAsNonRoot, and spec.ephemeralContainers[*].securityContext.runAsNonRoot
    must be set to `true`. Rule run-as-non-root[0] failed at path /spec/securityContext/runAsNonRoot/.
    Rule run-as-non-root[1] failed at path /spec/containers/0/securityContext/runAsNonRoot/.'
restrict-seccomp-strict:
  check-seccomp-strict: 'validation error: Use of custom Seccomp profiles is disallowed.
    The fields spec.securityContext.seccompProfile.type, spec.containers[*].securityContext.seccompProfile.type,
    spec.initContainers[*].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type
    must be set to `RuntimeDefault` or `Localhost`. Rule check-seccomp-strict[0] failed
    at path /spec/securityContext/seccompProfile/. Rule check-seccomp-strict[1] failed
    at path /spec/containers/0/securityContext/seccompProfile/.'

At first sight it looks good, Kyverno detected and denied the privileged pod creation request.

Now there might be some configuration somewhere that explicitly allows some privileged pods to run as some system pods need such permissions.

Kyverno stores its configuration in the kyverno config map, we can look at it with:

$ kubectl describe cm -n kyverno kyvernoName:         kyverno
Namespace:    kyverno
Labels:       app=kyverno
              app.kubernetes.io/component=kyverno
              app.kubernetes.io/instance=kyverno
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kyverno
              app.kubernetes.io/part-of=kyverno
              app.kubernetes.io/version=v2.3.0
              helm.sh/chart=kyverno-v2.3.0
Annotations:  meta.helm.sh/release-name: kyverno
              meta.helm.sh/release-namespace: kyvernoData
====
generateSuccessEvents:
----
false
resourceFilters:
----
[Event,*,*][*,kube-system,*][*,kube-public,*][*,kube-node-lease,*][Node,*,*][APIService,*,*][TokenReview,*,*][SubjectAccessReview,*,*][SelfSubjectAccessReview,*,*][*,kyverno,*][Binding,*,*][ReplicaSet,*,*][ReportChangeRequest,*,*][ClusterReportChangeRequest,*,*]

From the config map above, we can see that Kyverno is configured to ignore all request targeting the kube-system, kube-public, kube-node-lease and kyverno namespaces.

With this in mind, running the node shell attack in one of the whitelisted namespaces should succeed:

$ kubectl node-shell kind-control-plane -n kube-systemspawning "nsenter-wwsbcz" on "kind-control-plane"
If you don't see a command prompt, try pressing enter.
root@kind-control-plane:/# ls -la
total 60
drwxr-xr-x   1 root root 4096 Feb 24 16:31 .
drwxr-xr-x   1 root root 4096 Feb 24 16:31 ..
-rwxr-xr-x   1 root root    0 Feb 24 16:31 .dockerenv
lrwxrwxrwx   1 root root    7 Nov  2 20:43 bin -> usr/bin
drwxr-xr-x   2 root root 4096 Oct 11 08:39 boot
drwxr-xr-x  17 root root 4440 Feb 24 16:31 dev
drwxr-xr-x   1 root root 4096 Feb 24 16:31 etc
drwxr-xr-x   2 root root 4096 Oct 11 08:39 home
drwxr-xr-x   1 root root 4096 Feb 24 16:31 kind
lrwxrwxrwx   1 root root    7 Nov  2 20:43 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Nov  2 20:43 lib32 -> usr/lib32
lrwxrwxrwx   1 root root    9 Nov  2 20:43 lib64 -> usr/lib64
lrwxrwxrwx   1 root root   10 Nov  2 20:43 libx32 -> usr/libx32
drwxr-xr-x   2 root root 4096 Nov  2 20:43 media
drwxr-xr-x   2 root root 4096 Nov  2 20:43 mnt
drwxr-xr-x   1 root root 4096 Jan 26 08:06 opt
dr-xr-xr-x 516 root root    0 Feb 24 16:31 proc
drwx------   1 root root 4096 Feb 24 17:07 root
drwxr-xr-x  11 root root  240 Feb 24 16:32 run
lrwxrwxrwx   1 root root    8 Nov  2 20:43 sbin -> usr/sbin
drwxr-xr-x   2 root root 4096 Nov  2 20:43 srv
dr-xr-xr-x  13 root root    0 Feb 24 16:31 sys
drwxrwxrwt   2 root root   40 Feb 24 17:30 tmp
drwxr-xr-x   1 root root 4096 Nov  2 20:43 usr
drwxr-xr-x  11 root root 4096 Feb 24 16:31 var

Damn, it looks like the default configuration is not secure enough to prevent node shell attack.

One could argue that creating pods in the kube-system namespace should not be authorized but it doesn’t mean that our Kyverno setup should not be hardened if possible.

Harden Kyverno setup

We saw that Kyverno filters resources considered by the admission controller and ignores all resources in a couple of namespaces.

That leaves doors opened that can be exploited, so the first thing we can do to harden Kyverno is to remove the filters:

helm upgrade --install --wait --timeout 15m --atomic --namespace kyverno --create-namespace \
  --repo https://kyverno.github.io/kyverno kyverno kyverno \
  --values - <<EOF
replicaCount: 3
config:
  resourceFilters: []
EOF

Now, no more resources will be filtered and every resource will be considered by Kyverno.

If we try to run a node shell attack in the kube-system namespace, this time it would fail.

But we have another issue now, the system pods that live in the kube-system namespace won’t be able to restart as they violate the Pod Security Standards policies.

We need to find another way to allow those system pods to get through our Kyverno policies.

Fortunately, most of the system pods are static pods and static pods are created by kubelet with credentials that belong to the system:nodes group.

We can add an exclude statement in our policies to allow requests coming from a user that belongs to the system:nodes group.

In the same spirit, we also need to allow all service accounts that live in the kube-system namespace, and we can use the system:serviceaccounts:kube-system group for that.

Let’s deploy our Kyverno policies taking the observations above into account:

helm upgrade --install --wait --timeout 15m --atomic \
  --namespace kyverno --create-namespace \
  --repo https://kyverno.github.io/kyverno kyverno-policies \
  kyverno-policies --values - <<EOF
podSecurityStandard: restricted
validationFailureAction: enforce
background: false
policyExclude:
  disallow-capabilities:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-capabilities-strict:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
      - kind: Group
        name: system:serviceaccounts:kyverno
  disallow-host-namespaces:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-host-path:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-host-ports:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-host-process:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-privilege-escalation:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-privileged-containers:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-proc-mount:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  disallow-selinux:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  require-run-as-non-root-user:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  require-run-as-nonroot:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  restrict-apparmor-profiles:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  restrict-seccomp:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  restrict-seccomp-strict:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  restrict-sysctls:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
  restrict-volume-types:
    any:
    - subjects:
      - kind: Group
        name: system:nodes
      - kind: Group
        name: system:serviceaccounts:kube-system
EOF

This takes some work to add an exclusion statement for each policy. While this is a tedious and repetitive task, it could easily be improved in the chart itself and is probably worth the effort.

Note that we also need to change the background mode (background: false) as we can’t match on subjects in the background, it has to be done at the time of the request.

Attempt to run a node shell attack again

Without surprise, running a node shell attack will fail this time:

$ kubectl node-shell kind-control-plane -n kube-systemspawning "nsenter-dz6d2e" on "kind-control-plane"
Error from server: admission webhook "validate.kyverno.svc-fail" denied the request:resource Pod/kube-system/nsenter-dz6d2e was blocked due to the following policiesdisallow-capabilities-strict:
  require-drop-all: 'validation failure: Containers must drop `ALL` capabilities.'
disallow-host-namespaces:
  host-namespaces: 'validation error: Sharing the host namespaces is disallowed. The
    fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to
    `false`. Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privilege-escalation:
  privilege-escalation: 'validation error: Privilege escalation is disallowed. The
    fields spec.containers[*].securityContext.allowPrivilegeEscalation, spec.initContainers[*].securityContext.allowPrivilegeEscalation,
    and spec.ephemeralContainers[*].securityContext.allowPrivilegeEscalation must
    be set to `false`. Rule privilege-escalation failed at path /spec/containers/0/securityContext/allowPrivilegeEscalation/'
disallow-privileged-containers:
  privileged-containers: 'validation error: Privileged mode is disallowed. The fields
    spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged
    must be unset or set to `false`. Rule privileged-containers failed at path /spec/containers/0/securityContext/privileged/'
require-run-as-nonroot:
  run-as-non-root: 'validation error: Running as root is not allowed. Either the field
    spec.securityContext.runAsNonRoot must be set to `true`, or the fields spec.containers[*].securityContext.runAsNonRoot,
    spec.initContainers[*].securityContext.runAsNonRoot, and spec.ephemeralContainers[*].securityContext.runAsNonRoot
    must be set to `true`. Rule run-as-non-root[0] failed at path /spec/securityContext/runAsNonRoot/.
    Rule run-as-non-root[1] failed at path /spec/containers/0/securityContext/runAsNonRoot/.'
restrict-seccomp-strict:
  check-seccomp-strict: 'validation error: Use of custom Seccomp profiles is disallowed.
    The fields spec.securityContext.seccompProfile.type, spec.containers[*].securityContext.seccompProfile.type,
    spec.initContainers[*].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type
    must be set to `RuntimeDefault` or `Localhost`. Rule check-seccomp-strict[0] failed
    at path /spec/securityContext/seccompProfile/. Rule check-seccomp-strict[1] failed
    at path /spec/containers/0/securityContext/seccompProfile/.'

The attack was blocked, even with an kubernetes-admin access, while keeping a working cluster and preserving system pods permissions. 🎉

This makes complete sense that users are not allowed to create privileged pods, creating resources in a cluster should not be the responsibility of an end user anyway.

This clearly demonstrates how policies can be applied differently when a request comes from a user or a service account.

The bad news

Sadly, there is a bad news to this story… what i presented here is not feasible yet:

Changing the background mode is not yet supported in the kyverno-policies helm chart (the PR has been merged though)
There is a bug in the kyverno helm chart that sets securityContext.capabilities.drop to all instead of ALL (this has been fixed too, but not released yet)

Once those fixes are released, implementing the Pod Security Standards as described here will be completely possible and I will update this story when the fixes are released.

Wrapping it up

Kubernetes security can be a big challenge and tools like Kyverno can be used to simplify deploying and maintaining security policies inside an organization.

Although there’s a bit of work to understand how things work and to make sure it doesn’t break existing workloads, this is an extremely important aspect of running a secure Kubernetes cluster.

Kyverno policies can run in audit mode instead of enforce, this allows testing policies while not breaking existing workloads. In audit mode Kyverno will generate Warning events but won’t deny requests.

Kubernetes Security — Pod Security Standards using Kyverno

Create a local Kubernetes cluster

Perform node shell attack

Deploy Kyverno and Kyverno policies

Attempt to run a node shell attack again

Harden Kyverno setup

Attempt to run a node shell attack again

The bad news

Wrapping it up

Written by Charles-Edouard Brétéché