Kubernetes Security — Pod Security Standards using Kyverno

Charles-Edouard Brétéché
11 min readFeb 24, 2022

--

The Pod Security Standards define three different policies to broadly cover the security spectrum. These policies are cumulative and range from highly-permissive to highly-restrictive.

Unfortunately, there are currently two implementations in Kubernetes that implement this, one is being deprecated (Pod Security Policies) and the other one is still in beta (Pod Security Admission).

In this story, I’m going to show how to implement Pod Security Standards with Kyverno, a policy engine for Kubernetes that can be used to describe policies and validate resource requests against those policies.

I will use the node shell attack as an example and show how we can defend against this attack using Kyverno by going through:

  • Create a local Kubernetes cluster
  • Perform node shell attack on it
  • Deploy Kyverno and Kyverno policies
  • Attempt to run a node shell attack again (and succeed)
  • Harden Kyverno setup
  • Attempt to run a node shell attack again (and fail)

Create a local Kubernetes cluster

Let’s create a simple Kubernetes cluster with Kind by running the script below:

kind create cluster --image "kindest/node:v1.23.3" --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
EOF

This will give us a 3 master and 3 worker nodes cluster.

Perform node shell attack

With the cluster created above, we going to run a node shell attack using kubectl node-shell plugin.

Basically, the node shell attack allows an attacker to get a shell as root on a node of the cluster by starting a privileged pod with access to host namespaces (hostPID, hostIPC and hostNetwork).

Once the attacker has access to a shell on the node, it can corrupt the file system, retrieve sensitive informations, start/stop processes, and so on…

Let’s pick up a node from our cluster (kind-control-plane for example should be a good candidate):

$ kubectl get nodekind-control-plane    Ready    control-plane,master   13m   v1.23.3
kind-control-plane2 Ready control-plane,master 13m v1.23.3
kind-control-plane3 Ready control-plane,master 12m v1.23.3
kind-worker Ready <none> 12m v1.23.3
kind-worker2 Ready <none> 12m v1.23.3
kind-worker3 Ready <none> 12m v1.23.3

Without any kind of security policy, running kubectl node-shell kind-control-plane should bring us a shell on the master node:

$ kubectl node-shell kind-control-planespawning "nsenter-8ze5mw" on "kind-control-plane"
If you don't see a command prompt, try pressing enter.
root@kind-control-plane:/# ls -la
total 60
drwxr-xr-x 1 root root 4096 Feb 24 16:31 .
drwxr-xr-x 1 root root 4096 Feb 24 16:31 ..
-rwxr-xr-x 1 root root 0 Feb 24 16:31 .dockerenv
lrwxrwxrwx 1 root root 7 Nov 2 20:43 bin -> usr/bin
drwxr-xr-x 2 root root 4096 Oct 11 08:39 boot
drwxr-xr-x 17 root root 4440 Feb 24 16:31 dev
drwxr-xr-x 1 root root 4096 Feb 24 16:31 etc
drwxr-xr-x 2 root root 4096 Oct 11 08:39 home
drwxr-xr-x 1 root root 4096 Feb 24 16:31 kind
lrwxrwxrwx 1 root root 7 Nov 2 20:43 lib -> usr/lib
lrwxrwxrwx 1 root root 9 Nov 2 20:43 lib32 -> usr/lib32
lrwxrwxrwx 1 root root 9 Nov 2 20:43 lib64 -> usr/lib64
lrwxrwxrwx 1 root root 10 Nov 2 20:43 libx32 -> usr/libx32
drwxr-xr-x 2 root root 4096 Nov 2 20:43 media
drwxr-xr-x 2 root root 4096 Nov 2 20:43 mnt
drwxr-xr-x 1 root root 4096 Jan 26 08:06 opt
dr-xr-xr-x 524 root root 0 Feb 24 16:31 proc
drwx------ 1 root root 4096 Feb 24 16:32 root
drwxr-xr-x 11 root root 240 Feb 24 16:32 run
lrwxrwxrwx 1 root root 8 Nov 2 20:43 sbin -> usr/sbin
drwxr-xr-x 2 root root 4096 Nov 2 20:43 srv
dr-xr-xr-x 13 root root 0 Feb 24 16:31 sys
drwxrwxrwt 2 root root 40 Feb 24 16:48 tmp
drwxr-xr-x 1 root root 4096 Nov 2 20:43 usr
drwxr-xr-x 11 root root 4096 Feb 24 16:31 var

Now that’s scary, anyone with permissions to create a pod in the cluster can get root access to the cluster nodes ! 😱

Deploy Kyverno and Kyverno policies

In order to detect (and block) privileged pods creation requests, we will deploy Kyverno policy engine using Helm.

Note that blocking all privileged pods is not a viable solution as some pods in the cluster actually need elevated privileges (CNI, scheduler, api-server, etc…), we will come back to this later.

Let’s deploy Kyverno without minimal configuration for now by running the command below:

helm upgrade --install --wait --timeout 15m --atomic \
--namespace kyverno --create-namespace \
--repo https://kyverno.github.io/kyverno kyverno kyverno \
--values - <<EOF
replicaCount: 3
EOF

This will deploy Kyverno but no policies. Fortunately, there is a second Helm chart that contains the Pod Security Standards policies.

Let’s deploy the Pod Security Standards policies by running the command below:

helm upgrade --install --wait --timeout 15m --atomic \
--namespace kyverno --create-namespace \
--repo https://kyverno.github.io/kyverno kyverno-policies \
kyverno-policies --values - <<EOF
podSecurityStandard: restricted
validationFailureAction: enforce
EOF

Now Kyverno should be running and the Pod Security Standards policies should be deployed and effective, we can check this with:

$ kubectl get clusterpolicies.kyverno.ioNAME                             BACKGROUND   ACTION    READY
disallow-capabilities true enforce true
disallow-capabilities-strict true enforce true
disallow-host-namespaces true enforce true
disallow-host-path true enforce true
disallow-host-ports true enforce true
disallow-host-process true enforce true
disallow-privilege-escalation true enforce true
disallow-privileged-containers true enforce true
disallow-proc-mount true enforce true
disallow-selinux true enforce true
require-run-as-non-root-user true enforce true
require-run-as-nonroot true enforce true
restrict-apparmor-profiles true enforce true
restrict-seccomp true enforce true
restrict-seccomp-strict true enforce true
restrict-sysctls true enforce true
restrict-volume-types true enforce true

Attempt to run a node shell attack again

With Kyverno running and the Pod Security Standards policies effective, let’s try the node shell attack again:

$ kubectl node-shell kind-control-planespawning "nsenter-xzw7ab" on "kind-control-plane"
Error from server: admission webhook "validate.kyverno.svc-fail" denied the request:
resource Pod/default/nsenter-xzw7ab was blocked due to the following policiesdisallow-capabilities-strict:
require-drop-all: 'validation failure: Containers must drop `ALL` capabilities.'
disallow-host-namespaces:
host-namespaces: 'validation error: Sharing the host namespaces is disallowed. The
fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to
`false`. Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privilege-escalation:
privilege-escalation: 'validation error: Privilege escalation is disallowed. The
fields spec.containers[*].securityContext.allowPrivilegeEscalation, spec.initContainers[*].securityContext.allowPrivilegeEscalation,
and spec.ephemeralContainers[*].securityContext.allowPrivilegeEscalation must
be set to `false`. Rule privilege-escalation failed at path /spec/containers/0/securityContext/allowPrivilegeEscalation/'
disallow-privileged-containers:
privileged-containers: 'validation error: Privileged mode is disallowed. The fields
spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged
must be unset or set to `false`. Rule privileged-containers failed at path /spec/containers/0/securityContext/privileged/'
require-run-as-nonroot:
run-as-non-root: 'validation error: Running as root is not allowed. Either the field
spec.securityContext.runAsNonRoot must be set to `true`, or the fields spec.containers[*].securityContext.runAsNonRoot,
spec.initContainers[*].securityContext.runAsNonRoot, and spec.ephemeralContainers[*].securityContext.runAsNonRoot
must be set to `true`. Rule run-as-non-root[0] failed at path /spec/securityContext/runAsNonRoot/.
Rule run-as-non-root[1] failed at path /spec/containers/0/securityContext/runAsNonRoot/.'
restrict-seccomp-strict:
check-seccomp-strict: 'validation error: Use of custom Seccomp profiles is disallowed.
The fields spec.securityContext.seccompProfile.type, spec.containers[*].securityContext.seccompProfile.type,
spec.initContainers[*].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type
must be set to `RuntimeDefault` or `Localhost`. Rule check-seccomp-strict[0] failed
at path /spec/securityContext/seccompProfile/. Rule check-seccomp-strict[1] failed
at path /spec/containers/0/securityContext/seccompProfile/.'

At first sight it looks good, Kyverno detected and denied the privileged pod creation request.

Now there might be some configuration somewhere that explicitly allows some privileged pods to run as some system pods need such permissions.

Kyverno stores its configuration in the kyverno config map, we can look at it with:

$ kubectl describe cm -n kyverno kyvernoName:         kyverno
Namespace: kyverno
Labels: app=kyverno
app.kubernetes.io/component=kyverno
app.kubernetes.io/instance=kyverno
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kyverno
app.kubernetes.io/part-of=kyverno
app.kubernetes.io/version=v2.3.0
helm.sh/chart=kyverno-v2.3.0
Annotations: meta.helm.sh/release-name: kyverno
meta.helm.sh/release-namespace: kyverno
Data
====
generateSuccessEvents:
----
false
resourceFilters:
----
[Event,*,*][*,kube-system,*][*,kube-public,*][*,kube-node-lease,*][Node,*,*][APIService,*,*][TokenReview,*,*][SubjectAccessReview,*,*][SelfSubjectAccessReview,*,*][*,kyverno,*][Binding,*,*][ReplicaSet,*,*][ReportChangeRequest,*,*][ClusterReportChangeRequest,*,*]

From the config map above, we can see that Kyverno is configured to ignore all request targeting the kube-system, kube-public, kube-node-lease and kyverno namespaces.

With this in mind, running the node shell attack in one of the whitelisted namespaces should succeed:

$ kubectl node-shell kind-control-plane -n kube-systemspawning "nsenter-wwsbcz" on "kind-control-plane"
If you don't see a command prompt, try pressing enter.
root@kind-control-plane:/# ls -la
total 60
drwxr-xr-x 1 root root 4096 Feb 24 16:31 .
drwxr-xr-x 1 root root 4096 Feb 24 16:31 ..
-rwxr-xr-x 1 root root 0 Feb 24 16:31 .dockerenv
lrwxrwxrwx 1 root root 7 Nov 2 20:43 bin -> usr/bin
drwxr-xr-x 2 root root 4096 Oct 11 08:39 boot
drwxr-xr-x 17 root root 4440 Feb 24 16:31 dev
drwxr-xr-x 1 root root 4096 Feb 24 16:31 etc
drwxr-xr-x 2 root root 4096 Oct 11 08:39 home
drwxr-xr-x 1 root root 4096 Feb 24 16:31 kind
lrwxrwxrwx 1 root root 7 Nov 2 20:43 lib -> usr/lib
lrwxrwxrwx 1 root root 9 Nov 2 20:43 lib32 -> usr/lib32
lrwxrwxrwx 1 root root 9 Nov 2 20:43 lib64 -> usr/lib64
lrwxrwxrwx 1 root root 10 Nov 2 20:43 libx32 -> usr/libx32
drwxr-xr-x 2 root root 4096 Nov 2 20:43 media
drwxr-xr-x 2 root root 4096 Nov 2 20:43 mnt
drwxr-xr-x 1 root root 4096 Jan 26 08:06 opt
dr-xr-xr-x 516 root root 0 Feb 24 16:31 proc
drwx------ 1 root root 4096 Feb 24 17:07 root
drwxr-xr-x 11 root root 240 Feb 24 16:32 run
lrwxrwxrwx 1 root root 8 Nov 2 20:43 sbin -> usr/sbin
drwxr-xr-x 2 root root 4096 Nov 2 20:43 srv
dr-xr-xr-x 13 root root 0 Feb 24 16:31 sys
drwxrwxrwt 2 root root 40 Feb 24 17:30 tmp
drwxr-xr-x 1 root root 4096 Nov 2 20:43 usr
drwxr-xr-x 11 root root 4096 Feb 24 16:31 var

Damn, it looks like the default configuration is not secure enough to prevent node shell attack.

One could argue that creating pods in the kube-system namespace should not be authorized but it doesn’t mean that our Kyverno setup should not be hardened if possible.

Harden Kyverno setup

We saw that Kyverno filters resources considered by the admission controller and ignores all resources in a couple of namespaces.

That leaves doors opened that can be exploited, so the first thing we can do to harden Kyverno is to remove the filters:

helm upgrade --install --wait --timeout 15m --atomic --namespace kyverno --create-namespace \
--repo https://kyverno.github.io/kyverno kyverno kyverno \
--values - <<EOF
replicaCount: 3
config:
resourceFilters: []
EOF

Now, no more resources will be filtered and every resource will be considered by Kyverno.

If we try to run a node shell attack in the kube-system namespace, this time it would fail.

But we have another issue now, the system pods that live in the kube-system namespace won’t be able to restart as they violate the Pod Security Standards policies.

We need to find another way to allow those system pods to get through our Kyverno policies.

Fortunately, most of the system pods are static pods and static pods are created by kubelet with credentials that belong to the system:nodes group.

We can add an exclude statement in our policies to allow requests coming from a user that belongs to the system:nodes group.

In the same spirit, we also need to allow all service accounts that live in the kube-system namespace, and we can use the system:serviceaccounts:kube-system group for that.

Let’s deploy our Kyverno policies taking the observations above into account:

helm upgrade --install --wait --timeout 15m --atomic \
--namespace kyverno --create-namespace \
--repo https://kyverno.github.io/kyverno kyverno-policies \
kyverno-policies --values - <<EOF
podSecurityStandard: restricted
validationFailureAction: enforce
background: false
policyExclude:
disallow-capabilities:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-capabilities-strict:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
- kind: Group
name: system:serviceaccounts:kyverno
disallow-host-namespaces:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-host-path:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-host-ports:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-host-process:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-privilege-escalation:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-privileged-containers:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-proc-mount:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
disallow-selinux:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
require-run-as-non-root-user:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
require-run-as-nonroot:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
restrict-apparmor-profiles:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
restrict-seccomp:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
restrict-seccomp-strict:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
restrict-sysctls:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system
restrict-volume-types:
any:
- subjects:
- kind: Group
name: system:nodes
- kind: Group
name: system:serviceaccounts:kube-system

EOF

This takes some work to add an exclusion statement for each policy. While this is a tedious and repetitive task, it could easily be improved in the chart itself and is probably worth the effort.

Note that we also need to change the background mode (background: false) as we can’t match on subjects in the background, it has to be done at the time of the request.

Attempt to run a node shell attack again

Without surprise, running a node shell attack will fail this time:

$ kubectl node-shell kind-control-plane -n kube-systemspawning "nsenter-dz6d2e" on "kind-control-plane"
Error from server: admission webhook "validate.kyverno.svc-fail" denied the request:
resource Pod/kube-system/nsenter-dz6d2e was blocked due to the following policiesdisallow-capabilities-strict:
require-drop-all: 'validation failure: Containers must drop `ALL` capabilities.'
disallow-host-namespaces:
host-namespaces: 'validation error: Sharing the host namespaces is disallowed. The
fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to
`false`. Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privilege-escalation:
privilege-escalation: 'validation error: Privilege escalation is disallowed. The
fields spec.containers[*].securityContext.allowPrivilegeEscalation, spec.initContainers[*].securityContext.allowPrivilegeEscalation,
and spec.ephemeralContainers[*].securityContext.allowPrivilegeEscalation must
be set to `false`. Rule privilege-escalation failed at path /spec/containers/0/securityContext/allowPrivilegeEscalation/'
disallow-privileged-containers:
privileged-containers: 'validation error: Privileged mode is disallowed. The fields
spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged
must be unset or set to `false`. Rule privileged-containers failed at path /spec/containers/0/securityContext/privileged/'
require-run-as-nonroot:
run-as-non-root: 'validation error: Running as root is not allowed. Either the field
spec.securityContext.runAsNonRoot must be set to `true`, or the fields spec.containers[*].securityContext.runAsNonRoot,
spec.initContainers[*].securityContext.runAsNonRoot, and spec.ephemeralContainers[*].securityContext.runAsNonRoot
must be set to `true`. Rule run-as-non-root[0] failed at path /spec/securityContext/runAsNonRoot/.
Rule run-as-non-root[1] failed at path /spec/containers/0/securityContext/runAsNonRoot/.'
restrict-seccomp-strict:
check-seccomp-strict: 'validation error: Use of custom Seccomp profiles is disallowed.
The fields spec.securityContext.seccompProfile.type, spec.containers[*].securityContext.seccompProfile.type,
spec.initContainers[*].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type
must be set to `RuntimeDefault` or `Localhost`. Rule check-seccomp-strict[0] failed
at path /spec/securityContext/seccompProfile/. Rule check-seccomp-strict[1] failed
at path /spec/containers/0/securityContext/seccompProfile/.'

The attack was blocked, even with an kubernetes-admin access, while keeping a working cluster and preserving system pods permissions. 🎉

This makes complete sense that users are not allowed to create privileged pods, creating resources in a cluster should not be the responsibility of an end user anyway.

This clearly demonstrates how policies can be applied differently when a request comes from a user or a service account.

The bad news

Sadly, there is a bad news to this story… what i presented here is not feasible yet:

  • Changing the background mode is not yet supported in the kyverno-policies helm chart (the PR has been merged though)
  • There is a bug in the kyverno helm chart that sets securityContext.capabilities.drop to all instead of ALL (this has been fixed too, but not released yet)

Once those fixes are released, implementing the Pod Security Standards as described here will be completely possible and I will update this story when the fixes are released.

Wrapping it up

Kubernetes security can be a big challenge and tools like Kyverno can be used to simplify deploying and maintaining security policies inside an organization.

Although there’s a bit of work to understand how things work and to make sure it doesn’t break existing workloads, this is an extremely important aspect of running a secure Kubernetes cluster.

Kyverno policies can run in audit mode instead of enforce, this allows testing policies while not breaking existing workloads. In audit mode Kyverno will generate Warning events but won’t deny requests.

--

--