Google Kubernetes Engine: default deployment security problems

5 min readNov 30, 2017

In last week story I presented some security problems of Kops default deployments and how you can fix them.

A default deployment of a Kubernetes cluster using Google Kubernetes Engine suffers two of the same type of flaws that you must know about and fix before going to production.

Default Google Kubernetes Engine installation

Creating a cluster with GKE is pretty straight-forward with the gcloud command line utility:

$ gcloud container clusters create mycluster
Creating cluster mycluster...\
NAME       ZONE            MASTER_VERSION  MASTER_IP      MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
mycluster  europe-west1-d  1.7.8-gke.0     35.187.97.137  n1-standard-1  1.7.8-gke.0   3          RUNNING

At the time of this article this installs the Kubernetes release 1.7.8. It takes a couple of minutes for everything to be ready and after that, you get a super-easy to manage/upgrade/scale cluster hosted on Google infrastructure. This is my favorite way to get a Kubernetes cluster up and running but also to administer: every operation is a click away and the QA Google performs for each feature makes upgrading your cluster a breeze.

That being said, if you care about security in depth (and you should), the default setup suffers from two major problems that you must know about.
I strongly advise you to correct theses problems before going to production.

Problem 1: No RBAC by default

Same problem for GKE and for Kops here. By default RBAC isn’t enabled just yet. This means somebody getting access to one of your pods gets automatically access to the whole cluster: he can effortlessly jump from container to container, delete everything running, access secrets.

Demonstration:

# Start a container and log into it
$ kubectl run --image=google/cloud-sdk gcloud -- sleep infinity
$ kubectl exec -ti gcloud-290-v51rv bash
root@gcloud-290-v51rv:/#

From the container, list other pods:

root@gcloud-290-v51rv:/# kubectl get pods --all-namespaces -o name
pods/gcloud-290-v51rv
pods/event-exporter-v0.1.7-958884745-m2hk6
pods/fluentd-gcp-v2.0.9-bz5hl
pods/fluentd-gcp-v2.0.9-ctlh6
pods/fluentd-gcp-v2.0.9-pwlz2
pods/heapster-v1.4.3-1096258942-bhpnq
pods/kube-dns-3092422022-h8w4n
pods/kube-dns-3092422022-vnkrl
pods/kube-dns-autoscaler-97162954-7m8zr
pods/kube-proxy-gke-mycluster-default-pool-cb7b0ee6-q2c0
pods/kube-proxy-gke-mycluster-default-pool-cb7b0ee6-v2c3
pods/kube-proxy-gke-mycluster-default-pool-cb7b0ee6-xcxn
pods/kubernetes-dashboard-1914926742-2p8hz
pods/l7-default-backend-1798834265-l6xsw

From our pod, run a command on another pod:

root@gcloud-290-v51rv:/# kubectl -n kube-system exec -ti -c event-exporter event-exporter-v0.1.7-958884745-m2hk6 ls
bin   dev  event-exporter  lib   media  opt   root  sbin  sys usr
boot  etc  home     lib64  mnt  proc  run   srv   tmp var

Fix: always enable RBAC on your cluster. To do so, when you create the cluster, add --no-enable-legacy-authorization to the command so that RBAC will be enabled. This won’t be necessary with Kubernetes 1.8 onward: the option will be enabled by default. See the doc on google cloud for more details.
Note: if you are upgrading from an existing cluster to 1.8, you have to use gcloud command with option --no-enable-legacy-authorization to get a cluster with RBAC enabled after the upgrade.

Create the cluster with RBAC, log into a pod:

$ gcloud container clusters create --no-enable-legacy-authorization mycluster
Creating cluster mycluster...done.
[joss@~]$ kubectl run --image=google/cloud-sdk gcloud -- sleep infinity
deployment "gcloud" created
[joss@~]$ kubectl exec -ti gcloud-2908167891-sdb3p bash

The Kubernetes API Server now rejects the requests, the pod doesn’t have enough permissions:

root@gcloud-2908167891-sdb3p:/# kubectl get pods
Error from server (Forbidden): User "system:serviceaccount:default:default" cannot list pods in the namespace "default".: "Unknown user \"system:serviceaccount:default:default\"" (get pods)

Problem 2: Node scope includes Read/Write on Compute Engine

The second issue is the Google Kubernetes Engine equivalent of the AWS Metadata problem with Kops: by default, your instance has Read/Write permissions on Compute Engine and pods running on your instances can ask the Google Metadata API server for an access token with the permissions of the instance. This means on a default GKE cluster, pods are able to connect or destroy any instances (VM) running in the project.

Demonstration:

Start a pod with gcloud and log into it:

$ kubectl run --image=google/cloud-sdk gcloud -- sleep infinity
$ kubectl exec -ti gcloud-290-v51rv bash
root@gcloud-290-ckfpq:/#

From the pod, list the instances running on the project:

root@gcloud-290-ckfpq:/# gcloud compute instances list
NAME                                        ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
gke-mycluster-default-pool-cb7b0ee6-q2c0    europe-west1-d  n1-standard-1               10.132.0.4   35.205.194.113  RUNNING
gke-mycluster-default-pool-cb7b0ee6-v2c3    europe-west1-d  n1-standard-1               10.132.0.5   35.205.239.8    RUNNING
gke-mycluster-default-pool-cb7b0ee6-xcxn    europe-west1-d  n1-standard-1               10.132.0.3   35.190.200.121  RUNNING
instance-1                                  europe-west1-d  n1-standard-1               10.132.0.6   35.195.48.69    RUNNING

Example 1: from the pod, SSH into instance-1 (a VM outside our GKE cluster):

root@gcloud-290-ckfpq:/# gcloud compute ssh joss@instance-1Updating project ssh metadata...\
joss@instance-1:~$

Example 2: from the pod, delete the whole Cluster (Warning: this kills the cluster. Don’t do it):

root@gcloud-290-ckfpq:/# gcloud compute instance-groups list
NAME                                       LOCATION        SCOPE  NETWORK  MANAGED  INSTANCES
gke-mycluster-default-pool-cb7b0ee6-grp    europe-west1-d  zone   default  Yes      3root@gcloud-290-ckfpq:/# gcloud compute instance-groups managed delete gke-mycluster-default-pool-cb7b0ee6-grpThe following instance group managers will be deleted:
 - [gke-mycluster-default-pool-cb7b0ee6-grp] in [europe-west1-d]Do you want to continue (Y/n)? yDeleting autoscalers...done.
Deleting Managed Instance Group.../# Logs stop here: the instance the pod is running-on has been destroyed

Being able to kill the whole cluster from a pod is pretty bad. The escalation from an application breach to a full disaster is immediate.

Fix: Add network filtering rules to prevent pods from hitting the Metadata API Server. You can do it by adding a Network Policy to block egress traffic to the metadata server. I’m note sure what the consequences of removing the “Compute Read/Write” scope from the instance authorizations would be.

Details and discussion about this issue can be found at #8867. An option to forbid pods from accessing the metadata server has been merged a month ago and should be available with Kubernetes 1.9.

Conclusion

If you deployed your cluster using the default configuration, please have a look again and maybe plan an operation or at least create a ticket to have a deeper look at this sometime, you don’t want to have your cluster disappear at the first breach in your system.

General security should get better over time for Kubernetes: RBAC is out of beta since 1.8 and a fix to #8867 has been merged. It’s now a matter of finding a balance between safe and convenient defaults.

You may be interested about the same class of issues for Kubernetes clusters deployed with Kops: read here.

Google Kubernetes Engine: default deployment security problems

Default Google Kubernetes Engine installation

Problem 1: No RBAC by default

Problem 2: Node scope includes Read/Write on Compute Engine

Conclusion

Written by Josselin Costanzi