Securing Kubernetes with K-rail

Workload policy enforcement in Kubernetes to manage security at speed

Dustin Decker
Cruise
9 min readOct 30, 2019

--

Written by Dustin Decker and Sam “Frenchie” Stewart

Safety and security are core values at Cruise, and for good reason: we are in a race for trust, as much as we are in a race for tech. “Move fast and break things” doesn’t cut it when dealing with safety-critical systems. (Quick plug: if you’d like to join the trust race, we are hiring.)

Previously, we have written about how we manage secrets across many services in GCP, AWS and on-prem, as well as how we are building, securing, and networking our multi-tenant Platform-as-a-Service (PaaS).

In this post, we will be addressing some common security misconfigurations in Kubernetes and Google Kubernetes Engine (GKE) beyond identity, secrets, and access management. We’re also announcing the release of k-rail, a tool we’ve built to help manage security in Kubernetes while maintaining high developer productivity.

As a workload policy enforcement tool for Kubernetes, K-rail enables you to:

  • Measure violations before and after you enforce them
  • Use flexible and easy-to-use policy exemptions
  • Use many impactful policies out of the box
  • Get real-time, interactive feedback to users when they apply resources, even high level resources such as Deployments, DaemonSets, and CronJobs

Privilege escalation in Kubernetes

The most common ways to break out of Pods in Kubernetes have nothing to do with 0-day runtime or kernel exploits, but rather use standard Pod features like host mounts, host PID namespace sharing, or host networking. Kubernetes is a powerful and flexible tool, and the complexity of configurations can often lead to routes for privilege escalation. Kubernetes Role Based Access Control (RBAC) is an important security component that allows for granular access to resources and APIs, and Cruise has previously shared some good tooling to sync Google Groups to Kubernetes RBAC, which can help you manage Kubernetes RBAC for your engineers.

But even with robust RBAC implemented, if you have access to run Pods on Kubernetes, you likely have a few easy routes of privilege escalation available to you. These privilege escalations can lead to the compromise of every workload on the node, or in some cases the entire cluster and beyond.

Below are a few simple examples of common security misconfigurations.

Compromising environment secrets on a node

One path of exploitation involves hostPath volumes, also known as bind mounts. HostPath volumes allow you to mount directories from the host system into your container. Allowing this can enable containers to access powerful application and kernel APIs and enable full access to filesystems on the host. Host bind mounts can enable an attacker to exfiltrate any data on the node, intercept syscalls, load malicious kernel modules, dump process memory, and enable further host and cluster privilege escalation.

In this example, we have the vulnerable configuration of mounting the host root filesystem which is used to read environment variables from each Process ID (PID) in the host’s /proc filesystem.

Taking over the host Docker daemon

Mounting the host Docker daemon socket is another common misconfiguration which can easily lead to full system access. Once you have access to the Docker socket, you can tamper with existing containers or spawn highly privileged container with mounts for gaining further elevated access to the host system.

Becoming cluster-admin on Google Kubernetes Engine

Within Kubernetes RBAC, a role that grants full access to the cluster’s resources and APIs is typically referred to as cluster-admin. Often, there is at least one user or a Kubernetes Service Account bound to a system Pod to have this role. While it’s common, it’s certainly not a good practice — if you are able to obtain this credential, it is game over.

In GKE, the GCE metadata API’s kube-env endpoint is used to bootstrap Kubernetes nodes and provides a credential that can get a mTLS certificate signed for any node in the cluster, which enables an attacker to impersonate any worker node. Since each node can access the secrets for workloads assigned to it, a little iteration will quickly dump all cluster secrets. Those secrets include Kubernetes Service Account credentials and will likely be enough to obtain cluster-admin and also pivot throughout the rest of the organization’s infrastructure.

One important thing to note is that GCP recommends using a feature called metadata concealment, which hides certain Metadata API endpoints (including kube-env) from Pods. However, if you have access to a Pod with host networking or can create one, metadata concealment can be bypassed.

The demo below uses Brad Geesaman’s kube-env-stealer to automate the extraction of a cluster’s secrets using the kube-env vulnerability (Brad is great, follow him on twitter).

Dumping all of a GKE cluster’s secrets in seconds
Dumping all of a GKE cluster’s secrets in seconds

The remediation problem

As with a lot of security challenges, the choice isn’t whether or not to fix these misconfigurations, but rather how to go about it. To make matters more complicated, in our history, Cruise has had many single and multi-tenant clusters with a wide range of workloads and users. Many of the current workloads are short-lived jobs, making it difficult to know exactly what’s running and what the requirements are for a given point-in-time. One of Cruise’s busier clusters has ~18,000 Pods across 68 Namespaces. It was vitally important to enable enforcement without breaking services and disrupting engineering productivity.

Applying a PodSecurityPolicy (PSP) can be facilitated by tools such as sysdig’s Kube PSP Advisor, based on point-in-time analysis of existing workloads. Unfortunately, this approach risks breaking many of the intermittent workloads that were not present during analysis. Managing PSPs for each workload profile seemed daunting, and binding the policy only with subjects like users, groups, and service accounts was too limiting. Many times, we wanted to be able to apply a policy to a particular workload directly without requiring a team to create and attach a Kubernetes Service Account that we could bind a policy to.

The other possible solution at the time, Open Policy Agent (OPA), is a generic policy engine that supports Kubernetes Validating Webhooks. We found usability issues with this, as the examples did not always provide feedback during apply, but only after inspection of the resources. This is because the examples inspected the podSpec of a Pod, but not the podSpecs of more than a dozen other higher-level APIs that manage Pods. When a user applied a Deployment that violated a policy, it would be accepted by the apiserver, but the ReplicaSet it creates would fail to create the Pod. That sort of failure is not easily visible to the user, impacting their user experience and developer productivity.

Deployment configurations ultimately manage the creation of Pods
Deployment configurations ultimately manage the creation of Pods

In addition to the usability issues, there were some other requirements that were not satisfied by the existing solutions:

  • We needed to have flexible and easy-to-use support for policy exemptions (e.g. by namespace, resource names, and users) so we could roll out enforcement without breaking existing workloads and workflows.
  • We needed to be able to have flexibility in our policy implementations to support things like image signing and have confidence in them with tests and strong typing against the Kubernetes API.
  • We needed good telemetry so that we knew we were making informed decisions about enforcement and could share with engineers for transparency, awareness, and to gain trust.
  • In some cases, we want to mutate resources being applied for operational and security reasons, so we required support for policies that could patch resources.

While we liked the approach of using validating webhooks, we didn’t really need a general-purpose policy engine. We needed a robust and simple webhook server tailored for Kubernetes workload policy enforcement, that we could easily shape to our needs.

Introducing k-rail, a workload policy enforcement tool for Kubernetes

k-rail, also known as a Jersey Barrier

As mentioned earlier, k-rail’s features include:

  • Telemetry to enable you to measure violations before and after you enforce them
  • Flexible and easy to use policy exemptions
  • Many impactful policies out of the box
  • Real-time, interactive feedback to users when they apply resources, even high level resources such as Deployments, DaemonSets, and CronJobs

K-rail works by being a recipient of a Mutating Webhook from the kube-apiserver. When a user or system applies a resource against Kubernetes, the apiserver will delegate acceptance to the webhook endpoint:

K-rail leverages Admission Webhooks in Kubernetes

Let’s see that same kube-env exploit again using the hostNetwork bypass for a GKE metadata concealment, but now with K-rail in enforcement mode:

Our experience rolling out k-rail at scale

We started by running k-rail with all policies in report-only mode for a few weeks. This allowed us to collect violation data for even the shortest-lived jobs so that we could analyze the data to see what would be impacted.

K-rail logs violations in JSON format, so we used the Stackdriver Log Exports to shove them into BigQuery, a data warehouse and analytics tool in GCP:

Export structured logs from Stackdriver to BigQuery for analysis

Once the data is in BigQuery, you can visualize the data using Data Studio, which is also included in the GCP platform. We shared the raw data and a few dashboards with all engineers and sent out notifications about the incoming enforcement:

Workload policy violations by namespace, visible to all engineers
Workload policy violations by namespace, visible to all engineers

We then notified teams that enforcement would be happening in the future and provided them with the data and tooling. We were pleasantly surprised when many engineers were eager to remediate the workload policy violations themselves. When presented with the right information and tools, most engineers want to do the right thing.

Hi Infrasec,

Thanks for the heads up!

We have removed our bind mount dependency, so feel free to apply the enforcement.

Thanks,
$COLLEAGUE

In our experience, most of the violations were accidental misconfigurations or cruft from deployments that were no longer needed. These were easily remediated.

For the few remaining violations, we collected data to make policy exemptions so that we would not break any workflows or deployments when we enabled enforcement in each environment. We then specified the exemption configuration in git for deployment and to facilitate future remediation work:

Besides the real-time, interactive feedback that engineers receive, any enforced violations that someone encountered would also be observable on another Data Studio dash that is visible to all of engineering:

k-rail’s results

To recap, we:

  • Covered some common privilege escalation pathways found in Kubernetes deployments
  • Highlighted some of the challenges security teams often face when trying to balance engineering productivity and enforcement when using existing tooling
  • Shared our solution to these challenges by announcing the release of k-rail, a tool for workload policy enforcement in Kubernetes to manage security at speed

With k-rail, we were able to safely and transparently roll out workload policy enforcement for development, staging and production environments for over a dozen PaaS clusters and hundreds of engineers with many thousands of diverse workloads. This implementation happened in just over a week with no impact to developer productivity.

We hope that you’ll try out k-rail, share your own policies back upstream (PRs appreciated!), and use it to secure your own clusters.

If you would like to create your own tools or work with the Security team, join us.

--

--