Building fast and scalable security controls: Deep-dive into Google Kubernetes Engine (GKE)

Anders Nordin
Oda Product & Tech
Published in
5 min readApr 29, 2021

tl;dr Oda is a fast-moving organization with a lot of ambition. In this blog post you’ll learn from some examples how we work with security in a non-blocking way to support our engineering teams. We will take a deep-dive into Google Kubernetes Engine (GKE), and look at two examples of how we implement security in a scalable way.

Security often have the reputation of being a showstopper, and we tend to involve ourselves late in the development process and point out security issues that needs to be fixed before the release. This way of adding security to the downstream can be very frustrating for the rest of the organization and its the anti-goal of how we think of security in Oda.

In Oda we work in cross-functional product teams, where each team has a high degree of autonomy. The cybersecurity team cannot not simply scale with the number of product teams and therefore, we need to think smart when introducing new security controls at scale. The security tools we introduce must be easy to use and understand, and the information from the tools must also be easily available to the relevant people. We believe that if security is everyone’s responsibility, then everyone should be involved.

We utilize a lot of cloud services, and our main workload runs on Google Cloud Platform (GCP). Most of the applications and services we build are running on Kubernetes in GCP, on the Google Kubernetes Engine (GKE). In the next section, we will have a closer look at two concrete examples of security processes that have helped us to scale security.

Deep-dive: Security in Google Kubernetes Engine

GCP already has a guide that explains how you should secure your workload, involving mechanisms such as access control, network policies to limit pod-to-pod communication and audit logging. If you are not familiar with this you should read the guide before reading the next parts.

Sombrero

Product teams in Oda operate with a strong ownership to their applications and tooling. We try to give them as much space as possible, but at the same time adhering to the principle of least privilege. To make the onboarding for new teams as smooth as possible we created Sombrero.

Before we jump into Sombrero I want to explain the technology behind it. As we use a lot of Python in Oda already, it is natural for us to also want to use Python for provisioning our cloud infrastructure. For this we use cdktf, which is short for Cloud Development Kit for Terraform. It lets us define networks, clusters, IAM roles and everything else we need in order to provision our cloud environment. The code is written in Python and is eventually translated to Terraform code.

Sombrero is basically where we define our shared resources such as network configuration and the GCP project structure. Using cdktf and Python makes it easy for anyone in the organization to understand the setup and for us to quickly make changes.

With Sombrero we can easily onboard a new team with just a couple lines of code, and provision a new GCP project with the right permissions and resources. In short, all it takes is two lines of code:

1: settings['my-awesome-team'] = Team('my-awesome-team')
2: settings['my-awesome-team'].projects.append(Project('myapp-prod', ProjectType.Prod))

Simplifying Role-based Access Control (RBAC) in GKE

In Oda, the teams share the Kubernetes clusters which makes it a bit more complicated when setting up access control. The built-in roles in GCP (i.e. Kubernetes Engine Admin, Developer and Viewer) gives the user cluster-wide access and with many different teams involved we wanted to avoid the teams breaking things for each other and at the same time reducing the blast radius. However, we still want to give the teams autonomy to do their work so we ended up with namespace access (which is pretty common).

Doing proper RBAC in Kubernetes can be quite cumbersome since it takes a lot of YAML files to bind the role with the account. Therefore, we use RBAC manager. Using RBAC manager together with Google Groups reduced the number of changes and YAML files a lot. With this approach we only need to maintain one YAML file per cluster and that makes it a lot easier to maintain and understand.

Vulnerability Management

Many container images contain a lot of dependencies which like any other software needs security patches eventually. New vulnerabilities are found regularly and are often found long after the component is released into production. Doing container scanning only in the pipeline is therefore not enough. However, many products are built to help do vulnerability scanning, both open source and commercial.

We’ve recently start using Starboard which is a tool that combines other security tools for tasks such as configuration audit and image vulnerability scanning. Under the hood of Starboard is Trivy, which is a vulnerability scanner, and Polaris which monitor configurations against a defined set of rules. Every time a new pod is created or changed, a scan of the deployment is triggered. It eventually creates a configuration audit report and a vulnerability report in the same namespace as the pod is running. This makes it easy for the responsible team to identify any security issues without the help from the security team.

A summary of a deployment scanned with Trivy is available in the same namespace

The vulnerability reports are built on so-called Custom Resource Definitions (CRD). A more detailed view of the findings can be accessed by viewing the CRD resource but requires access via kubectl (kubernetes commandline). However, the information can become quite overwhelming with many findings, and carving out the important details from a JSON output can be quite exhausting. We’ve quickly found out that we needed to address this. What we did was to create a small Python application with the Python Kubernetes client that checks the reports and publishes the result to a Slack channel. This application also stores the findings in a database where other application can fetch the data and it also helps to keep track of the findings in order to avoid posting the same vulnerability multiple times.

We’re hiring

Sounds interesting? We’re looking for Security Engineers to join our Cybersecurity team! Send an email to security@oda.com or take a look at our open roles.

--

--