Securing container borders with cloud functions
Meetup’s server deployment pipeline is moving swiftly in the direction of containerization. This has created some new and interesting problems for us to solve, both in terms of security and platform tooling options. These problems are probably shared with others in our space. This is the story of how we secured our container borders.
We use Docker at Meetup for a number of reasons. It provides a nice, consistent packaging format and allows us to leverage an ever-expanding community ecosystem and set of tooling options.
One problem we had to solve early in our containerization process was finding a host to store our Docker images that provided the right balance of security, flexibility, and ease of use. Docker images can be published to registries. Registries are servers that allow Docker clients to pull images down by name in order to run them. You can think of them has a storehouse for your hard work. If you’ve ever used Docker before, you are probably familiar with the following format for referring to Docker images:
$ docker pull user/image-name
Docker will expand this URI with an implicit default registry host and version tag. The expanded format looks more like this:
$ docker pull registry.com/user/image-name:tag
Here, registry.com identifies the host the Docker client will communicate with in order to negotiate permission to pull down the target image.
We at Meetup are big fans of GCP and Kubernetes. After evaluating existing options including Docker hub and GCR, we decided go with running a private instance of the official registry on GKE. This frees us from needing to manage operations, enables us to have full control over security, and allows us to utilize the familiar deployment model Kubernetes provides.
Since privacy and ease of use were our next concerns, we set out to solve those next.
The official Docker registry provides no built-in security but does define a delegation protocol for implementing security. This protocol looks something like the following:
Lucky for us, a nice production-ready implementation of that protocol exists in the form of a Golang project called Docker Auth. For bonus points, it provides a Docker-ready package option for us to get it up and running in our existing infrastructure with ease.
Next up was configuring Docker Auth for our security needs. This project provides some straightforward examples to follow.
For all new services, we rely on the ability of Kubernetes deployments to pull published Docker images from our private registry. Docker Auth provides a way to define a static ACL listing for pattern matching on client information. Initially this feature seemed sufficient to whitelist our Kubernetes cluster IPs.
After some time, we noticed a recurring problem. Every time we provisioned a new Kubernetes cluster, it required the manual task of updating this IP white list as well as a new deployment of Docker Auth itself. Worse yet, we also routinely tear down clusters, leaving this configuration full of IP addresses that are no longer in use. We weren’t big fans of this kind of tedium, so we set out to come up with a better solution.
Around that time the Docker Auth project added a new feature that would allow you to call out to a script, enabling a form of dynamic configuration. Our initial idea was to call a script which calls out to the GCP API to validate that a given client IP belongs to an instance in our cluster and should be able to perform a registry operation. The defined protocol is simple. Docker Auth uses unix exit codes to communicate access. Docker Auth will call these scripts piping JSON-encoded client information into the script’s standard input stream.
The original idea was flawed because it creates a point of rate limitation: We’d have Docker Auth invoking a stateless script, providing no means for for us to throttle our GCP API access. What we really needed was a persistent cache of IPs to query.
All problems in computer science can be solved by another level of indirection — David Wheeler
We decided it would be better to instead have Docker Auth call a script that invokes an HTTP endpoint that stores a cache of our cluster IPs in memory. We could then translate the HTTP status code into unix exit codes, which Docker Auth can interpret. This script looks something like the following:
We laid out a few options for hosting this IP cache nano service. We have experimented, with good success, with using Google Cloud Functions (still in alpha) to fill in infrastructure gaps, connecting various components of our processes with very low overhead in terms of assembly and operations. As a result, it’s an ideal candidate for prototyping an idea before committing to it.
Our cloud function simply periodically polls the GCP API to collect the current state of cluster IP addresses for all of our GCP projects. We export an HTTP handler interface that expects the same client JSON Docker Auth pipes to the script which then performs a lookup in this IP cache.
This solution has worked great for us. Having this solution in production has freed our team up to focus on more important work as we no longer have to think about or spend time manually maintaining a static list of IP addresses for a dynamically changing set of hosts. We’ll leave that kind of work to the machines. I highly recommend automating security in your process where possible early on.
We have since extended this for facilitating automatic IP-based security for use with AWS as we currently have some multi-cloud needs.
Google Cloud Functions provided a great way to implement a sustainable, reliable, and managed solution to secure our container borders. We care as much about development velocity as we do security. In future posts, we’ll describe our pipeline for getting services from development into production using our locked down registry as an integration point.