Kubernetes Snapshot Deploys

Opendoor Labs
Open House
Published in
4 min readJun 14, 2017

--

By Brian Martin

As engineers and data scientists, we need tight feedback loops when developing new models and analyses. We also want all the data we can muster. Here at Opendoor, tweaking our models means backtesting and validating over years of data.

Deploying code is a key part of developing and validating model changes. Sometimes we’d even like to deploy code to a personal environment before committing or going through CI. This allows us to run custom backtests on the full dataset.

Here we’ll walk through a simple but effective technique we’ve been using for getting our “in-the-loop” Docker containers built and deployed quickly.

But isn’t this done already?

When we started... no. But the Deis team (now at Microsoft) announced Draft last week at CoreOS Fest 2017 (I was there!).

They say it best in their README:

Draft targets the “inner loop” of a developer’s workflow: as they hack on code, but before code is committed to version control.

Draft makes it simple to upload code to a remote server where the Docker image is then built and pushed to your registry. We’ll refer to these as snapshot deploys.

This is a great idea! But is there a simpler way than adopting a new tool? Yes!

Entering the Picture: Snapshot Deploys

Snapshot Deploys Overview

The flow for our snapshot deploy is:

  • edit code locally
  • rsync code to our build-server (a Kubernetes pod)
  • build and push the Docker image from the build-server
  • use that published image to do a deploy

But why go through the trouble of remote builds vs.just building the image locally?

A few niceties:

  • Bandwidth constraints aren’t an issue. By doing the Docker build on the Kubernetes cluster, we don’t need to rely on our local internet connection.
  • Shared Docker layer caching. The build server has a shared cache across all users doing builds. So when we update a dependency version or modify the Dockerfile only one person has to wait for those layers and every subsequent build is faster.
  • No local Docker required.

This allows most of our snapshot deploys to take less than a minute from start to finish.

Set Up the Build “Server”

Our server is really just an Ubuntu container with rsync and docker installed. Here is the Dockerfile for our build server.

FROM ubuntu:16.04

RUN apt-get update -qq && \
apt-get install -y --no-install-recommends \
build-essential \
curl \
rsync \
apt-transport-https \
ca-certificates \
software-properties-common && \
rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://apt.dockerproject.org/gpg | apt-key add - && \
add-apt-repository \
"deb https://apt.dockerproject.org/repo/ \
ubuntu-$(lsb_release -cs) \
main" && \
apt-get update -qq && \
apt-get install -y \
docker-engine && \
rm -rf /var/lib/apt/lists/*

Once that is built and pushed, we can deploy using the following Kubernetes config:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: build-server
spec:
replicas: 1
template:
metadata:
labels:
app: build-server
spec:
containers:
- name: build-server

# The image from the above Dockerfile.
image: 'opendoor/build-server:0ae5649'

# Just a busy loop which gives us some logging output.
command: ['bash', '-c', 'while true; do (set -x; sleep 60); done']

volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock

# This volume gives our container access to the
# Docker daemon which is running on the underlying
# k8s node.
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock

Set up a Client

Our client is a shell script that a developer can execute to perform a snapshot deploy. Its responsibilities are

  • rsync our code to the remote build “server”
  • trigger a remote build and push
#!/bin/bash -eux
set -o pipefail

# Just an example, you'll likely want to read this as an argument.
IMAGE="opendoor/our-repo:1"

BUILD_SERVER_POD="`kubectl get pod -lapp=build-server | sed -n '2p' | awk '{print $1}'`"
FROM="./"
TO=./efs/kube-registry-working-dirs/"$USER/`basename $PWD`"

kubectl exec -it $BUILD_SERVER_POD -- mkdir -p "$TO"
rsync \
-av \
--blocking-io \
--rsh="kubectl exec -i $BUILD_SERVER_POD" \
--exclude=".git" \
--delete
--delete-excluded \
-- \
"$FROM" \
--:"$TO" 1>&2

# Note: the ending there prepends all the lines with "REMOTE" to ensure the end-user understands what's going on where.
kubectl exec -i $BUILD_SERVER_POD -- bash -c \
"cd $TO && docker build . -t $IMAGE && docker push $IMAGE" \
2> >(sed 's/^/REMOTE: /' >&2)

Conclusion

Snapshot deploys are really useful for deploying code quickly, especially when developers require remote resources. One great example is the parallel compute cluster (Dask) we use in our stack.

It’s also useful for developing APIs and hosting uncommitted prototypes.

There are now tools for helping with this, but sometimes it’s worthwhile and simple to roll your own!

If you are interested in building data platforms for modeling the largest asset class in the country, come check our job listings!

--

--

Opendoor Labs
Open House

Engineers and Data Scientists at Opendoor. Modernizing the real-estate transaction. https://labs.opendoor.com