Building and Shipping at Ludicrous Speed

Greg Poirier
The Opsee Blog
Published in
4 min readJan 27, 2016

--

Organic and rapid growth of a codebase, number of repositories, and number of employees will undoubtedly lead to a Kafkaesque nightmare of build processes and runtime environments. Every repository becomes a snowflake, and moving from one to another necessitates considerable context-switching. Getting reproducible builds locally and in CI can be next to impossible. This is a clear indicator that it’s time to standardize build processes.

We jumped into a hell-storm of mistakes when we began using containers, as one does.

At Opsee, we were using various combinations of Make and Bash to cobble things together, and it “worked”, but when we started having difficult-to-diagnose issues with the Rube-Golderbergian build of our main Go project, we decided to take a step back.

Meager Beginnings

The heart of monitoring at Opsee is the Bastion Instance. It is a semi-autonomous agent that we manage remotely to check the health of services you’ve deployed to AWS or various AWS services you may be using (e.g. RDS, ElastiCache, ELB). It started out as a monolith, but was quickly broken apart into a collection of microservices which reside in a common Go repository. Along with that project are a supervisor (systemd) and a message bus (NSQ). A single Docker image contains all of the necessary binaries and each service is spawned in an individual container.

Given the heavy investment in microservices and containers (a conscious architectural decision, I swear), we realized we would need to be extremely good at building Go and shipping containers. We had done both for a while, but the process became brittle and unstable. We were fighting with build environments in CI, managing dependencies locally, and getting predictable and stable builds from one commit to the next. The containers you build on your laptop should be identical to those that are running in production. That’s the idea, right? There should be no guessing about it.

CI shouldn’t be the only place you can produce production artifacts.

When you’re under pressure, and you have to diagnose a production issue without actually touching production, you can stand up an entire environment identical to it on your laptop in seconds.

Build in Containers

We jumped into a hell-storm of mistakes when we began using containers, as one does. With no standardization, every project had its own hand-crafted Dockerfile with who-knows-what ending up in the containers being run in production. This cannot be. We started to explicitly outline our goals for containerization.

The method you use to build your development container on your laptop should yield a runtime environment identical to the one in production.

Do not ship a build environment in runtime containers.

We have separate build and runtime containers. In order to both do this well and easily onboard new employees into The Truth and The Way, we needed a standardized build container that allowed for project-specific build adjustments. Out of this desire came three Docker images:

Getting there required some initial constraints to be placed on our
Go and Clojure projects. For example, all of our Go projects build with gb. We are okay with those constraints, for now, because it means building a project is as “simple” as running a container with your source in a Docker volume.

docker run -v `pwd`:/build quay.io/opsee/go-build

The container will run gb test and gb build, necessitating passing tests before emitting an artifact. After the container exits, your binaries are placed in target/$os/$arch.

Minimize the size of containers shipped to production.

We wanted to minimize the size of our deployed containers, but also wanted to allow for some flexibility (e.g. having a shell and simple debugging tools in production). For this, we chose the gliderlabs/alpine Docker image. Alpine Linux is a minimal Linux distribution that is easy to use in a Docker container. Jeff Lindsay and Glider Labs have been kind enough to maintain this Docker image for the community. The base image is only 5MB, and that with our statically-linked Go binaries means anywhere from 8–50MB deployed images. It could be smaller, but we wanted the same base image for both build and runtime environments. Keeping the image small means we keep our S3 storage and inter-region transfer costs down. Hooray!

Each service has its own Dockerfile, but we have a couple of simple base images that include common utilities we use in production. This gives service owners the flexibility to determine what they want in their runtime containers.

Local test/build execution must be fast (on the order of seconds).

Our initial concerns with Docker were slow builds. The last thing we wanted to do was foment rebellion during developer on-boarding by having their builds take a long time. What that generally means is that both tests and docker run must execute quickly. Getting there is part development methodology, which we won’t discuss here, and part Dockerfile/container optimization.

By using a common build image, we minimize the amount of time it takes to go from initial git checkout to running tests and compiling. After the initial image download, the cached image will be used at compile time. Local build times are on the order of a few seconds, but because CI (in our case CircleCI) does not give us the ability to cache images between builds, our CI build times are on the order of 2–10 minutes.

So Far, So Good

Getting to this point has involved a considerable amount of learning, trial and error, and failure. Our process continues to evolve, and we continue to learn. The nice part about our technological choices is that every single component is easy to remove and replace with something else when it’s shown to no longer be viable. Personally, this is my first time as the first hire at a startup — and this methodology has proven to be invaluable.

--

--