Towards a better service development story
How Ironclad moved from monolith to microservices without sacrificing developer experience
Ironclad began as a monolithic Node application deployed on Heroku. For a small team without much time to devote to DevOps, Heroku was a great solution and made it very easy to automatically build and deploy from GitHub. As our team and application grew, we began to want to split portions of our application into separate services. We decided it was time to move off of Heroku, with the hope that we could keep the ease of development and deployment largely intact.
We ultimately settled on deploying our application with Kubernetes, hosting on Google’s own Kubernetes Engine offering. With this approach we build a Docker image for each of our services, and have Kubernetes handle scheduling these containers on a cluster. We can simply tell Kubernetes how many replicas of each service we want running, and it takes care of running and restarting containers as needed, and load balancing traffic across replicas. Deploying on Kubernetes is a breeze. We build and push service images in CI and deploy with a simple kubectl apply
invocation.
We’ve gone through a couple of iterations of our service development environment, and have gotten to a place we’re pretty happy with. In the Heroku days development was pretty straightforward. We ran a local development database and a development build of the monolith. With multiple services, we wanted to avoid developers having to manually orchestrate multiple development builds.
First pass: Docker Compose
Our initial approach after migrating to Docker and Kubernetes was to develop using Docker’s Compose tool. With Compose we could configure a development Docker build for each of our services, including any required environment variables and links to other services. Critically for development, Compose also allows you to “bind mount” local source directories into the running containers, so that each service can rebuild/restart to reflect local changes.
This was a solid setup, and it served us well for a time. As we used it and our codebase grew, we began to experience several pain points with it.
- Build Performance
As mentioned above, we used bind mounts of our local source directories so that development services could reflect local changes. We found that bind mounts in Docker for Mac imposed a significant performance overhead for our builds. It sounds like Docker has done a lot of work on osxfs performance since we moved away from Compose, but at the time we were seeing our initial development builds take several times longer inside the containers than out. - Unnecessary Builds
Typically when developing, we would only be changing one or maybe two services at a time. When we started the stack with Compose however, we would have to do an initial Docker build for every service. This could be somewhat mitigated with Dockerfiles that do a good job of utilizing the build cache. But this was tedious to maintain and often made more difficult by some of the limitations of the Dockerfile language, like a lack of globbing for ADD and COPY commands. - Configuration Duplication
Our Compose configuration had to specify which environment information each service needed, and which ports it listened on. But we already had to supply this information to Kubernetes to configure our services in production. This wasn’t a huge deal, but was an annoying source of duplication.
Minikube + Telepresence
We initially focused on pain point #2. It seemed wasteful to spend time building services you weren’t going to change. We already built and pushed images for each service from CI, so perhaps we could use these instead of building from scratch. This line of investigation lead us to Minikube, which is a tool maintained by the Kubernetes team that makes it easy to run a single-node Kubernetes cluster in a VM on your local machine. With this tool it was very easy to use our existing configurations to stand up a local Ironclad stack running the same images we deploy to production.
This was promising, but we needed to figure out a way to develop on individual services when needed. Luckily, we happened to stumble on a tool called Telepresence. With Telepresence, you can swap out services running in a (local or remote) Kubernetes cluster with ones running locally. It works by swapping the service image in Kubernetes with a proxy image that forwards requests to your local service, and similarly proxies requests from your local service into the cluster. This seemed like exactly what we were looking for, so we started prototyping with it.
Speed bump: SIP
One problem we ran into along the way has to do with how Telepresence proxies requests from your local process back into the Kubernetes cluster. Their main approach works by injecting shared libraries into the local process using the LD_PRELOAD
/DYLD_INSERT_LIBRARIES
environment variables, allowing it to intercept system calls like getaddrinfo
. This works in many cases, but we ran into trouble with how it interacts with a macOS feature introduced in El Capitan called System Integrity Protection. The long story short is that recent versions of macOS implement special protections for executables stored in certain distinguished locations like /bin
and /usr
. These protections include purging the aforementioned dynamic linker variables for these protected binaries, and are in effect even when running as root.
Telepresence does what it can to work around SIP. It makes a temporary copy of some of these protected directories and prepends them to the $PATH
of your local process. Unfortunately, we found that in practice it was very difficult to ensure that our development build processes did not at some point indirect through a process affected by SIP. Many build tools internally use shell scripts that directly reference /bin/sh
or /usr/bin/env
. For example, in the Node world most build tools are executable scripts starting with #!/usr/bin/env node
(including the npm
and yarn
CLIs).
Our solution to this problem isn’t particularly elegant, but it works. If we can’t rely on Telepresence’s proxying to make our local services seem like they’re inside the cluster, we’ll just have to make our whole development machine seem like it’s inside the cluster. It works like this: In development, we configure our services to allocate a NodePort
equal to their service port. This makes those services accessible from outside the cluster, on the IP bound by the Minikube VM. Separately, we have a development script we run when we start the stack. This script does some nice things like use kubetail to tail the logs of all running services. But more importantly, it queries the Kubernetes cluster to get an up-to-date list of all services and the ports they’re listening on, and updates a block in your /etc/hosts
file to map each service name to $(minikube ip)
. This certainly feels like a hack, but it’s been a surprisingly robust solution.
We’ve been using Minikube and Telepresence for our development environment for about 6 months now and have been pretty happy with it. It provides a quick and easy way to get a full stack running locally, and each service’s development build can be fast and simple.
Dylan Scott is a software engineer at Ironclad. Join his team by applying at https://ironcladapp.com/careers.