Blue-green rolling redeployments in Docker Cloud with Harrow.io

Docker Cloud has helped my team to centralize the deployment flow, which previously consisted of compiling jar files on dev machines, SSH’ing into EC2 instances and manually uploading those.

Sequential deployment helped with general availability, but while the container reported as being ready, the application inside the container still needed startup time, about 60 seconds of migrations and configuration. Waiting for this to have finished would result in a zero downtime setup, but from the Docker Forums, rolling redeployments was not natively possible.

Automated builds in our Docker Cloud infrastructure.

Our goals were to have a zero downtime deployment with no manual steps or ops flipping the switch. Ideally this would include ahealth monitor, rolling back a previous version when a newer version unexpectedly introduced breakage.

Martin Fowler wrote a piece on blue-green deployments, an approach and a sentiment that was echoed by Tutum, the product that eventually turned into Docker Cloud. Netflix has a longer piece on deployment. This includes something similar, which they refer to as red/black push.

Red/Black Push workflow from Netflix.

Our deployment workflow starts from a master branch that when committed to in Bitbucket, is automatically pushed to Docker Hub and tested there. We also used CI for front-end apps using Bitbucket Pipelines. A success trigger from the tests potentially redeploys the image in Docker Cloud. Tooling that fits into this CI/CD workflow, connecting to this infrastructure, would be the ideal choice.

Enter Harrow.io. It is a task runner to execute tests and deployment. While it’s use case of hosted Capistrano is the least effort to set up, it’s almost effortlessly extensible to other deployment scenarios. In our case we wanted to do the following:

  1. when a new image of our backend is available on Docker Hub, start a redeployment
  2. spin up new containers and execute health checks on them until those return successful
  3. switch over load to the new containers
  4. turn off the old containers

We set up a bash script to do just that. It installs Python to get the Docker Cloud CLI, then executes commands step by step until a full switch-over is done.

Important to note is that dockercloud/haproxy instances do the load balancing for us, and we configured VIRTUAL_HOST on two separate services to be blue.backend.example.org , green.backend.example.org plus both listening to backend.example.org . It would be possible to group this in a single service and track containers as they spin up. For us, this are planned in future improvements.

The full deployment script is contained below.

Slightly confusingly, we also tagged two services as GREEN and BLUE. Those are just for us to refer to different services and URLs and don’t correspond to the blue (container ready, app not ready) or green (container and app ready) status of a deployment.

The following environment variables need to be configured

  • DOCKERHUB_USER, DOCKERHUB_PASS for login credentials, DOCKERCLOUD_NAMESPACE for any organisation name that this is hosted under.
  • GREEN_INSTANCE and BLUE_INSTANCE being aliases for the Docker services deployed in Docker Cloud
  • GREEN_URL and BLUE_URL being corresponding health endpoints
  • TARGET_NUM_CONTAINERS being the target amount of containers. For us this is 2 for staging, 3 for production.

This is a sample rundown:

Sample deployment for staging

We configured this to run every time that a webhook gets called, which is notified by by Docker Hub. It also reports success and failure reports to Slack.

What strategies are you using for zero downtime deployment? Do you have comparable scripts to also do automated tagging and rolling back? Did I over-complicate matters or did I take too many shortcuts? Other glaring oversights? Any feedback is welcome.