Blue-green rolling redeployments in Docker Cloud with Harrow.io
Docker Cloud has helped my team to centralize the deployment flow, which previously consisted of compiling jar
files on dev machines, SSH’ing into EC2 instances and manually uploading those.
Sequential deployment helped with general availability, but while the container reported as being ready, the application inside the container still needed startup time, about 60 seconds of migrations and configuration. Waiting for this to have finished would result in a zero downtime setup, but from the Docker Forums, rolling redeployments was not natively possible.
Our goals were to have a zero downtime deployment with no manual steps or ops flipping the switch. Ideally this would include ahealth monitor, rolling back a previous version when a newer version unexpectedly introduced breakage.
Martin Fowler wrote a piece on blue-green deployments, an approach and a sentiment that was echoed by Tutum, the product that eventually turned into Docker Cloud. Netflix has a longer piece on deployment. This includes something similar, which they refer to as red/black push.
Our deployment workflow starts from a master branch that when committed to in Bitbucket, is automatically pushed to Docker Hub and tested there. We also used CI for front-end apps using Bitbucket Pipelines. A success trigger from the tests potentially redeploys the image in Docker Cloud. Tooling that fits into this CI/CD workflow, connecting to this infrastructure, would be the ideal choice.
Enter Harrow.io. It is a task runner to execute tests and deployment. While it’s use case of hosted Capistrano is the least effort to set up, it’s almost effortlessly extensible to other deployment scenarios. In our case we wanted to do the following:
- when a new image of our backend is available on Docker Hub, start a redeployment
- spin up new containers and execute health checks on them until those return successful
- switch over load to the new containers
- turn off the old containers
We set up a bash script to do just that. It installs Python to get the Docker Cloud CLI, then executes commands step by step until a full switch-over is done.
Important to note is that dockercloud/haproxy
instances do the load balancing for us, and we configured VIRTUAL_HOST
on two separate services to be blue.backend.example.org
, green.backend.example.org
plus both listening to backend.example.org
. It would be possible to group this in a single service and track containers as they spin up. For us, this are planned in future improvements.
The full deployment script is contained below.
Slightly confusingly, we also tagged two services as GREEN
and BLUE
. Those are just for us to refer to different services and URLs and don’t correspond to the blue (container ready, app not ready) or green (container and app ready) status of a deployment.
The following environment variables need to be configured
DOCKERHUB_USER,
DOCKERHUB_PASS
for login credentials,DOCKERCLOUD_NAMESPACE
for any organisation name that this is hosted under.GREEN_INSTANCE
andBLUE_INSTANCE
being aliases for the Docker services deployed in Docker CloudGREEN_URL
andBLUE_URL
being corresponding health endpointsTARGET_NUM_CONTAINERS
being the target amount of containers. For us this is 2 for staging, 3 for production.
This is a sample rundown:
We configured this to run every time that a webhook gets called, which is notified by by Docker Hub. It also reports success and failure reports to Slack.
What strategies are you using for zero downtime deployment? Do you have comparable scripts to also do automated tagging and rolling back? Did I over-complicate matters or did I take too many shortcuts? Other glaring oversights? Any feedback is welcome.