Blue/Green Deployments using Helm Charts


I am big fan of Labels and Selectors feature Kubernetes offers. K8s Services use them to target Pods which enables diverse use cases for traffic routing. In this article, I will demonstrate how we can leverage this mechanism to implement Blue/Green deployment strategy. I will use Helm charts for deployment. We will see that how with a little Helm templating trick we can enable this scenario.


Labels and Selectors : Basics

Labels are key/value pairs which can be attached to any K8s resource as metadata. K8s makes no assumption on their semantics and usage. Labels can be used to organize and select subset of objects.

Labels:
key1=value1
key2=value2

Services use Selectors to target the Pods with these labels:

Selector: key1=value1,key2=value2

For comprehensive documentation on Labels and Selectors please refer here.

Blue/Green deployment strategy:

Blue/Green strategy is used to minimize downtime for production services. Its typically achieved by keeping two identical production environments[aka Blue/Green OR Prod/Stage], only one of which is serving live production traffic. When a new release has to be rolled out, its first deployed in the Stage environment. When all the testing is done, the live traffic is moved to Stage which becomes the Prod environment , and current Prod environment becomes the Stage. There is added benefit of a fast rollback by just changing route if new issue is found with live traffic.

In following section we see how we can achieve this for Kubernetes deployments using Helm chart.

Topology

Slots: Here we call the two environments Slots, i.e. blue slot and green slot. This is a virtual concept which we realize by attaching a Label {slot=blue|green} on the Deployments.

Prod and Stage Service: We have two Services, i.e. service-prod and service-stage, one for serving Production traffic, other for Staging. These are connected through one Ingress Controller. Ingress controller is configured to route requests on primary domain to service-prod and stage domain to service-stage.

The services target pods in one of two Slots by using slot as label selector. Logic for determining slots for the two services is driven by Chart variable productionSlot. The service-prod always points to the productionSlot, and service-stage points to opposite slot of production slot.

In this example, service-prod has slot=blue, so all production traffic is going to the deployment in blue slot.

Initial State: Production traffic served by pods in Blue Slot

Release Process

At the start of new release, we read the current value of productionSlot and deploy the release in the other slot. Following script does this operation:

currentSlot=`(helm get values --all yourReleaseName | grep -Po '  productionSlot: \K.*')`
if [ "$currentSlot" == "blue" ]; then
newSlot="green"
else
newSlot="blue"
fi

Now deployment to the newSlot can be done with following helm command:

deploymentOption=$newSlot.enabled=true
helm upgrade yourReleaseName repo/blue-green --set $deploymentOption --reuse-values

Following diagram depicts the state at this point. The green circles represent the new pods deployed with Label slot=green. Now all test workloads can target stage.domain.com for verification.

Stage 1: New service version is deployed to Green Slot. No Production traffic on it yet.

Once the new deployment meets all quality bars it can be promoted to serve production traffic using following helm command:

deploymentOption=productionSlot=$newSlot
helm upgrade releasename repo/blue-green --set $deploymentOption --reuse-values

This flips the slot Labels on services with service-prod pointing to green slot. Now all the production traffic starts going to the new version in green slot.

Stage 2: Service Slots swapped. Prod now points to new service version

At this point, we can [optionally] delete the deployment in blue slot. Again we query helm for retrieving the value of productionSlot and delete deployment in the other slot.

deploymentOption=$oldSlot.enabled=false
helm upgrade yourReleaseName repo/blue-green --set $deploymentOption --reuse-values

Following diagram illustrates the current state, i.e. blue slot is now empty.

Stage 3: Old service version is deleted [This is optional].

Notice this state is equivalent to where we started, except for Blue and Green slots have switched their roles. Now the next release can again start with blue slot for staging purpose.

Hope this is helpful. The helm chart for this is present at GitHub here. Happy Helming!!!