Reducing the blast radius during application deployment

James Mak
Airwalk Reply
Published in
4 min readFeb 8, 2022
Photo by Dane Deaner on Unsplash

In the world of continuous integration (CI) and continuous deployment (CD), it is not uncommon to have dozens of minor application releases within a single day. If you are a believer in Murphy’s Law, I hope you don’t disregard the impact of such minor releases to your production operation, even though you have carefully crafted different automated tests to ensure a high level of code quality and successful builds. There will always be something failing that you didn’t expect.

So what can be done to minimize this risk? Canary testing.

The term canary testing originated in coal mining. Back to the old days, coal miners would take a canary along with them in the mine to indicate the presence of odorless but toxic gases. If the canary died, the miners knew the lethal gas was building up to dangerous levels and it was time to evacuate.

I.T. has borrowed the terminology. Canary deployment is a pattern for rolling out releases to a subset of users or servers, observing the outcome, roll out to the remaining if all is good, withdraw if the outcome is not as expected.

I am going to introduce two approaches to achieve canary deployment.

  1. By Kubernetes controls.
  2. By Istio controls.

Both have their merits and drawbacks.

Kubernetes controls

Simply speaking, you can perform a canary deployment by deploying your new app simultaneously with existing running app. The trick is to employ the same .spec.selector.matchLabels in your old and new app deployment. As a result, your Kubernetes service, relying on .spec.selector to find its endpoint will distribute its requests to these two deployments accordingly.

Below is an example, trying to deploy two versions of nginx at the same time. The result is new deployment pods will run along with old deployment pods. The traffic will be split between version 1 and version 2 pods (as my-service .spec.selector will pick those pods with the label app: my-proxy
) based on the load balancing feature of Kubernetes service.

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-proxy-v1
spec:
replicas: 3
selector:
matchLabels:
app: my-proxy
template:
metadata:
labels:
app: my-proxy
version: v1
spec:
containers:
- name: nginx
image: nginx:v1
ports:
- containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-proxy-v2
spec:
replicas: 3
selector:
matchLabels:
app: my-proxy
template:
metadata:
labels:
app: my-proxy
version: v2
spec:
containers:
- name: nginx
image: nginx:v2
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-proxy
ports:
- protocol: TCP
port: 80
targetPort: 80

In the above example, each deployment will spawn three pods, which means the chance that traffic is routed to old and new pods is 50–50 (because the ratio of number of new and old instances is 1:1) . You can adjust the old-to-new traffic distribution ratio by scaling the number of replicas for these deployments. The drawback is the increased infrastructure cost. And also, if your infrastructure capacity supports only three pods in total, you can’t achieve something exactly like 25% traffic to new pods and 75% to old pods. When you feel comfortable with the outcome of the canary deployment, you can route all incoming traffic to the new version by deleting the v1 deployment.

Istio controls

Istio is a good partner to Kubernetes. To achieve canary testing, we send traffic to a virtual service host which further routes to different versions of a service (service subsets). The percentage of traffic sent to each version is defined by virtual service rules. In our previous example, 25% of requests go to the new version so we create a canary rollout and gradually increase the weight (percentage of traffic) that’s sent to the new service version until it reaches 100%. The traffic routing doesn’t depend on the number of instances implementing the old and new service version. Therefore we can scale up and down based on traffic load without worrying about traffic routing as we did in the above Kubernetes example.

Istio virtual service and destination rule example.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: my-service-vs
spec:
hosts:
- my-service-vs
http:
route:
- destination:
host: my-service.default.svc.cluster.local
subset: v1
weight: 75
- destination:
host: my-service.default.svc.cluster.local
subset: v2
weight: 25
----apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: my-service-dr
spec:
host: my-service.default.svc.cluster.local
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2

Which Deployment Strategy Should I Use?

Now that we know different deployment techniques, which deployment strategy should I use? The answer depends on the type of application you have and your target environment.

It will be wise to employ Istio to save cost if you have a large farm of pods. Istio also gives you the flexibility and more granular control over service routing, in addition to chaos testing. However, if you are concerned about system complexity, using Kubernetes service/deployment resources to control may be more suitable as it requires less administration effort than Istio.

Overall, canary deployment allows organisations to test app release in a low risk-prone environment and compare different service versions side by side. It is fast to rollback to a previous version of an application, just by deleting the new Kubernetes app deployment or adjusting the service subset weight.

--

--