Kubernetes Canary deployment with Linkerd & Flagger

Cyril Becker
Mar 20 · 6 min read

In this article i’m going to introduce you to one useful technique for delivering automatically web application serenely : the Canary deployment method.

What is Canary deployment ?

Canary is a type of deployment method that reduce the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics like Http request success rate and latency. It allows do to capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. By switching slowly the traffic, you can monitor and capture metrics about how the new version impact the production environment.

How Canary work with Linkerd and Flagger ?

Linkerd is responsible of collecting application metrics and Flagger is responsible of executing the automatic deployment delivery. The job of Flagger is to create a Canary pod with the new release and send a little amount of traffic to determine the success from the Linkerd metrics. If the Canary pod is working correctly, Flagger trigger the deployment update.

What is Linkerd ?

Linkerd is a service mesh for Kubernetes, it gives you monitoring on you services without requiring any changes in your applications. Linkerd stored by default the monitoring metrics in a Prometheus Tsdb and include a web user interface.

What is Flagger ?

Flagger is a progressive delivery operator for Kubernetes, it is designed to give you confidence in automating deployment releases with progressive delivery techniques. It support many tools including Linkerd and other deployment method like A/B testing or classic Blue/Green.

Our use case : hosting a Node.js Api

To demonstrate how to implement Canary, i will take a simple common use case : hosting a Node.js Api written by myself.

Requirement

The Test Api

The test Api is pretty simple, it just return a Hello World message with the actual Api version : Hello v1 . My Node.js source code and Dockerfiles are stored on my Github here if you want to take a look.

For this test i have built 3 Docker images with tags reflecting 3 releases of my Api. My Docker images are available publicly on Dockerhub :

  • cyrilbkr/testapp:1.0
  • cyrilbkr/testapp:2.0
  • cyrilbkr/testapp:3.0

The 3.0 release return a 404 error, it’s use for simulating a bug in the application for showing you how the automatic rollback is working.

Linkerd & Flagger setup

Create a namespace called Linkerd then install Linkerd with the Cli tool and setup Flagger with kubectl :

Look in the documentation for more information or to customize your Linkerd & Flagger setup if needed. For example, by default Linkerd is shipped with a Prometheus server but you can plug your own already existent Prometheus.

Also don’t forget to expose by yourself the webui with an ingress definition or use portforward on your local pc.

Deploying our V1 Api on Kubernetes

We will deployed our Api on Kubernetes based on a traditionnal deployment & service configuration.

Now it’s time to define what are the parameters of our Canary, in my case i want to test if my new release http request test are higher than 99% of success during one minute. You can use other parameters to define success like latency.

  • Deploy all the yaml files with kubectl and check the status

As you can see our deployment is now called api-primay, it’s because the canary configuration takes over the initial deployment.

On the Linkerd webui you can now see http service monitoring in real time in the namespace testapi

You can check in Grafana that the Api is working properly and return a http request success rate of 100%. Success rate is calculated by sending http request automatically from the load generator to our Api as defined in the flagger.yaml : — name: QPS value: “10”

Upgrading to V2 using Canary progressive delivery

  • Update the Docker image tag for starting the delivery

A new single pod independent from the production one with the new Docker image is deployed, this is the Canary :

Flagger start switchting the traffic by 10% :

Then after verify success rate, it switches the traffic 10% per 10% to the Canary

After switching 100% of the traffic to the new release, the old deployment and the original Canary pod are terminated

Upgrading to V3 containing an error

Now we will deploy a new release of our Api (3.0) containing a 404 error to check how the system will reject this new release and don’t deliver it in production.

Update the Docker image tag for starting the delivery :

Flagger start to send 10% of the traffic to the Canary, API respond 404 :

As you can see in Grafana, the http success rate is null due to only 404 errors from the Canary.

After reaching the threshold setup earlier in the configuration, the system put the delivery in rollback state, reroute the 10% of traffic to the production service and destroy the Canary.

Conclusion

Canary delivery with Linkerd & Flagger is a powerful technique that reduce errors by automatically ensure your application is working before delivering it in production. It give you also tools to monitor in real time what’s happen during a deployment.

Reference

Linkerd : https://linkerd.io/2/tasks/canary-release/
Flagger : https://docs.flagger.app/tutorials/linkerd-progressive-delivery

alterway

Learn about Alter Way’s engineering efforts

Cyril Becker

Written by

CTO | Alter Way Cloud Consulting

alterway

alterway

Learn about Alter Way’s engineering efforts

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade