Flagger — Canary Deployments Tutorial

Sirishagopigiri
CodeX
Published in
15 min readJun 13, 2021
Photo by Fotis Fotopoulos on Unsplash

Flagger — a simple and easy progressive delivery tool, which can be used to roll out a new version of applications running on Cloud-native platforms like Kubernetes or GKE or EKS, or Alibaba Cloud. It integrates with service mesh tools like Istio or Linkerd or Gloo or Contour, to understand the app metrics and perform the rollout gradually.

The canaries are a kind of bird used in coal mining to detect toxic gases and alert the miners, in similar terms canary deployment is used as a strategy to rollout microservice-based applications from one version to another.

Flagger is a Kubernetes operator developed to help in canary deployments of the application. It helps in moving the microservice-based application to a new version in a cloud-native-based environment or simple terms it facilitates for provision of the application from one image to a new docker image. To do this it keeps checking on the Service Level Agreements(SLAs) defined, for a certain amount of time and gradually rolls out in case if they are met or rollback in case of failure scenarios.

The internal mechanics on which Flagger runs is whenever an application is scheduled for a change to a new version, it starts reading the metric values defined from the service mesh tools, and based on the SLAs it takes an action either to route the traffic to the new version or mark the rollout as failed and retain the old version of the application in the echo-system.

Flagger runs as a deployment kind in the service mesh namespace that the user wants to integrate to, which can be Istio, Linkerd, Gloo, etc., It comes with a CRD object called ‘Canary’ which helps the user in specifying different parameters like the name of the application it has to look out for rollout, the SLA definitions. It also has a webhook option that can be used to send notifications or use to run some load testing traffic at the time of rollout. It also defines different parameters specifying how much time it has to monitor the SLA before rolling out the application. Additionally, Flagger can be installed with Grafana enabled to monitor the traffic flow when the rolling out happens.

Flagger supports different types of deployment strategies —

  • Canary testing — slowly shifting the traffic to a new version
  • A/B testing — routing to different versions based on HTTP header or cookie data
  • Blue/Green traffic — to run without service mesh — it integrates simply with the Prometheus operator which reads the metrics in the cluster
  • Blue/Green mirroring — Can be used to route traffic to both versions of the application

In this blog, we will explore the canary-based deployment of a simple python flask app to see how flagger works. We will install flagger on the Kubernetes cluster which has Istio enabled.

Prerequisite

  • Ubuntu 20.04 OS
  • Docker 20.10 version
  • Kubernetes cluster — Single node with Calico CNI is used here — 1.21.1 version
  • Helm binary — Refer here for installation

Installing Istio and Flagger

We will now install Istio using the istioctl CLI. Download and install the binary using the below commands. Later we will use it to bring up the Istio service mesh on the cluster. We are using the 1.10.0 version of Istio here.

# To download istioctl$ curl -L https://istio.io/downloadIstio | sh -
$ cd istio-1.10.0
$ export PATH=$PWD/bin:$PATH
# To install Istio using istioctl on the kubernetes cluster$ istioctl install --set profile=demo -y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Egress gateways installed
✔ Installation complete Thank you for installing Istio 1.10. Please take a few minutes to tell us about your install/upgrade experience! https://forms.gle/KjkrDnMPByq7akrYA
# Install prometheus
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.10/samples/addons/prometheus.yaml
# Check if all the pods are running in istio-system namespace are in running state in the kubernetes cluster$ kubectl -n istio-system get pods
NAME READY STATUS RESTARTS AGE
istio-egressgateway-55d4df6c6b-vn2gx 1/1 Running 0 2m58s
istio-ingressgateway-69dc4765b4-4p2vj 1/1 Running 0 2m58s
istiod-798c47d594-7nldj 1/1 Running 0 6m34s
prometheus-8958b965-z4bkc 2/2 Running 0 5m

Once we have the Kubernetes cluster with Istio running we can use helm to install the Flagger.

# Adding flagger repo to helm and installing flagger CRD$ helm repo add flagger https://flagger.app
$ kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml

# Installing flagger in istio's namespace
$ helm upgrade -i flagger flagger/flagger --namespace=istio-system --set crd.create=false --set meshProvider=istio --set metricsServer=http://prometheus:9090
# Enabling grafana
$ helm upgrade -i flagger-grafana flagger/grafana --namespace=istio-system --set url=http://prometheus.istio-system:9090 --set user=admin --set password=change-me
# Enable port-forwarding to access grafana on localhost:3000
$ kubectl -n istio-system port-forward svc/flagger-grafana 3000:80
# Check pods in istio-system namespace
$ kubectl -n istio-system get podsNAME READY STATUS RESTARTS AGE
flagger-5c49576977-fvtgl 1/1 Running 0 3m28s
flagger-grafana-77b8c8df65-rszqm 1/1 Running 0 75s
istio-egressgateway-55d4df6c6b-vn2gx 1/1 Running 0 21m
istio-ingressgateway-69dc4765b4-4p2vj 1/1 Running 0 21m
istiod-798c47d594-7nldj 1/1 Running 0 24m
prometheus-8958b965-z4bkc 2/2 Running 0 26m

Deploying Python Flask app

Here is an example of the python flask app, clone the repository to deploy the app in the Kubernetes cluster.

# Clone python flask app$ git clone https://github.com/SirishaGopigiri/python-flask-app.git$ cd python-flask-app

Enable Istio sidecar in the default namespace and then deploy the app using the deployment.yaml file in the default namespace. Alternatively change the YAML file according to the cluster spec if needed.

Please note: All the YAML files used in this blog are available in the github repo

# Enable side car injection from istio in default namespace$ kubectl label namespace default istio-injection=enabled
# Deploying the application in kubernetes
$ kubectl apply -f deployment.yaml
deployment.apps/appdeploy created
service/appdeploy created
# Check pods and service in default namespace$ kubectl get pods
NAME READY STATUS RESTARTS AGE
appdeploy-7dcf9786cc-b2mjx 2/2 Running 0 2m17s
# Check service status in default namespace$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
appdeploy ClusterIP 10.102.160.160 <none> 5000/TCP 2m28s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h47m
# Test if service is accessible$ kubectl run -i -t nginx --rm=true --image=nginx -- bash# Once in the container execute the below commandsroot@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# exit

Istio’s ingress gateway will be used by canary to access the service. Use the below commands to patch istio-ingressgateway to NodePort.

# patch istio-ingressgateway to nodeport$ kubectl patch svc -n istio-system istio-ingressgateway --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"}]'

Now let us create a Virtual Service and Gateway to access the application from istio’s gateway service. Use the below YAML files.

# gateway.yamlapiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: appdeploy-gateway
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
# virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: appdeploy
spec:
hosts:
- "*"
gateways:
- appdeploy-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
port:
number: 5000
host: appdeploy
# Create the resources$ kubectl apply -f gateway.yaml
gateway.networking.istio.io/appdeploy-gateway created
$ kubectl apply -f virtualservice.yaml
virtualservice.networking.istio.io/appdeploy created

Once resources are created, extract the node port for the istio’s ingress-gateway service and try accessing the application.

# Node port for ingress-gateway$ export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')# Test application
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/"
hello world!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!

Creating Canary CR for python app

Once we have the python flask-app deployment running, now let’s configure it with the canary CRD. Use the below YAML file.

# Copy the below yaml file and make necessary changes according to your deployment### canary.yamlapiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: appdeploy
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: appdeploy # deployment name
service:
# service port number
port: 5000
gateways:
- appdeploy-gateway
hosts:
- "*"
analysis:
# schedule interval (default 60s)
interval: 1m
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary percentage (0-100)
maxWeight: 50
# canary increment step percentage (0-100)
stepWeight: 10
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
thresholdRange:
min: 99
interval: 1m
- name: request-duration
# maximum req duration P99
thresholdRange:
max: 500
interval: 30s
# testing (optional)
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.default/
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' http://appdeploy-canary:5000/return_version"
- name: load-test
url: http://flagger-loadtester.default/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://appdeploy-canary:5000/return_version"

From the above YAML file, we can see that we are increasing the traffic gradually by 10% for every 1 minute and once the 50% weight has reached we are want to declare the new version of the app has met the SLAs and want flagger to perform rollout. We are also calculating if the SLAs for every once minute and based on that the traffic progression happens. We are using the sample load tester from the flagger to generate load when the canary analysis is happening. Create the canary.yaml file with the required configuration.

# Before creating canary we need to delete the virtual service, as it will now be managed by the flagger from the above canary.yaml file$ kubectl delete -f virtualservice.yaml
virtualservice.networking.istio.io "appdeploy" deleted
# Creating canary.yaml$ kubeclt apply -f canary.yaml
canary.flagger.app/appdeploy created

Once created wait for some time for the flagger to read the deployment and create the resources to manage the canary. It will create the following resources

  • Scale down current deployment to 0
  • Brings up new deployment appdeploy-primary
  • Existing appdeploy service will point to primary deployment. (Check using kubectl get endpoints )
  • Creates two new services appdeploy-primary and appdeploy-canary used to route traffic while doing canary analysis
  • Creates a virtual service to distribute the weight between primary and canary services when doing canary analysis.
# Check canary CRD
$ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
appdeploy Initialized 0 2021-06-13T16:16:35Z
# Check pods see the difference in names$ kubectl get pods
NAME READY STATUS RESTARTS AGE
appdeploy-primary-79595548f6-pcq9t 2/2 Running 0 118s
# Check service
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
appdeploy ClusterIP 10.102.160.160 <none> 5000/TCP 5m12s
appdeploy-canary ClusterIP 10.100.219.243 <none> 5000/TCP 2m12s
appdeploy-primary ClusterIP 10.110.17.144 <none> 5000/TCP 2m11s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h50m
# Check virtual service$ kubectl get vs
NAME GATEWAYS HOSTS AGE
appdeploy ["appdeploy-gateway"] ["*"] 92s
# Check deployments$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
appdeploy 0/0 0 0 5m49s
appdeploy-primary 1/1 1 1 2m48s

Once all the resources are in place we test the application using the service name and istio’s ingress-gateway.

# Test if service is accessible$ kubectl run -i -t nginx --rm=true --image=nginx -- bash# Once in the container execute the below commandsroot@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# exit
# Test using ingress-gateway$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/"
hello world!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!

We will now deploy the load tester which will be later used by the canary analysis.

# tester.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: flagger-loadtester
labels:
app: flagger-loadtester
spec:
selector:
matchLabels:
app: flagger-loadtester
template:
metadata:
labels:
app: flagger-loadtester
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
containers:
- name: loadtester
image: ghcr.io/fluxcd/flagger-loadtester:0.18.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
command:
- ./loadtester
- -port=8080
- -log-level=info
- -timeout=1h
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
resources:
limits:
memory: "512Mi"
cpu: "1000m"
requests:
memory: "32Mi"
cpu: "10m"
securityContext:
readOnlyRootFilesystem: true
runAsUser: 10001
---
apiVersion: v1
kind: Service
metadata:
name: flagger-loadtester
labels:
app: flagger-loadtester
spec:
type: ClusterIP
selector:
app: flagger-loadtester
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
# Create load tester$ kubectl apply -f tester.yaml
deployment.apps/flagger-loadtester created
service/flagger-loadtester created
# Check pods and services
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
appdeploy-primary-79595548f6-pcq9t 2/2 Running 0 3m37s
flagger-loadtester-5b766b7ffc-ksl45 2/2 Running 0 32s
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
appdeploy ClusterIP 10.102.160.160 <none> 5000/TCP 6m24s
appdeploy-canary ClusterIP 10.100.219.243 <none> 5000/TCP 3m24s
appdeploy-primary ClusterIP 10.110.17.144 <none> 5000/TCP 3m23s
flagger-loadtester ClusterIP 10.103.190.34 <none> 80/TCP 17s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h51m

Rolling Update

Now we change the python application image to a new version to start the canary analysis and see how the flagger does the rolling update.

# Change the image$ kubectl set image deployment/appdeploy appdeploy=quay.io/sirishagopigiri/python-testapp:v2
deployment.apps/appdeploy image updated

Once updated you can check the deployments where we will find two versions of the application running under different deployment names. We can also check the service endpoints to see which pod is tagged to which service.

# Check deployments
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
appdeploy 1/1 1 1 8m38s
appdeploy-primary 1/1 1 1 5m37s
flagger-loadtester 1/1 1 1 2m31s
# Check pods
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
appdeploy-9477df584-cg6kt 2/2 Running 0 86s 192.192.43.137 harrypotter <none> <none>
appdeploy-primary-79595548f6-pcq9t 2/2 Running 0 5m25s 192.192.43.190 harrypotter <none> <none>
flagger-loadtester-5b766b7ffc-ksl45 2/2 Running 0 2m20s 192.192.43.191 harrypotter <none> <none>
# Check service endpoints
$ kubectl get ep
NAME ENDPOINTS AGE
appdeploy 192.192.43.190:5000 8m41s
appdeploy-canary 192.192.43.137:5000 5m41s
appdeploy-primary 192.192.43.190:5000 5m41s
flagger-loadtester 192.192.43.191:8080 2m34s
kubernetes 192.168.1.102:6443 5h53m
# Check canary CRD
$ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
appdeploy Progressing 20 2021-06-13T16:21:33Z

Finally, we can use the curl commands to see if the service continuity is maintained or not.

# Testing using nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commandsroot@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# curl -X GET http://appdeploy-primary:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# curl -X GET http://appdeploy-canary:5000/return_version
Running test app on version 2.0 !!!
# Testing using ingress-gateway
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/"
hello world!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 2.0 !!!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!

As we can notice only when we call the appdeploy-canary service explicitly then the requests are routing to the new version.

The service continuity to the application is maintained as the curl request to appdeploy service still returns version v1 only.

But with the ingress controller, the traffic distribution happens between the two versions(1 out of 4 requests is routed to v2) as we specified the rolling update strategy as step weight.

Please note: Explore A/B testing strategy to stop the distribution of traffic at ingress and handle the requests based on headers.

Check the load-tester and flagger logs for more info or the canary CRD.

# Check logs
$ kubectl logs <loadtester pod>
$ kubectl -n istio-system logs <flagger-pod>
# Check canary CRD
$ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
appdeploy Progressing 30 2021-06-13T16:22:32Z

Below are some logs from the flagger pod

{"level":"info","ts":"2021-06-13T16:16:35.835Z","caller":"controller/events.go:33","msg":"Initialization done! appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:19:32.970Z","caller":"controller/events.go:33","msg":"New revision detected! Scaling up appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:20:33.034Z","caller":"controller/events.go:33","msg":"Starting canary analysis for appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:20:33.227Z","caller":"controller/events.go:33","msg":"Pre-rollout check acceptance-test passed","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:20:33.561Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 10","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:21:33.439Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 20","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:22:33.191Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 30","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:23:33.462Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 40","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:24:33.278Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 50","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:25:33.601Z","caller":"controller/events.go:33","msg":"Copying appdeploy.default template spec to appdeploy-primary.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:26:33.003Z","caller":"controller/events.go:45","msg":"appdeploy-primary.default not ready: waiting for rollout to finish: 1 old replicas are pending termination","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:27:35.184Z","caller":"controller/events.go:33","msg":"Routing all traffic to primary","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:28:43.266Z","caller":"controller/events.go:33","msg":"Promotion completed! Scaling down appdeploy.default","canary":"appdeploy.default"}

Once the canary reaches 50 weight the rolling update happens automatically and the primary pod gets replaced with v2 version.

# Check canary CRD
$ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
appdeploy Succeeded 0 2021-06-13T16:28:42Z
# Check pods - new pod created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
appdeploy-primary-8ddc7bdfd-hwtm4 2/2 Running 0 4m15s 192.192.43.138 harrypotter <none> <none>
flagger-loadtester-5b766b7ffc-ksl45 2/2 Running 0 11m 192.192.43.191 harrypotter <none> <none>
# Check deployments
$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
appdeploy 0/0 0 0 17m
appdeploy-primary 1/1 1 1 14m
flagger-loadtester 1/1 1 1 11m
# Check services
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
appdeploy ClusterIP 10.102.160.160 <none> 5000/TCP 17m
appdeploy-canary ClusterIP 10.100.219.243 <none> 5000/TCP 14m
appdeploy-primary ClusterIP 10.110.17.144 <none> 5000/TCP 14m
flagger-loadtester ClusterIP 10.103.190.34 <none> 80/TCP 11m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6h2m
# Check endpoints
$ kubectl get ep
NAME ENDPOINTS AGE
appdeploy 192.192.43.138:5000 17m
appdeploy-canary <none> 14m
appdeploy-primary 192.192.43.138:5000 14m
flagger-loadtester 192.192.43.191:8080 11m
kubernetes 192.168.1.102:6443 6h3m
# Check service requests$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 2.0 !!!
# Check with nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commands
root@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 2.0 !!!

This shows that the rolling update is completed successfully!!!

Check the Grafana dashboard for the service metrics and other details, access using http://localhost:3000

Shows success-rate and request duration for primary and canary deployments

Rollback Scenario

We will now try to upgrade the app to a new version in which we return with an HTTP response code of 500 instead of 200. In this case the flagger we try to update but since the requests fail it will retain the previous version.

# Update image to a new version
$ kubectl set image deployment/appdeploy appdeploy=quay.io/sirishagopigiri/python-testapp:v3

Keep checking the canary CRD and logs for more information.

# Check Canary
$ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
appdeploy Progressing 10 2021-06-13T16:40:33Z
# Check pods
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
appdeploy-65fff955ff-t65fj 2/2 Running 0 62s 192.192.43.143 harrypotter <none> <none>
appdeploy-primary-8ddc7bdfd-hwtm4 2/2 Running 0 15m 192.192.43.138 harrypotter <none> <none>
flagger-loadtester-5b766b7ffc-ksl45 2/2 Running 0 21m 192.192.43.191 harrypotter <none> <none>
# Check endpoints
$ kubectl get ep
NAME ENDPOINTS AGE
appdeploy 192.192.43.138:5000 28m
appdeploy-canary 192.192.43.143:5000 25m
appdeploy-primary 192.192.43.138:5000 25m
flagger-loadtester 192.192.43.191:8080 22m
kubernetes 192.168.1.102:6443 6h13m
# Check service requests# Using nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commands
root@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 2.0 !!!
root@nginx:/# curl -X GET http://appdeploy-primary:5000/return_version
Running test app on version 2.0 !!!
root@nginx:/# curl -v -X GET http://appdeploy-canary:5000/return_version
* Trying 10.100.219.243...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5610595d8fb0)
* Connected to appdeploy-canary (10.104.13.182) port 5000 (#0)
> GET /return_version HTTP/1.1
> Host: appdeploy-canary:5000
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< content-type: text/html; charset=utf-8
< content-length: 35
< server: envoy
< date: Sun, 13 Jun 2021 15:58:50 GMT
< x-envoy-upstream-service-time: 26
<
* Connection #0 to host appdeploy-canary left intact
Running test app on version 3.0 !!!
# Using istio ingress-gateway
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying 127.0.0.1:31543...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 31543 (#0)
> GET /return_version HTTP/1.1
> Host: 127.0.0.1:31543
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< content-type: text/html; charset=utf-8
< content-length: 35
< server: istio-envoy
< date: Sun, 13 Jun 2021 15:57:37 GMT
< x-envoy-upstream-service-time: 2
<
* Connection #0 to host 127.0.0.1 left intact
Running test app on version 3.0 !!!

From the above service requests testing we can see that even though the new version returns a response but the HTTP response code is 500. So because of this, the canary analysis fails, as we have mentioned success-rate as one of the metrics in the canary CRD(canary.yaml).
Flagger logs for reference

{"level":"info","ts":"2021-06-13T16:39:32.888Z","caller":"controller/events.go:33","msg":"New revision detected! Scaling up appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:40:33.136Z","caller":"controller/events.go:33","msg":"Starting canary analysis for appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:40:33.203Z","caller":"controller/events.go:33","msg":"Pre-rollout check acceptance-test passed","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:40:33.353Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 10","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:41:33.603Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 20","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:42:32.944Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:43:33.203Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 30","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:44:32.951Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:45:33.084Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 40","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:46:32.911Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:47:33.404Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 50","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:48:33.273Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:49:32.907Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:50:32.954Z","caller":"controller/events.go:45","msg":"Rolling back appdeploy.default failed checks threshold reached 5","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:50:33.154Z","caller":"controller/events.go:45","msg":"Canary failed! Scaling down appdeploy.default","canary":"appdeploy.default"}

Once the update fails the canary CRD is updated with the same status. And finally, we can also check the service status with curl which returns version 2.0.

# Check Canary Status
$ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
appdeploy Failed 0 2021-06-13T16:50:33Z
# Check service requests
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 2.0 !!!
# Check with nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commands
root@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 2.0 !!!

Check the service success rate in Grafana

Success rate 0% on canary deployment

Conclusion

Flagger is a Kubernetes operator which when integrated with Gitops gives a very great advantage of testing the applications. It helps the DevOps engineer to see that no integration tests fail before deploying it to production. It also provides the advantage of progressive traffic routing to maintain service continuity. It also has A/B testing strategy support which will be a great value add when the application owner wants different versions servicing a different set of people like developers and testers working on the same Kubernetes cluster.

References:

  1. https://docs.flagger.app/
  2. https://istio.io/latest/
  3. https://istio.io/latest/docs/setup/getting-started/#download
  4. https://istio.io/latest/docs/ops/integrations/prometheus/
  5. https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/
  6. https://flask.palletsprojects.com/en/2.0.x/

--

--

Sirishagopigiri
CodeX
Writer for

Engineer by profession. Chef by passion (applicable only for some dishes :-P). Trying to become a blogger.