Flagger — Canary Deployments Tutorial

Published in

CodeX

15 min readJun 13, 2021

Flagger — a simple and easy progressive delivery tool, which can be used to roll out a new version of applications running on Cloud-native platforms like Kubernetes or GKE or EKS, or Alibaba Cloud. It integrates with service mesh tools like Istio or Linkerd or Gloo or Contour, to understand the app metrics and perform the rollout gradually.

The canaries are a kind of bird used in coal mining to detect toxic gases and alert the miners, in similar terms canary deployment is used as a strategy to rollout microservice-based applications from one version to another.

Flagger is a Kubernetes operator developed to help in canary deployments of the application. It helps in moving the microservice-based application to a new version in a cloud-native-based environment or simple terms it facilitates for provision of the application from one image to a new docker image. To do this it keeps checking on the Service Level Agreements(SLAs) defined, for a certain amount of time and gradually rolls out in case if they are met or rollback in case of failure scenarios.

The internal mechanics on which Flagger runs is whenever an application is scheduled for a change to a new version, it starts reading the metric values defined from the service mesh tools, and based on the SLAs it takes an action either to route the traffic to the new version or mark the rollout as failed and retain the old version of the application in the echo-system.

Flagger runs as a deployment kind in the service mesh namespace that the user wants to integrate to, which can be Istio, Linkerd, Gloo, etc., It comes with a CRD object called ‘Canary’ which helps the user in specifying different parameters like the name of the application it has to look out for rollout, the SLA definitions. It also has a webhook option that can be used to send notifications or use to run some load testing traffic at the time of rollout. It also defines different parameters specifying how much time it has to monitor the SLA before rolling out the application. Additionally, Flagger can be installed with Grafana enabled to monitor the traffic flow when the rolling out happens.

Flagger supports different types of deployment strategies —

Canary testing — slowly shifting the traffic to a new version
A/B testing — routing to different versions based on HTTP header or cookie data
Blue/Green traffic — to run without service mesh — it integrates simply with the Prometheus operator which reads the metrics in the cluster
Blue/Green mirroring — Can be used to route traffic to both versions of the application

In this blog, we will explore the canary-based deployment of a simple python flask app to see how flagger works. We will install flagger on the Kubernetes cluster which has Istio enabled.

Prerequisite

Ubuntu 20.04 OS
Docker 20.10 version
Kubernetes cluster — Single node with Calico CNI is used here — 1.21.1 version
Helm binary — Refer here for installation

Installing Istio and Flagger

We will now install Istio using the istioctl CLI. Download and install the binary using the below commands. Later we will use it to bring up the Istio service mesh on the cluster. We are using the 1.10.0 version of Istio here.

# To download istioctl$ curl -L https://istio.io/downloadIstio | sh -
$ cd istio-1.10.0
$ export PATH=$PWD/bin:$PATH
# To install Istio using istioctl on the kubernetes cluster$ istioctl install --set profile=demo -y
✔ Istio core installed                                                                                                                        
✔ Istiod installed                                                                                                                            
✔ Ingress gateways installed                                                                                                                  
✔ Egress gateways installed                                                                                                                   
✔ Installation complete                                                                                                                       Thank you for installing Istio 1.10.  Please take a few minutes to tell us about your install/upgrade experience!  https://forms.gle/KjkrDnMPByq7akrYA
# Install prometheus
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.10/samples/addons/prometheus.yaml# Check if all the pods are running in istio-system namespace are in running state in the kubernetes cluster$ kubectl -n istio-system get pods
NAME                                    READY   STATUS    RESTARTS   AGE
istio-egressgateway-55d4df6c6b-vn2gx    1/1     Running   0          2m58s
istio-ingressgateway-69dc4765b4-4p2vj   1/1     Running   0          2m58s
istiod-798c47d594-7nldj                 1/1     Running   0          6m34s
prometheus-8958b965-z4bkc               2/2     Running   0          5m

Once we have the Kubernetes cluster with Istio running we can use helm to install the Flagger.

# Adding flagger repo to helm and installing flagger CRD$ helm repo add flagger https://flagger.app
$ kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
# Installing flagger in istio's namespace$ helm upgrade -i flagger flagger/flagger --namespace=istio-system --set crd.create=false --set meshProvider=istio --set metricsServer=http://prometheus:9090
# Enabling grafana$ helm upgrade -i flagger-grafana flagger/grafana --namespace=istio-system --set url=http://prometheus.istio-system:9090 --set user=admin --set password=change-me
# Enable port-forwarding to access grafana on localhost:3000$ kubectl -n istio-system port-forward svc/flagger-grafana 3000:80
# Check pods in istio-system namespace$ kubectl -n istio-system get podsNAME                                    READY   STATUS    RESTARTS   AGE
flagger-5c49576977-fvtgl                1/1     Running   0          3m28s
flagger-grafana-77b8c8df65-rszqm        1/1     Running   0          75s
istio-egressgateway-55d4df6c6b-vn2gx    1/1     Running   0          21m
istio-ingressgateway-69dc4765b4-4p2vj   1/1     Running   0          21m
istiod-798c47d594-7nldj                 1/1     Running   0          24m
prometheus-8958b965-z4bkc               2/2     Running   0          26m

Deploying Python Flask app

Here is an example of the python flask app, clone the repository to deploy the app in the Kubernetes cluster.

# Clone python flask app$ git clone https://github.com/SirishaGopigiri/python-flask-app.git$ cd python-flask-app

Enable Istio sidecar in the default namespace and then deploy the app using the deployment.yaml file in the default namespace. Alternatively change the YAML file according to the cluster spec if needed.

Please note: All the YAML files used in this blog are available in the github repo

# Enable side car injection from istio in default namespace$ kubectl label namespace default istio-injection=enabled
# Deploying the application in kubernetes$ kubectl apply -f deployment.yaml
deployment.apps/appdeploy created
service/appdeploy created# Check pods and service in default namespace$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
appdeploy-7dcf9786cc-b2mjx   2/2     Running   0          2m17s# Check service status in default namespace$ kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
appdeploy    ClusterIP   10.102.160.160   <none>        5000/TCP   2m28s
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP    5h47m# Test if service is accessible$ kubectl run -i -t nginx --rm=true --image=nginx -- bash# Once in the container execute the below commandsroot@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# exit

Istio’s ingress gateway will be used by canary to access the service. Use the below commands to patch istio-ingressgateway to NodePort.

# patch istio-ingressgateway to nodeport$ kubectl patch svc -n istio-system istio-ingressgateway --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"}]'

Now let us create a Virtual Service and Gateway to access the application from istio’s gateway service. Use the below YAML files.

# gateway.yamlapiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: appdeploy-gateway
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"# virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: appdeploy
spec:
  hosts:
  - "*"
  gateways:
  - appdeploy-gateway
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        port:
          number: 5000
        host: appdeploy# Create the resources$ kubectl apply -f gateway.yaml
gateway.networking.istio.io/appdeploy-gateway created$ kubectl apply -f virtualservice.yaml
virtualservice.networking.istio.io/appdeploy created

Once resources are created, extract the node port for the istio’s ingress-gateway service and try accessing the application.

# Node port for ingress-gateway$ export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')# Test application
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/"
hello world!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!

Creating Canary CR for python app

Once we have the python flask-app deployment running, now let’s configure it with the canary CRD. Use the below YAML file.

# Copy the below yaml file and make necessary changes according to your deployment### canary.yamlapiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: appdeploy
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appdeploy # deployment name
  service:
    # service port number
    port: 5000
    gateways:
    - appdeploy-gateway
    hosts:
    - "*"
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary percentage (0-100)
    maxWeight: 50
    # canary increment step percentage (0-100)
    stepWeight: 10
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # maximum req duration P99
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.default/
        timeout: 30s
        metadata:
          type: bash
          cmd: "curl -sd 'test' http://appdeploy-canary:5000/return_version"
      - name: load-test
        url: http://flagger-loadtester.default/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://appdeploy-canary:5000/return_version"

From the above YAML file, we can see that we are increasing the traffic gradually by 10% for every 1 minute and once the 50% weight has reached we are want to declare the new version of the app has met the SLAs and want flagger to perform rollout. We are also calculating if the SLAs for every once minute and based on that the traffic progression happens. We are using the sample load tester from the flagger to generate load when the canary analysis is happening. Create the canary.yaml file with the required configuration.

# Before creating canary we need to delete the virtual service, as it will now be managed by the flagger from the above canary.yaml file$ kubectl delete -f virtualservice.yaml
virtualservice.networking.istio.io "appdeploy" deleted# Creating canary.yaml$ kubeclt apply -f canary.yaml
canary.flagger.app/appdeploy created

Once created wait for some time for the flagger to read the deployment and create the resources to manage the canary. It will create the following resources

Scale down current deployment to 0
Brings up new deployment appdeploy-primary
Existing appdeploy service will point to primary deployment. (Check using kubectl get endpoints )
Creates two new services appdeploy-primary and appdeploy-canary used to route traffic while doing canary analysis
Creates a virtual service to distribute the weight between primary and canary services when doing canary analysis.

# Check canary CRD
$ kubectl get canary
NAME        STATUS        WEIGHT   LASTTRANSITIONTIME
appdeploy   Initialized   0        2021-06-13T16:16:35Z# Check pods see the difference in names$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
appdeploy-primary-79595548f6-pcq9t   2/2     Running   0          118s# Check service
$ kubectl get svc
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
appdeploy           ClusterIP   10.102.160.160   <none>        5000/TCP   5m12s
appdeploy-canary    ClusterIP   10.100.219.243   <none>        5000/TCP   2m12s
appdeploy-primary   ClusterIP   10.110.17.144    <none>        5000/TCP   2m11s
kubernetes          ClusterIP   10.96.0.1        <none>        443/TCP    5h50m# Check virtual service$ kubectl get vs
NAME        GATEWAYS                HOSTS   AGE
appdeploy   ["appdeploy-gateway"]   ["*"]   92s# Check deployments$ kubectl get deploy
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
appdeploy           0/0     0            0           5m49s
appdeploy-primary   1/1     1            1           2m48s

Once all the resources are in place we test the application using the service name and istio’s ingress-gateway.

# Test if service is accessible$ kubectl run -i -t nginx --rm=true --image=nginx -- bash# Once in the container execute the below commandsroot@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# exit# Test using ingress-gateway$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/"
hello world!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!

We will now deploy the load tester which will be later used by the canary analysis.

# tester.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flagger-loadtester
  labels:
    app: flagger-loadtester
spec:
  selector:
    matchLabels:
      app: flagger-loadtester
  template:
    metadata:
      labels:
        app: flagger-loadtester
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      containers:
        - name: loadtester
          image: ghcr.io/fluxcd/flagger-loadtester:0.18.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          command:
            - ./loadtester
            - -port=8080
            - -log-level=info
            - -timeout=1h
          livenessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          readinessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          resources:
            limits:
              memory: "512Mi"
              cpu: "1000m"
            requests:
              memory: "32Mi"
              cpu: "10m"
          securityContext:
            readOnlyRootFilesystem: true
            runAsUser: 10001
---
apiVersion: v1
kind: Service
metadata:
  name: flagger-loadtester
  labels:
    app: flagger-loadtester
spec:
  type: ClusterIP
  selector:
    app: flagger-loadtester
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http# Create load tester$ kubectl apply -f tester.yaml
deployment.apps/flagger-loadtester created
service/flagger-loadtester created# Check pods and services
$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
appdeploy-primary-79595548f6-pcq9t    2/2     Running   0          3m37s
flagger-loadtester-5b766b7ffc-ksl45   2/2     Running   0          32s$ kubectl get svc
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
appdeploy            ClusterIP   10.102.160.160   <none>        5000/TCP   6m24s
appdeploy-canary     ClusterIP   10.100.219.243   <none>        5000/TCP   3m24s
appdeploy-primary    ClusterIP   10.110.17.144    <none>        5000/TCP   3m23s
flagger-loadtester   ClusterIP   10.103.190.34    <none>        80/TCP     17s
kubernetes           ClusterIP   10.96.0.1        <none>        443/TCP    5h51m

Rolling Update

Now we change the python application image to a new version to start the canary analysis and see how the flagger does the rolling update.

# Change the image$ kubectl set image deployment/appdeploy appdeploy=quay.io/sirishagopigiri/python-testapp:v2
deployment.apps/appdeploy image updated

Once updated you can check the deployments where we will find two versions of the application running under different deployment names. We can also check the service endpoints to see which pod is tagged to which service.

# Check deployments
$ kubectl get deployments
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
appdeploy            1/1     1            1           8m38s
appdeploy-primary    1/1     1            1           5m37s
flagger-loadtester   1/1     1            1           2m31s# Check pods
$ kubectl get pods -o wide
NAME                                  READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
appdeploy-9477df584-cg6kt             2/2     Running   0          86s     192.192.43.137   harrypotter   <none>           <none>
appdeploy-primary-79595548f6-pcq9t    2/2     Running   0          5m25s   192.192.43.190   harrypotter   <none>           <none>
flagger-loadtester-5b766b7ffc-ksl45   2/2     Running   0          2m20s   192.192.43.191   harrypotter   <none>           <none># Check service endpoints
$ kubectl get ep
NAME                 ENDPOINTS             AGE
appdeploy            192.192.43.190:5000   8m41s
appdeploy-canary     192.192.43.137:5000   5m41s
appdeploy-primary    192.192.43.190:5000   5m41s
flagger-loadtester   192.192.43.191:8080   2m34s
kubernetes           192.168.1.102:6443    5h53m# Check canary CRD
$ kubectl get canary
NAME        STATUS        WEIGHT   LASTTRANSITIONTIME
appdeploy   Progressing   20       2021-06-13T16:21:33Z

Finally, we can use the curl commands to see if the service continuity is maintained or not.

# Testing using nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash# Once in the container execute the below commandsroot@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# curl -X GET http://appdeploy-primary:5000/return_version
Running test app on version 1.0 !!!
root@nginx:/# curl -X GET http://appdeploy-canary:5000/return_version
Running test app on version 2.0 !!!# Testing using ingress-gateway
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/"
hello world!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 2.0 !!!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 1.0 !!!

As we can notice only when we call the appdeploy-canary service explicitly then the requests are routing to the new version.

The service continuity to the application is maintained as the curl request to appdeploy service still returns version v1 only.

But with the ingress controller, the traffic distribution happens between the two versions(1 out of 4 requests is routed to v2) as we specified the rolling update strategy as step weight.

Please note: Explore A/B testing strategy to stop the distribution of traffic at ingress and handle the requests based on headers.

Check the load-tester and flagger logs for more info or the canary CRD.

# Check logs
$ kubectl logs <loadtester pod>
$ kubectl -n istio-system logs <flagger-pod># Check canary CRD
$ kubectl get canary
NAME        STATUS        WEIGHT   LASTTRANSITIONTIME
appdeploy   Progressing   30       2021-06-13T16:22:32Z

Below are some logs from the flagger pod

{"level":"info","ts":"2021-06-13T16:16:35.835Z","caller":"controller/events.go:33","msg":"Initialization done! appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:19:32.970Z","caller":"controller/events.go:33","msg":"New revision detected! Scaling up appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:20:33.034Z","caller":"controller/events.go:33","msg":"Starting canary analysis for appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:20:33.227Z","caller":"controller/events.go:33","msg":"Pre-rollout check acceptance-test passed","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:20:33.561Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 10","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:21:33.439Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 20","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:22:33.191Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 30","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:23:33.462Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 40","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:24:33.278Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 50","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:25:33.601Z","caller":"controller/events.go:33","msg":"Copying appdeploy.default template spec to appdeploy-primary.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:26:33.003Z","caller":"controller/events.go:45","msg":"appdeploy-primary.default not ready: waiting for rollout to finish: 1 old replicas are pending termination","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:27:35.184Z","caller":"controller/events.go:33","msg":"Routing all traffic to primary","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:28:43.266Z","caller":"controller/events.go:33","msg":"Promotion completed! Scaling down appdeploy.default","canary":"appdeploy.default"}

Once the canary reaches 50 weight the rolling update happens automatically and the primary pod gets replaced with v2 version.

# Check canary CRD
$ kubectl get canary
NAME        STATUS      WEIGHT   LASTTRANSITIONTIME
appdeploy   Succeeded   0        2021-06-13T16:28:42Z# Check pods - new pod created
$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
appdeploy-primary-8ddc7bdfd-hwtm4     2/2     Running   0          4m15s   192.192.43.138   harrypotter   <none>           <none>
flagger-loadtester-5b766b7ffc-ksl45   2/2     Running   0          11m     192.192.43.191   harrypotter   <none>           <none># Check deployments
$ kubectl get deploy
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
appdeploy            0/0     0            0           17m
appdeploy-primary    1/1     1            1           14m
flagger-loadtester   1/1     1            1           11m# Check services
$ kubectl get svc
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
appdeploy            ClusterIP   10.102.160.160   <none>        5000/TCP   17m
appdeploy-canary     ClusterIP   10.100.219.243   <none>        5000/TCP   14m
appdeploy-primary    ClusterIP   10.110.17.144    <none>        5000/TCP   14m
flagger-loadtester   ClusterIP   10.103.190.34    <none>        80/TCP     11m
kubernetes           ClusterIP   10.96.0.1        <none>        443/TCP    6h2m# Check endpoints
$ kubectl get ep
NAME                 ENDPOINTS             AGE
appdeploy            192.192.43.138:5000   17m
appdeploy-canary     <none>                14m
appdeploy-primary    192.192.43.138:5000   14m
flagger-loadtester   192.192.43.191:8080   11m
kubernetes           192.168.1.102:6443    6h3m# Check service requests$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 2.0 !!!# Check with nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commands
root@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 2.0 !!!

This shows that the rolling update is completed successfully!!!

Check the Grafana dashboard for the service metrics and other details, access using http://localhost:3000

Shows success-rate and request duration for primary and canary deployments

Rollback Scenario

We will now try to upgrade the app to a new version in which we return with an HTTP response code of 500 instead of 200. In this case the flagger we try to update but since the requests fail it will retain the previous version.

# Update image to a new version
$ kubectl set image deployment/appdeploy appdeploy=quay.io/sirishagopigiri/python-testapp:v3

Keep checking the canary CRD and logs for more information.

# Check Canary
$ kubectl get canary
NAME        STATUS        WEIGHT   LASTTRANSITIONTIME
appdeploy   Progressing   10       2021-06-13T16:40:33Z# Check pods
$ kubectl get pods -o wide
NAME                                  READY   STATUS    RESTARTS   AGE   IP               NODE          NOMINATED NODE   READINESS GATES
appdeploy-65fff955ff-t65fj            2/2     Running   0          62s   192.192.43.143   harrypotter   <none>           <none>
appdeploy-primary-8ddc7bdfd-hwtm4     2/2     Running   0          15m   192.192.43.138   harrypotter   <none>           <none>
flagger-loadtester-5b766b7ffc-ksl45   2/2     Running   0          21m   192.192.43.191   harrypotter   <none>           <none># Check endpoints
$ kubectl get ep
NAME                 ENDPOINTS             AGE
appdeploy            192.192.43.138:5000   28m
appdeploy-canary     192.192.43.143:5000   25m
appdeploy-primary    192.192.43.138:5000   25m
flagger-loadtester   192.192.43.191:8080   22m
kubernetes           192.168.1.102:6443    6h13m# Check service requests# Using nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commands
root@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 2.0 !!!
root@nginx:/# curl -X GET http://appdeploy-primary:5000/return_version
Running test app on version 2.0 !!!
root@nginx:/# curl -v -X GET http://appdeploy-canary:5000/return_version
*   Trying 10.100.219.243...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5610595d8fb0)
* Connected to appdeploy-canary (10.104.13.182) port 5000 (#0)
> GET /return_version HTTP/1.1
> Host: appdeploy-canary:5000
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< content-type: text/html; charset=utf-8
< content-length: 35
< server: envoy
< date: Sun, 13 Jun 2021 15:58:50 GMT
< x-envoy-upstream-service-time: 26
< 
* Connection #0 to host appdeploy-canary left intact
Running test app on version 3.0 !!!# Using istio ingress-gateway
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 127.0.0.1:31543...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 31543 (#0)
> GET /return_version HTTP/1.1
> Host: 127.0.0.1:31543
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< content-type: text/html; charset=utf-8
< content-length: 35
< server: istio-envoy
< date: Sun, 13 Jun 2021 15:57:37 GMT
< x-envoy-upstream-service-time: 2
< 
* Connection #0 to host 127.0.0.1 left intact
Running test app on version 3.0 !!!

From the above service requests testing we can see that even though the new version returns a response but the HTTP response code is 500. So because of this, the canary analysis fails, as we have mentioned success-rate as one of the metrics in the canary CRD(canary.yaml).
Flagger logs for reference

{"level":"info","ts":"2021-06-13T16:39:32.888Z","caller":"controller/events.go:33","msg":"New revision detected! Scaling up appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:40:33.136Z","caller":"controller/events.go:33","msg":"Starting canary analysis for appdeploy.default","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:40:33.203Z","caller":"controller/events.go:33","msg":"Pre-rollout check acceptance-test passed","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:40:33.353Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 10","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:41:33.603Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 20","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:42:32.944Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:43:33.203Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 30","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:44:32.951Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:45:33.084Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 40","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:46:32.911Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:47:33.404Z","caller":"controller/events.go:33","msg":"Advance appdeploy.default canary weight 50","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:48:33.273Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:49:32.907Z","caller":"controller/events.go:45","msg":"Halt appdeploy.default advancement success rate 0.00% < 99%","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:50:32.954Z","caller":"controller/events.go:45","msg":"Rolling back appdeploy.default failed checks threshold reached 5","canary":"appdeploy.default"}
{"level":"info","ts":"2021-06-13T16:50:33.154Z","caller":"controller/events.go:45","msg":"Canary failed! Scaling down appdeploy.default","canary":"appdeploy.default"}

Once the update fails the canary CRD is updated with the same status. And finally, we can also check the service status with curl which returns version 2.0.

# Check Canary Status
$ kubectl get canary
NAME        STATUS   WEIGHT   LASTTRANSITIONTIME
appdeploy   Failed   0        2021-06-13T16:50:33Z# Check service requests
$ curl -X GET "http://127.0.0.1:$INGRESS_PORT/return_version"
Running test app on version 2.0 !!!# Check with nginx
$ kubectl run -i -t nginx --rm=true --image=nginx -- bash
# Once in the container execute the below commands
root@nginx:/# curl -X GET http://appdeploy:5000
hello world!
root@nginx:/# curl -X GET http://appdeploy:5000/return_version
Running test app on version 2.0 !!!

Check the service success rate in Grafana

Conclusion

Flagger is a Kubernetes operator which when integrated with Gitops gives a very great advantage of testing the applications. It helps the DevOps engineer to see that no integration tests fail before deploying it to production. It also provides the advantage of progressive traffic routing to maintain service continuity. It also has A/B testing strategy support which will be a great value add when the application owner wants different versions servicing a different set of people like developers and testers working on the same Kubernetes cluster.