Progressive Delivery: How To Implement Flagger with Istio
What will be discussed:
- Progressive Delivery
- What is the added value of Flagger
- Flagger’s Deployment Strategies
- Canary Release Demo: how to implement Flagger with Istio
Versions:
- Istio: v1.12.*
- Flagger: v1.16.0
Progressive Delivery
Progressive delivery is a modern software development for gradually rolling out new features in order to limit the potential negative impact for new product features.
Flagger helps us to manage traffic routing between our current release and our new release. Flagger uses a service mesh (App Mesh, Istio, Linkerd, Open Service Mesh) or an ingress controller (Contour, Gloo, NGINX, Skipper, Traefik) for traffic routing.
To sum up:
Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.
https://docs.flagger.app/
What is the added value of Flagger
Flagger will monitor the traffic routing for the canary release, and “decided” whether to route more traffic to the canary release or not. Not just that, we have very good control over how much traffic to route each step. For example, we can configure Flagger to query all the “response code” that is routed to the canary release service, and route 5% of the traffic every 30 minutes to the new release. Flagger will test the traffic each step, as long as the test passed it will keep routing more traffic as configured in the Canary Release.
Example: If more than 1% of the “response code” that has been routed to the canary release was from the 5XX family, we can set Flagger to halt the advance of the traffic to the new release. Otherwise Flagger will keep routing more traffic to the canary release, as configured in the canary.
Flagger can query Prometheus, Datadog, New Relic, CloudWatch, or Graphite and sends alerts via: Slack, MS Teams, Discord, and Rocket.
Flagger’s Deployment Strategies:
- Canary Release (progressive traffic shifting): Istio, Linkerd, App Mesh, NGINX, Skipper, Contour, Gloo Edge, Traefik, Open Service Mesh
- A/B Testing (HTTP headers and cookies traffic routing): Istio, App Mesh, NGINX, Contour, Gloo Edge
- Blue/Green (traffic switching): Kubernetes CNI, Istio, Linkerd, App Mesh, NGINX, Contour, Gloo Edge, Open Service Mesh
- Blue/Green Mirroring (traffic shadowing): Istio
Demo Canary Release: How To Implement Flagger with Istio
STEP 1: Add Flagger Helm repository
$ helm repo add flagger https://flagger.app
STEP 2: Deploy Flagger’s Canary CRD
$ kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yamlcustomresourcedefinition.apiextensions.k8s.io "canaries.flagger.app" created
customresourcedefinition.apiextensions.k8s.io "metrictemplates.flagger.app" created
customresourcedefinition.apiextensions.k8s.io "alertproviders.flagger.app" created
STEP 3: Deploy Istio 1.12.0
https://istio.io/latest/docs/setup/additional-setup/gateway/
As a security best practice, it is recommended to deploy the gateway in a different namespace from the control plane.
$ helm repo add istio https://istio-release.storage.googleapis.com/charts
$ helm repo update
$ kubectl create namespace istio-system
$ helm install istio-base istio/base -n istio-system
$ helm install istiod istio/istiod -n istio-system --wait
$ kubectl create namespace istio-ingress
$ kubectl label namespace istio-ingress istio-injection=enabled
$ helm install istio-ingress istio/gateway -n istio-ingress --wait
STEP 4: Deploy Istio Prometheus add-on
- Istio provides a basic sample installation to get Prometheus up and running quickly.
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.12/samples/addons/prometheus.yaml
STEP 5: Deploy Flagger
- We will use Istio as our service mesh and Prometheus as our metric server
https://github.com/fluxcd/flagger/tree/main/charts/flagger
crd.create
Iftrue
, create Flagger's CRDs (should be enabled for Helm v2 only)
$ helm upgrade -i flagger flagger/flagger --namespace=istio-system --set crd.create=false --set meshProvider=istio --set metricsServer=http://prometheus.istio-system:9090
STEP 6: Deploy test application and add Istio sidecar
$ kubectl create ns test
$ kubectl label namespace test istio-injection=enabled
$ kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main
$ kubectl apply -k https://github.com/fluxcd/flagger//kustomize/tester?ref=main
STEP 7: Create Metric Template
- Flagger will use the following metric, the metric will get all the 5XX errors that the podinfo-canary service received in the last 30 seconds. The metric output is a calculated percentage of the 5XX errors divided by to all the responses code.
Note: in most documentations the istio metric that’s been used is “request-success-rate” or “request-duration”, but recently istio changed the names of it’s metrics. The name of the new metrics are: “istio_requests_total” and “istio_request_duration_milliseconds_bucket”.
For more read about istio metrics: https://istio.io/latest/docs/reference/config/metrics/
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: error-rate
namespace: istio-system
spec:
provider:
address: http://prometheus.istio-system.svc.cluster.local:9090
type: prometheus
query: |
100 -
(sum(rate(istio_requests_total{destination_service="podinfo-canary.test.svc.cluster.local", response_code=~"5.*"}[30s]))
/
sum(rate(istio_requests_total{destination_service="podinfo-canary.test.svc.cluster.local"}[30s]))
* 100
)
STEP 8: Deploy Horizontal Pod Autoscaler
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
# scale up if usage is above
# 99% of the requested CPU (100m)
averageUtilization: 99
STEP 9: Deploy the Canary for podinfo application
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
service:
# service port number
port: 9898
# Istio traffic policy (optional)
trafficPolicy:
tls:
# use ISTIO_MUTUAL when mTLS is enabled
mode: DISABLE
analysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 100
# canary increment step
# percentage (0-100)
stepWeight: 5
metrics:
- name: "500 percentage"
templateRef:
name: error-rate
namespace: istio-system
thresholdRange:
min: 99
interval: 15s
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.test/
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
Few things to note regarding the canary release:
- The service and the deployment that Flagger will take control.
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
service:
# service port number
port: 9898
2. Configure How Flagger will route its traffic.
analysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 100
# canary increment step
# percentage (0-100)
stepWeight: 5
3. The template metric that Flagger will look for it’s analysis
metrics:
- name: "500 percentage"
templateRef:
name: error-rate
namespace: istio-system
thresholdRange:
min: 99
interval: 15s
STEP 10: Trigger a canary deployment by updating the container image
$ kubectl -n test set image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.1
- Describe the canary release to check for errors or advanced status
- In the following picture, we can see a successful release.
STEP 11: Check that the canary release will stop when generating error 500
- Upgrade your image again
$ kubectl -n test set image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.2
- Generate error 500 — run from flagger-loadtester pod
$ kubectl -n test exec -it deployment/flagger-loadtester bash
$ watch -n 1 curl http://podinfo-canary:9898/status/500
- If everything was configured correctly, Flagger would halt the advance of the traffic route to the canary release.
- In this demo, we can see that Flagger has reached the number of failed analyses we set and performed a Rolled Back.
Note: Flagger might be failing 3 times out of 5 through the whole cycle of the new release. In that case flagger will consider the new release as a successful release, and will fully upgrade to the new version.
Which means that when Flagger analysis fails, it halts the advance of the traffic routing, and so on until we reach the limit then it will preform a Roll Back. But if the next analysis will “pass” it will keep route the traffic as set in the canary release.
References:
- https://docs.flagger.app/tutorials/istio-progressive-delivery
- https://docs.flagger.app/usage/deployment-strategies
- https://istio.io/latest/docs/reference/config/metrics/
- https://istio.io/latest/docs/setup/install/helm/
- https://istio.io/latest/docs/ops/integrations/prometheus/
- https://www.optimizely.com/optimization-glossary/progressive-delivery/