Deployment on GKE with Flagger using GCP Log Based Metrics

Overview

Pawan Phalak
Google Cloud - Community
3 min readNov 15, 2022

--

Flagger is a progressive delivery tool that converts the release process for applications using Kubernetes to automatic operation. The flagger tool can be used to execute deployment strategy(canary, A/B, blue-green). The flagger deployment is executed with the metric analysis stage. We can use GCP log based metrics to define the custom metrics which can be used to control the deployment in GKE cluster. In this blog, we will cover how to use flagger tool to perform canary deployment using GCP logs based metrics as flagger metrics.

Prerequisites

  1. GKE Cluster with Istio installed and workload identity enabled.
  2. Flagger requires a Kubernetes cluster v1.16 or newer and Istio v1.5 or newer.
  3. Enable Istio access logs of type JSON.
  4. Update the GKE cluster firewall to open port 15017 for private GKE Cluster.

Steps

  1. Install flagger on GKE cluster with following command in istio-system namespace:
kubectl apply -k github.com/fluxcd/flagger//kustomize/istio

2. Deploy sample application with testing pods in test namespace:

kubectl create ns test
kubectl label namespace test istio-injection=enabled
kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main
kubectl apply -k https://github.com/fluxcd/flagger//kustomize/tester?ref=main

3. Expose the sample application with ingress gateway:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"

4. Create a log based metric of type distribution with the following cloud logging query to get the latency of canary pods(using field named jsonPayload.duration) of sample application from istio access logs:

resource.type="k8s_container"
resource.labels.project_id="<GCP_PROJECT_ID>"
resource.labels.location="<GKE_LOCATION>"
resource.labels.cluster_name="<GKE_CLUSTER_NAME>"
resource.labels.namespace_name="test"
resource.labels.container_name="istio-proxy"
jsonPayload.authority="podinfo-canary.test:9898"
jsonPayload.duration>0

5. Create service account(GCP_SA) with Monitoring Admin role from GCP IAM page.

6. Perform following steps in workload identity enabled GKE cluster:

a) Add IAM policy binding between the flagger service account and GCP service account. This binding allows the flagger Kubernetes service account to act as the IAM service account:

gcloud iam service-accounts add-iam-policy-binding <GCP_SA> \    
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<GCP_PROJECT_ID>.svc.id.goog[<NAMESPACE>/<FLAGGER_KUBERNETES_SA>]"

b) Annotate the flagger Kubernetes service account:

kubectl annotate serviceaccount <FLAGGER_KUBERNETES_SA> \
--namespace <NAMESPACE> \
iam.gke.io/gcp-service-account=<GCP_SA>

7. Create metric template with GCP log based metric to fetch latency from GCP log based metric created in step 4.

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: canary-latency
namespace: test
spec:
provider:
type: stackdriver
secretRef:
name: gcloud-sa
query: |
fetch k8s_container
| metric 'logging.googleapis.com/user/podinfo-canary-latency'
| every 1m
| group_by [],
[value_response_latencies_percentile:
percentile(value.latency,99 )]

8. Create a kubernetes secret with gcloud project ID:

kubectl create secret generic gcloud-sa --from-literal=project=<project-id> -n test

9. Apply the canary YAML file with following configurations, update the hosts and gateway parameter in the manifest, if required:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
# Mention the deployment name for which Canary has to be triggered
name: podinfo
service:
# service port number
port: 9898
# container port number or name (optional)
targetPort: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- istio-ingressgateway
# Istio virtual service host names (optional)
hosts:
# - app.example.com
- "*"
# Istio traffic policy (optional)
trafficPolicy:
tls:
# use ISTIO_MUTUAL when mTLS is enabled
mode: DISABLE
analysis:
# schedule interval (default 60s)
interval: 30s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 100
# canary increment step
# percentage (0-100)
stepWeight: 5
match:
- headers:
cookie:
regex: "^(.*?;)?(type=insider)(;.*)?$"
metrics:
- name: "GCP log based metrics for canary podinfo"
templateRef:
name: canary-latency
namespace: test
thresholdRange:
max: 100
interval: 30s
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 15s
metadata:
cmd: "hey -z 10m -q 10 -c 2 -H 'Cookie: type=insider' http://podinfo-canary.test:9898/"
- name: "promotion gate"
type: confirm-promotion
url: http://flagger-loadtester.test/gate/approve

10. Trigger the canary deployment by updating the image version of sample application:

kubectl -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:6.0.1

11. Once the canary is initialized, the canary roll-out will start and the new version of deployment will be rolled out in the increments of weight 5, if the metric analysis is succeeded.

12. The metric analysis of canary checks if the latency from the GCP log based metric is below 100 threshold value for new canary version. If the metric analysis condition is true, then the canary rollout is incremented by step 5. The canary is incremented with step weight 5 until complete 100 percent of traffic is routed to newer canary release.

--

--