Deployment on GKE with Flagger using GCP Log Based Metrics

Overview

Published in

Google Cloud - Community

3 min readNov 15, 2022

Flagger is a progressive delivery tool that converts the release process for applications using Kubernetes to automatic operation. The flagger tool can be used to execute deployment strategy(canary, A/B, blue-green). The flagger deployment is executed with the metric analysis stage. We can use GCP log based metrics to define the custom metrics which can be used to control the deployment in GKE cluster. In this blog, we will cover how to use flagger tool to perform canary deployment using GCP logs based metrics as flagger metrics.

Prerequisites

GKE Cluster with Istio installed and workload identity enabled.
Flagger requires a Kubernetes cluster v1.16 or newer and Istio v1.5 or newer.
Enable Istio access logs of type JSON.
Update the GKE cluster firewall to open port 15017 for private GKE Cluster.

Steps

Install flagger on GKE cluster with following command in istio-system namespace:

kubectl apply -k github.com/fluxcd/flagger//kustomize/istio

2. Deploy sample application with testing pods in test namespace:

kubectl create ns test
kubectl label namespace test istio-injection=enabled
kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main
kubectl apply -k https://github.com/fluxcd/flagger//kustomize/tester?ref=main

3. Expose the sample application with ingress gateway:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

4. Create a log based metric of type distribution with the following cloud logging query to get the latency of canary pods(using field named jsonPayload.duration) of sample application from istio access logs:

resource.type="k8s_container"
resource.labels.project_id="<GCP_PROJECT_ID>"
resource.labels.location="<GKE_LOCATION>"
resource.labels.cluster_name="<GKE_CLUSTER_NAME>"
resource.labels.namespace_name="test"
resource.labels.container_name="istio-proxy"
jsonPayload.authority="podinfo-canary.test:9898"
jsonPayload.duration>0

5. Create service account(GCP_SA) with Monitoring Admin role from GCP IAM page.

6. Perform following steps in workload identity enabled GKE cluster:

a) Add IAM policy binding between the flagger service account and GCP service account. This binding allows the flagger Kubernetes service account to act as the IAM service account:

gcloud iam service-accounts add-iam-policy-binding <GCP_SA> \    
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<GCP_PROJECT_ID>.svc.id.goog[<NAMESPACE>/<FLAGGER_KUBERNETES_SA>]"

b) Annotate the flagger Kubernetes service account:

kubectl annotate serviceaccount <FLAGGER_KUBERNETES_SA> \
--namespace <NAMESPACE> \
iam.gke.io/gcp-service-account=<GCP_SA>

7. Create metric template with GCP log based metric to fetch latency from GCP log based metric created in step 4.

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: canary-latency
  namespace: test
spec:
  provider:
    type: stackdriver
    secretRef: 
      name: gcloud-sa
  query: |
    fetch k8s_container
    | metric 'logging.googleapis.com/user/podinfo-canary-latency'
    | every 1m
    | group_by [], 
        [value_response_latencies_percentile:
            percentile(value.latency,99 )]

8. Create a kubernetes secret with gcloud project ID:

kubectl create secret generic gcloud-sa --from-literal=project=<project-id> -n test

9. Apply the canary YAML file with following configurations, update the hosts and gateway parameter in the manifest, if required:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    # Mention the deployment name for which Canary has to be triggered
    name: podinfo
  service:
    # service port number
    port: 9898
    # container port number or name (optional)
    targetPort: 9898
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    - istio-ingressgateway
    # Istio virtual service host names (optional)
    hosts:
    # - app.example.com
    - "*"
    # Istio traffic policy (optional)
    trafficPolicy:
      tls:
        # use ISTIO_MUTUAL when mTLS is enabled
        mode: DISABLE
  analysis:
    # schedule interval (default 60s)
    interval: 30s
    # max number of failed metric checks before rollback
    threshold: 10
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 100
    # canary increment step
    # percentage (0-100)
    stepWeight: 5
    match:
      - headers:
          cookie:
            regex: "^(.*?;)?(type=insider)(;.*)?$"
    metrics:
      - name: "GCP log based metrics for canary podinfo"
        templateRef:
          name: canary-latency
          namespace: test
        thresholdRange:
          max: 100
        interval: 30s
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 15s
        metadata:
          cmd: "hey -z 10m -q 10 -c 2 -H 'Cookie: type=insider' http://podinfo-canary.test:9898/"
      - name: "promotion gate"
        type: confirm-promotion
        url: http://flagger-loadtester.test/gate/approve

NOTE: Configure each parameter of canary YAML file as per requirement.

10. Trigger the canary deployment by updating the image version of sample application:

kubectl -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:6.0.1

11. Once the canary is initialized, the canary roll-out will start and the new version of deployment will be rolled out in the increments of weight 5, if the metric analysis is succeeded.

12. The metric analysis of canary checks if the latency from the GCP log based metric is below 100 threshold value for new canary version. If the metric analysis condition is true, then the canary rollout is incremented by step 5. The canary is incremented with step weight 5 until complete 100 percent of traffic is routed to newer canary release.

Deployment on GKE with Flagger using GCP Log Based Metrics

Overview

Prerequisites

Steps

Written by Pawan Phalak