Istio Series — 2: Deployment: Canary Deployments with Istio and Argo Rollouts in Kubernetes

4 min readMay 26, 2022

Hi Folks, let’s think about the canary deployment strategy in Kubernetes. Besides Rolling Update as the default Deployment strategy for Deployments, we have other fruitful deployment strategies, like Blue-Green and Canary deployments. (Here is a nice documentation for these deployment strategies).

In this discussion, we’ll give our thoughts to implement Canary Deployment strategy in Argo Rollouts with Istio as our service-mesh layer.

Glimpse about how canary deployment in Argo-rollout works: https://argoproj.github.io/argo-rollouts/features/canary/

Plain Vanilla version of Canary deployment using Argo-rollout and Istio: https://argoproj.github.io/argo-rollouts/features/traffic-management/istio/#host-level-traffic-splitting

Note: Only the Host-level-traffic-splitting canary deployment is considered in this document.

Lets see my way of Canary Deployment…

How my way of canary deployment differs from the one mentioned in Argo-rollout official documentation? lets scroll down more…

Lets have spot-light on each resources we are going to use here:

An actual application endpoint and a canary endpoint (canary endpoint is always advised to be of private domain).
Istio Gateways: Stable/Actual (A) — for actual endpoint and Canary (B) — for canary endpoint.
Istio Virtual Services: that is bound to Gateway A (C) and that is bound to Gateway A&B (D)
Kubernetes Services: Service pointing to production pods (E) and Service pointing to canary pods (F)
Argo Rollout: Rollout resource (G) that replaces Deployment, which controls Virtual Gateways, Services and Replica sets.

Note: A,B,C,D,E,F,G are marked in Architectural Overview above.

Resource Definitions:

Gateways

# (A)
kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: somegateway
  namespace: somenamespace
spec:
  servers:
    - hosts:
        - dummy.example.com
      port:
        name: http
        number: 80
        protocol: HTTP
  selector:
    istio: ingressgateway---
# (B)kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: somegateway-canary
  namespace: somenamespace
spec:
  servers:
    - hosts:
        - dummy-canary.example.com
      port:
        name: http
        number: 80
        protocol: HTTP
  selector:
    istio: ingressgateway

Virtual-Services

# (C)
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: somevs
  namespace: somenamespace
spec:
  hosts:
    - dummy.example.com
  gateways:
    - somegateway
  http:
    - match:
        - uri:
            prefix: /
      route:
      - destination:
          host: somesvc-stable.somenamespace.svc.cluster.local
          port:
            number: 80
        weight: 100
      - destination:
          host: somesvc-canary.somenamespace.svc.cluster.local
          port:
            number: 80
        weight: 0
---
# (D)
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: somevs-canary
  namespace: somenamespace
spec:
  hosts:
    - dummy-canary.example.com
  gateways:
    - somegateway-canary
  http:
    - match:
        - uri:
            prefix: /
      route:
      - destination:
          host: somesvc-canary.somenamespace.svc.cluster.local
          port:
            number: 80
        weight: 100

Services

# (E)
apiVersion: v1
kind: Service
metadata:
  name: somesvc-stable
  namespace: somenamespace
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: someapp
  type: ClusterIP
---
# (F)
apiVersion: v1
kind: Service
metadata:
  name: somesvc-canary
  namespace: somenamespace
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: someapp
  type: ClusterIP

Argo-Rollout

# (G)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  creationTimestamp: null
  name: someapp
  namespace: somenamespace
spec:
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: someapp
  strategy:
    canary:
      maxUnavailable: "10%"
      abortScaleDownDelaySeconds: 30
      canaryMetadata:
        labels:
          release: canary
      canaryService: somesvc-canary
      stableMetadata:
        labels:
          release: stable
      stableService: somesvc-stable
      steps:
      - setCanaryScale:
          replicas: 1
      - pause: {}
      - setWeight: 10
      - pause: {}
      - setWeight: 50
      - pause: {}
      - setWeight: 75
      - pause: {}
      trafficRouting:
        istio:
          virtualServices:
          - name: somevs
  template:
    metadata:
      annotations:
        prometheus.io/path: /stats/prometheus
        prometheus.io/port: "15020"
        prometheus.io/scheme: "http"
        prometheus.io/scrape: "true"
        sidecar.istio.io/status: '{"version":"xxxx","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null}'
      creationTimestamp: null
      labels:
        app: someapp
        istio.io/rev: ""
        security.istio.io/tlsMode: istio
        service.istio.io/canonical-name: someapp
        service.istio.io/canonical-revision: latest
    spec:
      containers:
      - image: <APP-IMAGE>
        imagePullPolicy: Always
        name: someapp
        ports:
        - containerPort: 80
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /app-health/someapp/readyz
            port: 15020
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 2
          timeoutSeconds: 1
        terminationMessagePolicy: FallbackToLogsOnError
      - args:
        - proxy
        - sidecar
        - --domain
#...<truncated just has injected istio-proxy definition>...
      - name: istio-token
        projected:
          sources:
          - serviceAccountToken:
              audience: istio-ca
              expirationSeconds: 43200
              path: istio-token
      - configMap:
          name: istio-ca-root-cert
        name: istiod-ca-cert

Enough Snippet !! Some Explanations !!

As visualised in the Architectural Overview Diagram,

We are creating Gateways [A] and [B] for actual and canary endpoints respectively.
Virtual-Service [C] has 2 destination services for the same endpoint and gateway with 100 weight for somesvc-stable (i.e [E]) and 0 weight for somesvc-canary (i.e [F]). Argo-rollout will control these weights during the deployment steps (setWeight: n).
Virtual-Service [D] has gateway [B] and canary endpoint bound to it, which leads the traffic to somesvc-canary [F].
On seeing service [E] and [F] we cannot spot any differences except service name. But Argo-rollout adds the label metadata “release: canary/stable” and the replica hash of replica-sets to the spec.selector of the corresponding services. In this way during the deployment (only when canary pods are up), service [E] points to stable pods and service [F] points to canary pods, until then both services [E] and [F] points to the same stable pods.
In Argo-rollout resource [G], we’ll be specifying services [E] and [F], also only virtual-service [C]. The deployment step: “setCanaryScale” will only create “n” canary pod (“replicas: n”) but it does not change any weights in the Virtual-service [C].

So what’s different in my way??

We are creating a Gateway [B] and a Virtual Service [C] which points to canary environment always (if no canary pods, then it will point to stable pods) and we also have a particular canary endpoint (dummy-canary.example.com). So now with specified the canary-endpoint, testing the canary environment is very easy, as it can be done before sending live traffic to the canary pods during deployment steps (esp. at the first “pause: {}” step).

Hope this document is helpful for anyone who wants to kick-start canary deployments in the production, but still stuck in demo phase.