Istio Series — 2: Deployment: Canary Deployments with Istio and Argo Rollouts in Kubernetes

Anand Thiyagarajan
4 min readMay 26, 2022

--

Hi Folks, let’s think about the canary deployment strategy in Kubernetes. Besides Rolling Update as the default Deployment strategy for Deployments, we have other fruitful deployment strategies, like Blue-Green and Canary deployments. (Here is a nice documentation for these deployment strategies).

In this discussion, we’ll give our thoughts to implement Canary Deployment strategy in Argo Rollouts with Istio as our service-mesh layer.

Architectural Overview

Glimpse about how canary deployment in Argo-rollout works: https://argoproj.github.io/argo-rollouts/features/canary/

Plain Vanilla version of Canary deployment using Argo-rollout and Istio: https://argoproj.github.io/argo-rollouts/features/traffic-management/istio/#host-level-traffic-splitting

Note: Only the Host-level-traffic-splitting canary deployment is considered in this document.

Lets see my way of Canary Deployment…

How my way of canary deployment differs from the one mentioned in Argo-rollout official documentation? lets scroll down more…

Lets have spot-light on each resources we are going to use here:

  1. An actual application endpoint and a canary endpoint (canary endpoint is always advised to be of private domain).
  2. Istio Gateways: Stable/Actual (A) — for actual endpoint and Canary (B) — for canary endpoint.
  3. Istio Virtual Services: that is bound to Gateway A (C) and that is bound to Gateway A&B (D)
  4. Kubernetes Services: Service pointing to production pods (E) and Service pointing to canary pods (F)
  5. Argo Rollout: Rollout resource (G) that replaces Deployment, which controls Virtual Gateways, Services and Replica sets.

Note: A,B,C,D,E,F,G are marked in Architectural Overview above.

Resource Definitions:

Gateways

# (A)
kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somegateway
namespace: somenamespace
spec:
servers:
- hosts:
- dummy.example.com
port:
name: http
number: 80
protocol: HTTP
selector:
istio: ingressgateway
---
# (B)
kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somegateway-canary
namespace: somenamespace
spec:
servers:
- hosts:
- dummy-canary.example.com
port:
name: http
number: 80
protocol: HTTP
selector:
istio: ingressgateway

Virtual-Services

# (C)
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somevs
namespace: somenamespace
spec:
hosts:
- dummy.example.com
gateways:
- somegateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: somesvc-stable.somenamespace.svc.cluster.local
port:
number: 80
weight: 100
- destination:
host: somesvc-canary.somenamespace.svc.cluster.local
port:
number: 80
weight: 0
---
# (D)
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somevs-canary
namespace: somenamespace
spec:
hosts:
- dummy-canary.example.com
gateways:
- somegateway-canary
http:
- match:
- uri:
prefix: /
route:
- destination:
host: somesvc-canary.somenamespace.svc.cluster.local
port:
number: 80
weight: 100

Services

# (E)
apiVersion: v1
kind: Service
metadata:
name: somesvc-stable
namespace: somenamespace
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: someapp
type: ClusterIP
---
# (F)
apiVersion: v1
kind: Service
metadata:
name: somesvc-canary
namespace: somenamespace
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: someapp
type: ClusterIP

Argo-Rollout

# (G)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
creationTimestamp: null
name: someapp
namespace: somenamespace
spec:
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app: someapp
strategy:
canary:
maxUnavailable: "10%"
abortScaleDownDelaySeconds: 30
canaryMetadata:
labels:
release: canary
canaryService: somesvc-canary
stableMetadata:
labels:
release: stable
stableService: somesvc-stable
steps:
- setCanaryScale:
replicas: 1
- pause: {}
- setWeight: 10
- pause: {}
- setWeight: 50
- pause: {}
- setWeight: 75
- pause: {}
trafficRouting:
istio:
virtualServices:
- name: somevs
template:
metadata:
annotations:
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scheme: "http"
prometheus.io/scrape: "true"
sidecar.istio.io/status: '{"version":"xxxx","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: someapp
istio.io/rev: ""
security.istio.io/tlsMode: istio
service.istio.io/canonical-name: someapp
service.istio.io/canonical-revision: latest
spec:
containers:
- image: <APP-IMAGE>
imagePullPolicy: Always
name: someapp
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /app-health/someapp/readyz
port: 15020
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 2
timeoutSeconds: 1
terminationMessagePolicy: FallbackToLogsOnError
- args:
- proxy
- sidecar
- --domain
#...<truncated just has injected istio-proxy definition>...
- name: istio-token
projected:
sources:
- serviceAccountToken:
audience: istio-ca
expirationSeconds: 43200
path: istio-token
- configMap:
name: istio-ca-root-cert
name: istiod-ca-cert

Enough Snippet !! Some Explanations !!

As visualised in the Architectural Overview Diagram,

  1. We are creating Gateways [A] and [B] for actual and canary endpoints respectively.
  2. Virtual-Service [C] has 2 destination services for the same endpoint and gateway with 100 weight for somesvc-stable (i.e [E]) and 0 weight for somesvc-canary (i.e [F]). Argo-rollout will control these weights during the deployment steps (setWeight: n).
  3. Virtual-Service [D] has gateway [B] and canary endpoint bound to it, which leads the traffic to somesvc-canary [F].
  4. On seeing service [E] and [F] we cannot spot any differences except service name. But Argo-rollout adds the label metadata “release: canary/stable” and the replica hash of replica-sets to the spec.selector of the corresponding services. In this way during the deployment (only when canary pods are up), service [E] points to stable pods and service [F] points to canary pods, until then both services [E] and [F] points to the same stable pods.
  5. In Argo-rollout resource [G], we’ll be specifying services [E] and [F], also only virtual-service [C]. The deployment step: “setCanaryScale” will only create “n” canary pod (“replicas: n”) but it does not change any weights in the Virtual-service [C].

So what’s different in my way??

We are creating a Gateway [B] and a Virtual Service [C] which points to canary environment always (if no canary pods, then it will point to stable pods) and we also have a particular canary endpoint (dummy-canary.example.com). So now with specified the canary-endpoint, testing the canary environment is very easy, as it can be done before sending live traffic to the canary pods during deployment steps (esp. at the first “pause: {}” step).

Hope this document is helpful for anyone who wants to kick-start canary deployments in the production, but still stuck in demo phase.

--

--