Istio Series — 2: Deployment: Canary Deployments with Istio and Argo Rollouts in Kubernetes
Hi Folks, let’s think about the canary deployment strategy in Kubernetes. Besides Rolling Update as the default Deployment strategy for Deployments, we have other fruitful deployment strategies, like Blue-Green and Canary deployments. (Here is a nice documentation for these deployment strategies).
In this discussion, we’ll give our thoughts to implement Canary Deployment strategy in Argo Rollouts with Istio as our service-mesh layer.
Glimpse about how canary deployment in Argo-rollout works: https://argoproj.github.io/argo-rollouts/features/canary/
Plain Vanilla version of Canary deployment using Argo-rollout and Istio: https://argoproj.github.io/argo-rollouts/features/traffic-management/istio/#host-level-traffic-splitting
Note: Only the Host-level-traffic-splitting canary deployment is considered in this document.
Lets see my way of Canary Deployment…
How my way of canary deployment differs from the one mentioned in Argo-rollout official documentation? lets scroll down more…
Lets have spot-light on each resources we are going to use here:
- An actual application endpoint and a canary endpoint (canary endpoint is always advised to be of private domain).
- Istio Gateways: Stable/Actual (A) — for actual endpoint and Canary (B) — for canary endpoint.
- Istio Virtual Services: that is bound to Gateway A (C) and that is bound to Gateway A&B (D)
- Kubernetes Services: Service pointing to production pods (E) and Service pointing to canary pods (F)
- Argo Rollout: Rollout resource (G) that replaces Deployment, which controls Virtual Gateways, Services and Replica sets.
Note: A,B,C,D,E,F,G are marked in Architectural Overview above.
Resource Definitions:
Gateways
# (A)
kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somegateway
namespace: somenamespace
spec:
servers:
- hosts:
- dummy.example.com
port:
name: http
number: 80
protocol: HTTP
selector:
istio: ingressgateway---
# (B)kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somegateway-canary
namespace: somenamespace
spec:
servers:
- hosts:
- dummy-canary.example.com
port:
name: http
number: 80
protocol: HTTP
selector:
istio: ingressgateway
Virtual-Services
# (C)
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somevs
namespace: somenamespace
spec:
hosts:
- dummy.example.com
gateways:
- somegateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: somesvc-stable.somenamespace.svc.cluster.local
port:
number: 80
weight: 100
- destination:
host: somesvc-canary.somenamespace.svc.cluster.local
port:
number: 80
weight: 0
---
# (D)
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
name: somevs-canary
namespace: somenamespace
spec:
hosts:
- dummy-canary.example.com
gateways:
- somegateway-canary
http:
- match:
- uri:
prefix: /
route:
- destination:
host: somesvc-canary.somenamespace.svc.cluster.local
port:
number: 80
weight: 100
Services
# (E)
apiVersion: v1
kind: Service
metadata:
name: somesvc-stable
namespace: somenamespace
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: someapp
type: ClusterIP
---
# (F)
apiVersion: v1
kind: Service
metadata:
name: somesvc-canary
namespace: somenamespace
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: someapp
type: ClusterIP
Argo-Rollout
# (G)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
creationTimestamp: null
name: someapp
namespace: somenamespace
spec:
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app: someapp
strategy:
canary:
maxUnavailable: "10%"
abortScaleDownDelaySeconds: 30
canaryMetadata:
labels:
release: canary
canaryService: somesvc-canary
stableMetadata:
labels:
release: stable
stableService: somesvc-stable
steps:
- setCanaryScale:
replicas: 1
- pause: {}
- setWeight: 10
- pause: {}
- setWeight: 50
- pause: {}
- setWeight: 75
- pause: {}
trafficRouting:
istio:
virtualServices:
- name: somevs
template:
metadata:
annotations:
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scheme: "http"
prometheus.io/scrape: "true"
sidecar.istio.io/status: '{"version":"xxxx","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: someapp
istio.io/rev: ""
security.istio.io/tlsMode: istio
service.istio.io/canonical-name: someapp
service.istio.io/canonical-revision: latest
spec:
containers:
- image: <APP-IMAGE>
imagePullPolicy: Always
name: someapp
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /app-health/someapp/readyz
port: 15020
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 2
timeoutSeconds: 1
terminationMessagePolicy: FallbackToLogsOnError
- args:
- proxy
- sidecar
- --domain
#...<truncated just has injected istio-proxy definition>...
- name: istio-token
projected:
sources:
- serviceAccountToken:
audience: istio-ca
expirationSeconds: 43200
path: istio-token
- configMap:
name: istio-ca-root-cert
name: istiod-ca-cert
Enough Snippet !! Some Explanations !!
As visualised in the Architectural Overview Diagram,
- We are creating Gateways [A] and [B] for actual and canary endpoints respectively.
- Virtual-Service [C] has 2 destination services for the same endpoint and gateway with 100 weight for somesvc-stable (i.e [E]) and 0 weight for somesvc-canary (i.e [F]). Argo-rollout will control these weights during the deployment steps (setWeight: n).
- Virtual-Service [D] has gateway [B] and canary endpoint bound to it, which leads the traffic to somesvc-canary [F].
- On seeing service [E] and [F] we cannot spot any differences except service name. But Argo-rollout adds the label metadata “release: canary/stable” and the replica hash of replica-sets to the spec.selector of the corresponding services. In this way during the deployment (only when canary pods are up), service [E] points to stable pods and service [F] points to canary pods, until then both services [E] and [F] points to the same stable pods.
- In Argo-rollout resource [G], we’ll be specifying services [E] and [F], also only virtual-service [C]. The deployment step: “setCanaryScale” will only create “n” canary pod (“replicas: n”) but it does not change any weights in the Virtual-service [C].
So what’s different in my way??
We are creating a Gateway [B] and a Virtual Service [C] which points to canary environment always (if no canary pods, then it will point to stable pods) and we also have a particular canary endpoint (dummy-canary.example.com). So now with specified the canary-endpoint, testing the canary environment is very easy, as it can be done before sending live traffic to the canary pods during deployment steps (esp. at the first “pause: {}” step).
Hope this document is helpful for anyone who wants to kick-start canary deployments in the production, but still stuck in demo phase.