Auto Scaling for Google Kubernetes Engine

Frank Chung
DeepQ Research Engineering Blog
2 min readOct 17, 2018

This post shares things related to server auto-scaling using GKE (Google Kubernetes Engine).

The basic computing unit in Kubernetes is a node, which is equivalent to

  • 1 AWS vCPU
  • 1 GCP vCPU
  • 1 Azure vCore

A deployment in Kubernetes defines the number of replicated pods that runs the same container to serve requests.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: demo
spec:
replicas: 4
containers:
- name: demo
image: vish/stress

Vertical Pod Auto Scaling

We can specify the resource request and limit of each pod for CPU and memory as follows:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: demo
spec:
containers:
- name: demo
image: vish/stress
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: 500M
memory: "200Mi"

In the above example, a pod requires 500 millicpu (1 vCPU = 1000M) and 200 MB memory to allocate, and it can utilize at most 1 vCPU and 1 GB memory. Noted that if the resource of the whole cluster (node pool) is not enough, the pod will not be scheduled.

Horizontal Pod Auto Scaling

We can dynamically adjust the number of replicas according to overall CPU utilization of all pods. Apply Kubernetes HPA (Horizontal Pod Autoscaler) to achieve this:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: demo
targetCPUUtilizationPercentage: 60

In above example, a new pod will be allocated if overall CPU utilization is exceeding 60%.

Cluster Auto Scaling

If all the nodes are occupied, no more pod can be scheduled. We can enable GKE’s cluster auto-scaling to dynamically add nodes to fullfill the requirements for those un-scheduled pods.

gcloud container clusters create example-cluster \
--num-nodes 2 --enable-autoscaling --min-nodes 1 --max-nodes 4

The controller of GKE cluster will adjust the number of nodes dynamically according to current resource demands. Noted that the auto-scaler is triggered periodically (~10 minutes) by GKE.

--

--