Using preemptible VMs to cut Kubernetes Engine bills in half

Preemptible VMs have fixed pricing up to 80% off regular instances but unfortunately they are mostly advertised for running short lived batch jobs. In this blog post we’ll see how a mix of preemptible and regular Kubernetes nodes on Google Cloud Platform can substantially save money without sacrificing application stability.


First things first, it won’t work for any application. It’s not a silver bullet! Your application should be tolerant to unexpected failures of some pods. Let’s see on an example what we mean by that.

In our case we run Cirrus CI on Kubernetes engine. Cirrus CI is powered by 20 microservices and some other cool tech. Each service is backed up by at least two pods with a configured Horizontal Pod Autoscaler to scale up if needed. All of the microservices but two are stateless and can tolerate fails of any pods. The other two are generally speaking stateless too but in their case we want to minimize unexpected pod failures because it’s very expensive for clients to re-connect and continue working with them. For example, one of them is responsible for uploading caches from CI agents and it will be unreasonably expensive to re-upload a cache on a failure.

Idea is simple: we want to schedule pods we can tolerate failures of on preemptible nodes. We also need to make sure we never schedule pods we can’t tolerate failures of on preemptible nodes.

Kubernetes Engine Setup

Your Kubernetes cluster should have a pool of preemptible nodes. If it doesn’t, here is a gcloud command to create an auto-scalable node pool of preeptible VMs:

gcloud container node-pools create preemtible-pool \
--cluster $CLUSTER_NAME \
--zone $CLUSTER_ZONE \
--scopes cloud-platform \
--enable-autoupgrade \
--preemptible \
--num-nodes 1 --machine-type n1-standard-8 \
--enable-autoscaling --min-nodes=0 --max-nodes=6

Note: nodes in preemptible pool will have cloud.google.com/gke-preeptible: true label.

Node Affinity

In order to control scheduling of pods we can use Node Affinity feature. With Node Affinity we can have hard and soft preferences for scheduling of pods.

Hard Preference

Hard preference is ideal to make sure some pods won’t ever be scheduled on preemptible nodes. Pods that we can’t tolerate failures of. With requiredDuringSchedulingIgnoreDuringExecution we can make sure pods won’t be scheduled on nodes with cloud.google.com/gke-preeptiblelabel presented. Simply add following lines to your Pod or Deployment spec:

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: DoesNotExist

Soft Preference

Soft preference on the other hand just tells Kubernetes preferences for scheduling. For example there are no guaranties by Google Compute Engine that preemptible VMs will be always available. For our use case we just want to tell Kubernetes something like “Please schedule these pods on nodes with cloud.google.com/gke-preemptiblelabel if you can. If not, then doesn’t matter”. Here is how to modify your Pod or Deployment spec in order to achieve this using preferredDuringSchedulingIgnoreDuringExecution:

affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: Exists
weight: 100

Results

We have such soft preferences for our 18 stateless microservices and hard preferences for our two special cases described above. And results are pretty impressive! On average 75% of our production workload is running on preemptible nodes which saves 50% in GCP bills.

Please follow us on Twitter and if you have any question don’t hesitate to ask them!