Using preemptible instances in Google Cloud Kubernetes engine

Published in

Pasarpolis Product + Tech

4 min readJun 19, 2020

As your organization grows, your infrastructure on cloud also grows, and as that grows, so does your bills from your cloud vendors. Same happened with us at Pasarpolis. To reduce our bills from Google Cloud, we took several measures and using preemptible VM instances was one of them.

In this post, I will be sharing how we are using preemptible instances in our Kubernetes cluster.

We use kubernetes to host our various micro-services. Majority of our applications can be divided into three components.
1. API deployment
2. Celery workers to process asynchronous tasks
3. Pub-Sub subscriber clients to receive messages from other services.

Before I start to explain how we have utilized preemptible instances in our kubernetes clusters, it is necessary that you should know about preemptible instances offered by Google Cloud. You can read more about them at https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms

Given the nature of preemptible instances, we decided to use them for deploying “asynchronous” parts of our applications. These include our celery workers and subscriber clients. These applications basically function on a “pull” mechanism. For example, celery worker pulls tasks from the broker queue and so does subscriber client from the Pub-Sub server. While other components like api-deployment etc work on a “push” mechanism. Tasks, requests etc are pushed by clients to these applications.

You can afford to have a downtime in the pull based asynchronous components, but not on the end user facing API deployments.
Note: I will be using the term “asynchronous” to refer to these components in this post.

Node Pool Configuration:

We added two new node pools to our k8s clusters. Let’s call them the following.
1. preemptible-pool
2. preemption-fallback-pool

As the name suggests, preemptible-pool would contain preemptible nodes and preemption-fallback-pool would contain regular nodes to act as a fallback when preemptible resources are not available. Both these node pools contain nodes with same CPU and memory configuration.

We put both these node pools on auto scaling mode. In any normal condition, the fallback pool should have zero nodes and all pods should be deployed on preemptible-pool. Fallback pool should only be scaled when preemptible resources are not available.

Pod Scheduling:

During and after the creation of these node pools, the most important part is the scheduling configuration of your pods. Following things were needed to be taken care of:

1. “asynchronous” pods should only be deployed in our new nodes and not on any other nodes.
2. No pods other than these “asynchronous” pods should be deployed on our new nodes.

Node taints:

In order to fulfill the above conditions, we have to first taint our new nodes. We added a taint with the following configuration on our new nodes.

key: pp-preemptible
value: "true"
effect: NoSchedule

This taint will tell k8s to not schedule any pods on these nodes. But, we want our “asynchronous” pods to get scheduled on these nodes. Therefore, we have to add a “toleration” in the pod spec.

Adding tolerations for taints

To add a toleration, the pod spec should contain:

tolerations:
- key: "pp-preemptible"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

With the configuration so far, we have ensured that no pods except our asynchronous pods will be scheduled on these nodes.

K8s labels for “asynchronous” pods:

Now, to ensure that our “asynchronous” pods get scheduled on these new nodes only, we will add a kubernetes label to new nodes.
We added the following label to our nodes.

So, our node configuration looks like:

Node affinity spec:

And then we updated “node affinity” configuration in the pod spec.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: application-type
            operator: In
            values:
            - async

This will tell k8s to schedule our “asynchronous” pods in nodes containing the label “application-type:async”, which exists only in our new nodes.

There is one more affinity rule that we need to add to our pod spec, which will tell k8s to prefer preemptible nodes for our “asynchronous” pods.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: application-type
            operator: In
            values:
            - async
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: cloud.google.com/gke-preemptible
            operator: In
            values:
            - "true"

This will make our pods prefer preemptible nodes over regular nodes from preemption fallback pool.

The final pod spec looks like this:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: application-type
            operator: In
            values:
            - async
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: cloud.google.com/gke-preemptible
            operator: In
            values:
            - "true"
   tolerations:
   - key: "pp-preemptible"
     operator: "Equal"
     value: "true"
     effect: "NoSchedule"

So, with this configuration, we were able to transfer some of our load to preemptible nodes.

In another post, I will sharing how we are doing the following two activities:

1. Try to ensure graceful terminations in preemptible nodes.
2. Shift pods back to preemptible nodes from the fallback pool.

Stay tuned and feel free to share your comments, suggestions and ask any questions.