How we reduce 60% cost for ML cluster with K8s

Thaworn Kangwansinghanat
Graffity Technologies
3 min readOct 18, 2021

Intro

Running a GPU cluster for ML jobs with On-Demand instances can burn all your funding, especially for a seed-scale startup like us. So, we need to optimize every dollar we pay but still serve our needs.

In this post, we’ll talk about concepts and guidelines on how we ran the Machine Learning cluster cost-effectively.

Concepts

The concept is setting up K8s with an On-Demand instance for managed node group and GPU Spot instances (Preemptive for GCP) as worker nodes. So, your computing nodes stay in Spot pools, saving up to 90% of your cost compared to On-Demand.

However, your Spot instances can be terminated anytime. That’s why K8s came in this concept to automate interruption handling, provisioning, and autoscaling Spot instances for your workload. Once you complete this setup, you can leave K8s to automate Ops works for you.

On-Demand instance at least one for managed node group and GPU Spot instances as worker nodes.

The key component is Cluster Autoscaler can be provisioned as a single-pod deployment to an On-Demand instance. It can be used to manage scaling activities by changing the Auto Scaling group’s DesiredCapacity and directly terminating instances.

About Spot Instances (or Preemptible VM)

Spot instances are spare unused standard VMs suited for a stateless, fault-tolerant application. When compared to On-Demand instances, Spots are usually available at a 60–90% discount. However, if the provider wants to reclaim those resources for other use, these instances can also be terminated within a minute of notice.

Read more: AWS EC2 Spot and GCP Preemptible VM

Guidelines

The main goal of this concept is to make sure you reserve enough Spot capacity (Spot Pools) to reduce interruption and decrease provisioning time.

Spot Pools = (Availability Zones) * (Instance Types)

AWS recommends picking the same size of instances for each node group for example with a 1:4 vCPU to memory ratio:

  • 4vCPU / 16GB Node Group : m5.xlarge, m5d.xlarge, m5n.xlarge, m5dn.xlarge
  • 8vCPU / 32GB Node Group : m5.2xlarge, m5d.2xlarge, m5n.2xlarge, m5dn.2xlarge, m5a.2xlarge, m4.2xlarge

But in the GPU cluster, there’s no type of those instances much to shopping. So, we found that picking the same GPU type is best for us due to our observation. For example:

  • NVIDIA T4 Node Group : g4dn.xlarge, g4dn.2xlarge, g4dn.4xlarge, g4dn.8xlarge
  • NVIDIA V100 Node Group : p3.2xlarge, p3.8xlarge, p3.16xlarge

For hands-on, you can follow this article to try this concept on your own in a “Walkthrough” step below.

Fault-Tolerant vs. High Availability

Last but not least, you can leverage between lower your cost or higher cluster availability due to requirements that make Fault-Tolerant or High Availability cluster respectively.

To achieve those varieties, you can vary these three variables to reserve the number of instances in a node group that is minSize, maxSize, and desiredCapacity.

Example of reserving Spot instances for each node group

Our Notice: We found that GPU Spot instances on AWS are not terminated as often as CPU Spots.

This blog is about ideas and guidelines which are so brief to keep it readable. So, you can contact me for further information.

About us

We’re tech startup based in Southeast Asia. We create an AR Cloud Platform using our VPS technologies. Make The Metaverse happen in the real world.

--

--