Turbocharge Kubernetes Scaling: Autoscaling with CPA and Karpenter

Shubham Tanwar
Deutsche Telekom Digital Labs
4 min readSep 10, 2024

In a Kubernetes cluster, having the ability to quickly scale up resources can be crucial for maintaining application performance, especially under fluctuating workloads. One strategy to ensure that your cluster is always ready to handle an increase in demand is through overprovisioning. In this blog, we'll explore how to implement overprovisioning using Karpenter for autoscaling, coupled with the Cluster Proportional Autoscaler (CPA), to efficiently manage resources, including spot instances.

What is Overprovisioning?

Overprovisioning in Kubernetes involves running placeholder pods that reserve extra resources on a node, ensuring that there is always spare capacity available. When real workloads spike, these placeholders are evicted, allowing the cluster to quickly accommodate new pods without waiting for new nodes to be provisioned.

Why Use Overprovisioning?

- Faster Scaling: Overprovisioning ensures that there is always a node with spare capacity, reducing the time it takes to scale out additional pods.
- Proactive Resource Allocation: By reserving space ahead of time, overprovisioning enables your cluster to handle unexpected spikes in demand more gracefully.
- Flexibility with Spot Instances: With tools like Karpenter, you can leverage cost-effective spot instances while still ensuring high availability through overprovisioning.

Karpenter Installation

https://medium.com/@shubham.foss/karpenter-magic-deployments-splitting-spot-on-demand-2e0edf1df8de

Implementing Overprovisioning with Karpenter and Cluster Proportional Autoscaler

Step 1: Configure Overprovisioning with PriorityClass

The first step is to set up a PriorityClass that will be used by the placeholder pods. This class ensures that placeholder pods are the first to be evicted when the cluster needs to free up resources for higher-priority pods.

Here’s a sample configuration for creating a PriorityClass:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: overprovisioning
value: -1
globalDefault: false
description: 'Priority class used by overprovisioning.'

This `PriorityClass` assigns a value of `-1`, making these pods the lowest priority in your cluster.

Step 2: Deploy the Placeholder Pod

Next, you’ll create a Deployment that runs a placeholder pod using the `overprovisioning` PriorityClass. This pod will reserve resources but will not perform any actual work.

Here’s an example deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning
namespace: test-shared
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: overprovisioning
template:
metadata:
labels:
app: overprovisioning
spec:
priorityClassName: overprovisioning
containers:
- image: busybox:latest
name: overprovisioning
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
resources:
requests:
memory: "12000Mi"
cpu: "3.5"
limits:
memory: "14000Mi"
cpu: "4"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- overprovisioning

by using a node selector value spot we will be scheduling our pod on spot node

Step 3: Integrate with Cluster Proportional Autoscaler

The Cluster Proportional Autoscaler (CPA) dynamically adjusts the number of replicas for deployments based on the cluster size. This tool is particularly useful for services that need to scale proportionally with the number of nodes or cores in your cluster, such as DNS services.

You can install CPA using Helm:

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler
helm repo update
helm upgrade --install cluster-proportional-autoscaler cluster-proportional-autoscaler/cluster-proportional-autoscaler --values <<name_of_your_values_file>>.yaml

Step 4: CPA Configuration for Ladder Scaling

CPA supports different scaling modes. Here’s how you can configure it for ladder scaling based on cores and nodes:

ladder: |-
{
"coresToReplicas":
[
[ 1, 1 ],
[ 64, 3 ],
[ 512, 5 ],
[ 1024, 7 ],
[ 2048, 10 ],
[ 4096, 15 ]
],
"nodesToReplicas":
[
[ 1, 1 ],
[ 5, 1 ],
[ 10, 2 ],
[ 15, 3 ],
[ 20, 4 ]
]
}

This configuration scales the number of replicas ladderly with the number of cores and nodes, ensuring that your cluster remains responsive to changes in workload.

Step 5: Using Karpenter for Cost-Efficient Scaling

Karpenter is an open-source Kubernetes node provisioning system that automatically adjusts the compute capacity of your cluster. By integrating Karpenter with overprovisioning, you can use spot instances to minimize costs while maintaining the ability to quickly scale up resources.

Step 6: Managing Downscaling

When your cluster’s workload decreases, CPA works with Karpenter to downscale by removing underutilized nodes. CPA checks every 10 seconds to see if node utilization falls below 50%, at which point it triggers the removal process.

Conclusion

Overprovisioning with Karpenter and Cluster Proportional Autoscaler is a powerful strategy for maintaining high availability and rapid scaling in Kubernetes. By reserving resources in advance and leveraging spot instances, you can balance cost with performance, ensuring that your cluster is always ready to handle unexpected spikes in demand. Whether you’re running a high-traffic application or preparing for unpredictable workloads, this approach provides a robust foundation for Kubernetes resource management.

By integrating KEDA, Cluster Proportional Autoscaler (CPA), and Karpenter in your Kubernetes environment, you can revolutionized your scaling process — making it not just faster, but more cost-efficient and reliable. Previously, if a node wasn’t available, it would take 6 minutes to get a pod running — 5 minutes to add a new node and 1 minute for scheduling the pod. With Karpenter, we’ve slashed node provisioning time from 5 minutes to just 1, cutting total pod scheduling time to an impressive 2 minutes.

This game-changing setup allows us to scale aggressively without overspending. By harnessing the power of spot instances, we’ve slashed our infrastructure costs by 60%. Our strategy of splitting deployments 50% on-demand and 50% on spot instances ensures a perfect balance between cost-efficiency and system reliability, making our platform highly resilient to disruptions.

We’ve also enhanced overprovisioning with CPA using a ladder approach, which keeps spare nodes ready for immediate pod scheduling. By carefully calculating the nodes to overprovision relative to our current running capacity, we ensure that scaling is always fast and efficient.

This powerful combination of Karpenter, CPA, and KEDA enables us to scale rapidly, optimize resources, ensure high availability, and maintain an aggressive yet cost-effective strategy that drives both performance and reliability.

Acknowledgments

A sincere thank you to Abhishek Srivastava, Akshay Sharma, Randhir Thakur for playing pivotal roles in guiding & Presenting problem statements. Abhishek’s leadership, combined with Akshay,Randhir invaluable support, enriched the content with their expertise.

For more detailed information and examples,
refer to the [Cluster Proportional Autoscaler GitHub repository](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler/tree/master/charts/cluster-proportional-autoscaler).

--

--