Time Based Scaling for Kubernetes Deployments

Armanit Garg
Conversation Intelligence
4 min readMar 19, 2020

--

At Symbl, during our journey to be able to support our growing customer base and simultaneously be cost efficient, we came across the problem of scaling our services only during our peak usage hours.

The Problem

Currently we have a set of machine learning micro services running as Kubernetes Deployments. These are all very compute intensive and have to be scaled on the fly in case there is extra load. We do this by using HorizontalPodAutoscalers that scale based on CPU utilization. This tends to fail when we hit our peak hours since the service pods take a long time to boot up and result in dropped connections.

We could simply fix this by setting the minimum pod requirement for the HPA to be able to handle peak traffic. However, these costs add up since these extra pods are not being used for most of the day and on weekends. The approach we decided to explore was to scale only during peak hours.

How did we fix this?

TLDR; Kubernetes CronJob and kubectl patch

Since we can’t directly change the deployment replica count because the final scaling decision lies with the HPA, we decided to create a Kubernetes CronJob to directly modify the HPA’s minReplica count. This way we can let Kubernetes handle how it wants to scale the deployment during traffic. Additionally, by restricting ourselves to modify only the min replica count, we ensure that the HPA doesn’t accidentally kill running pods because of a modified requirement of a maxReplica count.

We did this by using a docker container with gcloud and kubectl for our CronJob which accepts a kubectl command as an entry point argument. This container has bash script which configures kubectl to access our GKE cluster after validating the service account credentials. We pass these credentials to the script through a Kubernetes ConfigMap. This script then executes the kubectl command passed to it as an argument.

FROM google/cloud-sdk:alpine
RUN gcloud components install kubectl
WORKDIR /app
COPY . .
ENTRYPOINT ["./execute.sh"]

For each of our services now, we define 2 separate CronJobs, one for scaling the service up and down respectively. Each CronJob is passed the Kubectl patch command for their respective service’s HPA to only patch the minimum replica count.

Scale-Up Job:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: service-scale-up-job
namespace: scaling-jobs
spec:
schedule: "0 17 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: service-scheduled-job
image: service-scheduled-job:latest
args:
- "kubectl patch hpa service-hpa --patch '{\"spec\":{\"minReplicas\":15}}'"
env:
- name: "SERVICE_ACCOUNT_KEY"
valueFrom:
secretKeyRef:
key: "SERVICE_ACCOUNT_KEY"
name: "scale-job-secret"

Scale-Down Job:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: service-scale-down-job
namespace: scaling-jobs
spec:
schedule: "0 22 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: service-scheduled-job
image: service-scheduled-job:latest
args:
- "kubectl patch hpa service-hpa --patch '{\"spec\":{\"minReplicas\":3}}'"
env:
- name: "SERVICE_ACCOUNT_KEY"
valueFrom:
secretKeyRef:
key: "SERVICE_ACCOUNT_KEY"
name: "scale-job-secret"

All CronJob schedule: times are based on the timezone of the kube-controller-manager (more on that here). GKE’s master follows UTC timezone and hence our cron jobs were readjusted to run at 9AM and 2PM CST. We also avoided scaling during the weekend by specifying 1–5 for weekdays in our crontab expression.

In order to add more visibility with this change for scaling of the pods, we also modified the deployments with container lifecycle hooks to notify our slack channel. This way we can keep track of whether the CronJob failed or not. We also added PodDisruptionBudget to ensure that we always maintain 1 pod for each service and prevent kubectl from accidentally removing the entire deployment.

Finally, we ran tests under load to see how patching the HPA affects the service. The scenarios we ran were:

  • Running the scale-up CronJob under load such that it has to spin up new nodes for the Kubernetes cluster. In all cases we could automatically see the HPA kick in even before the CronJob patched it to scale itself appropriately.
  • Running the scale-down CronJob under load. Here the HPA doesn’t scale down the deployment until the load has passed even after the CronJob has patched its minReplica count.

We could always see that the CronJob had minimal effect on the HPA and gave us the desired replica count in all cases.

Other Approach

Kubernetes has support for utilizing custom metrics to inform HPAs for scaling. With this, we can write our custom metric service which could expose time-based metrics via the Kubernetes API to the HPA. This way the HPA can decide to scale to the appropriate amount of pods whenever the custom metric service increases the target metric amount. Unfortunately for the HPA, we can only use percentages of the custom metric to specify the target amount of pods we want to scale. Hence writing an algorithm that can do this seemed “hacky” and unreliable and we decided to not follow this approach.

Conclusion

Ultimately, the best approach that worked for us was to approach the problem as simply as possible using what we already have. In the future we might look into turning this flow into an open-source CRD.

What are other approaches you have come across for solving this problem? Happy to chat!

--

--