Taming the Ebb and Flow: Cost Efficient Time Based Autoscaling

Aditya Bennur
engineering-udaan
Published in
4 min readAug 1, 2024

The Challenge: Predictable Peaks and Unpredictable Costs

At Udaan, we operate a complex microservices architecture with over 200+ Kubernetes deployments. Analysis of our application traffic patterns reveals distinct “ebbs” and “flows”.

During business hours, our servers are bustling city squares, by night, they’re quiet ghost towns. This cyclical nature of our workloads presented us with an opportunity to optimize our compute resource utilization and reduce operational costs.

y=rps; x=time

Primer: Autoscaling

Kubernetes autoscaling capabilities play a pivotal role in its widespread adoption across enterprises. Among the various autoscaling methods available for microservices, such as Vertical Pod Autoscaling (VPA), Horizontal Pod Autoscaling (HPA), and Event-Driven Autoscaling, HPA stands out as the most commonly adopted solution.

HPA dynamically adjusts the number of replicas for a Deployment or ReplicaSet based on metrics like CPU and memory utilization, or custom metrics. This reactive scaling responds to incoming traffic patterns without relying on predefined time-based components.

In contrast, time-based scaling is a modern approach that allows applications to scale based on user-defined schedules. This method proves advantageous for scenarios with predictable traffic patterns.

While HPA excels in handling dynamic and unpredictable traffic, it may not be the ideal choice for environments where traffic patterns are well-understood and predictable. Time-based autoscaling empowers organizations to efficiently scale resources up or down ahead of expected demand.

Approach: Time Driven Scaling

Post evaluation and comparison of available autoscaling solutions that aligned with our requirements and limitations, we opted to implement time-based autoscaling. Enter “Service Scaler”, a home-grown Kubernetes operator which pro-actively monitors and controls the HPA object of a corresponding deployment and facilitates gradual scaling of workloads in accordance with a specified configuration.

The Configuration (CRD)

CRD definition

Time can be specified in the following formats:

  • ZonedTime HH:MM<tz-offset>
  • ZonedDateTimerfc3339 format Ex: 2023-01-11T08:00:00+05:30

If no time range matches the default configuration will be applied.

Operator “Mechanics”

Let’s explore the inner workings of the Service Scaler through visual representations

state machine
architecture

Operational Overview

  • The Controller monitors and reacts to three types of events within the ServiceScaler CRD:
    1. Create
    2. Update
    3. Delete
  • Reconciliation:
    1. Verifies if the “time-range” matches.
    2. Assesses “early-exit” conditions.
    3. Execute HPA operations accordingly.
  • Given the dependency on the “time” dimension, reconciliation is forced every 5 minutes to maintain the desired state of the HPA consistently, even when Create/Update/Delete events are not triggered.
  • Invalid HPAs (HPAs having minReplicas == maxReplicas) are automatically removed conserving configuration sanity.

Ramp up-down “mechanics”

Scaling activities commence 30 minutes in advance of both anticipated traffic increases and decreases, ensuring systems are primed and ready.

ramp-up down module

Enough talk! Let’s take a look at a scenario where it would begin scaling down from 14 to 2 replicas prior to midnight (00:00), ensuring the 2 replicas are fully provisioned and ready to handle the expected lower traffic load at the start of the day.

y=replicas; x=time

Kill Switch

For those rare instances when things might not go as planned, we’ve built in a manual override. By adding a simple annotation to the HPA, the Service Scaler can be bypassed, putting control back in the hands of our operators.

service-scaler.kubernetes.io/managed: “false”

Game Plan: patch “minReplicas”

With the ServiceScaler framework in place, we proceeded to adjust the minReplicas field of the Horizontal Pod Autoscaler (HPA) based on the time of day for 200+ deployments. We divided the 24-hour period into three distinct intervals:

  1. Morning (8am — 4pm)
  2. Evening (4pm — 12am)
  3. Night (12am — 8am)

The corresponding minReplicas values for each interval was calculated statistically, proportional to the observed load patterns.

Aftermath: More Than Just Promising Numbers

We achieved a 25% reduction in VM costs whilst ensuring our infrastructure was breathing in sync with our business demands. Let’s take a look at some visuals portraying the reduction.

node-count reduction
vm-quantity reduction

We’ve open sourced the Service Scaler, so be sure to give it a spin! Contributions and feedback are always welcomed.

--

--