Custom Kubernetes Scaling via Envoy Metrics

Louis Vernon
May 20 · 4 min read
A selection of data products returned from our Raster service

Scaling Challenges

Raster can be called directly from our Python client or via RESTful API and response times can vary significantly depending on the nature of the request.

Raster request latencies at a random moment in time
CPU utilization was not a great indicator of load
At Descartes Labs we are using Managed Istio (1.0.6) on GKE

Tapping into Envoy Metrics

Fortunately for us, Envoy, the sidecar proxy used by Istio, allows us to directly measure current saturation of our Raster service. From the Envoy docs:

Prometheus Configuration

In Prometheus this metric is surfaced as: envoy_cluster_upstream_rq_active
(this metric is also exposed via envoy_cluster_upstream_cx_active).

Number of active connections to Raster as reported by Envoy's upstream_rq_active metric
- source_labels: [ cluster_name ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop

Implementing Our Custom Autoscaler

With the number of active requests per pod being tracked by Prometheus we could then implement our custom HPA:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: raster-release
namespace: raster
metric-config.object.raster-release-rq-active.prometheus/per-replica: "true"
metric-config.object.raster-release-rq-active.prometheus/query:| sum(max_over_time(envoy_cluster_upstream_rq_active{app="raster",cluster_name="inbound|8000||raster-release.raster.svc.cluster.local", namespace="raster",stage="release"}[1m]))
maxReplicas: 1500
minReplicas: 12
apiVersion: apps/v1
kind: Deployment
name: raster-release
- type: Object
metricName: raster-release-rq-active
apiVersion: v1
kind: Pod
name: raster-release # needed for schema consistency
targetValue: 2
  • Tell the metrics adapter to normalize the metric with respect to the number of pods: per-replica: “true"
  • Provide a PromQL query that returns the sum of the max number of active requests per Raster pod over the last minute.

Does It Work?

As shown below, our rq_active HPA roughly halved the requested resources. Even accounting for the delay introduced between Prometheus scraping Envoy and the metrics adapter querying Prometheus, we still get more responsive scaling than using CPU utilization, resulting in a lower 503 rate overall. We saw these trends continue once we applied our custom HPA in production.

Requested cores for raster-release (using CPU scaling) vs raster-mirror (using rq_activ)

Final Thoughts

  • Istio and Envoy made collecting telemetry and safely testing with production traffic (via mirroring) simple.
  • We’re now rolling this methodology out to multiple services throughout our stack.
  • How often do you get to improve quality of service and reduce costs in the process?


Explore posts from the Descartes Labs team

Louis Vernon

Written by

Security, Operations and Site Reliability Engineering (SOS) at Descartes Labs.


Explore posts from the Descartes Labs team