Louis Vernon
May 20 · 4 min read
A selection of data products returned from our Raster service

The Descartes Labs Platform runs on Kubernetes and scales from hundreds to tens of thousands of cores in response to customer traffic.

Much of this load comes from our Tasks service, which allows users to scale out analytics models with high throughput access to petabytes of geospatial data. The bulk of the heavy lifting behind the retrieval, transformation, and delivery of this data is handled by our Raster service.

Scaling Challenges

Raster can be called directly from our Python client or via RESTful API and response times can vary significantly depending on the nature of the request.

Raster request latencies at a random moment in time
CPU utilization was not a great indicator of load

Our original approach to scaling Raster used a standard horizontal pod autoscaler (HPA) that tracked CPU utilization per pod. Unfortunately, variation in compute characteristics (requests could be I/O- or CPU-bound) made CPU utilization a poor indicator, and we needed a low threshold to stay ahead of the load.

Variation in both the nature and duration of requests meant that scaling based on request rate was also not ideal.

At Descartes Labs we are using Managed Istio (1.0.6) on GKE

We have been using Istio for a long time and took note of Istio metrics-based autoscaling, but these higher level metrics (i.e., labeled request counts, duration, rate) were not an obvious fit for our service.

Tapping into Envoy Metrics

Fortunately for us, Envoy, the sidecar proxy used by Istio, allows us to directly measure current saturation of our Raster service. From the Envoy docs:

upstream_rq_active — Gauge — Total active requests

By summing the upstream_rq_active across all Raster pods we get an effective measure of how many requests are currently being handled by our service.

To allow Kubernetes to scale on this metric we installed the Zalando kube-metric-adapter as packaged by Stefan Prodan. We could technically configure the metric adapter to scrape this metric directly from Envoy using the JSON stats endpoint, but it made more sense to let our existing Prometheus infrastructure handle scraping and aggregation.

Prometheus Configuration

In Prometheus this metric is surfaced as: envoy_cluster_upstream_rq_active
(this metric is also exposed via envoy_cluster_upstream_cx_active).

Number of active connections to Raster as reported by Envoy's upstream_rq_active metric

Unfortunately the Istio bundled Prometheus configuration scrapes but drops this number. To retain it you must modify or remove the following lines from the Prometheus config.

- source_labels: [ cluster_name ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop

Implementing Our Custom Autoscaler

With the number of active requests per pod being tracked by Prometheus we could then implement our custom HPA:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: raster-release
namespace: raster
annotations:
metric-config.object.raster-release-rq-active.prometheus/per-replica: "true"
metric-config.object.raster-release-rq-active.prometheus/query:| sum(max_over_time(envoy_cluster_upstream_rq_active{app="raster",cluster_name="inbound|8000||raster-release.raster.svc.cluster.local", namespace="raster",stage="release"}[1m]))
spec:
maxReplicas: 1500
minReplicas: 12
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: raster-release
metrics:
- type: Object
object:
metricName: raster-release-rq-active
target:
apiVersion: v1
kind: Pod
name: raster-release # needed for schema consistency
targetValue: 2

The critical part of this HPA config is the annotation block where we:

  • Provide a consistent label for the metric: raster-release-rq-active
  • Tell the metrics adapter to normalize the metric with respect to the number of pods: per-replica: “true"
  • Provide a PromQL query that returns the sum of the max number of active requests per Raster pod over the last minute.

Raster can handle four concurrent requests per pod, so we set the targetValue to two active requests per pod.

In our small-scale testing we found that simply taking the sum of envoy_cluster_upstream_rq_active yielded accurate numbers, but when we tested with production traffic (yay for Istio traffic mirroring!) and large numbers of pods, we needed to use a window of at least one minute to get consistent numbers.

Does It Work?

As shown below, our rq_active HPA roughly halved the requested resources. Even accounting for the delay introduced between Prometheus scraping Envoy and the metrics adapter querying Prometheus, we still get more responsive scaling than using CPU utilization, resulting in a lower 503 rate overall. We saw these trends continue once we applied our custom HPA in production.

Requested cores for raster-release (using CPU scaling) vs raster-mirror (using rq_activ)

Final Thoughts

  • Istio and Envoy made collecting telemetry and safely testing with production traffic (via mirroring) simple.
  • We’re now rolling this methodology out to multiple services throughout our stack.
  • How often do you get to improve quality of service and reduce costs in the process?

descarteslabs-team

Explore posts from the Descartes Labs team

Louis Vernon

Written by

Security, Operations and Site Reliability Engineering (SOS) at Descartes Labs.

descarteslabs-team

Explore posts from the Descartes Labs team

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade