Take scaling to the next level
The vanilla kubernetes Horizontal Pod Autoscaler (HPA) provides limited configuration options. If your workload is not computationally heavy, or you would like to base your pod count on a different metric rather than the default CPU and memory metrics, than you will not find any easy solutions. Perhaps you would like to scale your workers based on the number of requests pending to be acknowledged…
We had to face this situation here at Mercadona Tech. The solution is quite easy to set up and configure, and involves the following components:
- Custom metrics API server + k8s-prometheus adapter
- Stackdriver exporter (for our use case; optional/replaceable with other exporters)
The Problem we had to face
Our use case involves a vehicle routing service that needs to scale based on the pending Pub/Sub message count regardless of CPU or memory.
Due to the architecture of our vehicle routing service, one Relé worker pod is designed to handle one message at a time. In case we receive multiple requests, the number of workers should be increased.
Our Solution
To set up the above mentioned tools, please refer to their documentation. The magic happens afterwards.
Kubernetes does not understand the Prometheus metric format, so we need a translator between the two services. The k8s-prometheus adapter does just that. It queries Prometheus for the defined metrics, and transforms them for the custom-metric API to ingest, and present further to the HPA resources. Pub/Sub publishes it’s metrics in Stackdriver, from where we need to export them to Prometheus.
First of all configure the Stackdriver exporter to expose the needed metric(s):
env: STACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXESvalue: pubsub.googleapis.com/subscription/num_undelivered_messages
With the above environmental variable defined, you should start to see the above metric under the name
stackdriver_pubsub_subscription_pubsub_googleapis_com_subscription_num_undelivered_messages
Stackdriver exporter does not publish labels that could be used to reference the connected k8s resources, therefore some relabeling trickery is necessary to link the stackdriver metric with a k8s resource. This means we have to expose in the labels the kubernetes resource it relates to, e.g. the name of the deployment, and the namespace the deployment resides in. One possible way is that when your subscription ID — which is a default label for the above metric — contains the necessary information, you can transform it with some regex magic to useful labels. The minimum required labels are the deployment and the namespace name.
After you have the labels exposed, the configuration of the k8s-prometheus adapter configuration should contain the following rule:
A little explanation for the above fields:
- seriesQuery selects the metric in general
- override creates the associations between the prometheus labels and the respective k8s resources
- optionally, the metric can be renamed under the name key
- metricsQuery should define the exact query that provides the value for each of our Relé workers. The keys between ‘<<>>’ are builtin variables
If everything works fine, you should see the pubsub metric by querying the custom-metric API:
To check for the actual metric values, the following query should be used:
Please note the value of ‘apiVersion’. When you associate a custom metric to a k8s deployment resource, the following API version will be used:
Now when you have the metric successfully exposed through the k8s custom-metric API, an HPA can be configured to use this metric:
It is possible with the autoscaling/v2beta1 API version to scale based on multiple metrics. Simply add e.g. the CPU target under the metrics list.
Let’s observe the behaviour
Since the metric we scale on is exported to Prometheus, and hopefully you keep an eye on the number of your pods, too, creating some grafana visualizations should be quite easy.
Below we see that with the scaling threshold set to 1, with the increased unacknowledged message count the number of workers increased, too. The base value of the second graph is 1.
Conclusion
Extending your kubernetes cluster with the custom-metrics API opens a whole new world in terms of workload scaling. The above presented solution is just one corner case, but shows the endless possibilities.
Additional use cases could include e.g. scaling web services based on http request count, or virtually any metric that is exported to your Prometheus.
If these kinds of challenges are appealing to you, come and join our team at Mercadona Tech.