Building Your Own Custom Metrics API for Kubernetes Horizontal Pod Autoscaler

Published in

The Startup

9 min readAug 31, 2020

Preface

Kubernetes is a lot of fun, has lots of features and usually supports most of one’s whims as a container orchestration tool in a straight-forward fashion.

However, one request from my direct manager had made me sweat during my attempts to achieve it: auto-scale pods according to a complicated logic criteria.

Trying to tackle the task, my online research yielded partial solutions, and I ran through so many brick walls trying to crack this one, that I had to write an article about it in order to avoid future confusion regarding this matter for all poor souls who might try to scale-up their micro-services on a criteria that’s not CPU/Mem.

The Challenge

It all started when we needed to scale one of our deployments according to the number of pending messages in a certain queue of RabbitMQ.
That is a cool, not overly complicated task that can be achieved by utilizing Prometheus, Rabbitmq-exporter, and Prometheus-adapter together (hereinafter referred to as “the trio”).

With much enthusiasm and anticipation, I jumped right into the implementation only to later discover that one of my manager’s magic light-bulbs had switched on in his brain. It happens quite often, fortunately for him, and less fortunately for me as this usually means stretching the capabilities of the technology at hand with advanced and not-often-supported demands.

He came up with a better, more accurate scaling criteria for our deployment. In a nutshell: measures how long a message has been waiting in queue “A” using the message’s timestamp, and then performs some logic to determine the final value of the metric, which is always returned as a positive integer.

Well, that’s nice and all, but as far as my knowledge extends, the trio mentioned above is not able to perform the advanced logic my manager desired. After all it relies solely on metrics that RabbitMQ exposes, so I was left to figure out a solution.

The experience from trying to implement the trio has helped me gain a better view on how the Horizontal Pod Autoscaler works and reads data from sources.

As per the documentation, HPA works mainly against 3 APIs:

Metrics
Custom Metrics
External Metrics

My plan was to somehow harness the ‘custom metrics’ API and have it work against an internal application metrics API of our own, with the intention that the HPA would be able to read data from the internal API and scale accordingly.

This API could, in the future, be extended and serve as an application-metric for other deployments that need scaling based on internal application metrics or any kind of metrics for that matter.

This in essence involves the following tasks:

Writing the code for our internal API
Creating a Kubernetes deployment and service for our internal API
Creating a Custom Metrics APIService in Kubernetes
Creating the HPA resource

And with that in mind, let’s get to work.

Please note that for the sake of demonstration, I used the ‘custom-metrics’ namespace in all yaml definitions. However, it’s an arbitrary selection so feel free to deploy it anywhere you want.

Writing the Internal API Metrics Server

Since web development isn’t a Devops specialty, I turned to the pros and asked my talented colleague Naama Yochai to help out with getting this part of the equation done. She’s the one in the team in charge of the deployment that we’re trying to scale, so she also had high-stakes in this project.

Our basic requirement was a fast and simple web application, so we used JavaScript and the Express web-server, nothing fancy.

For the sake of POC we only need to return some mock integer value and Naama would later on write the code performing the advanced logic behind the scaling criteria.

However, we had no idea how to return the results in a way that the HPA would understand. What’s the expected output when querying our API?
We knew it was probably JSON, but had no idea of the structure nor the path on the web-server that HPA would search for when looking for that scale metric. Will it hit ‘/’ by default?

To better understand how APIs in Kubernetes work, we needed to take a look at the ‘metrics’ API (implemented by ‘metrics-server’) that was already deployed in our cluster. This API gives you CPU/Mem metrics for pods and worker nodes.

When we described the service, we noticed the ‘Self Link’ value in the metadata of the service:

This gave us a hint as to the path in which APIs are expected to work in Kubernetes.

Once the root path was discovered, we created a small Docker container running NodeJS and ended up having a simple HTTP server running in port 6443.

However, we soon found the first chink in the armor: the Kubernetes API works exclusively with HTTPS web-servers (namely port 443) and cannot use insecure HTTP.

A couple of code tweaks later, and viola! The server was working in a secure protocol and listening on the same port.

Here’s the Dockerfile we used:

FROM ubuntu:18.04# Install some debugging/editor toolsRUN apt-get update --fix-missing && \
 apt-get install -y \
 build-essential \
 net-tools \
 iputils-ping \
 vim \
 curl \
 nginx \
 wget -yRUN curl -sL https://deb.nodesource.com/setup_10.x | bash -
RUN apt-get install nodejs -yWORKDIR /metric-exporterCOPY . /metric-exporterRUN npm installEXPOSE 6443
ENTRYPOINT [“/usr/bin/node”]
CMD [“./index.js”]

Creating a Kubernetes Deployment & Service

Now that the basic code outline for the internal API was ready, we proceeded with creating a deployment.

The metrics-exporter.yaml deployment file is fairly simple and looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: metrics-exporter
  template:
    metadata:
      labels:
        app: metrics-exporter
    spec:
      containers:
      - image: our-company/metrics-exporter
        name: metrics-exporter
        ports:
        - containerPort: 6443

Once the pod was up and running, it needed to be backed-up by a corresponding service, so we applied the following metrics-exporter-service.yaml file:

apiVersion: v1
kind: Service
metadata:
  name: metrics-exporter
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: 6443 
  selector:
    app: metrics-exporter

Notice that the service is configured to run on port 443, but the container port is the aforementioned 6443 port.

Let’s test the internal API from within the pod. Since there’s no certificate to our web-server, we used the ‘-k’ switch:

curl -k https://localhost:6443/apis/custom.metrics.k8s.io/v1beta1/

And received the following output:

{“status”:”healthy”}

This would serve later on as a “Liveness probe” that the web-server is healthy and accepting requests.

So up until this point we had a working web-server on port 6443 that hopefully the APIService we were going to create would be able to communicate with. We had yet to figure out the path to retrieve metrics from, but more on that later.

Creating the Custom Metrics APIService

When applied, the below resource definition creates and registers the ‘v1beta1.custom.metrics.k8s.io’ APIService. Note that it has no namespace affinity and thus a global resource.

apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  insecureSkipTLSVerify: true
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 1000
  versionPriority: 5
  service:
    name: metrics-exporter
    namespace: custom-metrics
  version: v1beta1

The most interesting bit in this APIService resource definition is the ‘service’ stanza. It basically registers a new APIService object that will bind the new API path to the Kubernetes service that’s implementing it, in our case it is the ‘metrics-exporter’ service in the ‘custom-metrics’ namespace.

This APIService would connect to our internal API Kubrenetes service in an attempt to retrieve our custom application metrics.

The chain of information looks like this:

APIService → Metrics-Exporter Kubernetes-Service → Metrics-Exporter Pod

If you’re using RBAC in your cluster, you’ll also need to create ServiceAccount, ClusterRole and ClusterRoleBindings for the custom metrics API. I’ve added all yaml definitions for the above in here.
Without these you’ll get “Unauthorized” error messages since the API would fail to interact with Kubernetes resources.

Now, let’s examine the newly-created APIService:

kubectl get apiservices v1beta1.custom.metrics.k8s.io

We can see that it’s available, let’s describe it:

It reports back that “all checks passed”, which bodes well for now. It would not have passed if we omitted the “Insecure Skip TLS Verify: true” line (again, there’s no signed certificate for our web-server), or if the deployment/service underneath it were not in place.

We’re almost done!

Creating the HPA resource

Now the last missing part of the puzzle is what I described earlier in this article and what consumed a lot of my time: the path from which the APIService would retrieve the metrics of our deployment. So far, our internal API returns a health check status in JSON format, which is merely enough to satisfy the APIService’s sanity checks.

We created the below HPA resource and started experimenting:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: binder-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: binder 
  minReplicas: 1
  maxReplicas: 30
  metrics:
  - type: Object
    object:
      target:
        kind: Service
        name: metrics-exporter
      metricName: seconds
      targetValue: 100

Note that the deployment we’re asking to scale is called “binder”, it’s one of our prominent micro-services that gets a lot of requests.

When describing the HPA resource we created, we get the following error that the HPA cannot get the value of a metric named “seconds” and as a result nothing would be scaled:

At this stage we’re still ignorant and baffled as to the format that needs to be adhered to in regards to API calls done by Kubernetes.

Since we use a managed Kubernetes cluster (EKS) on AWS, we turned to their premium support for help with troubleshooting (as we don’t have the master logs and can’t see what’s really going on under the hood). After several back-and-forth correspondences and failed attempts to help with the matter, the support engineer sent me off with:

“Custom metrics API implementation or implementing any Kubernetes configurations falls outside of the scope of support for us Premium Support Engineers and is done on a best-effort basis”

So much for best effort :-/

With little new debug information we received from AWS, we carried on doing our own experimentation and research.

After lots of reading on how APIs interact in Kubernetes and hours of trial and error, we had managed to crack this one! We finally understood what the proper way of interacting with the API was, and where the APIService was expecting to find the “seconds” metric.

It appears that the path convention is as follows:

/apis/custom.metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/services/<NAME_OF_CUSTOM_METRICS_SERVICE>/<METRIC_NAME>’

In our case, the full path was:

/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/services/metrics-exporter/seconds

And could be verified via:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/services/metrics-exporter/seconds

However, that was not enough! We knew the path, but what was the expected output format when hitting that path?

We were naive at first to think that a simple {“seconds”:”10”} would do. Needless to say it wasn’t that simple, so again we used the ‘metrics’ API as a reference and deduced the supported JSON output format.

When piping it all into ‘jq’ for better readability, we get the following JSON output that originates from our internal API:

kubectl get — raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/services/metrics-exporter/seconds | jq

This, ladies and gentlemen, is the proper way of satisfying the APIService metric values. Your internal API code must output its metrics in the above way, or else no metric values would be processed by the HPA.

Now the HPA is happy and can retrieve the current value of the ‘seconds’ metric:

kubectl -n custom-metrics describe hpa binder-hpa

Conclusion

Once again we’re witnessing the sheer brilliance and dynamics of Kubernetes, as it offers its users a way to scale pods based on whatever criteria they desire. This internal ‘metrics-server’ of ours is now being further developed to support any kind of deployment in our cluster in terms of custom metrics auto-scaling.

Hope you enjoyed reading and please feel free to ask questions.