Going the Extra Mile with Our k8s Setup

Published in

Xandr-Tech

8 min readApr 11, 2022

Photo by Conny Schneider on Unsplash; The Kubernetes logo is a registered trademark of the Linux Foundation

As part of Xandr’s long-term effort to migrate our core engines from a mixture of virtual machines (VM) and bare-metal servers to the new world of Kubernetes (k8s), my team has been charged with migrating any of our applications still residing on VMs onto k8s pods. By moving to k8s, Xandr is taking advantage of enhanced possibilities for application orchestration and maintenance. Our ultimate plan is to have 100% of our servers running on k8s.

Overall, our previous migrations have gone smoothly, with predictable results. As we began scheduling and prioritizing our remaining applications, we discovered that migrating certain complex, event-driven apps posed new challenges around variable resource usage. However, with some research and a proof of concept, we were able to demonstrate how these challenges could be addressed effectively using k8s autoscaling.

The challenge

Most of the core applications we’d previously migrated ended up using a predictable 2 pods per data center. However, when we put certain event-driven applications under the microscope, we found that this formula didn’t account for the complexity of these applications. These apps typically rely on parallel processing, concurrency, workers, child processes, and other complex operations that can result in variable resourcing requirements. As a result, assigning a fixed number of pods wouldn’t work.

In the short term, assigning a standard number of pods for each of these applications would result in over-provisioning. Calculating a threshold based on the current peak usage over X months would result in excess resource allocation beyond real-world usage.

In the long term, assigning a standard number of pods would also result in under-provisioning. Because demand grows over time, the previously established threshold would eventually be insufficient, causing contention for CPU and memory and — in the worst-case scenario — a growing backlog.

The problem, illustrated

In a VM and bare-metal deployment, application 1 spawns a total of 144 processes. Taking into account 3 different data centers, matching its throughput would theoretically require a total of 48 (144/3) pods per data center. Application 2, also on the to-be-migrated list, spawns a total of 3072 processes.

These applications often belong to an interdependent workflow, as shown in the following illustration:

As a result of these dependencies, we may see:

static tide throughput.
idle resources on the previous app, while the next app in the chain is struggling.
backlogs being created on every app when the magic number is no longer valid and demand spikes occur.

To head off these problems, we needed to tread carefully and consider how to proceed with provisioning in a scalable manner.

Proof of Concept time

If you are a tech enthusiast, you know the primary issue here is scalability. Furthermore, depending on how critical the data generated by those applications is, scalability may also translate to stability, as shown in the following graphics.

Illustration of pod scaling vertically to meet demand

Illustration of pod scaling horizontally to meet demand

Digging into k8s documentation to understand this problem more deeply, we decided to create a proof of concept (POC) of the k8s scaling mechanism using our company test environment.

In this environment, we created a Node Express application that:

executed CRUD into RabbitMq.
pushed metrics to our metrics client, an open-source monitoring application called Prometheus).
did some CPU-intensive calculation.

Autoscaling based on CPU

Starting with the basics (k8s pod metrics for memory and CPU), the first stage of our POC was to scale based on the CPU used by the application. The following code samples illustrate our approach:

Node js code sample

// source from: https://gist.github.com/sqren/5083d73f184acae0c5b7// higher number => more iterations => slower
function intenseCalculation(baseNumber) {
    console.time('mySlowFunction');
    let result = 0;
    for (var i = Math.pow(baseNumber, 7); i >= 0; i--) {
        result += Math.atan(i) * Math.tan(i);
    };
    console.timeEnd('mySlowFunction');
}--------------------------------------------app.post('/work', async (req, res) => {
    intenseCalculation(5)
    res.status(200).send('ok');
})

Auto-scaling sample for CPU

------------------------------
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: poc-k8s-vpa-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: poc-k8s-vpa-hpa-app-creat120
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
        type: AverageValue
        averageValue: 0.01

When our engines were properly deployed, all we had to do was to call the endpoint /work multiple times from several terminals, then stop them gradually in order to see k8s scaling up and down.

We used the following sequence of commands:

Trigger command

for id in {1..1000}; do curl -X POST 'ENVIRONMENT_URL/work'; done;

2. kubectl command

kubectl describe horizontalpodautoscaler --context qa01nym2 -n creative poc-k8s-vpa-hpa

Based on the CPU being used, the deployment was scaled up to 3 pods. As we stopped the calls, it was reduced to only 1, as shown below.

Output of kubectl describing horizontalpodautoscaler

Plot twist

As shown above, the initial POC worked like a charm. Now it was time to go a step further and leverage autoscaling based on custom metrics — metrics that are reported by the application and read by the custom metrics API (custom.metrics.k8s.io).

As the error messages below demonstrate, we were in unknown territory. We couldn’t apply autoscaling, or retrieve metrics from the k8s metrics API. In fact, the custom metrics APIs were not registered at all, which also informed us that no one in our company had tried this before.

kubectl describe horizontalpodautoscaler --context qa01nym2 -n creative poc-k8s-vpa-hpaWarning  FailedComputeMetricsReplicas  18m (x12 over 21m)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get object metric value: unable to get metric requests-per-second: Ingress on creative main-route/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registeredWarning  FailedGetObjectMetric         77s (x80 over 21m)  horizontal-pod-autoscaler  unable to get metric requests-per-second: Ingress on creative main-route/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registeredkubectl get --raw/apis/metrics.k8s.io/v1beta1/namespaces/creative/pods/poc-k8s-vpa-hpa-app-creat120-5c794d9bb4-jxxsl  --context qa01nym2 -n creative | jqError from server (Forbidden): pods.metrics.k8s.io "poc-k8s-vpa-hpa-app-creat120-5c794d9bb4-jxxsl" is forbidden: User "REDACTED" cannot get resource "pods" in API group "metrics.k8s.io" in the namespace "creative"kubectl get --raw /apis/custom.metrics.k8s.io/ --context qa01nym2Error from server (NotFound): the server could not find the requested resource

We couldn’t touch the core pieces ourselves to get the metrics to be exported, nor could we see the metrics already exported by the k8s custom metrics API.

Solving this problem required both research and the knowledge of our internal k8s experts, who provided the engines our test environment was missing. We found these articles especially helpful:

Autoscaling based on custom metrics

To create a POC of k8s autoscaling, we needed to scale our application requirements up and down, based on the metrics we generated inside the app for messages being pushed to RabbitMq.

To achieve this, we made the following changes at the app code level:

We updated the code so it would push items to RabbitMq. This wasn’t technically required for the POC, since all that really mattered was that the metric was pushed.
We ensured that report metrics were sent to Prometheus.

Example: a node.js api exposes a POST endpoint /work, and pushes metrics when called.

app.post('/work', async (req, res) => {
    const message = req.query.message;    // pushes a message to Rabbit MQ
    await rabbitmq.publishMessage(
        "my_queue",
        JSON.stringify({
            created_on: new Date().getTime(),
            message
        })
    );   //sends a metric to prometheus
   //metric name: total_messages_pushed
   addTotalPushedData();
   
   res.status(200).send('ok');
})

Example: a YAML configuration sets up autoscaling based on the custom metric.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
    name: custom-metric-hpa
    namespace: creative
spec:
    scaleTargetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: poc-k8s-vpa-hpa-.Release.name
    minReplicas: 1
    maxReplicas: 10
    metrics:
      - type: Object
        object:
          metric:
            name: messages_pushed_per_second
          describedObject:
            kind: Namespace
            name: creative
            apiVersion: v1
          target:
            type: Value
            value: 2

prometheus-adapter custom metric configuration

prometheus-adapter is the engine responsible for fetching the metric available on Prometheus and making it available to k8s custom api metrics. We can then use it as a criterion for autoscaling.

The following code shows pushing metrics to Prometheus, with the average of total_messages_pushed over the past 5 minutes:

apiVersion: v1
data:
    config.yaml: |
      rules:
      - seriesQuery: 'total_messages_pushed'
        resources:
          overrides:
              namespace: {resource: "namespace"}
        name:
          matches: "^total_(.*)"
          as: "${1}_per_second"
        metricsQuery: "rate(total_messages_pushed{<<.LabelMatchers>>}[5m])"

Once again, all we had to do to evaluate the POC was to call /work several times from different terminals to see the autoscaling taking action, using the following command.

for id in {1..1000}; do curl -X POST 'ENVIRONMENT_URL/work?q=example'; done;

Then we invoked kubectl to get the autoscaling data, using the following command:

kubectl describe horizontalpodautoscaler --context qa01nym2 -n creative poc-k8s-vpa-hpa

Based on the number of messages pushed to RabbitMQ (messages_pushed_per_second), the k8s pods were successfully scaled up to 10, and then down to 1 as we stopped the calls.

Output of k8s horizontal pod auto scaler

What’s next?

As our internal k8s experts add these engines to our staging and production environments, we’ll be able to apply the k8s autoscaling mechanism to our applications, especially the event-driven ones I’ve described in this blog post. This technological advance will empower us to:

Maintain a pool of resources to allow autoscaling across multiple apps, rather than hardcoding a set number of resources per application.
Achieve a moving tide throughput. Pods scale up or down on demand, with resources allocated accordingly.
Reduce and optimize our team’s resource fingerprint.

Ultimately, by taking advantage of the enhanced benefits of k8s containerization — and in particular, a responsive back end that can be autoscaled up or down and efficiently maintained using k8s orchestration tools, this migration gains for Xandr not only the increased ability to optimize infrastructure budgets by better focusing on demand, but the potential to move forward with innovation on a larger scale. In a space evolving as rapidly as ad tech, we’re poised to roll with whatever the future brings.