Autoscaling your Airflow using DataDog External Metrics

Follow this step-by-step guide and learn how to autoscale your Airflow K8s pods based on ExternalMetrics

Lidor Ettinger
NI Tech Blog
3 min readJan 16, 2022

--

One of the strengths of Kubernetes lies in its ability to achieve effective resources autoscaling. To make scaling decisions, Kubernetes requires rational decision-based on service metrics.

In this post we will learn about the scaling decision, and how we can extract, monitor and autoscale Airflow workers based on Datadog external metrics.

Following are the steps one takes in order to accomplish Airflow HPA (Horizontal Pod Autoscaling) within the Kubernetes environment:

Setup Datadog cluster agent

The External Metrics Provider resource made it possible for monitoring systems like Datadog to expose application metrics to the HPA controller. Our scaling policy will be based on external metrics which we collect from our applications.

To learn how to activate and use DatadogMetric, refer to Datadog’s documentation.

Enable Airflow StatsD

Now, let’s configure Airflow to send metrics to StatsD. Our scaling decision metric is the executor.running_tasks. We decided to observe the number of running tasks on the executor and to get a rational decision whether to scale the Airflow workers’ pod up or down.

Step 1: Install StatsD

Step 2: Configure StatsD environment variables:

Decide what metrics to expose by configuring an allowed list of prefixes. Here we choose to send the executor metrics as we are interested in the metric of the running tasks.

Create a Custom Resource Definition (CRD) like DatadogMetric

DatadogMetric allows the HPA to autoscale applications based on metrics that are provided by third-party monitoring systems. External metrics support target types of Value and AverageValue.

The Horizontal Pod Autoscaler is a great tool for scaling stateless applications. But you can also use it to support scaling statefulsets.

Following is an example of how to create it using Terraform:

Validate the value of a metric by sending a raw GET request to the Kubernetes API server:

Response:

Create HPA using the DatadogMetric

Next, we will expose executor.running_tasks which will be queried by the HPA.

The HPA fetches executor.running_tasks value from the metrics API and will scale up the airflow-worker if the current running tasks go over 1 task.

Using HPA in combination with cluster autoscaling can help you achieve cost savings for workloads that see regular changes in demand by reducing the number of active nodes as the number of pods decreases.

Here is the set of the configuration in the Helm chart value:

Conclusion

DatadogMetric allows the HPA to autoscale Airflow’s workers based on statsD metric provided by third-party monitoring systems.

It’s possible to take advantage of these insights and implement them in other areas within your production environment.

Airflow HPA workers with Datadog External Metrics

In the following, you can see a full working example of how to deploy the Airflow with a helm chart including the HPA and DatadogMetric.

custom-values.yaml can be found: here

--

--