Integrating Airflow + Datadog using docker-compose

Lukas Tarasevicius
3 min readJan 24, 2020

--

Integrating Airflow running on Docker + Datadog took way longer than I expected, so I decided to simplify it.

Couple of requirements before you get into this guide:

  • Your Airflow instance is Dockerized
  • You are familiar with docker-compose

Configuring Airflow to write to Datadog’s StatsD agent

You can do this a few ways: editing airflow.cfg or by setting configuration options through ENV. I’m going to outline how to configure statsd on airflow through ENV vars:

  1. Install airflow with statsd packaged in it. This is as easy as updating yourrequirements.txt from apache-airflow==1.10.7 to apache-airflow[statsd]==1.10.7
  2. In your docker-compose.yml environment section add below ENV vars to enable statsd:
environment:
- AIRFLOW__SCHEDULER__STATSD_ON=True
- AIRFLOW__SCHEDULER__STATSD_HOST=datadog
- AIRFLOW__SCHEDULER__STATSD_PORT=8125
- AIRFLOW__SCHEDULER__STATSD_PREFIX=airflow

3. Congratulations you are done.

Adding Datadog to your docker-compose file

  1. If you follow datadog’s example for configuring docker-compose.yml integrating datadog to your network is pretty simple. But there is a few settings that you (probably) need.
  2. You’ll need to expose 8125/udp port
  3. I think you’ll need to enable a couple of other ENV vars: DD_APM_ENABLED, DD_DOGSTATSD_NON_LOCAL_TRAFFIC, DD_DOGSTATSD_ORIGIN_DETECTION, DD_AC_INCLUDE
  4. I’d recommend also setting your DD_HOSTNAME ENV var as well. I chose AIRFLOW-PROD (don’t fuck up and use an underscore, that was my rookie mistake)
  5. The datadog section should look like:
datadog:
image: datadog/agent:latest
container_name: datadog
ports:
- "8125:8125/udp"
links:
- airflow
environment:
- DD_API_KEY=
- DD_HOSTNAME=AIRFLOW-PROD
- DD_APM_ENABLED=true
- DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
- DD_DOGSTATSD_ORIGIN_DETECTION=true
- DD_AC_INCLUDE="image:*"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /proc/:/host/proc/:ro
- /sys/fs/cgroup:/host/sys/fs/cgroup:ro
networks:
- main

The finished docker-compose.yml file

version: "3"
services:
airflow:
image: apache/airflow
container_name: airflow
restart: unless-stopped
ports:
- "8080:8080"
- "8888:8888"
environment:
- AIRFLOW__SCHEDULER__STATSD_ON=True
- AIRFLOW__SCHEDULER__STATSD_HOST=datadog
- AIRFLOW__SCHEDULER__STATSD_PORT=8125
- AIRFLOW__SCHEDULER__STATSD_PREFIX=airflow
networks:
- main
# agent section
datadog:
image: datadog/agent:latest
container_name: datadog
ports:
- "8125:8125/udp"
links:
- airflow
environment:
- DD_API_KEY=
- DD_HOSTNAME=AIRFLOW-PROD
- DD_APM_ENABLED=true
- DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
- DD_DOGSTATSD_ORIGIN_DETECTION=true
- DD_AC_INCLUDE="image:*"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /proc/:/host/proc/:ro
- /sys/fs/cgroup:/host/sys/fs/cgroup:ro
networks:
- main
networks:
main:

Running it

Run docker-compose and wait to see:

Creating airflow ... done
Creating datadog ... done

Hopefully your dashboard starts populating — this was my (shitty) finished product:

Mistakes to avoid

  • Following this guide for using the datadog airflow-integration — the huge issue here is that it’s aimed at users that are running airflow on the local machine + the datadog agent on that machine. If you just have your airflow up on an EC2 box — feel free. But it looks like most airflow metrics come through to data-dog without it!
  • Following the datadog docker-compose guide and trying to use autodiscovery for the airflow integration above. Turns out the airflow integration isn’t in the public catalog and cannot be auto discovered!

But I really want to use the airflow integration…

Neither me nor Datadog support recommend this approach — but here’s their response:

Hi Lukas,

I just wanted to follow up with you based on our chat from earlier as it seems like we got disconnected.

The airflow integration is viable, but just not available to use with autodiscovery.

As a workaround, you could run this Airflow integration as a custom check: https://docs.datadoghq.com/developers/write_agent_check/?tab=agentv6v7#overview. This would involve copying the Airflow python code into a file in the agent's checks.d directory.

Once this integration has been set up as a custom check, you can mount your custom yaml file using a ConfigMap: https://docs.datadoghq.com/agent/kubernetes/integrations/#configmap.

Would this suit your use case?

Best,
Redacted | Solutions Engineer | Datadog

Good luck in your Airflow adventures.

--

--