Enable Kerberos with Airflow KubernetesExecutor

Joffrey Bienvenu
Apache Airflow
Published in
4 min readJun 9, 2023

--

Kerberos (Cerberus), Hades’ dog defending Airflow
Kerberos guards Airflow — Dog icon from svgrepo.com

While the integration of Kerberos sidecar with the CeleryExecutor works smoothly out of the box in Airflow 2.6.0, further configuration is required when using the KubernetesExecutor and CeleryKubernetesExecutor to shutdown the Kerberos sidecar attached to the main container.

In this article, we will see a practical example of pod_template_file and introducing two necessary entrypoints that need to be added to the Airflow image, in order to use the Kerberos sidecar with KubernetesExecutor.

Sidecar shutdown problem

When using the KubernetesExecutor in conjunction with a sidecar, the following sequence of events occurs within the Pod:

The sidecar still up and running while the main container exited — Image by author
  1. The main container, also referred to as the “base” container, initiates and begins executing its designated task.
  2. Simultaneously, the sidecar container starts alongside the main container. In our case, it will create a ticket from a Kerberos keytab and schedule a refresh every hour.
  3. When the main container completes its task, it gracefully exits, signaling the end of its execution to Kubernetes.
  4. However, despite the main container’s graceful exit, the sidecar container continues to run, causing the Pod to remain active indefinitely.

Solution: Communicate the exit signal to the sidecar

To terminate the Pod, the sidecar needs to receive the exit signal.

To do so, we need to establish communication between the main ‘base’ container and the sidecar. So that an EXIT signal can be passed from the former to the latter. There are two approaches to achieve this:

  • Shared Process: This first method uses the shareProcessNamespace functionality to pass an exit signal directly into the sidecar container.
  • Shared Volume with File: This second approach, which we will explore in this article, leverages a shared volume to facilitate communication. Specifically, it involves utilizing a file stored on the shared volume as a means of exchanging information between the main ‘base’ container and the sidecar. It works well in environments with restricted permissions, where using shareProcessNamespace may not be feasible.

Adding a Pod Template

The following Pod template introduces several enhancements to the “dag in PersitentVolume example template” proposed by Airflow:

  • Sidecar Container: The Pod template includes a Kerberos sidecar container, working in tandem with the main ‘base’ container. Again, this sidecar create a ticket from a Kerberos keytab and schedule the refresh every hour.
  • Volumes and VolumeMounts: To facilitate the integration of Kerberos dependencies (keytab, krb5.conf, …), the Pod template defines the necessary volumes and VolumeMounts.
  • ‘exit-signals’ Volume: To coordinate the termination of the main ‘base’ container and the sidecar, an ‘exit-signals’ volume is created. It enables the main container to pass an exit signal via file to the sidecar.
  • Custom Container Entrypoints: The Pod template also incorporates custom container entrypoints. Let’s see further below what they do. ⬇️

The Pod Template:

Note: This is an example of template, not usable as-is. Take care to replace the image, secret, PVC and configMap names by your own.

Adding entrypoints

To pass the EXIT signal between the main ‘base’ container and the sidecar, we define specific entrypoints for the worker and the sidecar. They need to be added in the Docker image of your Airflow setup.

Worker entrypoint:

The custom entrypoint for the Worker has two functionalities:

  1. Execute Airflow Command: The script utilizes the ( $@ ) command to execute the Airflow command received from the scheduler. It can actually be any valid Airflow command, such as triggering a task, running a DAG, or performing other workflow-related operations.
    Note: Do not use exec . In a nutshell, exec prevent any subsequent line of your bash script to be executed.
  2. Create “EXIT” File: Following the execution of the Airflow command, the script creates a file named “EXIT” within the shared volume. This file serves as an exit signal and its presence indicates to the sidecar container that the Airflow worker has completed its task and is ready for termination.

Sidecar entrypoint:

The second entrypoint, for the sidecar, operates in conjunction with the worker. Let’s see what it does:

  1. Start Airflow Sidecar: The script launches the Airflow sidecar container via nohup airflow kerberos. The nohup command launch the Kerberos ticket refresh in a subprocess.
  2. Catch subprocess Id: The echo $! > kerberis.pid and CHILD_PID=$(cat kerberos.pid ) lines are catching the Id of the subprocess, so that the next part of the script can interact with it.
  3. Monitor for Termination Signal: An infinite loop continuously checks for the presence of an “EXIT” file in the /opt/exit-signals folder. When detected, the script terminate the Kerberos subprocess by killing the $CHILD_PID.

And that’s it ! You have now Kerberos ticket refresh both with the KubernetesExecutor and the CeleryKubernetesExecutor 🎉

The integration of Kerberos sidecar support with Airflow presents new opportunities for secure and efficient workflow execution in Kubernetes environments. With the shared volume approach and custom entrypoints, communication between the main ‘base’ container and the sidecar becomes seamless, enabling secure data workflows and streamlined Kerberos management.

Credits: Jed Cunningham, Airflow PMC member, for pointing out this solution on Airflow Slack. Steven Aldigen, writer of this nice Medium article about EXIT signals on Kubernetes. And the Stackoverflow community for detailling how using nohup.

--

--

Joffrey Bienvenu
Apache Airflow

Product-oriented Data Engineer in the Machine Learning and Cloud era. Working @ Infrabel