Overview of a specific Airflow setup using Google Cloud Composer. Airflow variables and connections are managed through Google Secret Manager highlighted in green.

Manage Airflow variables in Terraform using Google Secret Manager

Mads Emil Dahlgaard
7 min readMay 30, 2023

--

Airflow Variables can be highly useful for retrieving dynamic content at runtime. However, as the scale and complexity of your Airflow setup grows, managing these variables becomes a considerable challenge. This guide provides a practical, step-by-step approach to managing Airflow variables in Terraform using Google Secret Manager as a backend. This approach allows you to version control variables and improve security for variables containing sensitive content. The result is a scalable, secure, and easily manageable solution that improves the efficiency of your data workflows.

The steps are outlined below

  1. Configure Airflow to use Secret Manager
  2. Manage creation of secrets in Terraform
  3. Manage secret payloads in Secret Manager
  4. Test the setup

This guide references official documentation where possible. The focus will be on Cloud Composer, Google’s managed version of Airflow. However, the concepts are generally applicable to any setup of Airflow.

Configure Airflow to use Secret Manager

The first step is to configure your Airflow setup to use Google Secret Manager as a secrets backend. You can find the official documentation on how to do that here. You need to configure the [secrets] configuration. This can either be configured through the airflow.cfg file or using environment variables like below.

export AIRFLOW__SECRETS__BACKEND=airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend

For Cloud Composer this can be done through the Google Cloud UI. You can find the documentation here. You need to enter the following values.

section: secrets
key: backend
value: airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend

If you manage Cloud Composer through Terraform, you can specify the configuration like this

resource "google_composer_environment" "local_name" {
name = "environment_name"
region = "..."
config {
node_config {
service_account = google_service_account.composer-sa.name
}
software_config {
image_version = "composer-2-airflow-2"
airflow_config_overrides = {
webserver-instance_name = "🚀"
secrets-backend = "airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend"
}
}
workloads_config {
...
}
environment_size = "ENVIRONMENT_SIZE_MEDIUM"
}
}

In the Terraform block above, the configuration is specified under software_config. Additionally, you can specify many other configurations such as webserver-instance_name (see full list here).

There is another configuration in the [secrets] section called backend_kwargs that controls the integration with Secret Manager, where you can specify a path to a service account keyfile and the project ID for reading secrets. By default, Airflow uses Application Default Credentials and expects the secrets in Secret Manager to be prefixed by airflow-variables. You can see the full list of backend parameters here.

Make sure your Airflow setup has the necessary permissions, specifically, secretmanager.versions.access. The easiest way is to assign the roles/secretmanager.secretAccessor to the relevant service account.

You can safely apply these settings even if you have variables specified directly in Airflow. Airflow will search the secrets backend first and fall back to metastore when necessary. This means that you can gradually migrate your secrets to Secret Manager. Once the configuration is up-and-running, you are ready for the next part.

Manage creation of secrets in Terraform

Now that we have configured Airflow to read variables from Secret Manager, the next step is to effectively manage these secrets. This is done through Terraform. We need to prefix the airflow-variables to the secret name, and we can create secrets using the google_secret_manager_secret resource in Terraform.

# Add Airflow variable as Google secrets
resource "google_secret_manager_secret" "airflow-variables" {
secret_id = "airflow-variables-my_secret"
replication {
automatic = true
}
}

As the number of Airflow variables increases, duplicating the above block of code for every variable becomes inconvenient. We can utilise Terraform locals and the for_each meta-argument to simplify this.

locals {
airflow-variables = [
"my_secret-1",
"my_secret-2",
"...",
"my_secret-n",
]
}

# Add Airflow variables as Google secrets
resource "google_secret_manager_secret" "airflow-variables" {
for_each = toset(local.airflow-variables)
secret_id = "airflow-variables-${each.key}"
replication {
automatic = true
}
}

The Terraform code above creates the secrets, but it does not add any content to the secrets. It is important to note that we are not interested in managing the actual payloads in Terraform. Instead, secret payloads are handled in Google Secret Manager which allows for secure version control of payloads. A secret can have multiple versions and each version has a corresponding payload.

Access control to the secrets can also be managed through Terraform. A common scenario is to let users of your Airflow setup create new secret versions but restrict viewing of existing secret versions. For instance, we can create two access groups: one where users can add new payloads to a secret without viewing existing payloads, and one where users can both add payloads and view the current payload. This can be handled in Terraform as shown below.


# Allow certain members to add versions to Airflow
resource "google_secret_manager_secret_iam_binding" "secretVersionAdder" {
for_each = toset(local.airflow-variables)
secret_id = each.value.id
role = "roles/secretmanager.secretVersionAdder"
members = [
"group:data-analysts@yourcompany.com",
"group:data-engineers@yourcompany.com",
"...",
"jane@yourcompany.com"
]
}

# Allow certain members to view payloads of Airflow variables
resource "google_secret_manager_secret_iam_binding" "secretAccessor" {
for_each = toset(local.airflow-variables)
secret_id = each.value.id
role = "roles/secretmanager.secretAccessor"
members = [
"group:data-engineers@yourcompany.com"
]
}

The members field could also include service accounts, domains, or other identifiers in addition to Google groups and emails used in the example above. Note that we use the google_secret_manager_secret_iam_binding which is authoritative for the given role. While you should generally be careful when using authoritative resources, it is acceptable in this case as the role is assigned for specific variables, not at the project level. Alternatively, you can use google_secret_manager_secret_iam_member which is non-authoritative. With these configurations in place, you are now ready for the final step — adding secret payloads.

Manage secret payloads in Secret Manager

The final step in this process is to manage secret payloads directly within Google Secret Manager. The actual payloads are created through the Google Cloud UI. After applying the previous Terraform code, you should be able to view the secrets in the Google Cloud UI. Adding payloads is a straightforward process. Navigate to Secret Manager, select a secret, and opt to add a new version to it. Each version represents a different payload, providing a clear version history of your secrets and making rollback easier if necessary.

Once the payloads have been added to your secrets, they are readily accessible in Airflow through Airflow Variables. This means that Airflow DAGs can fetch these variables, providing a secure means to handle sensitive information such as API keys, or other confidential configuration data.

Test the setup

Suppose you’ve created a secret called airflow-variables-mysecret. One of the most straightforward ways to test the connection is by accessing your Airflow environment and executing:

airflow variables get mysecret

This code snippet uses the Airflow CLI to fetch the value of the mysecret variable.

Another method, and arguably the primary use case for Airflow variables, is to incorporate them within your DAGs directly. A basic DAG leveraging the BashOperator could look like this.

import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator

with DAG(dag_id="my_dag", start_date=datetime.datetime(2023, 1, 1)) as dag:
BashOperator(
task_id="t1",
env={"API_KEY": "{{ var.value.mysecret }}"},
bash_command='[[ "$API_KEY" == "abc123" ]] && echo It is a match',
)

Airflow masks the variables in any plaintext output, so we need to use a conditional statement in the bash command to verify the variable’s value. In this case, we are Jinja templating to access the variable. There are generally two ways to fetch variables in Airflow: Variable.get("mysecret") or through Jinja templating like {{var.value.mysecret}}.

In line with best practices, it’s generally advisable to avoid using Variable.get() in top-level Python code. Jinja templates do not make requests when scanning the DAG, while Variable.get() produces a request every time the DAG is parsed. Airflow scans the DAGs folder every 30 seconds by default and extensive use of Variable.get() could significantly slow down your Airflow setup due to repeated requests.

Extending to Airflow Connections

The same setup can be extended to manage Airflow connections. The slight difference lies in changing the prefix from airflow-variables to airflow-connections. The creation of Airflow connections as Google secrets through Terraform would look like this.

locals {
airflow-variables = [
...
],
airflow-connections = [
"my_connection-1",
"my_connection-2",
"...",
"my_connection-n",
]
}

# Add Airflow connections as Google secrets
resource "google_secret_manager_secret" "airflow-connections" {
for_each = toset(local.airflow-variables)
secret_id = "airflow-connections-${each.key}"
replication {
automatic = true
}
}

Once this is set up, you can add payloads to these newly created secrets, which now represent Airflow connections, directly through the Secret Manager UI. However, this requires specifying the connection details in a URI format. You can convert a connection into the URI format by changing the values in the Python code snippet below. Just replace the placeholder values with your actual connection details.

import json

from airflow.models.connection import Connection

c = Connection(
conn_id="some_conn",
conn_type="mysql", # Or any other type
description="connection description",
host="myhost.com",
login="myname",
password="mypassword",
extra=json.dumps(dict(this_param="some val", that_param="other val*")),
)

print(f"airflow-connections-{c.conn_id}", c.get_uri())

Conclusion

We’ve outlined how to configure your Airflow environment to fetch secrets from Google Secret Manager, use Terraform for defining and controlling access to these secrets, and manage the actual secret payloads.

In conclusion, leveraging Terraform and Google Secret Manager for managing Airflow variables and connections provides a structured, scalable, and secure approach for any growing data team. The setup might seem a bit complex at first, but the payoffs in terms of increased manageability and improved security are significant in the long run.

--

--