Airflow + AWS Secret Manager + LocalStack

Giuliana Belli
5 min readMar 13, 2023

--

To develop data pipelines many times we need to interact with cloud services such as AWS. For example, we may need to move data from a database to an AWS S3 bucket.

My goal was to change the secrets backend in an Airflow instance for AWS Secret Manager. My motivation came from the fact that I had to access the same connections information from different microservices so I needed a central place to store them and avoid storing confidential information in different places.

On the way to El Mojón, Córdoba, Argentina

By default, Airflow will always try to find your secrets in the environment variables and next in the metastore DB. When we enable an alternative secrets backend this one will be searched first, then environment variables and finally the metastore. You can read more about this here.

The most uncertain part for me was how to develop this integration locally before moving to production and spending all my AWS budget at once.

How can we avoid it? I found out it was possible using “LocalStack” according to its documentation.

LocalStack is a cloud service emulator that runs in a single container on your laptop or in your CI environment. With LocalStack, you can run your AWS applications or Lambdas entirely on your local machine without connecting to a remote cloud provider!

With LocalStack you can basically emulate an AWS service and run the same commands you would do in production but locally. For each AWS CLI command you run you will have to add the parameter --endpoint-url=http://localstack:4566 in order to indicate you want it to ask your LocalStack instance instead of the default AWS URL.

I found many tutorials on how to use LocalStack with docker-compose and AWS S3 but almost none combining Airflow and AWS Secret Manager. So I thought someone should write it (at least for the future me to remember how to make it work).

The recipe 🍳

First, add the LocalStack service to your Airflow docker-compose file. You can check an example of how the LocalStack service should look here. Mine finally looks like this:

localstack:
image: localstack/localstack:1.4
environment:
DEBUG: 1
DOCKER_HOST: unix:///var/run/docker.sock
ports:
- 4566:4566
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./localstack/secretsmanager:/docker-entrypoint-initaws.d

I removed some parts I didn’t need and I added a volume to the entry point so that as soon as the LocalStack service starts it will run the commands to create the credentials for my local environment to work.

You can find these commands in the AWS documentation the only difference, as I mentioned before, is adding the --endpoint-urlparameter to each command.

In ./localstack/secretsmanagerI created a file named init.sh with these commands:

#!/bin/bash
AWS_ACCESS_KEY_ID=jon \
AWS_SECRET_ACCESS_KEY=doe \
aws --endpoint-url=http://localstack:4566 secretsmanager create-secret --name connections/connection1 — region us-east-1 — secret-string ‘{“password”:”root”,”conn_type”:”mysql”,”port”:”3306",”host”:”db”,”login”:”root”, “schema”:”db1"}’

AWS_ACCESS_KEY_ID=jon \
AWS_SECRET_ACCESS_KEY=doe \
aws --endpoint-url=http://localstack:4566 secretsmanager create-secret --name connections/connection2 — region us-east-1 — secret-string ‘{“password”:”root”,”conn_type”:”mysql”,”port”:”3306",”host”:”db2",”login”:”root”}’

AWS_ACCESS_KEY_ID=jon \
AWS_SECRET_ACCESS_KEY=doe \
aws --endpoint-url=http://localstack:4566 secretsmanager create-secret --name connections/connection3 — region us-east-1 — secret-string ‘{“password”:”root”,”conn_type”:”postgres”,”port”:”5432",”host”:”db3",”login”:”postgres”}’

To execute commands against your LocalStack your credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) could be any string you want, LocalStack won’t validate this.

Now let’s tell Airflow about the new secrets backend we will use for development. For this, we need to add the following environment variables in the Airflow docker-compose file:

AIRFLOW__SECRETS__BACKEND: airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
AIRFLOW__SECRETS__BACKEND_KWARGS: ‘{“connections_prefix”: “datasources/connections”, “variables_prefix”: null, “region_name”: “us-east-1”, “endpoint_url”: “http://localstack:4566"}'
AWS_ACCESS_KEY_ID: john
AWS_SECRET_ACCESS_KEY: doe

Let’s review each of them:

AIRFLOW__SECRETS_BACKEND: The fully-qualified class name of the backend we want to enable. This indicates Airflow not to use the default Metadata DB to store and retrieve secrets.

AIRFLOW_SECRETSS_BACKEND_KWARGS: All the parameters that AWS Secret Manager needs to work. Let’s check each of them:

  • connection_prefix: prefix that the connections will have. Airflow will only be able to find and read the ones that start with this prefix.
  • variables_prefix: the same as the previous one but for Variables. In my case, I didn’t need to store Variables in AWS Secret Manager so I left this as null. This indicates not to look for Variables.
  • region_name: LocalStack does not validate this. Any AWS region name will work.
  • endpoint_url: the LocalStack service URL (in order to use the local service and not AWS services default endpoint)

This JSON will be passed as kwargs to the __init__ method of the AWS Secrets Manager Backend class.

AWS_ACCESS_KEY_ID: Again, LocalStack does not validate this so it can be any string and the same goes for AWS_SECRET_ACCESS_KEY.

And that’s pretty much it! docker compose up and let the magic begin.

Now you can check the logs and you will see Airflow looking up for your secrets in your local stack instance.

2023-03-10 08:46:48 /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initaws.d/init.sh
2023-03-10 08:46:49 2023-03-10T11:46:49.273 DEBUG --- [ asgi_gw_0] plugin.manager : instantiating plugin PluginSpec(localstack.aws.provider.secretsmanager:default = <function secretsmanager at 0x7f486c4b4dc0>)
2023-03-10 08:46:49 2023-03-10T11:46:49.273 DEBUG --- [ asgi_gw_0] plugin.manager : loading plugin localstack.aws.provider:secretsmanager:default
2023-03-10 08:46:49 2023-03-10T11:46:49.726 DEBUG --- [ asgi_gw_0] l.services.plugins : checking service health secretsmanager:4566
2023-03-10 08:46:50 2023-03-10T11:46:50.080 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.CreateSecret => 200
2023-03-10 08:46:50 {
2023-03-10 08:46:50 "ARN": "arn:aws:secretsmanager:us-east-1:000000000000:secret:connections/connection1-YLOzPW",
2023-03-10 08:46:50 "Name": "connections/connection1",
2023-03-10 08:46:50 "VersionId": "21481a92-ae27-46e9-8625-7afd183671bd"
2023-03-10 08:46:50 }
2023-03-10 08:46:50 2023-03-10T11:46:50.683 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.CreateSecret => 200
2023-03-10 08:46:50 {
2023-03-10 08:46:50 "ARN": "arn:aws:secretsmanager:us-east-1:000000000000:secret:connections/connection2-rrWIGu",
2023-03-10 08:46:50 "Name": "connections/connection2",
2023-03-10 08:46:50 "VersionId": "e3793c82-c528-42b0-8ab1-0ed3aa2314df"
2023-03-10 08:46:50 }
2023-03-10 08:46:51 2023-03-10T11:46:51.335 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.CreateSecret => 200
2023-03-10 08:46:51 {
2023-03-10 08:46:51 "ARN": "arn:aws:secretsmanager:us-east-1:000000000000:secret:connections/connection3-uAuRmF",
2023-03-10 08:46:51 "Name": "connections/connection3",
2023-03-10 08:46:51 "VersionId": "3dddf8d6-b86c-48df-a066-bd77b9066ab2"
2023-03-10 08:46:51 }
2023-03-10 08:46:51
2023-03-10 08:57:09 2023-03-10T11:57:09.357 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.GetSecretValue => 200
2023-03-10 09:10:06 2023-03-10T12:10:06.226 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.GetSecretValue => 200
2023-03-10 09:10:11 2023-03-10T12:10:11.271 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.GetSecretValue => 200
2023-03-10 09:10:16 2023-03-10T12:10:16.032 INFO --- [ asgi_gw_0] localstack.request.aws : AWS secretsmanager.GetSecretValue => 200

This way you will be able to:

  • Guarantee a local environment of development as closest to production as it can be without spending budget on DEV accounts in AWS.
  • Share with your teammates the exact same development environment.
  • Estimate the number of requests that will be sent to AWS through LocalStack logs.
  • Have a better estimation of costs before implementing this new AWS service in your production infrastructure.

Hopefully, this will help someone!

--

--

Giuliana Belli

Systems Engineer. Head of Data @rudol.ai 🦌 Data passionate. Analytics Engineer, teacher, mother of pug 🐾 and runner 🏃🏻‍♀️