How to enable mwaa to be multi-tenancy with rbac and sso

David Fustero
adidoescode
Published in
6 min readJun 16, 2023
Photo by Yevhenii Dubrovskyi on Unsplash

Before we decided to start this project we faced the problem of how to schedule and execute data workloads.

Once upon a time data engineers used Jenkins, our CI/CD tool, resulting in a non-scalable solution. Some long jobs executed periodically performed some ETL(Extract, transform, and load) tasks, the problems found were mainly two. In order to know if a process has finished, the pipeline had to ask the tool executing the task its state, EMR(Elastic MapReduce) as an example.

Jenkins does not have any sensor to know when the task has already finished, a running node needs to ask periodically the state with a busy waiting and a huge cost since it takes some hours to finish. Moreover, if an error happens at the end of the job, it has to start all over.

With this non-scalable solution the cost was huge and we needed to migrate to another solution.

We looked into different available tools like Argo, but after analysing all of them, we chose Airflow as the tool to use for our data orchestration workloads. Once we selected Airflow as our data orchestration tool, we needed to see which the best Cloud Infrastructure is to keep the set of instances (scheduler, workers..) up and running.

The idea is not to focus on what Airflow is or how to use it, but how to have a Workflow Management tool with role based access control and single sign on which is not available out of the box for AWS solution.

AIRFLOW INFRASTRUCTURE OPTIONS

The three different solutions we take into consideration

  • Self-managed Airflow: deployed in a Kubernetes cluster using a helm chart
  • PaaS Airflow
    - MWAA(Amazon Managed Workflows for Apache Airflow)
    - Any other managed Airflow

Self-managed Airflow provides more customization than the other options but has a big inconvenient, we have to work with the executor, databases(redis and postgreSQL), etc.

The reason why we discarded it is because we are a small team which have not enough expertise on handling ops in K8s.

MWAA solves the previous problem, AWS manage the needed resources for you and makes the deployment very easy. They only provide some Airflow versions and do not have RBAC(Role-based access control) by default as Self-managed Airflow.

We evaluated other PaaS Airflow solutions, but it would take some time to make up to speed in the company and the ratio of cost versus features was too high.

Given the data points we have, we decided to go for MWAA as our workload management tool.

Our requirements for MWAA

MWAA cannot be configured to be multi-tenancy and does not have RBAC out of the box nor SSO(Single sign-on) is available to access the instance without AWS console login. We needed to create and simulate them with different AWS resources.

First, how can we implement SSO authenticating with Azure Active Directory? Our first approach was to use SSO Application of AWS, renamed recently to IAM Identity Center.

This implementation had one main problem for us, users need to login into AWS console to access MWAA environments. It is true that access to AWS resources can be restricted. Nevertheless, it is not ideal and we looked for another implementation.

Our final solution was to combine Application Load Balancer and Cognito to login into the instance. Cognito has an identity provider in order to connect with Azure AD and login with our user and Load Balancer provides an url to access the instance without giving AWS console access to the users.

Also, we need RBAC to allow different teams to use the same instance without breaking the security principals. To accomplish this we need to have the following:

  • Create roles for new teams. Ideally, it should be automated
  • Assign DAGs permissions to the previous roles, wildcard or adding DAG automatically to role should be enough
  • Secrets isolation, variables and connections

First of all, we wanted to create roles automatically, our initial idea was to use Airflow API. However, AWS disabled Airflow API completely.

To perform some automated requests in MWAA there is a CLI option, but AWS blocked some of them and are not available.

Role creation is available, but you cannot choose the permissions you want to assign, you can just create a role with a custom name, so we discarded this option at the very beginning.

We tried to avoid manual steps as much as we could, but we faced a challenge when giving permissions. At that point of time we did not find an easy and scalable solution, we decided then to grant roles manually.

There is another constraint, a user cannot login with the team role the first time. Why was this happening? This Airflow role has to be one of the default ones: Admin, Public, User, Op and Viewer. We tried to login with a custom role but it was not working. Default roles have access to DAGs except Public role, which does not have any permission at all, neither the UI. Users just need to notify the team that they have already access to the instance in order to assign them their team role.

Secondly, we need to assign permissions to the team role in order to have DAG isolation. Previous manual step is a one time action when a new team is created, in this part we must avoid any manual action for every new DAG. To accomplish that we decided to use access control DAG parameter, when a DAG is created you can select the role you want to grant permissions to.

Finally, we must isolate Airflow secrets. MWAA does not offer any fine-grained control of variables or connections. Although, MWAA can import secrets from AWS Secrets Manager where we can isolate them using team paths. We share IAM user credentials with teams, who has access to their path in Secrets Manager and just that team.

ARCHITECTURE

We have mentioned the main resources we created to achieve our objectives, a multi-tenancy RBAC environment with SSO access. Nevertheless, these are not the only ones, we also have a DynamoDB and a Lambda to login, route53 to have prettier URLs with certificates, target groups to let load balancer forward the requests properly, and security additions such as Web Application Firewall to deny fraudulent requests.

SSO and RBAC Architecture

In the previous image you can see how the resources are connected. Route53 is straight-forward, an alias per load balancer. There is one load balancer per MWAA environment with some rules to create the web login token or redirect to the instance as well as authenticate with Cognito.

There is one Cognito user pool with the identity provider that authenticates with Azure AD using an enterprise application. We also need a Cognito app client per instance to retrieve user information from Azure, user email and user groups part of the enterprise application.

Last part includes a common Lambda and DynamoDB. In order to check if an user has access or not, we stored the information needed in the database. We have more data stored to know if the user is Admin or Viewer, and to accomplish the logout. Instance name and ad group allow us to know which groups have permissions in which instances. Lambda checks whether the user has access and creates the web login token to redirect to the instance. Following JSON is the DynamoDB definition.

{
"airflow-instance-name": "my-instance",
"ad-group-id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx",
"ad-group-type": "Admin",
"airflow-private-url": "xxxx-vpce.c19.eu-west-1.airflow.amazonaws.com",
"airflow-project-name": "my-team-name",
"cognito-app-client-id": "xxxxxxxxxxxxxxxxxxxxxxxxxx"
}

What’s next

We are happy with the work done so far, an automated multi-tenancy RBAC instance with SSO access. Although, there is still work to be done.

  • Airflow role is created manually at this moment, there is one automated solution to create the roles using DAGs: https://gist.github.com/TristanBilot/465d2e4aebb359cb1e0e1259978f49ef
  • We detected one problem we should solve, if a new DAG with the same id is uploaded, it is overwritten. Temporary solution is to create DAGs with team name prefix, but still not perfect.

I want to appreciate people who have participated on this: Ruben, Ismael and Suraj, Jose, who has been a great leader, and Javier, who has been an inspiration throughout my career journey and also lead this at the beginning.

The views, thoughts, and opinions expressed in the text belong solely to the author, and do not represent the opinion, strategy or goals of the author’s employer, organization, committee or any other group or individual.

--

--