Triggering Airflow Workflows After Data Modification in Amazon Aurora

Deniz Parmaksız
Insider Engineering
3 min readMar 28, 2022

Amazon Aurora is a great choice for scalable, available, reliable, and cost-effective MySQL and PostgreSQL compatible RDBMS solutions that we heavily use at Insider. It enables us to focus more on product development while the managed service handles time-consuming tasks like hardware provisioning, database setup, patching, and backups. One of the coolest features of Amazon Aurora is its integration with other AWS services such as AWS Lambda Functions, Amazon S3, and even Amazon SageMaker to run ML tasks.

Integration with AWS Lambda

Our case is integrating Amazon Aurora with AWS Lambda to trigger the REST API of our Apache Airflow deployment to trigger respective workflows when a configuration changes. We store the customer configurations in a MySQL database and changes in several configurations require immediate action such as a product or feature activation.

Amazon Aurora integration with AWS Lambda which calls Apache Airflow REST API.
Amazon Aurora integration with AWS Lambda which calls Apache Airflow REST API.

An example case is as follows; one of our customers creates a new Custom Conversion Prediction action using our product, which allows you to generate predictions for a custom goal you define such as a page visit. Of course, to start generating predictions, there should be a deployed machine learning model. Such models need a dataset of features and labels for training. So there is a whole machine learning pipeline to run. If we wait for the next scheduled training, it may take a long time before we start delivering the predictions to the customer, which is not ideal.

CREATE TRIGGER lambda_trigger AFTER INSERT ON TABLE_NAME
FOR EACH ROW
BEGIN
CALL mysql.lambda_async (‘LAMBDA_FUNCTION_ARN’, ‘JSON_PAYLOAD’);
END

Our solution is to use a database trigger on insert events to our changelog table and invoke a Lambda function that receives the parameters about the customer and the current configuration. That Lambda function calls the dagRuns endpoint of the Airflow REST API to trigger an Airflow DAG, which runs the tasks of our machine learning pipeline. At the end of the dag run, a machine learning model is deployed so that we can start generating real-time predictions for the defined goal.

Configuring Amazon Aurora

Permissions are needed to have access to a Lambda function in order to invoke it, from Aurora DB. We manage that by attaching an IAM policy that provides Lambda invoke access to an IAM role. Then, the IAM role should be attached to the RDS cluster. It is important to set the aws_default_lambda_role DB cluster parameter to the ARN of the same IAM role, so that IAM role can be used during invoking a Lambda function. Finally, a VPC endpoint for Lambda (com.amazonaws.region.lambda) is required if the RDS cluster is in a private subnet.

lambda_async (
lambda_function_ARN,
JSON_payload
)

Invoking a Lambda function from Aurora MySQL works by using built-in functions called lambda_sync and lambda_async. The database user needs to be granted access to call those functions. If the user is granted access, then they can invoke any Lambda function that the RDS cluster role permits. In our case, we invoke the Lambda function that calls the dagRuns endpoint of the Airflow REST API with the dag id and parameters, so that the required DAG runs with the required parameters to execute all the necessary tasks.

Conclusion

The seamless integration of Amazon Aurora with other AWS services unlocks unique flows that cannot be achieved with other RDBMS solutions natively. We have been using the Lambda function integration for several solutions as the Lambda function itself is capable of solving a wide variety of problems with its event-driven and serverless architecture. Finally, adding the Apache Airflow to the solution creates a robust pipeline and enables running complicated workflows via a single trigger.

--

--

Deniz Parmaksız
Insider Engineering

Staff Machine Learning Engineer at Insider | AWS Ambassador