End to End tests using Apache Airflow

Pedro Galvao
3 min readMar 16, 2020

--

Apache Airflow is a platform created by community to programmatically author, schedule and monitor workflows. You create the “flow” using its operators or writing yours, and the Apache Airflow takes care of the execution and Scheduling.

picture by @markusspiske

I’ve seen lots of example using this platform with the purpose of CI/CD tools. But there are other applications too.

Imagine writing end to end test scenarios already having retries, queue ready, automatically scheduled, reusable flows and capable to see everything from an user interface? Apache Airflow to the rescue! I’ve written this article to show that it can be done, but didn’t cover every step detail such as installation, configuration or implementation. I strongly suggest you to take a look into the documentation tutorial if this is your first time using it.

Ok, sounds interesting, but how?

First of all, let’s create an example. To make it simple, we have this new API, returning this expected message:

{“msg”: “this new api return”}

And we need to make sure it’s always returning the expected result. Our testing plan is really simple:

Testing plan

With the tests steps in mind, the only thing left is to implement it on Apache Airflow.

Creating the Operations

Operations are the actual implementation of the steps. So our step to check the API response is a HTTP request that check the response. Luckily, Apache Airflow already have an Operation that does exactly this, called SimpleHttpOperator.

Apache airflow lets you create all the connection parameters such as host, port, authentication separately on the user interface. For our example, the connections parameters were already saved having the my_api ID.

To check the response, we can use a method similar like

Even if we couldn’t find an already created operation, it is possible to create custom operators. The only restriction is that they need to be implemented in python, and could be easily integrated in all your DAG’s.

Creating the DAG

In Airflow, the Directed Acyclic Graphs (D.A.G) are collections of all the tasks we want to run, organized in a way that reflects their relationships and dependencies.

In our example, we have a Start operation, calling the Check API Response operation, that subsequently calls the End operation. Although our example cannot be simpler, you can create big and complex DAGs, retrying operations when necessary or executing operations in parallel and merging them.

Everything together should look like this:

Apache Airflow even let’s test our operations to check that everything is working as intended. After testing our check_response operation, it’s time to put it under the /dags folder.

Checking the DAG execution

Checking our User Interface, we can see our new DAG listed

Dags list in the platform user interface

Checking the DAG graph view, we can see that it contains all the steps declared in our testing plan

check_api DAG graphic view

To make sure that our operation is accessing the correct API, we can look at the operation log

check_response operation logs

And there we have it. We created a test using few lines of coding that will run Daily, and even have 2 Attempts!

For the next posts, I intent to cover:

  • E2E using graphical interfaces
  • Writing E2E tests using Apache Airflow in Cloud Composer

--

--