Apache Airflow as an External scheduler for distributed systems
So have you ever needed a reliable External scheduler for your distributed systems? Apache Airflow (by Airbnb) has a good stable scheduler.
So how can we use Airflow for this purpose, here’s how we did.
It was easier to create an endpoint than a worker, so we created a endpoint in the service and deployed that.
https://doamin.com/api/1.0/purpose?params=value
And then just created a simple Airflow DAG to trigger this endpoint on schedule.
So what’s advantages of this method,
- Well Airflow has SLA managements, notifications on failures, execution timeouts and retries. All of witch can be very useful here.
- And incase we need to trigger more than one job, just run it thru a loop and create multiple tasks. Airflow will trigger it parallely, and as long as u have a good LB in front the load will be balanced in your distributed servers. So it’s extremely modular and scalable.
- Also the Airflow-UI makes it easier to check logs/response from the endpoint when it fails.
- And also it will give you a nice graph like the below, where we can keep track of the time taken by the job.
Takeaway
Next time when you need to build a scheduled worker on a distributed system, this could be a good option.
Also Airflow is easy to setup and get started, take a look at it first!