Apache Airflow as an External scheduler for distributed systems

Arunkumar
2 min readFeb 16, 2018

--

So have you ever needed a reliable External scheduler for your distributed systems? Apache Airflow (by Airbnb) has a good stable scheduler.

So how can we use Airflow for this purpose, here’s how we did.

It was easier to create an endpoint than a worker, so we created a endpoint in the service and deployed that.

https://doamin.com/api/1.0/purpose?params=value

And then just created a simple Airflow DAG to trigger this endpoint on schedule.

So what’s advantages of this method,

  • Well Airflow has SLA managements, notifications on failures, execution timeouts and retries. All of witch can be very useful here.
  • And incase we need to trigger more than one job, just run it thru a loop and create multiple tasks. Airflow will trigger it parallely, and as long as u have a good LB in front the load will be balanced in your distributed servers. So it’s extremely modular and scalable.
  • Also the Airflow-UI makes it easier to check logs/response from the endpoint when it fails.
  • And also it will give you a nice graph like the below, where we can keep track of the time taken by the job.
Time Taken Graph

Takeaway

Next time when you need to build a scheduled worker on a distributed system, this could be a good option.

Also Airflow is easy to setup and get started, take a look at it first!

--

--