Photo by Joshua Aragon on Unsplash

Tutorial

Introduction to Apache Airflow as the Job Orchestration: a Quick Tutorial for Beginner

Web scraping of Covid19 data in Indonesia (send notification via email and Telegram)

Analytics Vidhya
Published in
6 min readSep 23, 2020

--

Overview

Airflow is a platform that elevates cron jobs to the next level, enabling the creation and monitoring of task scheduling. Airflow utilizes directed acyclic graphs (DAGs) as workflows to be executed, automating scripts.

Meanwhile, the COVID-19 pandemic remains a grave concern. The outbreak was first identified in Wuhan, China in December 2019 and quickly spread worldwide within a month. As a result, it is crucial to monitor daily COVID-19 patient data in Indonesia. Kompas News is one of the platforms that provides daily updates on COVID-19 data through a dedicated dashboard. This data will be scraped using Python and scheduled using Apache Airflow as a workflow scheduler.

Prerequisites

Prior to delving into a more in-depth discussion, kindly ensure that you have thoroughly reviewed and properly set up the following tools:

1. Install Apache Airflow read here

2. Install the module dependencies

  • requests —It simplifies the…

--

--

Audhi Aprilliant
Analytics Vidhya

Data Scientist. Tech Writer. Statistics, Data Analytics, and Computer Science Enthusiast. Portfolio & social media links at http://audhiaprilliant.github.io/