Seus dados em um único lugar com Airflow

Anderson Rocha
Oct 30, 2017 · 3 min read

E agora, como faremos para ter relatórios com vários dados distribuídos, como bancos de dados, arquivos de terceiros (CSVs, TXTs, etc), LOGs, APIs de terceiros?

Vamos analisar o cenário abaixo:

Modelagem Usando Micro Services

Vamos para prática.

from datetime import datetime, timedeltafrom airflow import DAGfrom operators.data_transfer import DataTransfer
from operators.insert_non_duplicate import InsertNonDuplicate
interval = timedelta(hours=24)
extract_params = {'interval': str(interval)}
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 11, 1, 0, 0, 0),
'email': ['seu-email.com.br'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=5),
}
dag = DAG('drivers',
default_args=default_args,
schedule_interval='@daily'
)
extract_drivers = DataTransfer(
task_id='extrac_drivers',
source_conn_id='drivers',
destination_conn_id='consolidate_database',
destination_table='drivers_stage',
preoperator="TRUNCATE TABLE drivers_stage",
sql='extract_drivers.sql',
params=extract_params,
conflict_action=None,
commit_every=5000,
dag=dag
)
merger_drivers = InsertNonDuplicate(
task_id='merger_drivers',
conn_id='consolidate_database',
origin_table='drivers_stage',
destination_table='drivers',
key_field=['driver_id'],
truncate_on_end=True,
dag=dag
)
merger_drivers.set_upstream(extract_drivers)
Interface Web Airflow
-- extract_drivers.sql 
SELECT *
FROM drivers
WHERE created_at >= Timestamp('{{ ts }}', '-{{ params.interval }}')
AND rd.created_at < Timestamp('{{ ts }}')

Conclusão

Tutoriais Airflow

Fontes

Revisores

Anderson Rocha

Written by

Software Engineer. Helping people to turn ideas into software.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade