ELT Airflow Pipeline Project
About
Project using data engineering concepts.
The project is an ELT (Extract, Load, Transform) data pipeline, orchestrated with Apache Airflow through Docker containers.
Faker is used as a package to generate data to a mysql database. The data is extracted from mysql, transformed with pandas and Sql and then loaded into an Olap postgres database. A notification is then sent by email once the whole process is completed.
Architecture
Prerequisites
Setup mailtrap
One platform toTest, Send, Control your emails:
Setup
Clone the project to your desired location:
$ git clone https://github.com/Stefen-Taime/airflow_etl.git
fill the AIRFLOW__SMTP__SMTP_USER, AIRFLOW__SMTP__SMTP_PASSWORD, AIRFLOW__SMTP__SMTP_MAIL_FROM in .envExample file:
AIRFLOW_ADMIN_MAIL=airflow
AIRFLOW_ADMIN_FIRSTNAME=airflow
AIRFLOW_ADMIN_NAME=airflow
AIRFLOW_ADMIN_PASSWORD=airflowpassword
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres+psycopg2://airflow:airflowpassword@postgres:5432/airflow
AIRFLOW__CORE__FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs=
AIRFLOW_CONN_METADATA_DB=postgres+psycopg2://airflow:airflowpassword@postgres:5432/airflow
AIRFLOW_VAR__METADATA_DB_SCHEMA=airflow
AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=5
AIRFLOW__CORE__EXECUTOR=LocalExecutor
AIRFLOW__SMTP__SMTP_HOST=smtp.mailtrap.io
AIRFLOW__SMTP__SMTP_PORT=2525
AIRFLOW__SMTP__SMTP_USER=xxxxxxxxxxx
AIRFLOW__SMTP__SMTP_PASSWORD=xxxxxxx
AIRFLOW__SMTP__SMTP_MAIL_FROM=your_email@gmail.com
AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflowpassword
POSTGRES_DB=airflow
AIRFLOW_UID=1000
AIRFLOW_GID=0
AIRFLOW_UID=1000
AIRFLOW_GID=0
AIRFLOW_UID=1000
AIRFLOW_GID=0
PG_VER=14-alpine
POSTGRES_SRC_PASSWORD=Sup3rS3c3t
PORT=5432
POSTGRES_USER_OLAP=postgres
HOSTNAME=olap
ONTAINER_NAME=postgres
POSTGRES_DB_OLAP=postgres
grant permissions to the bash script:
chmod a+x build_Services.sh
Bash:
$ ./build_Services.sh
Build Docker:
$ docker-compose up --build -d
When everything is done, you can check all the containers running:
$ docker ps
oltp Interface
Now you can access adminer web interface by going to http://localhost:8085 with the default user which is in the docker-compose.yml:
Système MySQL
Serveur oltp
user root
password myrootpassword
Database testdb
olap Interface
Now you can access new adminer web interface by going to http://localhost:8085 with the default user which is in the docker-compose.yml:
Système PostgesSQL
Serveur olap
user postgres
password Sup3rS3c3t
Database postgres
Airflow Interface
Now you can access Airflow web interface by going to http://localhost:8080 with the default user which is in the docker-compose.yml. Username/Password: airflow/airflowpassword:
Airflow DAG
Now you can run Airflow etl dag:
Check oltp and olap database
:)
Check your mailtrap.io/inboxes
Shut down or restart Airflow
If you need to make changes or shut down:
$ docker-compose down