Data Engineering with Bash Script and MySQL

Level up your Data loading and ETL (Extract, Transform and Load) skills by learning Bash Script.

Shariful Hasnine Sabuj
CodeX
2 min readAug 20, 2022

--

Photo by Kevin Ku on Unsplash

Problem Statement

Let's understand the problem statement first. Suppose from the different source systems data is provided into a common directory as CSV files. Your job is to create tables, read data from that directory, truncate the table if it is a full load or just load data into the MySQL database if it is an incremental load. Doing some basic transformation and loading the data into the transform table. Uploading the transformation table as a CSV file for business and archiving the source file. And also scheduling the job every day.

ETL With Bash Script

We will work with PRODUCT (P_ID, PRODUCT_NAME, DESCRIPTION) master data, CUSTOMER (C_ID, CUSTOMER_NAME, AGE) master data and SALES (S_ID, C_ID, P_ID, QUANTITY, AMOUNT) transactional data. Assume that the PRODUCT.csv file is shared in the /home/user/PRODUCT directory, CUSTOMER.csv. file in /home/user/CUSTOMER directory and SALES.csv file in /home/user/SALES directory.

Let's Load the PRODUCT.csv file into the Database.

Same steps for loading the CUSTOMER.csv file

We will follow similar steps for the SLAES.csv file except we will not truncate the table as it is a transactional table.

Now all the data is loaded into the MySQL database. Now we will develop a store procedure to populate the transform table and will run the SP from the script. The purpose of the SP is to populate the transform table with Customer Name, Product Name, Quantity and Amount.

Now we will call the store procedure from the script, generate a file from the transform table and upload the file for business and remove it from the directory.

Now we need to write a cronjob to schedule the task at regular intervals. This task will now run every day at 23:59 PM.

We have completed the journey of ETL using a bash script.

Thank you for reading!

Stay tuned for more.

--

--

Shariful Hasnine Sabuj
CodeX

Java, Angular, Spring Boot, Ionic, Android, Oracle, MSSQL, AWS, ETL(ADF + Databricks), Data Lake and Data warehousing on Azure Platform, Power BI