Road to Data Engineer 2.0!

Orapin Anonthanasap
Road to Data Engineer Student Blogs
5 min readOct 10, 2021

Last two months, I had an excellent opportunity to take the Road to Data Engineer (R2DE 2.0) course from DATAth.com. Road to Data Engineer (R2DE) is a course that provides fundamental to advance knowledge in the Data Engineer track with the workshop in every chapter. We can apply the knowledge to build our end-to-end Data Pipeline by using the cutting edge technology that is popular today

This dashboard was inspired by Road to Data Engineer course :)

LINK to my Google Data Studio Dashboard

My GitHub Code: https://github.com/orapinanon/dataengineer_proj

Tech Stack

✔️ Python (Google Colab)
✔️ SQL
✔️ Pandas
✔️ Apache Spark, PySpark
✔️ Apache Airflow, DAG: Directed Acyclic Graph
✔️ Google Cloud Platform (GCP)
✔️ Google BigQuery
✔️ Google Data Studio
✔️ Databricks
✔️ Basic and Advanced Git + CI/CD
✔️ Docker + Airflow + Kubernetes

Course Curriculum

Data Engineer end-to-end flow

✔️ CH0 Introduction to Data Engineering — Python & SQL

  • Basic of Data Engineer
  • Concept of Big Data & Hadoop
  • Database, Data Warehouse, Data Lake

✔️ CH1 Data Pipeline & ETL

  • Data Pipeline (ETL/ELT)
  • Data Pipeline designing
  • Data Integration
  • Workshop 1: Data Collection with Python (Google Colab), Pandas, REST API

✔️ CH2 Data Quality & Wrangling

  • Data Cleansing
  • Data Quality (Data Lineage and Data Dictionary)
  • EDA (Exploratory Data Analysis) and Data Profiling
  • Handling Anomaly และ missing data
  • Distributed Data Processing
  • Concept of Apache Spark
  • Workshop 2: Data Wrangling, Data Cleansing with Apache Spark (Colab and PySpark)

✔️ CH3 Basic Cloud — Google Cloud

  • Cloud Computing concept with Google Cloud Platform (GCP)
  • Concept of public / private / hybrid cloud
  • Cloud vs on-premise
  • Concept of Cloud computing e.g., Managed Service and serverless
  • GCP (Google Cloud Platform) services
  • Data Processing and storage on cloud
  • Basic of Bash Command
  • Workshop 3: Data Storage with GCS

✔️CH4 Data Pipeline Orchestration with Airflow

  • Data Pipeline Orchestration
  • Data Pipeline tool
  • Concept of Apache Airflow
  • DAG: Directed Acyclic Graph
  • Cloud Composer (GCP)
  • Create data pipeline with Apache Airflow
  • Workshop 4: Automated Data Pipeline with Airflow

✔️ CH5 Introduction to BigQuery

  • Data Warehouse
  • BigQuery and the concept of BigQuery
  • Load data into BigQuery
  • Design schema in BigQuery
  • Workshop 5: Building Data Warehouse with BigQuery (feeding data from AirFlow)

✔️ CH6 Introduction to Google Data Studio

  • Data visualization
  • Google Data Studio
  • Connect Data Studio with data sources
  • Dimension and Metric in chart
  • Workshop 6: Building dashboard with Google Data Studio (Data from BigQuery)

✔️ CH7 Advanced Data Engineering

  • Case study for using data pipeline
  • Data Architecture
  • Git and Docker container
  • Data Privacy
  • Introduce to Machine Learning engineer
  • ML deployment pipeline (MLDevOp)
  • The future of Data Warehouse e.g., Snowflake

✔️ Special Classes

  • Intro to Databricks
  • What I have learned working with data
  • Enterprise Data Architecture
  • Slowly Changing Dimension
  • What recruiter looks for in DE LinkedIn Profile
  • Intro to PowerBI
  • Soft skills for Data Engineers
  • Intro to Data Monitoring
  • Basic and Advanced Git + CI/CD
  • Docker + Airflow + Kubernetes

The Data Engineer workshop

Data Engineer end-to-end flow
  • Workshop 1: Data Collection with Python (Google Colab), Pandas, REST API
  • Workshop 2: Data Wrangling, Data Cleansing with Apache Spark (Colab and PySpark)
Example of our dataset (Audible book data)
  • Workshop 3: Data Storage with Google Cloud Storage
  • Workshop 4: Automated Data Pipeline with Airflow
  • Workshop 5: Building Data Warehouse with BigQuery (feeding data from AirFlow)
  • Workshop 6: Building dashboard with Google Data Studio (Data from BigQuery)

Workshop 6: Building dashboard with Google Data Studio (Data from BigQuery)

The dashboard is the last step of the Data Engineer/Data Science work. This article will show the last workshop building dashboard with Google Data Studio

Google Data Studio: Data Studio is a free tool that turns your data into informative, easy to read, easy to share, and fully customizable dashboards and reports.

Input: Datastore in BigQuery -> Output: Report and Dashboard on Google Data Studio

Example of our dataset (Audible book data)

Step to do Dashboard:

1. Create a table in BigQuery
2. Creating view, filtering some data for the dashboard

CREATE VIEW vw_customer_purchase AS
SELECT customer AS customer_name, COUNT(*) AS purchase_count
FROM sales_table
GROUP BY customer_name;

Create a View to let DataAnalyst see only information to use

  • Income (Thai Baht)
  • Country
  • Name book
  • Customer ID (take into account Total Customer)
  • Book category
  • Time to buy
  • Book ID (for future use)

3. Create Dashboard

LINK to my Google Data Studio Dashboard

The Dashboard 1 “Overview” contains:

Show summary:
- Business income
- Number of customers
- Number of purchases in each country
- Bestsellers
- Best-selling book category.

Dashboard 1: Overview

The Dashboard 2 “Search book by revenue” contains:

System for searching books by sales:
- can select country select sales at want to search
- Displays only books that meet the Search Criteria.

Dashboard 2: Search book by revenue

Notes: The parameter allows the user to enter their information. We can create a variable (Parameter) to allow users to fill in their data or choose.

Notes2: Calculated Field, write your equations If an existing column doesn’t meet our needs, we can write an equation from the existing data to create the column we need.

Certificate after graduation

After completing the course and passing the final exam, you will immediately receive a certificate from the Road to Data Engineer instructor!

Certificate after graduation

References Data

  1. Road to Data Engineer course by DataTH
  2. LINK to my Google Data Studio Dashboard

--

--

Orapin Anonthanasap
Road to Data Engineer Student Blogs

Digital Specialist – Data & AI 📊 | Live in Sydney, Australia 🇦🇺 | Data Science student, Ireland 🇮🇪 #UCD2018 | YWC#11 | https://www.linkedin.com/in/orapina