Road to Data Engineer 2.0!
Last two months, I had an excellent opportunity to take the Road to Data Engineer (R2DE 2.0) course from DATAth.com. Road to Data Engineer (R2DE) is a course that provides fundamental to advance knowledge in the Data Engineer track with the workshop in every chapter. We can apply the knowledge to build our end-to-end Data Pipeline by using the cutting edge technology that is popular today
This dashboard was inspired by Road to Data Engineer course :)
LINK to my Google Data Studio Dashboard
My GitHub Code: https://github.com/orapinanon/dataengineer_proj
Tech Stack
✔️ Python (Google Colab)
✔️ SQL
✔️ Pandas
✔️ Apache Spark, PySpark
✔️ Apache Airflow, DAG: Directed Acyclic Graph
✔️ Google Cloud Platform (GCP)
✔️ Google BigQuery
✔️ Google Data Studio
✔️ Databricks
✔️ Basic and Advanced Git + CI/CD
✔️ Docker + Airflow + Kubernetes
Course Curriculum
✔️ CH0 Introduction to Data Engineering — Python & SQL
- Basic of Data Engineer
- Concept of Big Data & Hadoop
- Database, Data Warehouse, Data Lake
✔️ CH1 Data Pipeline & ETL
- Data Pipeline (ETL/ELT)
- Data Pipeline designing
- Data Integration
- Workshop 1: Data Collection with Python (Google Colab), Pandas, REST API
✔️ CH2 Data Quality & Wrangling
- Data Cleansing
- Data Quality (Data Lineage and Data Dictionary)
- EDA (Exploratory Data Analysis) and Data Profiling
- Handling Anomaly และ missing data
- Distributed Data Processing
- Concept of Apache Spark
- Workshop 2: Data Wrangling, Data Cleansing with Apache Spark (Colab and PySpark)
✔️ CH3 Basic Cloud — Google Cloud
- Cloud Computing concept with Google Cloud Platform (GCP)
- Concept of public / private / hybrid cloud
- Cloud vs on-premise
- Concept of Cloud computing e.g., Managed Service and serverless
- GCP (Google Cloud Platform) services
- Data Processing and storage on cloud
- Basic of Bash Command
- Workshop 3: Data Storage with GCS
✔️CH4 Data Pipeline Orchestration with Airflow
- Data Pipeline Orchestration
- Data Pipeline tool
- Concept of Apache Airflow
- DAG: Directed Acyclic Graph
- Cloud Composer (GCP)
- Create data pipeline with Apache Airflow
- Workshop 4: Automated Data Pipeline with Airflow
✔️ CH5 Introduction to BigQuery
- Data Warehouse
- BigQuery and the concept of BigQuery
- Load data into BigQuery
- Design schema in BigQuery
- Workshop 5: Building Data Warehouse with BigQuery (feeding data from AirFlow)
✔️ CH6 Introduction to Google Data Studio
- Data visualization
- Google Data Studio
- Connect Data Studio with data sources
- Dimension and Metric in chart
- Workshop 6: Building dashboard with Google Data Studio (Data from BigQuery)
✔️ CH7 Advanced Data Engineering
- Case study for using data pipeline
- Data Architecture
- Git and Docker container
- Data Privacy
- Introduce to Machine Learning engineer
- ML deployment pipeline (MLDevOp)
- The future of Data Warehouse e.g., Snowflake
✔️ Special Classes
- Intro to Databricks
- What I have learned working with data
- Enterprise Data Architecture
- Slowly Changing Dimension
- What recruiter looks for in DE LinkedIn Profile
- Intro to PowerBI
- Soft skills for Data Engineers
- Intro to Data Monitoring
- Basic and Advanced Git + CI/CD
- Docker + Airflow + Kubernetes
The Data Engineer workshop
- Workshop 1: Data Collection with Python (Google Colab), Pandas, REST API
- Workshop 2: Data Wrangling, Data Cleansing with Apache Spark (Colab and PySpark)
- Workshop 3: Data Storage with Google Cloud Storage
- Workshop 4: Automated Data Pipeline with Airflow
- Workshop 5: Building Data Warehouse with BigQuery (feeding data from AirFlow)
- Workshop 6: Building dashboard with Google Data Studio (Data from BigQuery)
Workshop 6: Building dashboard with Google Data Studio (Data from BigQuery)
The dashboard is the last step of the Data Engineer/Data Science work. This article will show the last workshop building dashboard with Google Data Studio
Google Data Studio: Data Studio is a free tool that turns your data into informative, easy to read, easy to share, and fully customizable dashboards and reports.
Input: Datastore in BigQuery -> Output: Report and Dashboard on Google Data Studio
Step to do Dashboard:
1. Create a table in BigQuery
2. Creating view, filtering some data for the dashboard
CREATE VIEW vw_customer_purchase AS
SELECT customer AS customer_name, COUNT(*) AS purchase_count
FROM sales_table
GROUP BY customer_name;
Create a View to let DataAnalyst see only information to use
- Income (Thai Baht)
- Country
- Name book
- Customer ID (take into account Total Customer)
- Book category
- Time to buy
- Book ID (for future use)
3. Create Dashboard
LINK to my Google Data Studio Dashboard
The Dashboard 1 “Overview” contains:
Show summary:
- Business income
- Number of customers
- Number of purchases in each country
- Bestsellers
- Best-selling book category.
The Dashboard 2 “Search book by revenue” contains:
System for searching books by sales:
- can select country select sales at want to search
- Displays only books that meet the Search Criteria.
Notes: The parameter allows the user to enter their information. We can create a variable (Parameter) to allow users to fill in their data or choose.
Notes2: Calculated Field, write your equations If an existing column doesn’t meet our needs, we can write an equation from the existing data to create the column we need.
Certificate after graduation
After completing the course and passing the final exam, you will immediately receive a certificate from the Road to Data Engineer instructor!