Building an end-to-end data pipeline using Azure Databricks (Part-1)
Sep 10, 2022
In this article you will learn how to develop an end-to-end data pipeline using Delta Lake which is an open-source storage layer that provides ACID transactions and metadata handling. Also you will learn how data moves from bronze to gold container, how to make an incremental load, create external tables for data analysis and orchestrate your pipeline. We will use technologies such as PySpark, ADSLS, Azure Databricks, Azure Data Factory and Power BI.
Requirements
- Azure Account (Free Trial Subscription)
- Basic knowledge of Azure
- Basic knowledge of Databricks
- Basic knowledge of Python and PySpark
The process will be divided in the following sub-articles:
- Requirements
- Set up azure services
- Mount azure storage containers to Databricks
- Use case explanation
- Data Ingestion and Transformation
- Data Enrichment
- Pipeline using Data Factory