Building an end-to-end data pipeline using Azure Databricks (Part-1)

Sep 10, 2022

In this article you will learn how to develop an end-to-end data pipeline using Delta Lake which is an open-source storage layer that provides ACID transactions and metadata handling. Also you will learn how data moves from bronze to gold container, how to make an incremental load, create external tables for data analysis and orchestrate your pipeline. We will use technologies such as PySpark, ADSLS, Azure Databricks, Azure Data Factory and Power BI.

Requirements

Azure Account (Free Trial Subscription)
Basic knowledge of Azure
Basic knowledge of Databricks
Basic knowledge of Python and PySpark

The process will be divided in the following sub-articles:

Requirements
Set up azure services
Mount azure storage containers to Databricks
Use case explanation
Data Ingestion and Transformation
Data Enrichment
Pipeline using Data Factory

Building an end-to-end data pipeline using Azure Databricks (Part-1)

Requirements

Written by Alonso Medina Donayre