Building an end-to-end data pipeline using Azure Databricks (Part-1)

--

In this article you will learn how to develop an end-to-end data pipeline using Delta Lake which is an open-source storage layer that provides ACID transactions and metadata handling. Also you will learn how data moves from bronze to gold container, how to make an incremental load, create external tables for data analysis and orchestrate your pipeline. We will use technologies such as PySpark, ADSLS, Azure Databricks, Azure Data Factory and Power BI.

Requirements

  • Azure Account (Free Trial Subscription)
  • Basic knowledge of Azure
  • Basic knowledge of Databricks
  • Basic knowledge of Python and PySpark

The process will be divided in the following sub-articles:

  1. Requirements
  2. Set up azure services
  3. Mount azure storage containers to Databricks
  4. Use case explanation
  5. Data Ingestion and Transformation
  6. Data Enrichment
  7. Pipeline using Data Factory

--

--

Alonso Medina Donayre

I am very interested in topics related to Data, Software and Management.