Building an end-to-end data pipeline using Azure Databricks (Part-4)

Alonso Medina Donayre
2 min readSep 14, 2022

--

Use Case Explanation

We will be working with transactional data referred to loan transactions and customers from GeekBankPE (a famous bank around the world).

You have two requirements from different areas of the bank.

  • The Marketing area needs to have updated customer data to be able to contact them and make offers.
  • The Finance area requires to have daily loan transactions complemented with customer drivers to be able to analyze them and improve the revenue.

To comply with the request, we are going to perform incremental loads and also using techniques like upsert.

Architecture

We are going to work following the delta lake architecture.

Bronze: Raw data (Data stored in original format)
Silver: Transformed data (Data stored in delta format)
Gold: Feature/Agg data (Data stored in delta format)

  1. Requirements
  2. Set up azure services
  3. Mount azure storage containers to Databricks
  4. Use case explanation
  5. Data Ingestion and Transformation
  6. Data Enrichment
  7. Pipeline using Data Factory

--

--

Alonso Medina Donayre

I am very interested in topics related to Data, Software and Management.