Building an end-to-end data pipeline using Azure Databricks (Part-6)

3 min readSep 16, 2022

Data Enrichment

In this article, we are going to perform some joins with our tables, aggregations, we will use upsert technique and finally query our tables.

Step 1 — Preparing scenario

Create a new folder on your workspace named enrichment.

Inside enrichment create 2 new python files named:
- customer.py
- loantTrx.py
Inside set-up folder create a new python file named:
- database.py

Step 2— Creating database

Copy the below code to the database notebook and execute it. Once execution finish you have mounted your database over the gold container.

Note: If you drop your database which is mounted on your gold container, all the files inside that container are going to be deleted too.

Step 3— Enrichment Customer

Copy the code below to your customer notebook inside enrichment folder, do not execute it yet, cause we are going to test it later.

Step 4— Enrichment Loan Transactions

We are going to generate two tables from loan transaction data, one table would a feature tale and the other one would be and aggregate table. Copy the code below to your python notebook.

Step 5— Testing Pipeline

It’s time to uncomment our two las code blocks. You can run your script for the following p_file_date values:
- “2022–09–10”
- “2022–09–11”
- “2022–09–12”

Step 6— Querying database tables

Inside your utilities folder, create a SQL Notebook, we will be querying the tables of our database. You can use sql for exploratory analysis on your data.