End-to-end Azure data engineering project — Part 2: Using DataBricks to ingest and transform data

Patrick Nguyen
5 min readJun 3, 2023

This is a series of 4 articles demonstrating the end-to-end data engineering process on the Azure platform, using Azure Data Lake, DataBricks, Azure Data Factory, Python, Power BI and Spark technology. In this part 2, we will discuss how to use Databricks to ingest and transform the JSON files that we’ve imported from Part 1.

Please review the related articles in the series here:

End-to-end Azure data engineering project — Part 1: Project Requirement, Solution Architecture and ADF reading data from API

End-to-end Azure data engineering project — Part 2: Using DataBricks to ingest and transform data

End-to-end Azure data engineering project — Part 3: Creating data pipelines and scheduling using Azure Data Factory

End-to-end Azure data engineering project — Part 4: Data Analysis and Data Visualization (Power BI)

5. Data Ingestion

After you mount Databricks to ADLS and loading source data to the raw container, you can run this code to ensure that Databricks can read the JSON file from ADLS.

display(spark.read.json("/mnt/formula1datalake133/raw/circuits.json"))

--

--

Patrick Nguyen

Data Enthusiast with more than 15 years of experience in Data Engineering, Data Warehouse and Data Analytics. Ex Oracle/IBM