Member-only story

The simplest ETL stack in Azure — Data Engineering

6 min readAug 31, 2023

Starting a Data Engineering project can be daunting, especially for newcomers. One of the initial challenges is understanding the flow of activities required to begin processing data effectively. This post aims to provide an example of a fundamental ETL process using Excel files, orchestrated with Data Factory, Databricks, and Data Lake.

I’m not going to explain how to create the resources in Azure, since they’re mostly self explanatory in the Azure Portal, but we are going to see the configuration needed in each resource.

Data Lake Storage

We need to create the next containers:

raw: Used to get the data “as-is” from our sources.

staging: Used to put the data once it has been processed. This is the data that’s going to end in our Warehouse Database.

The container “simulatingasource” is only used to host our Excel file as a data source.

You can download data samples from this site: https://www.contextures.com/xlsampledata01.html

The simplest ETL stack in Azure — Data Engineering

Data Lake Storage

Databricks

Written by Jose Colque

Responses (1)