The perfect data pipeline doesn’t exist: Azure

Hugo Lu
Orchestra’s Data Release Pipeline Blog
5 min readJun 11, 2023

--

This is a multi-part article series where I dive into the ideal stack you’d use for a data engineering pipeline given constraints around what software providers to use. I aim to provide some indications of cost, ease of use, and functional limitations / cool features.

A simple data engineering architecture in Azure, with limited bells and whistles

Azure

Azure is not a huge fan-favourite in the data engineering community but thought it was worth speaking about given Fabric has been released.

This is definitely the v1.0 pipeline. You will probably do bits and bobs differently with Fabric.

So without further ado:

Data stack structure

Here are the pieces of a “Modern Data Stack” I’ll try to replicate only using Azure services:

- Data ingestion service to Data Lake
- Data Lake (Azure Data Lake)
- Data warehouse (Azure Synapse)
- Workflow orchestrator (Azure Data Factory)
- Transformation tool:
- Visualisation: power BI
- rELT: self-hosted app services

Data ingestion

Data ingestion to the data lake is pretty straightforward. If the underlying databases are hosted in…

--

--

Hugo Lu
Orchestra’s Data Release Pipeline Blog

I write on Data engineering and the coolest data stuff. CEO@ Orchestra, the best-in-class unified control plane for dataops. https://app.getorchestra.io/signup