The perfect data pipeline doesn’t exist: Azure

Published in

Orchestra’s Data Release Pipeline Blog

5 min readJun 11, 2023

This is a multi-part article series where I dive into the ideal stack you’d use for a data engineering pipeline given constraints around what software providers to use. I aim to provide some indications of cost, ease of use, and functional limitations / cool features.

Azure

Azure is not a huge fan-favourite in the data engineering community but thought it was worth speaking about given Fabric has been released.

This is definitely the v1.0 pipeline. You will probably do bits and bobs differently with Fabric.

So without further ado:

Data stack structure

Here are the pieces of a “Modern Data Stack” I’ll try to replicate only using Azure services:

- Data ingestion service to Data Lake
- Data Lake (Azure Data Lake)
- Data warehouse (Azure Synapse)
- Workflow orchestrator (Azure Data Factory)
- Transformation tool:
- Visualisation: power BI
- rELT: self-hosted app services

Data ingestion

Data ingestion to the data lake is pretty straightforward. If the underlying databases are hosted in…

The perfect data pipeline doesn’t exist: Azure

Azure

Data stack structure

Data ingestion

Written by Hugo Lu