Pipelines in Azure Synapse (& Data factory)

5 min readApr 20, 2022

Often is the case where we need to ingest Data following the same format, from multiple providers. If we have a relatively small amount of providers, we can create individual pipelines, however, if this number scales to 100s or 1000s, we need to automatically iterate our ingestion process. This short tutorial handles this.

Classic ETL structure in ADF (Azure Data Factory)

Problem

I have 150 providers of data, and they all provide data with the same schema in CSVs. I want to Copy this data from external storage or SFTP into my Data Warehouse and (optionally) do some transformation along the way. In this case, the Files sit in the Following structure

raw/ProviderA/FileA.csv
              FileB.csv
              FileC.csv
   /ProviderB/FileA.csv
              FileB.csv
              FileC.csv (...)

And I want to move them to my processed folder with the following structure while converting them to Parquet (Each batch is saved in a folder with the load date). I also want to log my actions and delete all files in raw at the end.

processed/20042022/ProviderA/FileA.parquet
                               FileB.parquet
                               FileC.parquet
                    /ProviderB/FileA.parquet
                               FileB.parquet
                               FileC.parquet…

Pipelines in Azure Synapse (& Data factory)

Problem

Written by Joao Maria Jesus