CI/CD for Azure Synapse Analytics Pipelines with Azure DevOps yaml pipelines (Part 1)

Stefan Graf
CodeX
Published in
3 min readJun 29, 2022

--

In this short story I’ll showcase you, how to implement a Synapse Analytics CI/CD Pipeline using Azure DevOps. This is also very similar to the approach for Azure Data Factory, but it’s not 100% the same.

Photo by Danil Shostak on Unsplash

Meanwhile, there is an new and better way of achieving the goal of this article — you can find more details here New Approach — CI/CD for Synapse Analytics Pipelines with Azure DevOps yaml pipelines (Part 1.1) | by Stefan Graf | Feb, 2023 | Medium

Background

Before we jump in, we need to clarify some basic concepts on how this CI/CD pipeline will work. The fundament of this will be the repo integration provided Azure by Synapse Analytics. This gives us the capability to connect our Synapse Workspace to either Azure DevOps or GitHub (Enterprise) repos.

All you need to do is to create the git connection by providing basic information about your repo and you’re ready to start. This Git integration will then replace your standard “Synapse live” code management. Additionally, you need to define 2 different branches, the collaboration branch (most likely your main branch, where your most current stable version of your code lives) and an publish branch (by default called workspace_publish, an automatically by the system created branch, where your code lives as an ARM template styled manner).

But be aware, that according to Microsoft Documentation, Synapse is no longer for every Artifact a pure ARM template like ADF was prior. That means we have to use another way of deploying it, because simply deploying ARM templates won’t work anymore.

CI/CD approach

The approach on how to handle CI/CD with Azure Synapse, differs quite a lot from your standard approach. The only branch you can use to deploy your code is the publish branch (workspace_publish). This branch will be created/updated when you press publish in your Synapse UI, after you done any changes.

The actual working branch, where all the Pull Requests are integrated to implement new features, is the collaboration branch (main branch). This is also the base for your automated publishments.

CI/CD Concept Synapse

Azure DevOps yaml Pipeline

CI

Building is pretty much already done, because everything is already prepared as a deployment ready solution. This is the reason why you only need to package your code for traceability and reusability purposes.

CD

This task is also quite easy, because you can use a predefined task called “Synapse workspace deployment@2”. Here you only need insert your target Synapse Workspace, authenticate via Service Connection (Subscription). Also, you need to turn your triggers of for a clean build, but there is also a prebuild task to use in Azure DevOps called “toggle-triggers-dev@2".

CI/CD put together

And now both put together in a fully working yaml pipeline. Keep in mind that a windows vm is needed, because these predefined tasks we are using, are based on Powershell scripts, which didn’t work on an Ubuntu machine for me. The trigger fires always when a new Synapse template gets published into our publish branch.

Keep in mind that here everything happens in our workspace_publish branch. That means that your pipeline needs also be started manually from this branch.

Conclusion

This story should enable you to use Synapse in a more stable way, with a working CI/CD pipeline in the background.

--

--

Stefan Graf
CodeX

Data Engineer Consultant @Microsoft — Data and Cloud Enthusiast