A roundtrip on Modern Data Platform with Azure Synapse Analytics and Azure DevOps — Part Two

Kevin Chant
Sogeti Data | Netherlands
7 min readMay 17, 2022

This post is part two in the series of posts which are a roundtrip on the Modern Data Platform with Azure Synapse Analytics and Azure DevOps.

The aim of this series is to get you familiar with modern data platforms such as Azure Synapse Analytics and to show you the advantages of the integration with Azure DevOps.

This is the second blog post in the series where I do an introduction to Azure DevOps. Along with how you can use it with Azure Synapse Analytics to strengthen the chain of deployment and iterate on standardized building blocks and automation.

For an introduction to Azure Synapse Analytics you can read part one of the series.

Introduction to Azure DevOps

Azure DevOps is a platform provided by Microsoft which you can use to manage various stages of your development work. Including managing work items, storing your code and deploying your releases.

It also allows you to manage your testing at various levels as well. Because you can automate various tests within a pipeline and manage manual tests using Azure Test Plans.

Below is one way that you can use the default services in Azure DevOps together.

Example of Azure DevOps services working together

Azure DevOps is available to use either in the cloud (Azure DevOps cloud services) or on-premises (Azure DevOps Server). This post will focus on the cloud version. However, the concepts can be adapted to use with Azure DevOps Server.

I wrote a whitepaper especially for this post which provides a more detailed introduction. So that I can focus on using Azure DevOps with Azure Synapse Analytics. It is called ‘A quick introduction to Azure DevOps’.

Synapse Studio objects in Azure DevOps

You can setup Git configuration in Synapse Studio to store various objects that exist within Synapse Studio together in one central Git repository. For example, linked services, notebooks and SQL scripts.

The screenshot below gives you an idea where and how to configure this inside the Manage Hub in Synapse Studio.

Git configuration in Synapse Studio

You can see that I set the root folder to be called resources. I did this so that the objects are stored in a subfolder in the Git repository. Below is how the repository looks in Azure Repos.

Objects stored in Azure Repos

Notice as well that I setup a collaboration branch. This is the default branch people will work on in Synapse Studio. You can use your default branch (which tends to be main). However, using a different branch makes it easier to implement CI/CD of individual objects.

You and your colleagues can create other branches within Synapse Studio.

Creating a new branch in Git

When you create a new branch you copy the contents of an existing branch that you can work on independently. Afterwards, you can merge your changes back into the collaboration branch if you want to by creating something called a pull request. Just like developers do within version control.

Note that when you go to create a pull request in Synapse Studio, it will open up a pull request in Azure DevOps.

Pull request in Azure DevOps

Another key point I want to highlight is that the Git configuration in Synapse Studio does not automatically store objects that exist within SQL Pools or Spark clusters.

Special branch

You will notice above that there is a workspace_publish branch in the Git repository that you cannot select in Synapse Studio. This is because this is a special branch which is used to store all the objects you have published to Azure within Azure Synapse.

After you have setup Git configuration you can use ‘Commit’ to save the changes of your changes to the Git repository in Azure DevOps. However, you can also select ‘Publish’ to have all your Synapse objects saved in Azure instead.

When you Publish objects in Synapse Studio, they are added to a json file in the workspace_publish branch inside your Git repository. Which contains the definition of every single object saved in Azure.

Example of the workspace_publish branch

You can use the contents of this branch to completely migrate all the workspace contents you have published to Azure to another workspace.

For example, if you have a development workspace you can publish all your objects. Which will then update the json files in your Git repository. From there, you can either use a YAML pipeline or the Releases feature in Azure Pipelines to deploy the updated contents to a Production workspace.

Below you can see how it looks in the Releases feature. Which is the older GUI-based way of doing Releases. If you are adopting modern Infrastructure as Code practices then I recommend using the Pipelines feature to do it within a YAML pipeline instead.

Using Releases feature to perform CI/CD for your workspace

Microsoft provides a guide on how to perform CI/CD using this method called ‘Continuous integration and delivery for an Azure Synapse Analytics workspace’.

As it mentions in the ‘Prerequisites’ section you need to install the Synapse Workspace Deployment extension in Azure DevOps before you look to do this.

Copying branch objects

Another method you can use is to copy the individual objects from your existing Git repository into another Git repository for a different workspace.

For example, I can copy the below SQL script from this Git repository to another Git repository that has been setup for Production use.

Individual object in Git

I can do this a few different ways. For example, I can do a manual copy and paste of the object into the new repository. However, a more graceful method is to automate it using Azure Pipelines.

I wrote a post a while back called ‘Automate a pipeline migration to a Synapse workspace using Azure DevOps’. Which gives you an idea of how to do this in Azure DevOps using a YAML pipeline.

SQL Pools

As I mentioned earlier in this post, objects in SQL Pools are not stored within Git repository that you setup in Synapse Studio. So, you must use other methods instead.

One key point to remember is that when you look to do CI/CD for SQL Pools you must connect using a SQL endpoint instead of a server name. You can find these endpoints in the Overview section of your Synapse workspace in the Azure Portal.

SQL Endpoints in the Azure Portal

You can perform CI/CD for dedicated SQL Pools using state-based or migration-based deployments. State-based migration is where you copy over an existing state, whereas migration-based is where you run all the incremental changes individually.

Dedicated SQL Pools

You can use a lot of the same tools you use with SQL Server to perform CI/CD with dedicated SQL Pools.

For example, to do state-based deployment you can use the popular dacpac method. Where you create a special file known as a dacpac and then update a dedicated SQL Pool using the dacpac file.

Most people either create a dacpac either from an existing database or from within a collection of files that represent the database schema known as a Database Project.

You can create a Database Project from an existing dedicated SQL Pool using a variety of applications including Visual Studio and Azure Data Studio. You can read one way to do this in a post I wrote called ‘Create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure Data Studio’.

As it shows in that post, you can create a dacpac directly from a Database Project within various applications. However, if you want to perform CI/CD you must make sure that a dacpac is created every time a Database Project is updated.

To do this in Azure DevOps you store the Database Project in Azure Repos. From there you can setup a deployment pipeline in Azure DevOps to build the dacpac every time the Database Project is updated.

I show how you can set this up in a post I wrote called ‘Create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure DevOps’.

I cover the method for migration-based deployments next.

Serverless SQL Pools

Currently, you cannot use the dacpac method to perform CI/CD for Serverless SQL Pools. In fact, it caused a lot of discussions in the Microsoft Data Platform community.

However, I came up with a way for this to be done by using a migration-based method.

To cut a long story short, in Visual Studio you can create a C# project for the DbUp .NET library. Once done you can then make the folder that contains the project a Git repository and synchronize the contents with a Git repository in Azure Repos.

From there, you can implement CI/CD in a deployment pipeline using a PowerShell module called DBOps.

You can find out more about this in detail in a post I wrote called ‘CI/CD for serverless SQL pools using Azure DevOps’.

Final words

I hope you found part two of this series about Modern Data Platform Azure DevOps with Azure Synapse Analytics useful.

Of course, if you have any comments or queries about this feel free to reach out to me.

--

--