Global Parameters 101 in Azure Data Factory

Dian Germishuizen
7 min readMay 21, 2022

--

Azure Data Factory Logo

What the heck are they?

Global Parameters are fixed values across the entire Data Factory and can be referenced in a pipeline at execution time. They have many applications e.g. when multiple pipelines require identical parameters and values at run time and you don’t want to duplicate them in variables across said pipelines.

When you utilize the continuous integration and deployment process (CI/CD) via Azure Pipelines to promote your code to a new environment, you can also override the parameters at deployment time to ensure the values are applicable to your target environment.

This is the use case I will focus on in this article (not necessarily how to configure CI/CD, just that use case of using them to store critical config options. CI/CD setup tutorial article is still in the works).

How to create Global Parameters

1. Once logged into your Data Factory workspace, navigate to the Manage tab on the left-hand side, then to the Global Parameters section.

2. Click on the “+ New” button just underneath the page heading.

New Global Parameter in Azure Data Factory

3. In the popup window that appears to the right hand side of the screen:

  • Supply the name of the variable (avoid spaces and dashes in the name, this causes runtime errors at the moment due to how parameter names are referenced at runtime)
  • Define the data type of the variable e.g. string, integer etc.
  • Provide the value. Note, this is a read only parameter and cannot be updated at runtime like a variable
New Global Parameter Attributes Creation
Data Type Selection

4. Once created, you can edit it if need be. If you need to apply bulk changes, the “Edit All” button can assist with that.

Edit All Button
Edit All View

Referencing the Global Parameters at runtime

You can reference the global parameters in any activity in the pipeline which supports custom expressions. Global parameters are referenced as pipeline().globalParameters.<parameterName>.

Store source and target resources connection details

Even if your solution does not have any CI/CD pipelines configured as yet, you can still leverage Global Parameters to easily define environment specific values for your pipelines.

The use case I use them for most often is to give the connection details of the resources I will be connecting to in the current environment such as the URL to the key vault, URL to the Azure SQL Server, the SQL Server database names I need to pull data from or push data to etc. Note, this is only the URLs and Database names of the resources, not sensitive information such as passwords which should be maintained via Key Vault.

Using this pattern allows you to make your linked services and datasets dynamic and not require changes when promoting code from one environment to another. It also helps to limit the number of linked services and datasets you need per technology i.e. even if you are connecting to 10+ Azure SQL Server databases, you only need one linked service and one dataset to connect to them. You use the global variables to alter the connection string details at runtime as needed. A full rundown of how to limit the number of datasets and linked services by using parameterization will be published very soon as well.

A method I have used to great success in the past when CI/CD was not an option for promoting my UAT data factory to PROD:

  1. Connect the UAT data factory to the GIT repo’s UAT branch.
  2. Once all code changes have been made as needed, perform a pull request from the UAT branch to the PROD branch.
  3. Perform a commit in the PROD branch to update the .json file containing the global parameters to the PROD values.
  4. Connect the PROD data factory to the PROD branch.
  5. Publish the PROD branch’s code to the Live Mode of the PROD Data factory.

Something to consider here is that human error can play a big role in applying incorrect config value updates, so please be careful when using this solution. This is really only a band aid solution for when there are outside factors delaying or preventing the acquisition of the necessary resources to setup CI/CD — which is the preferred approach.

When using CI/CD

You can manipulate your global parameters in CI/CD setups using one of two methods:

  1. Choose to include them in the ARM template generated at publish time and alter them using Azure Pipelines
  2. Update them using post deploy PowerShell scripts

How to setup the actual CI/CD pipeline and alter these values is beyond the scope of this article.

Choose to include them in the ARM template generated at publish time

This is the preferred method, since there are less custom scripting processes required. The method integrates natively with Microsoft guidelines on how to implement CICD found here.

When using this method, the global parameters are added as ARM Template Parameters. You can enable this using the checkbox labelled “Include in ARM Template” in the global parameters management area.

Note, this option will only be available when the data factory is connected to a GIT Repo.

Include in ARM Template Tooltip

Update the parameters using post deploy PowerShell scripts

This method is useful in cases where you need to override factory level settings and you do not want to perform an entire data factory override using ARM Template deployments.

When you define global parameters in your data factory and connect it to a GIT Repo, there is a file generated in the repo in the factory folder in a file titled <DataFactoryName>.json. The contents of this file may look something like the below:

Please note, the official documentation here appears to be outdated in the method these global variables are stored.
The article mentions that the global variables are stored in a dedicated json file in the git repo in a folder called globalParameters.
However, when I set this up myself, I found it was stored as noted above in the factory config file as a nested attribute.

The actual setup of the activity in the Azure Pipelines to run PowerShell code is beyond the scope of this article.

Microsoft Provided PowerShell for dedicated globalParamters file alteration

Here is a sample PowerShell script provided by Microsoft to get the contents of the source data factory dedicated parameters .json file and push altered values to a target data factory.

Custom PowerShell for factory config file globalParamters attribute alteration

Here is an altered version I created using Microsoft’s example as the base. The only difference is on line 28 where you reference the area in the JSON file which contains the multiple global parameter definitions.

Instead of taking the entire source file contents as the list of parameters, you reference the nested attribute that contains the values you need.

Disclaimer — I have not yet tried the PowerShell code above myself, I only came across this as a thing while researching for this article. So if you find any piece of the code not working as expected, please leave a comment below with your findings so we can update the article to ensure the code is usable for anyone referencing it in future.

Pipelines in Synapse Analytics Consideration

At the time of writing this article, global parameters are not yet available for Synapse Analytics Pipelines.

However, the workaround I have used in the past is:

  1. For each “global variable” you need, define a variable in the pipeline itself
  2. Use a “Set Variable” activity at the start of the pipeline flow to update this variable
  3. For the expression, use an if statement to determine what the current synapse workspace name is by referencing the system variable that contains the current synapse workspace name.
  4. If it is the DEV instance, provide the DEV version of your parameter value, if it is UAT instance name, provide the UAT value name etc.

This is a more cumbersome workaround, so if anyone has more efficient ideas, please comment below.

Tags

#automation, #data-factory, #synapse-analytics

Thank you for reading my ramblings, if you want to you can buy me a coffee here: Support Dian Germishuizen on Ko-fi! ❤️

My Socials

My Blog: diangermishuizen.com

Linked In: Dian Germishuizen | LinkedIn

Twitter: Dian Germishuizen (@D_Germishuizen) / Twitter

Credly: Dian Germishuizen — Badges — Credly

--

--

Dian Germishuizen

I have been working in the Technology Industry as a Data Engineer since 2016. I have a passion for learning new things and sharing that knowledge with others.