Azure Synapse Environment Setup Key Considerations — Get started using ARM Template Provisioning Scripts

Objective

Inderjit Rana
Microsoft Azure
11 min readSep 15, 2021

--

Azure Synapse Analytics is one of the core services in Azure Data platform. It is a composite service with quite a few components and when getting started it might require decent understanding of quite a few components of Azure Synapse as well as general Azure Platform. The goal of this blog post is to summarize important concepts needed to implement an Azure Synapse Environment in an enterprise setting, share a reference architecture along with provisioning scripts to easily get started with a secure Azure Synapse environment — https://github.com/isinghrana/my-azure-utils/tree/master/synapse

Note: Just to warn you this is not a basic article and goes pretty deep on concepts for real world implementation, I assume you have familiarity with Azure Platform as well as basic understanding of Azure Synapse at a high level , if not please do check out Azure Synapse Basics section on my previous blog post.

Table of Contents

Since lot of ground was covered in this blog its important to add a Table of Contents. It would be a good idea to go through all sections if you desire to understand the concepts better (understand what is being done and why) but if your goal is to quickly get started with a secure Synapse environment you can directly jump to the last two sections (Sections 9 and 10).

  1. Context
  2. Key Consideration — Managed VNET
  3. Key Consideration — Allow Azure Services
  4. Key Consideration — Public Endpoint Enabled or Disabled
  5. Key Consideration — Data Exfiltration Protection
  6. Key Consideration — Storage Accounts
  7. Key Consideration — Workspace Managed Identity
  8. Azure Synapse Studio Storage Explorer — Considerations for Network Security and Azure AD Passthrough for Authorization
  9. Putting it all together — Reference Architecture for Azure Synapse Environment
  10. Granting end-user permissions for Azure Synapse Workspace

Context

There are quite a few key considerations for setting up Azure Synapse environment in enterprises. One of the common security requirement is that the resources like Synapse Studio which is a web interface for Synapse, SQL Dedicated Pool, SQL Serverless Pool, Storage Account associated with Synapse, etc. are network protected so that will also be an integral part of the discussion here. When network security topic is touched upon I always like to re-iterate that there are multiple layers of security and network security is in addition to the authentication/authorization layer.

In the following sections I will go over the various key considerations in detail and help guide on pros/cons of choosing one setting value over others.

Key Consideration — Managed VNET

Managed Virtual Network is a relatively new concept where a PaaS Service is created in an isolated Virtual Network (aka VNET) but the VNET is completely managed by the Azure Platform and there is no need to specify IP Ranges, Subnets, etc. Whether to choose Managed VNET or create your own is one of the initial decision points when creating Azure Synapse Workspace resource and this cannot be changed once the Workspace has been created. My guidance would be to choose Managed VNET and not create your own VNET, more detail and advantages are well documented on the public docs here

Key Consideration — Allow Azure Services

Allow Azure Services and resources to access this server setting is usually preferred to be No (or Unchecked) by customers who strict network security and my guidance would be to stick with setting No as well. More details on this setting are documented on the public docs here under the Connections from inside Azure section. Its possible to change this setting after the workspace is created. Figure 1 below shows Azure Synapse Workspace Network Setting which has the Allow Azure Services checkbox.

Figure 1: Synapse Workspace Network Settings

Key Consideration — Public Endpoint Enabled or Disabled

The screenshot in Figure 1 above shows the Public Endpoint Enabled or Disabled setting. These are two different paths for network protection:

Public Endpoint Enabled

This method is much easier to get started, in this implementation Public endpoints for the resources exist but it is still secure because you as a customer control IP ranges which can connect to Azure Synapse or other associated resources using Firewall settings. Usually connections originating from corporate networks have a finite list of Public IP Ranges (also referred to as NAT IP Ranges) and that can be allowed using Synapse IP firewall. As part of this blog and companion provisioning scripts I have chosen to focus on this method of applying network security as I do consider this acceptable. I do acknowledge this might not be good enough for some folks so I will cover full private mode with Private Endpoint/Private Link in as subsequent blog post.

Public Endpoint Disabled (Full Private Mode)

This methods requires that Private Endpoint are used and connections are allowed only from specific customer owned VNETs using services like Private Endpoints/Private Link and connections from on-premises require VPN/Express Route.

Note: You can change these settings after creating a Synapse Workspace but that will have a major impact the patterns on how you consume this service so try to avoid if you are beyond proof of concept stage.

Key Consideration — Data Exfiltration Protection

This is another security aspect which you might want to consider, I chose not to focus on this particular aspect of network security and did not enable Data Exfiltration Protection. The reason to enable this would be to protect from internal threat where a rogue authorized user exfiltrates data to an external source. You can read more about this on the public docs here and it would be good for you to be aware of some limitations around Spark Package Management for workspaces with Data Exfiltration Enabled which are documented here.

Note: You cannot change the workspace configuration for managed virtual network and data exfiltration protection after the workspace is created.

Key Consideration — Storage Accounts

Azure Synapse requires a default Storage Account and additional Storage accounts can be added as linked services. My guidance would be a good idea to consider default Storage Account for Azure Synapse Metadata needs and associate a secondary Storage Account for business data. Its a common requirement to have Storage Accounts to be network protected which translates to Allow access from Selected Networks only rather than All Networks as shown in the diagram below.

Figure 2: Storage Selected Networks Only

Key Consideration — Workspace Managed Identity

Managed Identity is a very nice feature of Azure platform and Azure AD available for some of the Azure native services including Azure Synapse Workspace, the idea is for easy platform managed authentication between Azure resources. For example, if Synapse Pipeline has to connect to SQL Dedicated Pool it requires a connection string and usual implementation uses a connection string with username/password but with Managed Identity the resource in this case Synapse Pipeline has an Azure AD Identity associated with it and if it is authorized in the SQL Dedicated Pool authentication is completely managed by the platform without the need to have any password or key in the connection string. Read here for more details on Azure Synapse Managed Identity.

Permissions assigned to Synapse Workspace Managed Identity

  • Granting Synapse Workspace Managed Identity Control permissions on SQL Dedicated Pool is another choice, Control permissions are equivalent of Admin level permission on SQL Dedicated Pool. These permissions can also be granted at a later time after the creation of the Workspace so I would suggested not granting these permissions to begin with unless you have the need, example — Polybase loads using Synapse Pipelines would require such high level permissions be granted to Synapse Workspace Managed Identity.
  • Workspace Managed Identity is required to have Storage Blob Data Contributor permissions on the default Storage account for certain components of Azure Synapse to work (documented in the public docs here).
  • Workspace Managed Identity is not required to have permissions on the Secondary Storage Account

Permissions for Synapse users to use Synapse Workspace Managed Identity

Synapse Security Model at the time of writing does require users to have permissions on Synapse Workspace Managed Identity to be able to run Synapse Pipelines. If you have multiple teams using the same Synapse Workspace and intention is to strictly control access to data in Storage Account use caution in giving end users permissions on Synapse Workspace Managed Identity. I would go as far as avoid giving end users permissions to use Synapse Managed Identity nor grant Synapse Managed Identity e permissions on the Storage Account (secondary storage with business data) but you will need to evaluate yourself the feasibility of this design because it would mean end users cannot run Synapse Pipelines. Alternative could be that a small group of Data Engineers implement Synapse Pipelines and are given permissions on Workspace Managed Identity not the wider user base, other suggestion would be to just T-SQL Copy Command or Polybase if mere usage of Synapse Pipeline is to push the data from Storage Account into SQL Dedicated Pool.

Synapse Administrator and Synapse Credential roles can be used to control who has permissions to use Workspace Managed Identity. The overall Synapse Access Control is documented here but the following screenshot shows the Synapse interface where this can be managed.

Figure 3: Permissions to Workspace Managed Identity

Azure Synapse Studio Storage Explorer — Considerations for Network Security and Azure AD Passthrough for Authorization

Connections to Storage Account are from user’s browser to the Storage Account so Network Settings need to be set appropriately

When end users browse data in Storage Account using Synapse Studio web interface, the connections are going from end user’s web browser to Storage Account and not from Synapse Workspace to Storage Account so network security rules need to be configured appropriately:

  • The connections from on-prem to Azure Storage coming over public internet will require that Corporate IP Ranges are allowed on the Storage Account Firewall.
  • The connections from a VM inside a VNET in Azure need to either allow that VNET using VNET Service Endpoint or additional Private Endpoint need to be created for Storage Account in that VNET, I will cover this in full private mode setup (not focus of this blog post)

End User Azure AD Authorization Context not Workspace Managed Identity

The default Storage Account Linked Service is setup out of the box using Synapse Workspace Managed Identity, its logical to setup the Linked Service for secondary Storage Account using Workspace Managed Identity as well. It might not be obvious but Linked Service setup is responsible only to make Storage Account show up as a node in the Synapse Studio Storage Explorer but when user is trying to browse the contents of the storage account its user’s Azure AD Passthrough Credentials that take affect. When Linked Service is used within a Synapse Pipeline then Workspace Managed Identity is used but as far as the interactive Storage Browsing experience in Synapse Studio is concerned Workspace Managed Identity plays no role except showing a node for Storage Account in the Storage Browser interface.

The use cases where end users are given access to the entire Storage Account, the setup is to simply add users to one of the Data Plane roles (Storage Blob Data Reader, Storage Blob Data Contributor or Storage Blob Data Owner) at the Storage Account level.

The use cases where multiple teams are using the same storage account and need more finer grain access control to their respective containers or folders within container it won’t be feasible to grant data roles at the Storage Account level and in that case you will need to have the following setup:

  • Grant users Reader permissions at the Storage Account Level, this does not give user access to the data or Access Key it’s just Control Plane permission to be able to list Storage Accounts and Containers but see no data.
  • Once the user has Reader permission he will be able to list containers but if they want to see the data inside the Container standard RBAC or ACL Permissions need to be configured.

Please checkout my blog on Azure Storage Explorer with Azure AD which goes into nuts and bolts of how Storage RBAC and ACL work with Storage Explorer (desktop client tool for Storage), it’s not Synapse Storage browser but same concepts carry over.

Putting it all together — Reference Architecture for Azure Synapse Environment

Now we have covered the concepts so I would like to tie things together using couple architecture diagrams, its important that I reference the following blog post Understanding Azure Synapse Private Endpoints which explains Azure Synapse networking concepts in great detail and helped me build better understanding of Synapse Analytics infrastructure. The diagram below is also taken from the same blog post and gives a very accurate picture of various components of a bare bone Azure Synapse Workspace in a Managed VNET .

Figure 4: Azure Synapse
  • Spark and Synapse Pipeline Integration Runtimes are inside VNET but SQL Dedicated Pool and SQL Serverless are not inside a VNET.
  • By Default Private Endpoints created for SQL Serverless and SQL Dedicated Pool
  • By Default No Private Endpoint are created for Storage Account

Challenge

The main challenges of the default setup is that Storage Accounts have allow All Networks setting enabled which is not acceptable for majority of the enterprises and once the Storage Accounts are changed to Selected Networks:

  • Spark cannot talk to the Storage Account unless Managed Private Endpoint is created for the Storage Account
  • SQL Pools cannot interact with Storage Account unless Resource Access Rule is added on the Storage Account to allow connections from Workspace.

Synapse Environment Provisioned using scripts

Synapse Environment provisioned using the automation scripts from the GitHub Repository — https://github.com/isinghrana/my-azure-utils/tree/master/synapse tackles the above mentioned challenges and get you quickly started with a ready to user environment and the diagram below shows the configuration.

Figure 5: Synapse Workspace Environment Setup using scripts

Synapse Workspace created inside a Managed VNET:

  • Public Network Enabled
  • IP Firewall Rules applied to allow connections from approved IP Ranges (usually Corporate IP Ranges)
  • Allow Azure Services set to No
  • Workspace Managed Identity not granted permissions on the SQL Dedicated Pool (ARM Template has a parameter which you can use to grant permissions if you decide to do so).

Storage Accounts :

  • IP Firewall Rules applied to allow connections from approved IP Ranges (usually Corporate IP Ranges).
  • SQL Serverless and SQL Dedicated Pool are not natively inside a VNET, Private Endpoints just makes them look like they are in VNET but they still reside outside the VNET hence Resource Access Rules configured on Storage Accounts to allow connections from SQL Pools to Storage Account (read more here).
  • Synapse Workspace Managed Identity granted Storage Blob Data Contributor Role on the default Storage Account.
  • Managed Private Endpoints created for both Default and Secondary Storage Accounts so that connections from Spark and Synapse Pipeline Integration Runtimes can work with network protected Storage Accounts.
  • Linked Service for Secondary Storage Account setup so that Secondary Storage Account node shows up in Synapse Studio Storage Explorer.

Granting end-user permissions to use Azure Synapse Workspace

The script above provisions a functional environment but the following steps need to be taken to grant end users permissions to use Synapse Workspace components:

Credit: I would like to thank my colleague Ken Kilty for sharing his real world knowledge which greatly helped me improve the understanding of Azure Synapse and put this blog post together.

Disclaimer: Things change fast in the world of technology, I will try my best to keep this article up to date as time permits. Also, this is my personal work based on my own learning and not an officially prescribed solution by my employer. The deployment scripts are open source and you can use them as you see fit and if you notice issues please feel free to file a bug or send me a pull request on Github.

--

--