Azure SQL network settings (Private Link, VNET Service Endpoint) and Azure Data Factory

Inderjit Rana
Microsoft Azure
Published in
10 min readApr 13, 2020

Azure SQL Database has a few extra settings on the Firewalls and Virtual Networks tab in addition to Private Link and VNET Service Endpoint which might not be very clear so in this blog post I will be discussing these settings in detail and their impact on how Azure Data Factory will communicate with Azure SQL Database within the Azure environment. If you are just trying to understand how these settings play together in general sense and Azure Data Factory is not relevant for you following would be a really good article to go through — https://techcommunity.microsoft.com/t5/azure-database-support-blog/lesson-learned-126-deny-public-network-access-allow-azure/ba-p/1244037

Please note that when I use the term Azure SQL Database it is applicable to both Azure SQL Database as well as Azure Synapse (formerly Azure SQL Data Warehouse). Lastly, I will add a disclaimer that things change at a fast pace so the information/screenshots are accurate at the time of writing this post but might get outdated in near future and I will do my best to come back to update.

Updates on May 7, 2020

This article focuses on moving data from Azure Storage to Azure SQL Database, I am not addressing data movement from on-premises to Azure in this article. The simpler technique used by quite a few customers is just to use AzCopy cross-platform command line utility for transferring data from on-premises to Azure Storage and then handle the load to Azure SQL Database using Azure Data Factory.

Updates on July 27, 2020

I highly encourage you to read the new Managed VNET for ADF feature which was very recently released in Preview.

Azure Data Factory Mapping Data Flows

Azure Data Factory has a relatively new component for performing transformations referred to as Mapping Data Flows, b̵u̵t̵ ̵t̵h̵i̵s̵ ̵a̵r̵t̵i̵c̵l̵e̵ ̵h̵a̵s̵ ̵n̵o̵t̵ ̵b̵e̵e̵n̵ ̵v̵e̵r̵i̵f̵i̵e̵d̵ ̵f̵o̵r̵ ̵D̵a̵t̵a̵ ̵F̵l̵o̵w̵s̵ ̵c̵o̵m̵p̵o̵n̵e̵n̵t̵ ̵o̵f̵ ̵A̵D̵F̵,̵ ̵v̵e̵r̵y̵ ̵s̵o̵o̵n̵ ̵I̵ ̵p̵l̵a̵n̵ ̵t̵o̵ ̵u̵p̵d̵a̵t̵e̵ ̵t̵h̵e̵ ̵a̵r̵t̵i̵c̵l̵e̵ ̵o̵n̵c̵e̵ ̵I̵ ̵v̵a̵l̵i̵d̵a̵t̵e̵ ̵i̵f̵ ̵t̵h̵e̵ ̵s̵a̵m̵e̵ ̵c̵o̵n̵c̵e̵p̵t̵s̵ ̵a̵p̵p̵l̵y̵ ̵t̵o̵ ̵D̵a̵t̵a̵ ̵F̵l̵o̵w̵s̵ ̵a̵s̵ ̵w̵e̵l̵l̵ , at this time Azure SQL Database needs to have Deny Public Network Access set to No and Allow All Azure Services Set to Yes for Data Flows to be able to communicate with Azure SQL Database. The ADF Self-Hosted Integration Runtime is not an option for Data Flow at this time. The best guidance I can provide if you need to use Mapping Data Flows is to read/write to Azure Storage Account and then use regular ADF copy activity to load to database. The Storage account can be configured to allow only specific networks, as long as the All Microsoft Trusted Services checkbox is checked and Managed Identity is used Mapping Data Flow will be able to communicate with the Storage Account.

Background

Azure SQL Database is an Azure PaaS Services and by default not launched in Private VNET within customer Azure Subscription but there are methods to apply network controls to make it look and feel like private resource meeting the most strict security requirements for majority of corporate environments. Please keep in mind that we are strictly talking Network Layer Security here, access controls is another layer of security that uses authentication and authorization which is a different layer of security. Majority of corporate customers especially in Regulated Industry like to have tight network security controls in place so that even if somebody gains access to the credentials there are network boundaries in place to provide an extra layer of protection.

Concepts and Terminology

VNET Service Endpoint

VNET Service Endpoint does not create any Private IP for Azure SQL Database but it allows customer to define the network boundary letting them whitelist who has network path open to connect to the Azure SQL Database:

1. Explicit List of Public IPs

2. Specific VNETs within customer owned Subscriptions

You can read more about VNET Service Endpoint on the public documentation page — https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview

Private Link

Private Link is a relatively new technology and associates a Private IP in a chosen customer VNET for the Azure SQL Database hence effectively bringing the Azure SQL Database inside a customer VNET. Couple very important aspects to keep in mind:

You can read more about the Private Link on the public documentation page — https://docs.microsoft.com/en-us/azure/private-link/private-link-overview

Azure Data Factory Integration Runtime

Azure Data Factory components do require a compute infrastructure to run on and this is referred to as Integration Runtime. The ADF Pipeline components do require developers to specify Integration Runtime. Following are the two most common Integration Runtimes used in ADF

  1. Azure Integration Runtime — This provides the true PaaS Service experience where compute infrastructure is managed by Azure Platform and does not require any Virtual Machines managed by customer.
  2. Self-Hosted Integration Runtime — The primary use case of this Integration Runtime was for corporate network where it runs on servers inside Customer Data Center and moves data from on-premises data sources to Azure Data Services. It might come as a surprise to quite a few folks that if really tight security controls are applied on Azure Data Sources this Integration Runtime might need to installed on Azure Virtual Machine inside a VNET for ADF to communicate with Azure native data service like Azure SQL Database, Azure Storage, etc. If you are applying really tight network controls on Azure Data Services you are pretty much simulating your very locked down corporate network and hence you might Self-Hosted Integration Runtime on VMs in Azure.

You can read about Azure Data Factory Integration Runtime here — https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime

Client communication Requirements to be Satisfied

As far the network communication goes the clients/consumers of Azure SQL Database can be divided into two categories :

  1. How on-premises users on corporate network will access Azure SQL Database?
  2. How other resources like Azure Data Factory inside Azure boundary will access Azure SQL Database?

Azure SQL Database Network Settings

Following screenshot shows the Firewall and virtual networks tab for the Logical Sql Server for and Azure SQL Database.

Azure SQL Database — Firewall and Virtual Networks

I will be addressing how the following four different settings play together:

  1. Deny public network access
  2. Allow Azure Services and resources to access this server — In many cases setting Yes is more permissive than most customers want because any resource within the Azure boundary irrespective of Subscription or Customer can access Azure SQL Database provided it has the credentials to authenticate. Please read in more detail here — https://docs.microsoft.com/en-us/azure/sql-database/sql-database-networkaccess-overview#allow-azure-services
  3. White listed IP Addresses — This is the box with Rule Name, Start IP and End IP
  4. White listed Virtual Network — This the bottom box with Rule Name, Virtual Network, Subnet and Address Range

I am summarizing the settings and the impact in the following table and sections below address each combination of settings (row from the table) in detail.

Summary of Settings and Impact

Deny Public Network Access set to Yes

Setting Deny Public Network Access to Yes completely disables the Public Endpoint for the Azure SQL Database and the result is that Azure SQL Database connectivity is completely closed and only way to access the Azure SQL Database is to create a Private Endpoint for Azure SQL Database.

  • Private Endpoints are always created inside a VNET so only the resources from within that VNET or peered VNET can access the Azure SQL Database.
  • On-Premises users need to access the Azure SQL Database through Express Route Private Peering or VPN.

Azure Data Factory Connectivity in this configuration

Azure Data Factory Azure Integration Runtime is not inside a VNET so it cannot be used hence if ADF needs to read or write to Azure SQL Database Self-Hosted Integration Runtime will need to be used.

The other settings (2), (3) and (4) do not matter at all hence can be ignored. It is important to point out that Allow Azure Services (2) is completely overridden and does not apply at all.

Deny Public Network Access set to No and Allow Azure Services set to No

Setting Deny Public Network Access to No might be mistakenly perceived as not secure but that is not the case at all, having a Public Endpoint does not mean it’s open to the world. Even if the person trying to access Azure SQL Database has the credentials to authenticate, network controls are in place by default and clients need to be white listed to allow connections. Two categories for white listing:

  1. Explicit List of IP address ranges — This can be used to allow on-premises users, it’s important to understand that most corporations use Network Address Translation (NAT) so that when a user inside a corporate network connects to a resource on the internet the Public IP seen by the destination is a finite list of Public IP Ranges owned by the corporation. So, in simpler terms corporation NAT IP Range is white listed on the Azure SQL Database thus restricting access to users coming only from the corporate network. Further Summarization, Public Endpoint exists for Azure SQL Database but connections allowed only from corporate network, traffic goes over internet but its encrypted in transit (TLS) so the setup is pretty secure.
  2. Specific Customer Owned VNETs — This is used to define specific subnets within a VNET which can connect to the Azure SQL Database

Azure Data Factory Connectivity to Azure SQL Database in this configuration

As you remember Azure Data Factory Azure Integration Runtime is not inside a VNET so by default it cannot connect to your Azure SQL Database. You can use ADF Self-Hosted Integration Runtime in Azure VMs but if you desire to avoid VM (which result in higher operational costs) the best you can do is to whitelist the IP Ranges for Azure Data Factory Integration Runtime on your Azure SQL Database, architecture diagrams for both of the design shown below. The following link discusses Azure Data Factory Azure Integration Runtime IP Ranges - https://docs.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses The exact IP Ranges will need to be looked up from the JSON file for which the link is available here — https://docs.microsoft.com/en-us/azure/virtual-network/service-tags-overview#discover-service-tags-by-using-downloadable-json-files The IP Ranges don’t change frequently but still it would be good idea to make sure the IP Rules are not outdated, Azure Powershell and Azure CLI programmatic method of downloading IP Ranges is in Preview. I would reiterate that Managed VNET for ADF feature would be preferred solution rather than going in this direction of IP Whitelisting.

For example, if you are using Azure SQL Database in East US and the Azure Integration Runtime which needs to connect is in East US white list the IP Range of Azure Data Factory Integration Runtime IP Range for East US.

Azure SQL Database Explicit IP White listing section on Azure Portal requires Start and End IP Addresses, you can use the following tool to convert the CIDR Range specified on the above link to Start and End IP address format — https://www.ipaddressguide.com/cidr

Please note that Azure Data Factory Integration Runtime IP Range is for Integration Runtime in that region irrespective of customer subscription, tenant, etc. so this solution might not be most restrictive network control but still reduces the risk surface considerably. Following are the other measures which can be taken to further enhance the security of Azure SQL Database but its customer who needs to determine if this is sufficient to meet organizations security requirements.

  1. Advanced Threat Protection — https://docs.microsoft.com/en-us/azure/sql-database/sql-database-threat-detection
  2. Azure AD Authentication — Limit the use of SQL Authentication, instead use Azure AD Authentication with MFA and/or Conditional Access wherever possible (Azure AD MFA — https://docs.microsoft.com/en-us/azure/active-directory/authentication/concept-mfa-howitworks , Azure AD Conditional Access — https://docs.microsoft.com/en-us/azure/active-directory/conditional-access/overview)

Deny Public Network Access set to No and Allow Azure Services set to Yes

Any client trying to connect from outside Azure would still need to be white listed by IP Range on the Azure SQL Database but this configuration allows communication from all resources inside the Azure boundary (they may or not be part of the customer’s subscription). In many cases this configuration is more permissive than most customer want. Please read in more detail on the following documentation link — https://docs.microsoft.com/en-us/azure/sql-database/sql-database-networkaccess-overview#allow-azure-services

Azure Data Factory Connectivity to Azure SQL Database in this configuration

Azure Data Factory Azure Integration Runtime can connect with Azure SQL Database without any additional configuration on the Azure SQL Database.

Using Private Link and VNET Service Endpoint Together

Private Link and VNET Service Endpoint can be used together as well. When Deny Public Network Access setting is No it is still possible to use Private Link. This would result in both Public and Private Endpoints to be operational, depending on the DNS settings of the network you are connecting from client will communicate with Public or Private Endpoint. This might not be the most common setup but there are use cases where you might choose to go this route. For example, access within Azure uses Public Endpoint using VNET Service Endpoint because the traffic is internal to Azure Network but on-premises access is over a VPN using the Private Endpoint.

Related Links

--

--