Sharing Data between Azure Subscriptions (Azure Storage, Azure SQL or Azure Synapse)

Inderjit Rana
Microsoft Azure
Published in
8 min readFeb 27, 2021

Data Sharing in Cloud Platforms is a very big topic, Azure Data Share is a more advanced Azure service for sharing data especially between different organizations which you can read more about here but my blog post is about more basic where Data Producer (or data asset owner) makes data resource available to Data Consumer using VNET Service Endpoint or Private Endpoint outside the Data Producer Azure Subscription.

What you will learn?

I have observed this pattern emerge among large enterprises where VNET protected data assets are owned by one team and stored in one Azure Subscription (I will refer to as Data Provider Subscription) but another team which has its own Azure Subscription (I will refer to as Data Consumer Subscription) wants to perform analytics on the data residing in Data Provider subscription without explicitly copying data to Data Consumer Subscription. In relation to such a requirement I have been asked the questions several times “How can we share data between different Azure Subscriptions” and I will show some reference architectures around answers to this question.

I will share my learnings and experience on the following aspects:

  • How this kind of data sharing can be achieved?
  • Why would one pursue such multiple Azure Subscription model for Data Provider and Data Consumer?

Lastly, some of these concepts might port over to other data storage services as well but my thought process is solely based on Azure Storage, Azure SQL and Azure Synapse.

Background — VNET Protected Data Store

Private Link and VNET Service Endpoints are the two Azure Platform features used to give PaaS services look & feel of being in a private VNET. The VNET security is at times not well understood so I have the following screenshots to be super clear on what I mean by VNET Protected resource.

Azure Storage — Selected Networks only

Following screenshot shows Allow Azure Services setting for Azure SQL or Azure Synapse set to No (you can read more on this here — https://docs.microsoft.com/en-us/azure/azure-sql/database/network-access-controls-overview#allow-azure-services)

Azure SQL Network Setting — Allow Azure Services No

How this kind of data sharing can be achieved between Data Provider and Data Consumer Subscriptions?

Before I explain the solution I would like to point out Azure Subscriptions are just billing/administration boundaries and not necessarily security boundary. To illustrate the same point further I would like to highlight that:

  • If you look at the connection string for Azure SQL Database running in Azure it doesn’t have Azure Subscription name anywhere in it -
Server=tcp:<srvname>.database.windows.net,1433;Initial Catalog=<dbname>;Persist Security Info=False;User ID=<userid>;Password=<password>;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;
  • If you look at ADLS Gen2 (Azure Storage) connection string for an Azure Databricks Notebook to read/write data, it again doesn’t have Azure Subscription name in it -
readdf = spark.read.format(“<file format>”).load(“abfss://<filesys>@<storageacc>.dfs.core.windows.net/<path>”)

Solution

The two requirements for a consumer in Data Consumer Subscription to access data stored in Data Provider Azure Subscription (in fact the same requirements hold true even if the consumer is completely outside Azure):

  1. Consumer needs authentication/authorization credentials to the data store
  2. Consumer also needs network line of sight to the data store

As far the consumption from another Azure Subscription is concerned the solution lies in the detail (which is embedded in networking documentation not the most desirable place to hunt down for the Data Analytics Professionals) that VNET Service Endpoint and Private Link technologies used to allow network path to a VNET protected PaaS resources can be in any Azure Subscription irrespective of Data Provider Azure Subscription.

Updated Made September 2021: Its possible to create Private Endpoint in different Azure AD Tenant than the data resource, VNET Service Endpoints can also be created in different Azure AD Tenant than the data source for some some services.

VNET Service Private Endpoint documentation explains here certain Azure Services (not all) such as Azure Storage and Azure Key Vault also support service endpoints across different Active Directory(AD) tenants i.e., the virtual network and Azure service resource can be in different Active Directory (AD) tenants. Please check individual service documentation for more details. can as long as it is under the same Azure AD Tenant.

As mentioned in the Private Endpoint FAQ section of Public Docs, Private endpoints can connect to Private Link services or to an Azure PaaS across Azure Active Directory tenants.

Screenshot shows that VNET Service Endpoint creation form has Azure Subscription selection as the first drop down (similar selection is available even when creating Private Endpoint).

VNET Service Endpoint or Private Endpoint — Any Subscription in same Azure AD Tenant

VNET Service Endpoint can be in different Subscription documented in Configuration section — https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview#secure-azure-service-access-from-on-premises

Of course, appropriate permissions will be required — https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview#provisioning

Both VNET Service Endpoint and Private Endpoint are secure methods, traffic between your virtual network and the service travels the Microsoft backbone network. Also, you can create many VNET Service Endpoints or Private Endpoints on one resource to account for multiple Data Consumer Subscriptions.

Example 1

VNET Protected Azure Storage in one Azure Subscription can add a Private Link or VNET Service Endpoint for a VNET in a completely different Azure Subscription hence allowing network line of sight for resources in a VNET in Data Consumer Subscription to Azure Storage in Data Provider Azure Subscription.

Simple example Azure Databricks in Consumer Subscription but Storage in Data Provider Azure Subscription. It’s important to understand that Azure Databricks must be created in a customer owned VNET as documented here — https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject

Storage Sharing using VNET Service Endpoint
Storage Sharing using Private Endpoint

Example 2

VNET Protected Azure SQL or Azure Synapse in Data Provider Azure Subscription and Azure Databricks or a VM or any other resource in a VNET in Data Consumer Subscription. Please note that for Azure Synapse Workspace only supports Private Endpoint and VNET Service Endpoint is not available, for Azure SQL or Azure Synapse (formerly SQL DW) both VNET Service Endpoint and Private Endpoint are supported.

SQL Sharing

Why would one pursue such multiple Azure Subscription model for data provider and consumer?

The answer is cost chargeback on how individual groups or business units within a large enterprise pay the bills for Azure Platform usage (same pattern can also be expanded to external customers by Data Provider by inviting to them to their Azure AD Tenant using as guests). In Cloud Platform usually storage costs are lower and it’s the compute which costs more, I will dissect this statement little bit more as things can vary between storage services and also separation between storage and compute can be interpreted in multiple ways.

Example 1

The most common pattern I want to highlight for this discussion is where Azure Storage (or Azure Data Lake Gen2) is in the Data Producer Azure Subscription (Data Lake setup by the central team) and Azure Databricks in Data Consumer Azure Subscription (business groups).

When this pattern of separate Azure Subscription for Consumer (Azure Databricks) is used, chargeback for compute are directly billed to the consumer. Although Storage is relatively inexpensive in comparison to compute, ideally the Data Producers want to chargeback the consumer groups for the Storage as well, this is little bit more tricky but still there are options to reach some level of approximation (not super accurate though):

  • Simple option is to divide the costs on some proportion among the consumers.
  • More involved method — Exclusively use Azure AD for all interactions to Azure Storage from consumer tools, enable Diagnostic Settings on Storage Accounts to send all Storage Transactions to Log Analytics so that logs record can be analyzed to bill the users based on their actual usage:

Read More -

Note: Resource Groups can also be used instead of Azure Subscription for very similar separation between Data Producer and Data Consumer for charge back but in my experience it is more common in large enterprises to use Azure Subscription as clear boundary for separation.

Example 2

When this pattern of separate Data Provider and Consumer Subscription is used for Azure SQL or Azure Synapse data stores, storage is somewhat coupled with compute as far as chargeback is concerned. You will find documentation where it says Storage and Compute are separate for Azure SQL or Synapse because compute can be scaled up or down or in case of Synapse it can be paused, for our charge back conversation here I am using the word storage and compute are coupled because Compute of the SQL or Synapse data store cannot be in the Consumer Subscription. In this case the data store costs are higher than the plain Azure Storage, hence the desire to charge back the consumer will be even more stronger so the solution here again is very similar to the one used for Azure Storage in the previous example, either charge back using proportion or based on the usage recorded in Audit Logs similar to Azure Storage.

  • Instructions to enable Audit Logs on Azure SQL or Azure Synapse (formerly SQL DW) are documented here — https://docs.microsoft.com/en-us/azure/azure-sql/database/auditing-overview
  • Azure Synapse Analytics (Workspace) only allows auditing to Azure Storage at the time of writing this blog, I am sure option to send logs to Log Analytics will come later but if your need is more immediate then you will need to run an analytics engine like Azure Databricks.

Azure Synapse Private Link Hubs

Azure Synapse Workspace is relatively new and I don’t have experience with Azure Synapse Private Link Hubs but that also seems to be an interesting option which can allow Cross-Subscription access even among different Azure AD Tenants so I wanted to mention that over here. Please feel read more here — https://docs.microsoft.com/en-us/azure/synapse-analytics/security/synapse-private-link-hubs

--

--