Firewall Protected Azure Storage and Azure SQL Database — Load Data using T-SQL BULK INSERT command

Inderjit Rana
Microsoft Azure
Published in
4 min readMay 28, 2020

*******************

Important Note Jan 29, 2021: I have been informed that this method of using Managed Identity to interact with network protected storage account for Bulk Insert is no longer supported and can result in errors, the following official documentation calls out explicitly that this is not supported — https://docs.microsoft.com/en-us/azure/azure-sql/database/doc-changes-updates-release-notes?tabs=single-database#bulk-insert-and-backuprestore-statements-cannot-use-managed-identity-to-access-azure-storage

*******************

In this post I will share detailed instructions on how to load data from Firewall Protected Azure Storage to Azure SQL Database using Bulk Insert T-SQL command. Azure Data Factory is another mechanism to perform such loads but at times it is handy to be able to do a quick load from the command line using sqlcmd or SQL Server Management Studio.

Architecture

It’s a very common requirement especially among customers in Regulated Industry that the Azure Storage Account and Azure SQL Database are firewall protected. To be more specific, customers usually favor running these Azure Services in a configuration with strict network access controls where they explicitly whitelist (allow) connections from within the customer owned VNETs in Azure or specific list of IP Ranges (commonly NAT IP Ranges for the Corporate Data Center). It’s important to note that I have validated the method documented here with VNET Service Endpoint method of applying network controls and not the relatively new Private Link (I will try to test it out Private Link over the next few weeks and update the post with my findings). In case you are interested in understanding differences between those methods please refer to my previous post Azure SQL network settings (Private Link, VNET Service Endpoint) and Azure Data Factory ). Below is the diagram showing architecture.

Summary of the Solution

Following is summary of the solution:

  1. Azure Storage Account and Azure SQL Database Firewall Settings: VNET of VM where sqlcmd command or SQL Serer Management Studio is being run need to be white-listed on both Storage Account and Azure SQL DB (see the screenshots below).
Azure Storage Account Firewall Settings
Azure SQL Database Firewall Settings

Note - Same principles should apply for any resource inside a VNET used to run the SQL Command, it doesn’t have to be a VM it can be a Docker Container running on ACI or AKS in a VNET whitelisted on the Storage and SQL Database.

2. Enable Managed Identity on the Azure SQL Database: This is the key element and often missed step because currently this can only be done using command line (and not from Azure Portal). The following link is for Synapse but Step 1 & 2 need to be followed to enable Managed Identity on Azure SQL DB — https://docs.microsoft.com/en-us/azure/azure-sql/database/vnet-service-endpoint-rule-overview#impact-of-using-vnet-service-endpoints-with-azure-storage [Updated July 21, 2020 — Fixed the out of date link]

3. Grant Azure SQL Database appropriate permissions on Azure Storage Account: Azure SQL Database Managed Identity needs to be granted Storage Blob Data Contributor role on the Azure Storage Account. Please refer to the instructions here — https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-rbac-portal?toc=/azure/storage/blobs/toc.json#assign-rbac-roles-using-the-azure-portal

4. Load the data file to Azure Storage Account and perform the load: Follow the Bulk Insert instructions from the following document (Examples section with title F. Importing data from a file in Azure blob storage) — https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15#examples

Test Scripts

I am sharing test scripts and data if you would like to perform a test run in your own environment. Please ensure the above setup is in place before doing a test run in your environment.

  1. Create CSV file and load it to Azure Storage Account (please be mindful of Endof Line characters when you copy/paste, I used Notepad++ on a Windows machine so used CRLF as the End of Line)
10165,Adak Island,AK,Adak
10299,Anchorage,AK,Ted Stevens Anchorage Internationa
10304,Aniak,AK,Aniak Airport
10754,Barrow,AK,Wiley Post/Will Rogers Memorial
10551,Bethel,AK,Bethel Airport
10926,Cordova,AK,Merle K Mudhole Smith
14709,Deadhorse,AK,Deadhorse Airport

2. Create Table in Azure SQL Database

CREATE TABLE [dbo].[airports](
[airport_id] [int] NULL,
[city] [varchar](50) NULL,
[state] [varchar](50) NULL,
[name] [varchar](50) NULL
) ON [PRIMARY]

3. Run the following commands replacing appropriate values for your environment using sqlcmd on a command line or SQL Management Studio

CREATE MASTER KEY ENCRYPTION BY PASSWORD = <<YourStrongPassword1>>'; 
GO

CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Identity';
GO

CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage2 WITH ( TYPE = BLOB_STORAGE, LOCATION = 'https://<<stgacct>>.blob.core.windows.net/<<stgacctcontainername>>' , CREDENTIAL= msi_cred);
GO

BULK INSERT dbo.airports FROM 'airports_noheader.csv' WITH (DATA_SOURCE = 'MyAzureBlobStorage2', FIELDTERMINATOR=',')
GO

Related Posts

--

--