Query Azure Data Lake via Synapse Serverless Security Credentials Setup

Dian Germishuizen
5 min readMay 21, 2022

--

Synapse SQL Serverless Logo

Overview

Azure Synapse Analytics Serverless SQL Endpoint has the capability to query files that are stored in an Azure Data Lake using T-SQL code as if they were regular tables in a relational database.

These files can be semi-structured or unstructured in nature.

Using the Create External Table As Select (CETAS) functionality, you can even generate new data in the Data Lake using T-SQL.

Synapse Serverless Data Lake Access via External Tables Illustration

The end user experience is very similar to simply querying a normal relational database such as SQL Server.

However, the architecture in the backend is quite different. As such, there are a few considerations to take into account when going down the route of querying files in a Data Lake via SQL Serverless.

This article will focus on the initial setup required to grant the Serverless Database permission to query the Data Lake.

Using Azure Active Directory Authentication

When you connect to the SQL Database using your Azure Active Directory account, you still need to be authorized on the Data Lake itself to access the files.

Currently, there are five methods to do this — Azure AD Identity, Shared Access Signature, Service Principle, Managed Identity, Public Access.

  • Public Access and Service Principle will not be covered in this article.

Note, the below examples assume you are querying parquet files.

Authorize Using AD Passthrough Permissions

Using AD Passthrough means your AD Account has directly been assigned the Storage Blob Data Contributor role on the Data Lake. Thus, you can instruct SQL Server to passthrough those credentials directly to the Data Lake and utilize those permissions to read the files.

Using Full File Path — NO DATASOURCE

Applies To: (1) Ad-Hoc Query with OPENROWSET using full file path , (2) External Tables

  1. Assign Storage Blob Data Contributor RBAC permissions to the AD User on the storage account via the Azure Portal.
  2. Reference full file path in OPENROWSET, don’t specify a DATASOURCE.

Using DATASOURCE

Applies To: (1) Ad-Hoc Query with OPENROWSET using a DATASOURCE, (2) External Tables

  1. Assign Storage Blob Data Contributor RBAC permissions to the AD User on the storage account via the Azure Portal.
  2. Create a DATASOURCE object referencing the path in data lake without specifying a credential. AD User’s permissions on the data lake are inherited.
  3. Reference the DATASOURCE in OPENROWSET, only specify the relative file path from the parent directory listed in the DATASOURCE in the query.

Authorize Using Synapse Workspace Managed Identity Permissions

The Azure Synapse Workspace has a Managed Identity AD Account assigned to it at creation time. You can use this account for access delegation / impersonation. This avoids granting highly privileged permissions directly to users.

Using Full File Path — NO DATASOURCE

Applies To: (1) AdHoc Query with OPENROWSET using full file path

  1. Create Server Scoped Credential that uses the workspace’s managed identity to provide access. Notice the URL of the storage account is used as the name of the credential.
  2. Assign Storage Blob Data Contributor RBAC permissions to the Workspace Managed Identity via the Azure Portal.
  3. Grant REFERENCES permissions on the Credential to any AD user that needs to use it.
  4. No need to reference the credential in OPENROWSET to gain access to files using a full file path. If the URL in the credential name matches the URL being queried, the credential will be used.

Using DATASOURCE

Applies To: (1) Ad-Hoc Query with OPENROWSET using DATASOURCE , (2) External Tables

  1. Create Database Scoped Credential that uses the workspace’s managed identity to provide access.
  2. Assign Storage Blob Data Contributor RBAC permissions to the Workspace Managed Identity via the Azure Portal.
  3. Grant REFERENCES permissions on the credential to any AD user that needs to use it.
  4. Create a DATASOURCE that uses that Database Scoped Credential for Authorization as well as Authentication. Reference a parent container and directory in the lake.
  5. Use that DATASOURCE in OPENROWSET to access files, only specifying the relative path to the directory references in the DATASOURCE.

Authorize Using SAS Key Permissions

An SAS Key is an Authentication and Authorization delegation mechanism you can utilize on Data Lakes and Storage Accounts to minimize direct access granted to users.

Using Full File Path — NO DATASOURCE

Applies To: (1) Ad-Hoc Query with OPENROWSET using full file path

  1. Create a Server Scoped Credential that uses an SAS Key generated in the storage account to provide authorization as well as authentication. Notice that the full URL to the storage account is embedded in the name of the credential. This is crucial since no data source will be used.
  2. Grant REFERENCES permissions on the Credential to any AD user that needs to use it.
  3. No need to reference the credential in OPENROWSET to gain access to files using a full file path. If the URL in the credential name matches the URL being queried, the credential will be used.

Using DATASOURCE

Applies To: (1) Ad-Hoc Query with OPENROWSET using DATASOURCE , (2) External Tables

  1. Create a Database Scoped Credential that uses an SAS Key generated in the storage account to provide authorization as well as authentication. Notice the URL to the storage account is not yet referenced in the credential. That comes later when defining the Data Source.
  2. GRANT REFERENCES permissions on the credential to any AD user that needs to use it.
  3. Create a DATASOURCE that uses the credential for access to the data lake. Reference a parent container and directory in the lake.
  4. Use that DATASOURCE in OPENROWSET to access files, only specifying the relative path to the directory references in the DATASOURCE.

Using SQL Server Authentication

When you authenticate to the Serverless SQL Server Database using a SQL Authentication user, the instructions above are all valid for the most part.

The differences are

  1. You cannot use Azure Active Directory Passthrough as the Authentication and Authorization method to the Data Lake, since you are connecting via SQL Authentication. You must use a Credential tied to either the Synapse Workspace Managed Identity, or a SAS Token.
  2. When granting the REFERENCES permissions on the CREDENTIALS, you assign it to as SQL Authentication user instead of an Azure Directory user. However, the syntax is exactly the same.

Tags

#analytics, #data, #data-lake, #sqlserver, #synapse-analytics-serverless

Thank you for reading my ramblings, if you want to you can buy me a coffee here: Support Dian Germishuizen on Ko-fi! ❤️

My Socials

My Blog: diangermishuizen.com

Linked In: Dian Germishuizen | LinkedIn

Twitter: Dian Germishuizen (@D_Germishuizen) / Twitter

Credly: Dian Germishuizen — Badges — Credly

--

--

Dian Germishuizen

I have been working in the Technology Industry as a Data Engineer since 2016. I have a passion for learning new things and sharing that knowledge with others.