Creating a Mount Point with SAS Token for ADLS Gen2 in Databricks

Rahul Gosavi
4 min readJul 5, 2024

--

Azure Databricks provides a powerful platform for big data analytics and machine learning. One common use case is accessing Azure Data Lake Storage (ADLS) Gen2 from Databricks clusters securely using a SAS (Shared Access Signature) token. This blog post will guide you through the process of setting up a mount point in Databricks with ADLS Gen2 using a SAS token.

Prerequisites

Before we begin, make sure you have the following set up:

  1. Azure Databricks Workspace: Access to an Azure Databricks workspace with the necessary permissions to create secrets and mount points.
  2. Azure Data Lake Storage (ADLS) Gen2: An ADLS Gen2 storage account created and configured with a container where data resides.
  3. Azure Key Vault: A Key Vault in Azure to securely store and manage the SAS token.

Step 1: Creating a Secret Scope in Databricks

Secret scopes in Databricks are used to store secrets securely, such as SAS tokens or other credentials.

Navigate to Databricks Workspace:

  • Go to your Azure Databricks workspace.
  • In the URL of your browser, enter https://<databricks-instance>#secrets/createScope.
  • Replace <databricks-instance> with your Databricks workspace URL.
  • Get the Resource-Id and Vault URI or DNS Name from Azure portal.

Create a New Secret Scope:

  • Provide a name for your scope (e.g., Azure-KeyVault) and a brief description.
  • Click Create to create the secret scope.

Step 2: Integrating Azure Key Vault with Databricks

Integrating Azure Key Vault allows Databricks to securely access secrets such as SAS tokens.

Grant Permissions to Databricks:

  • In the Azure portal, navigate to your Key Vault.
  • Go to Access Control (IAM) under Settings.
  • Click on Add role assignment.
  • Select a role (e.g., Key Vault Secrets User or Key Vault Contributor depending on your organization’s policies).
  • Under Select, search and select your Databricks service principal. It Could starts with AzureDatabricks-<workspace-name>
  • Click Review & Assign to grant permissions.

Step 3: Mounting ADLS Gen2 in Databricks Using a SAS Token

Now that we have set up the environment and integrated Key Vault, let’s proceed with mounting ADLS Gen2 using a SAS token.

Generate a SAS Token:

  • In the Azure portal, navigate to your ADLS Gen2 account.
  • Go to Shared access signature under Data management.
  • Configure the necessary permissions and expiry for your SAS token.
  • Copy the generated SAS token.

Store SAS Token in Azure Key Vault:

  • In the Azure portal, navigate to your Key Vault.
  • Go to Secrets under Settings.
  • Click Generate/Import to create a new secret.
  • Enter a name for the secret (e.g., sastokenforstorage1) and paste the SAS token value in the Value field.
  • Set any necessary properties (e.g., expiry date, activation date).
  • Click Create to store the SAS token securely in Key Vault.

Write PySpark Code:

  • Open a new or existing notebook in your Databricks workspace.
# Replace placeholders with actual values
storageContainer = 'landing'
storageAccount = '<your-storage-account-name>'
landingMountPoint = '/mnt/landing'
databricksScopeName = 'Azure_KeyVault'
StorageSAStoken = 'sastokenforstorage1'

# Retrieve SAS token from Azure Key Vault via Databricks secret scope

# Mount ADLS Gen2 container to DBFS

dbutils.fs.mount(
source='wasbs://{}@{}.blob.core.windows.net'.format(storageContainer, storageAccount),
mount_point=landingMountPoint,
extra_configs={'fs.azure.sas.{}.{}.blob.core.windows.net'.format(storageContainer, storageAccount): dbutils.secrets.get(scope="adls-scope", key=StorageSAStoken)}
)
print('Mounted the storage account successfully')

OR

# Replace placeholders with actual values
storageContainer = 'landing'
storageAccount = '<your-storage-account-name>'
landingMountPoint = '/mnt/landing'
databricksScopeName = 'Azure_KeyVault'
StorageSAStoken = 'sastokenforstorage1'

# Retrieve SAS token from Azure Key Vault via Databricks secret scope

sas_token = dbutils.secrets.get(scope="adls-scope", key=StorageSAStoken)

# Mount ADLS Gen2 container to DBFS

dbutils.fs.mount(
source='wasbs://{}@{}.blob.core.windows.net'.format(storageContainer, storageAccount),
mount_point=landingMountPoint,
extra_configs={'fs.azure.sas.{}.{}.blob.core.windows.net'.format(storageContainer, storageAccount): sas_token}
)
print('Mounted the storage account successfully')

Explanation:

  • storageAccount: Your ADLS Gen2 storage account name.
  • storageContainer: Name of the container in ADLS Gen2 where your data resides.
  • StorageSAStoken: Name of the secret in Azure Key Vault where the SAS token is stored (adls-sas-token in this example).
  • "/mnt/<mount_point_name>": Desired mount point path in Databricks.
  • extra_configs: Configuration dictionary containing the SAS token.

Run the PySpark Code:

  • Execute the code cell in your Databricks notebook.
  • Verify that the mount point (/mnt/<mount_point_name>) is successfully created and accessible.
  • And now you can read the files and directories inside mount point.

Conclusion

In this blog post, I’ve covered the step-by-step process of setting up a mount point in Azure Databricks to access ADLS Gen2 securely using a SAS token stored in Azure Key Vault. By storing sensitive information like SAS tokens in Key Vault and securely integrating with Databricks, you ensure data access is managed according to best practices and security standards. This setup empowers you to leverage Azure Databricks for data processing and analytics seamlessly with Azure Data Lake Storage.

--

--