Access to Azure Data Lake Storage Gen2 using secret scope in Databricks

4 min readFeb 24, 2020

Introduction

Data lakes are used to hold massive amounts of data, it serves a quite well when it comes to Big Data. on the other hand, Databricks is Apache Spark-based unified data analytics platform — making big data simple.

Let’s see how we can connect to raw data dumped to Data Lake using Databricks secret scope. below given is file path. This file will be accessed from Databricks work space and do some transformations and save back to Azure Data Lake Storage

https://<storageaccount>.dfs.core.windows.net/<container>/<folder>/HouseData.csv

3 levels of Azure Data Lake Storage — 3 Levels of Azure Data Lake Storage

What is Secret and Secret Scope? A secret is a key-value pair that stores secret material and A secret scope is collection of secrets identified by a name. There are two types of secret scope: Azure Key Vault-backed and Databricks-backed.

We specifically see batabricks-backed scope. Before creating secret scope, we must create secret in Azure Key vault.

Step 1.

We use access keys of storage account to authenticate our applications when making requests to this Azure storage account. Access key should be copied

Go to Azure Storage account and in the left pane under settings you could find “Access Keys” . once you clicked, you will be seen that there are 2 Keys auto-generated & copy one of the key. It will be used in creating secret in Azure Key Vault.

Access Keys for the storage account in Azure

Step 2.

Go to Azure Key Vault, in the resource menu, click secrets under Settings category. Then, click + sign (Generate/Import) in the command bar. you will be prompted with below window

working pane view for creating secret in Azure Key Vault

Give unique name for name and paste copied access key in step 1 in the place of Value. leave others default and click create. you are done with creating secret for accessing storage account.

Step 3.

Navigate to properties under resource menu of Key Vault & copy DNS name and Resource ID and save it in notepad. This will be used whilst creating secret scope.

Step 4.

Go to https://<your_azure_databricks_url>#secrets/createScope

Ex- https://southeastasia.azuredatabricks.net#secrets/createScope

you will be directed to below window

Give scope name which is uniquely identified in the database maintained by Databricks, leave Manage Principal as Creator and paste copied values of DNS name & resource ID in place of DNS Name and Resource ID fields. Click Create & you will be given with success message. Remember scope name or save it in file.

Step 5.

Configuration of connection string in python notebook in Databricks work space.

spark.conf.set(“fs.azure.account.key.<storage_account>.dfs.core.windows.net”, dbutils.secrets.get(scope = “<scope_name>”, key = “<scope_key>”))

once you put storage_account, scope_name, scope_key with values we created, please execute it.

Done. Congrats! you are successfully accessed to Azure Blob Storage .

Now, we are going to create DataFrame using spark object and do simple transformation and save (with different file format) this back to Azure Data Lake which can store any type of files.

df = spark.read.csv(“abfss://<containername>@<storage_account>.dfs.core.windows.net/path/to/file”, header=True)

df1 = df.limit(10)

df1.write.format(‘parquet’).save(“abfss://<containername>@<storage_account>.dfs.core.windows.net/output”)

once it is successfully executed, you can see in the Azure storage , new folder ‘output’ created under specific container which you given.

Conclusion

This section covered basic direct access connection to storage account using secret scope in which we can be able to hide authentication details to other users. creating secret in Azure key vault and creating secret scope in Databricks are major steps. Accessed file stored in Data Lake using Spark configuration connection in Databricks & done ETL job as well.

That’s it! If you find any difficulties or need any clarifications, Feel free to leave a comment. I will try my best to clear it or other medium users might be able to help out!