Using Secrets API to Mount Azure Blob securely on Databricks File Storage

Raghav Matta
The Startup
Published in
6 min readMay 17, 2020

In this tutorial, you will learn Databricks CLI -Secrets API to achieve the below objectives:

  • Create an Azure Storage Account using Azure Portal
  • Install and configure Databricks CLI - Secrets API
  • Mount Blob storage on your Azure Databricks File Storage

Requirements

  • Microsoft Azure subscription must be pay-as-you-go / MSDN / Trial subscription
  • An existing Databricks Workspace with a cluster, steps to create one can be found here
  • Python version 2.7. 9 and above if you’re using Python 2 or Python 3.6 and above if you’re using Python 3

What are the advantages of using Secrets API?

Databricks provide your team with a collaborative environment using Notebooks, which has many advantages but one challenge which comes with this is that you end up passing all your access secrets ( resource keys, service principal passwords, server address, etc.) between your team members.

Secrets API is a REST API 2.0 provided by Databricks CLI and helps you with below:

  • Store all your secrets under a single repository
  • Access these secrets through REST API 2.0 encrypted channel
  • No need to enter or display secrets on your Databricks Notebook
  • The secrets stored cannot be viewed once stored through CLI
  • Manage your secrets category wise under a given scope
  • Scope of the secrets is limited to given Databricks Workspace
  • Version control your Notebooks on Git Repo without pushing your Secrets

Exercise 1: Create an Azure Storage Account using Azure Portal

  • Navigate to https://portal.azure.com/ and sign in using your Microsoft account credentials with Azure Subscription
  • From your Azure dashboard Click on +Create A Resource, then scroll down and click on Storage and select Storage account -blob, file, table, and queue
  • Under the Basic tab in the Create Storage Account page, enter the below details:
Subscription: Select Your Azure Subscription
Resource Group(Create New): adb-secrets-rg
Storage Account Name: <Your Name>adbmount
Location: East US
Performance: Standard
Account Kind:
StorageV2
Replication:
Read-access geo-redundant storage (RA-GRS)
Access tier(default):
Hot
  • Click on Review + Create, and then click on Create. It will take less than 5 minutes for your deployment to complete
  • After your deployment is successful, click on Go To Resource, this will open your Azure Storage Account
  • On your Storage Account blade, from the left-hand menu, click on Access Keys under Settings
  • Copy and temporarily save the Storage Account Name and Storage Account Key1 on a Notepad in your PC
  • Now we will create a container and upload a sample file. Click on the Overview tab, and then click on Container
  • In the Containers tab, click on +Container, enter the Name of the container, make sure the Public access level is set to Private, and click on Create
  • Open the sample-container, and click on +Upload to upload a Sample file which you will be accessing through Databricks Notebook
  • Sample data used in this tutorial can be exported from the below link:
https://datahttp://data.cityofchicago.org/api/views/3i3m-jwuy/rows.csv?accessType=DOWNLOAD

Exercise 2: Install and configure Databricks CLI - Secrets API

  • From your command-line interpreter with Python installed, run the following pip command to install the Databricks CLI:
pip install databricks-cli
  • You can confirm that everything is working by running the following command:
databricks --version
  • Now, we will have to set up authentication to the Databricks workspace from the CLI. Navigate to your Databricks Workspace, and from the top right-hand side click on User Settings
  • Under User Settings, click on Generate New Token
  • Provide a Comment (optional) and set a Lifetime of the token in days and click on Generate
  • Make sure you copy the token and click on Done. Verify now that your token is visible under the Access Tokens tab in the User Settings
  • Navigate back to your command-line interpreter and type in the following command:
databricks configure --token
  • Enter your Workspace URL and Token to authenticate your CLI, as shown below:
  • You can confirm that you have successfully authenticated to your workspace and view all the notebooks created within your workspace by running the following command:
databricks workspace ls
  • Create a new Scope (‘secrets-demo-storage-key’) by running the following command:
secrets create-scope --scope secrets-demo-storage-key --initial-manage-principal users
  • Add a new Secret (‘mattadb-accesskey’) to the existing scope, by running the following command:
databricks secrets put --scope secrets-demo-storage-keys --key mattadb-accesskey
  • In the pop-up text editor, paste the Storage Account Key1 you had copied in the previous exercise and Save
  • You can confirm that you have successfully created Scope and added a Secret by running the following command:
databricks secrets list --scope secrets-demo-storage-keys
  • If you want to get some help as to what command parameters you can use in the CLI, you can run the following command:
databricks <resource> -h
  • For example, if you want to know what commands I could use when working with the Secrets API, I’d run the following command:
databricks secrets -h

Exercise 3: Mount Blob storage on your Azure Databricks File Storage

  • Navigate to your Databricks Workspace homepage, and click on New Notebook. Enter a Name for your Notebook, Select Language as Python and click on Create
  • In your Notebook, copy and paste the below command:
dbutils.fs.mount(
source = "wasbs://<ContainerName >@<StorageAccountName>.blob.core.windows.net",
mount_point = "/mnt/<MountName>/",
extra_configs = {"fs.azure.account.key.<StorageAccountName>.blob.core.windows.net":dbutils.secrets.get(scope = "<Scope Name>", key = "<Secrets Name>")})
  • Make sure you have replaced values of Storage Account Name, Container Name, Scope Name, and Secrets Name created in Exercise 1 and Exercise 2 respectively. Specify a name for the Mount Point and Press Ctrl + Enter to run the cell
dbutils.fs.mount(
source = "wasbs://sample-container@mattaadbmount.blob.core.windows.net",
mount_point = "/mnt/sample-container/",
extra_configs = {"fs.azure.account.key.mattaadbmount.blob.core.windows.net":dbutils.secrets.get(scope = "secrets-demo-storage-keys", key = "mattadb-accesskey")})
  • You can confirm that you have successfully mounted your container and list its content by running the following command:
%fs
ls '/mnt/sample-container/'
  • You can try reading the data uploaded earlier on your Blob Storage and Create a temporary view by running the following command:
crime_records = spark.read.option('header','True').option('delimiter',',').csv('/mnt/sample-container/crime_data/')
crime_records.createOrReplaceTempView('crime_records')
  • Try running some sample queries, and start analyzing your data
%sql
SELECT COUNT(*) as Count,`Primary Type`,Arrest FROM crime_records
GROUP BY 2,3
ORDER BY 1 DESC
  • Next, you can start using this Mount Point with RWX permission. To set up more granular access a file/folder level, you must have Databricks Premium Workspace and setup ACL which will be covered in the next tutorial

References:

  1. https://docs.databricks.com/dev-tools/api/latest/secrets.html#secretsecretserviceputacl
  2. https://data.cityofchicago.org/Public-Safety/Crimes-2018/3i3m-jwuy
  3. https://medium.com/@willvelida/installing-configuring-and-using-the-azure-databricks-cli-5d0381e662a1

--

--

Raghav Matta
The Startup

Microsoft Certified Trainer with passion to learn and share knowledge | Expertise in Azure Data Platform | Databricks