Using Secrets API to Mount Azure Blob securely on Databricks File Storage

Raghav Matta
May 17 · 6 min read

In this tutorial, you will learn Databricks CLI -Secrets API to achieve the below objectives:

  • Create an Azure Storage Account using Azure Portal
  • Install and configure Databricks CLI - Secrets API
  • Mount Blob storage on your Azure Databricks File Storage

Requirements

  • Microsoft Azure subscription must be pay-as-you-go / MSDN / Trial subscription
  • An existing Databricks Workspace with a cluster, steps to create one can be found here
  • Python version 2.7. 9 and above if you’re using Python 2 or Python 3.6 and above if you’re using Python 3

What are the advantages of using Secrets API?

Databricks provide your team with a collaborative environment using Notebooks, which has many advantages but one challenge which comes with this is that you end up passing all your access secrets ( resource keys, service principal passwords, server address, etc.) between your team members.

Secrets API is a REST API 2.0 provided by Databricks CLI and helps you with below:

  • Store all your secrets under a single repository
  • Access these secrets through REST API 2.0 encrypted channel
  • No need to enter or display secrets on your Databricks Notebook
  • The secrets stored cannot be viewed once stored through CLI
  • Manage your secrets category wise under a given scope
  • Scope of the secrets is limited to given Databricks Workspace
  • Version control your Notebooks on Git Repo without pushing your Secrets

Exercise 1: Create an Azure Storage Account using Azure Portal

  • Navigate to https://portal.azure.com/ and sign in using your Microsoft account credentials with Azure Subscription
  • From your Azure dashboard Click on +Create A Resource, then scroll down and click on Storage and select Storage account -blob, file, table, and queue
Image for post
Image for post
  • Under the Basic tab in the Create Storage Account page, enter the below details:
Subscription: Select Your Azure Subscription
Resource Group(Create New): adb-secrets-rg
Storage Account Name: <Your Name>adbmount
Location: East US
Performance: Standard
Account Kind:
StorageV2
Replication:
Read-access geo-redundant storage (RA-GRS)
Access tier(default):
Hot
Image for post
Image for post
  • Click on Review + Create, and then click on Create. It will take less than 5 minutes for your deployment to complete
  • After your deployment is successful, click on Go To Resource, this will open your Azure Storage Account
  • On your Storage Account blade, from the left-hand menu, click on Access Keys under Settings
  • Copy and temporarily save the Storage Account Name and Storage Account Key1 on a Notepad in your PC
Image for post
Image for post
  • Now we will create a container and upload a sample file. Click on the Overview tab, and then click on Container
  • In the Containers tab, click on +Container, enter the Name of the container, make sure the Public access level is set to Private, and click on Create
Image for post
Image for post
  • Open the sample-container, and click on +Upload to upload a Sample file which you will be accessing through Databricks Notebook
  • Sample data used in this tutorial can be exported from the below link:
https://datahttp://data.cityofchicago.org/api/views/3i3m-jwuy/rows.csv?accessType=DOWNLOAD

Exercise 2: Install and configure Databricks CLI - Secrets API

  • From your command-line interpreter with Python installed, run the following pip command to install the Databricks CLI:
pip install databricks-cli
  • You can confirm that everything is working by running the following command:
databricks --version
  • Now, we will have to set up authentication to the Databricks workspace from the CLI. Navigate to your Databricks Workspace, and from the top right-hand side click on User Settings
Image for post
Image for post
  • Under User Settings, click on Generate New Token
Image for post
Image for post
  • Provide a Comment (optional) and set a Lifetime of the token in days and click on Generate
Image for post
Image for post
  • Make sure you copy the token and click on Done. Verify now that your token is visible under the Access Tokens tab in the User Settings
Image for post
Image for post
  • Navigate back to your command-line interpreter and type in the following command:
databricks configure --token
  • Enter your Workspace URL and Token to authenticate your CLI, as shown below:
Image for post
Image for post
  • You can confirm that you have successfully authenticated to your workspace and view all the notebooks created within your workspace by running the following command:
databricks workspace ls
  • Create a new Scope (‘secrets-demo-storage-key’) by running the following command:
secrets create-scope --scope secrets-demo-storage-key --initial-manage-principal users
Image for post
Image for post
  • Add a new Secret (‘mattadb-accesskey’) to the existing scope, by running the following command:
databricks secrets put --scope secrets-demo-storage-keys --key mattadb-accesskey
  • In the pop-up text editor, paste the Storage Account Key1 you had copied in the previous exercise and Save
Image for post
Image for post
  • You can confirm that you have successfully created Scope and added a Secret by running the following command:
databricks secrets list --scope secrets-demo-storage-keys
Image for post
Image for post
  • If you want to get some help as to what command parameters you can use in the CLI, you can run the following command:
databricks <resource> -h
  • For example, if you want to know what commands I could use when working with the Secrets API, I’d run the following command:
databricks secrets -h
Image for post
Image for post

Exercise 3: Mount Blob storage on your Azure Databricks File Storage

  • Navigate to your Databricks Workspace homepage, and click on New Notebook. Enter a Name for your Notebook, Select Language as Python and click on Create
Image for post
Image for post
  • In your Notebook, copy and paste the below command:
dbutils.fs.mount(
source = "wasbs://<ContainerName >@<StorageAccountName>.blob.core.windows.net",
mount_point = "/mnt/<MountName>/",
extra_configs = {"fs.azure.account.key.<StorageAccountName>.blob.core.windows.net":dbutils.secrets.get(scope = "<Scope Name>", key = "<Secrets Name>")})
  • Make sure you have replaced values of Storage Account Name, Container Name, Scope Name, and Secrets Name created in Exercise 1 and Exercise 2 respectively. Specify a name for the Mount Point and Press Ctrl + Enter to run the cell
dbutils.fs.mount(
source = "wasbs://sample-container@mattaadbmount.blob.core.windows.net",
mount_point = "/mnt/sample-container/",
extra_configs = {"fs.azure.account.key.mattaadbmount.blob.core.windows.net":dbutils.secrets.get(scope = "secrets-demo-storage-keys", key = "mattadb-accesskey")})
Image for post
Image for post
  • You can confirm that you have successfully mounted your container and list its content by running the following command:
%fs
ls '/mnt/sample-container/'
Image for post
Image for post
  • You can try reading the data uploaded earlier on your Blob Storage and Create a temporary view by running the following command:
crime_records = spark.read.option('header','True').option('delimiter',',').csv('/mnt/sample-container/crime_data/')
crime_records.createOrReplaceTempView('crime_records')
Image for post
Image for post
  • Try running some sample queries, and start analyzing your data
%sql
SELECT COUNT(*) as Count,`Primary Type`,Arrest FROM crime_records
GROUP BY 2,3
ORDER BY 1 DESC
Image for post
Image for post
  • Next, you can start using this Mount Point with RWX permission. To set up more granular access a file/folder level, you must have Databricks Premium Workspace and setup ACL which will be covered in the next tutorial

References:

  1. https://docs.databricks.com/dev-tools/api/latest/secrets.html#secretsecretserviceputacl
  2. https://data.cityofchicago.org/Public-Safety/Crimes-2018/3i3m-jwuy
  3. https://medium.com/@willvelida/installing-configuring-and-using-the-azure-databricks-cli-5d0381e662a1

The Startup

Medium's largest active publication, followed by +705K people. Follow to join our community.

Raghav Matta

Written by

Microsoft Certified Trainer with passion to learn and share knowledge | Expertise in Azure Data Platform | Databricks

The Startup

Medium's largest active publication, followed by +705K people. Follow to join our community.

Raghav Matta

Written by

Microsoft Certified Trainer with passion to learn and share knowledge | Expertise in Azure Data Platform | Databricks

The Startup

Medium's largest active publication, followed by +705K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store