Secret Management in Azure Databricks

Ruwan Sri Wickaramarathna
4 min readNov 29, 2018

--

photo courtesy https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks

Introduction

Databricks Unified Analytics Platform, from the original creators of Apache Spark, unifies data science and engineering across the Machine Learning lifecycle from data preparation, to experimentation and deployment of Machine Learning applications.

Sometimes accessing data requires that you authenticate to external data sources. Instead of directly entering your credentials into a notebook, use Azure Databricks secrets to store your credentials and reference them in notebooks and jobs.

Databricks has introduced Secret Management, which allows users to leverage and share credentials within Databricks in a secured manner.

Overview

There are two types of secret scopes,

  1. Azure Key Vault-backed : To reference secrets stored in an Azure Key Vault, you can create a secret scope backed by Azure Key Vault. You can then leverage all of the secrets in the corresponding Key Vault instance from that secret scope. Because the Azure Key Vault-backed secret scope is a read-only interface to the Key Vault, the PutSecret and DeleteSecret Secrets API operations are not allowed. To manage secrets in Azure Key Vault, you must use the Azure SetSecret REST API or Azure portal UI.
  2. Databricks-backed : A Databricks-backed scope is stored in (backed by) an Azure Databricks database. You create a Databricks-backed secret scope using the Databricks CLI (version 0.7.1 and above). Alternatively, you can use the Secrets API.

But in this article, the focus will be on Databricks-backed secrets.

How it works

Concepts

  • Secret Scopes : Managing secrets begins with creating a secret scope. A secret scope is identified by its name, unique within a workspace. The names are considered non-sensitive and are readable by all users in the workspace. A workspace is limited to a maximum of 100 scopes.(This act as the logical grouping for the secrets. For example, one scope to hold JDBC credentials , one scope to hold blob storage credentials, likewise.)
  • Secrets : A secret is a key-value pair that stores secret material, with a key name unique within a secret scope. Each scope is limited to 1000 secrets. The maximum allowed secret value size is 128 KB.(Here all the credential information lies. It assigned to a scope.)
  • Secret ACLs : By default, all users in all pricing plans can create secrets and secret scopes. Using secret access control, available with the Azure Databricks Premium Plan, you can configure fine-grained permissions for managing access control. This guide describes how to set up these controls.(Here we can define the access to the secret-scopes.)

Steps

1 : Setting up Databricks-CLI on Machine

1.1 Requirements and limitations

  • Python 2–2.7.9 and above
  • Python 3–3.6 and above

1.2 Install CLI

  • Open Command Prompt (cmd)
  • Run pip install databricks-cli using the appropriate version of pip for your Python installation. If you are using Python 3, run pip3.

1.3 Authentication

To authenticate and access Databricks REST APIs, you use personal access tokens. Tokens are similar to passwords; you should treat them with care. Tokens expire and can be revoked.

Requirements : Your administrator must enable personal access tokens for your organization’s Azure Databricks account.

Generate token :

  • Click the user profile icon User Profile in the upper right corner of your Azure Databricks workspace.
  • Click User Settings.
  • Go to the Access Tokens tab.
photo courtesy https://docs.azuredatabricks.net/api/latest/authentication.html#authentication
  • Click the Generate New Token button.
  • Optionally enter a description (comment) and expiration period.
photo courtesy https://docs.azuredatabricks.net/api/latest/authentication.html#authentication
  • Click the Generate button.
  • Copy the generated token and store in a secure location.

1.4 Set up Authentication

  • Open Command Prompt (cmd)
  • Run following command.
databricks configure --token
  • databricks Host : https://<region>.azuredatabricks.net
  • token : <personal-access-token>
  • After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg.

2 : Create a Databricks-backed secret scope

By default, scopes are created with MANAGE permission for the user who created the scope. If your account does not have the Azure Databricks Premium Plan, you must override that default and explicitly grant the MANAGE permission to “users” (all users) when you create the scope.

Go to cmd and run following command,

databricks secrets create-scope --scope <scope-name> --initial-manage-principal users

3 : Set up a secret

  • Go to cmd and run following command/commands,
databricks secrets put --scope <scope-name> --key <key-name>
  • Then a notepad will be opened and you have to enter whatever the key in there, and then save the notepad. Otherwise secret will no be created.

Voila !!! Now you can use referenced Databricks-backed secrets instead of direct credential in the Notebook.

Additionally..

List secret scopes

databricks secrets list-scopes

Delete a secret scope

databricks secrets delete-scope --scope <scope-name>

Reference

https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html#id3

--

--