Secret Management in Azure Databricks

Photo courtesy: https://docs.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks

“Databricks Unified Analytics Platform’, from the original creators of Apache Spark, unifies data science and engineering across the Machine Learning lifecycle from data preparation to experimentation to deployment.”

Quoted from https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html#id3

Often, accessing data requires authentication to external data sources. Instead of directly entering the credentials into a notebook, use Azure Databricks secrets to store them and then reference them in notebooks and jobs. Databricks has introduced Secret Management, which allows users to leverage and share credentials within Databricks in a secured manner.

OVERVIEW

There are two types of secret scopes:

Azure Key Vault-backed: To reference secrets stored in an Azure Key Vault, you can create a secret scope backed by Azure Key Vault. You can then leverage all the secrets in the corresponding Key Vault instance from that secret scope. Because the Azure Key Vault-backed secret scope is a read-only interface to the Key Vault, the PutSecret and DeleteSecret Secrets API operations are not allowed. To manage secrets in Azure Key Vault, you must use the Azure SetSecret REST API or Azure portal UI.

Databricks-backed: A Databricks-backed scope is stored in (backed by) an Azure Databricks database. You must create a Databricks-backed secret scope using the Databricks CLI (version 0.7.1 and above). Alternatively, you can use the Secrets API.

But in this article, the focus will be on Databricks-backed secrets. The following infographic shows how it functions:

CONCEPTS

· Secret Scopes: Managing secrets begins with creating a secret scope. A secret scope is identified by its name, unique within a workspace. The names are considered non-sensitive and are readable by all users in the workspace. A workspace is limited to a maximum of 100 scopes. (This acts as the logical grouping for the secrets. For example, one scope to hold JDBC credentials, one scope to hold blob storage credentials, etc.)

· Secrets: A secret is a key-value pair that stores secret material, with a key name unique within a secret scope. Each scope is limited to 1000 secrets. The maximum allowed secret value size is 128 KB. (This contains all the credential information, which is assigned to a scope.)

· Secret ACLs: By default, all users in all pricing plans can create secrets and secret scopes. Using secret access control, available with the Azure Databricks Premium Plan, you can configure fine-grained permissions for managing access control. This guide describes how to set up these controls. (Here we can define the access to the secret-scopes.)

STEPS

1.0 Setting up Databricks-CLI on Machine

1.1 Requirements and limitations

· Python 2–2.7.9 and above

· Python 3–3.6 and above

1.2 Install CLI

· Open Command Prompt (cmd)

· Run pip install databricks-cli using the appropriate version of pip for your Python installation. If you are using Python 3, run pip3.

1.3 Authentication

To authenticate and access Databricks REST APIs, use your personal access tokens. Tokens are like passwords; be discrete with them. Tokens expire and can be revoked.

Requirements: Your administrator must enable personal access tokens for your organization’s Azure Databricks account.

Generate token:

· Click the user profile icon User Profile in the upper right corner of your Azure Databricks workspace.

· Click User Settings.

· Go to the Access Tokens tab.

· Click the Generate New Token button.

· Optionally enter a description (comment) and expiration period.

· Click the Generate button.

· Copy the generated token and store in a secure location.

· 1.4 Set up Authentication

· Open Command Prompt (cmd)

· Run the following command.

databricks configure --token

· databricks Host: https://<region>.azuredatabricks.net

· token: <personal-access-token>

· After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg.

2.0 Create a Databricks-backed secret scope 3.0 Set up a secret

By default, scopes are created with ‘MANAGE permission’ for the user who created the scope. If your account does not have the Azure Databricks Premium Plan, you must override that default and explicitly grant the MANAGE permission to “users” (all users) when you create the scope.

Go to cmd and run following command,

databricks secrets create-scope --scope <scope-name> --initial-manage-principal users

3.0 Set up a secret

· Go to cmd and run following command/commands,

databricks secrets put --scope <scope-name> --key <key-name>

· Then a notepad will be open, and you must enter whatever the key in there, and then save the notepad. Otherwise secret will not be created.

Now you can use referenced Databricks-backed secrets instead of direct credential in the Notebook.

ADDITIONAL STEPS

List secret scopes

databricks secrets list-scopes

Delete a secret scope

databricks secrets delete-scope --scope <scope-name>

Written by Ruwan Sri Wickramarathna, Data Scientist. Originally published on his Medium Account.

FURTHER READING

https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html#id3

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
OCTAVE - John Keells Group

OCTAVE - John Keells Group

146 Followers

OCTAVE, the John Keells Group Centre of Excellence for Data and Advanced Analytics, is the cornerstone of the Group’s data-driven decision making.