Azure Databricks User Token Management — we can end up developing a key expiry notification & auto-rotation system
Azure Databricks can be connected in different ways. All of these need a valid Databricks user token to connect and invoke jobs. Though using user token is very straight forward however, maintaining the token needs some extra efforts as on now.
Find below the 3 most popular ways to connect Databricks:
1. Azure Data Factory (ADF) v2 — Linked Services
First we need an ADF — Databricks Linked Service to be created. We can add the user token to connect.
2. Azure Databricks Rest API calls
REST POST call has the Authorization — header which needs the User Token.
As user token is related to a ‘user’, the real problem would be if a user gets deactivated e.g. left the organization/blocked or the token gets expired (though we can make a token ‘never expiring’ but that’s not a good practice)!!
We would have to identify all the client applications (ADF/Rest client/JDBC-Hive client) and update with the new token. This task could be very difficult and time consuming in case we have a lot of ADF pipelines / Rest clients using the expired token. After updating with the new token we also would need to take care of the failed jobs 😟.
A better approach would be to keep the user token at Azure Key Vault (as a Secret value) and use the Secret name to retrieve it. In case of any new user token generation, the Azure Key Vault secret value would need to be updated manually and all of the Databricks’ clients using the secret would get the latest token without any manual intervention.
Next, we’ll see how we can do that.
1. Azure Data Factory (ADF) v2 — Linked Services
Instead of ‘hard-coding’ the Databricks user token, we can store the token at Azure Key Vault as a Secret and refer that from the Data Factory Linked Service.
2/3. Rest API calls / Using JDBC-ODBC
For the detailed steps you can follow this. Here, we have stored the Databricks user token in the Azure Key Vault and retrieved it before calling Databricks Rest API or constructing JDBC-Hive connection string each time.
Few points to note:
(i) The OAuth2 token received in the Step 4 only lives for an hour
(ii) The Service Principal key, Azure Key Vault secret and Databricks User Token should have some expiry dates, generally aligning with the organization system password expiration policy.
(iii) In Step 3, the Service Principal should have only required access of the Azure KV.
User Token Refreshment
With the above approach, if we now want to update the user token (due to user being decommissioned or has been blocked), a new user token from a separate user can be created as a new Secret version. Any further Rest calls (Step 5 in the above flow diagram) will fetch the new token. ADFv2 pipeline invocations also will refer the updated Azure KV Secret value. So, we don’t need to manually update user token in any client codes/configurations.
Now there could be two scenarios,
(i) At client end we can store the user token in a local database and refresh if any authentication exception / invalid access token exception is received. We need to be extra cautious to store the token locally.
(ii) Otherwise, we can retrieve the token whenever a Databricks Rest API needs to be invoked (without storing at client side).
Note than, the OAuth2 token only lives for an hour
("expires_in": 3600) so, we may need to refresh this again before the Databricks user token refreshment.
Key / Secret / User Token Expiration
Instead of managing only the Databricks user token expiration, now I have to manage expiration of three keys (Service Principal, Azure KV and Databricks) 😮!
As on now, Azure will not send any alert if any of these about to expire! So, we need a solution to update the keys/secrets before those expire.
Though a simple manual solution would be to keep an offline list of the keys/secrets with expiry dates maintained, reviewed regularly and manually change/create a new one before expiry.
But, then we’ll go against the world of automation! Apart from that, managing manual key expiry and rotation will be a daunting task as we’ll use more cloud services.
To overcome that (until any ‘off-the-shelf’ cloud service is available), we can use the existing Rest APIs or Azure PowerShell commands to renew the keys/secrets.
Find below few ways we have:
Databricks User Token Expiration:
Updating the Azure Key Vault, requires the secrets/set permission to be set.
Azure Key Vault Secret Expiration:
Listing & updating the Azure Key Vault, requires the secrets/list & set permissions to be set.
Service Principal Key Expiration:
Currently there is an open bug which will throw the above error while trying to create a new key.
I’m yet to find the latest appropriate Rest API documentation for the Service Principal key creation. Though further internal Rest call details can be found by appending
— — debug global parameter with the
az ad app credential reset.
Though the above approaches can rotate the keys/secrets/tokens without much human interaction however, we also need a notification system to notify appropriate support group after the auto-rotation completes. There are some good examples available:
In a similar way, notifications can also be sent if keys are about to expire.
Solving the Misleading Identity Problem
Databricks user token are created by a user, so all the Databricks jobs invocation log will show that user’s id as job invoker. This could create confusion.
As of now, there is no option to integrate Azure Service Principal with Databricks as a system ‘user’.
As a workaround, we can create a dummy ‘user’ account with a valid email id and add it to the Azure Active Directory tenant.
We shouldn’t create unnecessary keys/secrets and shouldn’t auto-rotate all indiscriminately. Check for the keys/secrets which should expire after the set date. Only consider keys/secrets which are really required and could take down the system if expired.