Delta Sharing is Caring: How to set up a Delta Share with Databricks
Learn how to share your Databricks environment data to external recipients without them having access to a Databricks cluster.
In this article we will show how to use the Delta Sharing Protocol with a Unity Catalog enabled Databricks workspace. The Delta Sharing protocol allows you to share data directly from your Databricks workspace without the data recipient needing their own Databricks cluster (Databricks, 2023). This is interesting because this protocol can be used to share data from your Databricks environment without the recipient needing access to your workspace. So for non-frequent data requests, the Delta Sharing protocol could be used instead of granting direct access. With this protocol, it is therefore possible to limit the number of people with access to your Databricks environment.
Article overview
If the Databricks Delta Sharing integration is useful to you and you would like to implement it yourself, this article provides a step-by-step instruction on how to share a table from Databricks using the Delta Sharing protocol. These instructions are divided over four different sections: We start with creating a share (Section 1), recipient (Section 2), and add a data table to the created share (Section 3). Finally, we also show how a data recipient can use the credential file to access the data using PowerBI (Section 4).
1.) Creating a share
In Delta Sharing, a share is a read-only collection of tables and table partitions that a provider wants to share with one or more recipients. If your recipient uses a Unity Catalog-enabled Databricks workspace, you can also include notebook files, views (including dynamic views that restrict access at the row and column level), and Unity Catalog volumes in a share. You can add or remove tables, views, volumes, and notebook files from a share at any time, and you can assign or revoke data recipient access to a share at any time. In a Unity Catalog-enabled Databricks workspace, a share is a securable object registered in Unity Catalog. If you remove a share from your Unity Catalog metastore, all recipients of that share lose the ability to access it.
Requirements share
In our example, we want to create a Share named opendatasharepoc with our recipient. Before we can create a share however, we need to comply to the following two prerequisites:
1. Be a metastore admin or have the CREATE SHARE privilege for the Unity Catalog metastore where the data you want to share is registered.
2. Create the share using a Unity Catalog enabled Databricks workspace.
Share Setup
To create a share, go to the Delta Sharing tab in the Catalog Explorer and click on the blue “Share Data” button in the top-right corner (Figure 1). Then fill in the name of the Share you want to create and any comments you would like to add to the share. Once you created a share, it should appear within the ‘Shared by me’ list (Figure 2). As mentioned before, only share data with recipients that you trust, and ensure that the share does not contain any sensitive data.
2.) Create a data recipient.
To share data with someone, we need to create a recipient within the Databricks environment. As a data provider, you can define multiple recipients for any given Unity Catalog metastore. If you want to share data from multiple metastores with a particular user or group of users, you must define the recipient separately for each metastore. A recipient can have access to multiple shares.
Requirements data recipient
Before you can create recipients within the Databricks workspace, you need to have the following three prerequisites:
- You must be a metastore admin or have the CREATE_RECIPIENT privilege for the Unity Catalog metastore in which the data is located.
- You must create the recipient using a Unity Catalog enabled Databricks workspace.
- If you use a Databricks notebook to create the recipient, your cluster must use Databricks Runtime 11.3 LTS or above and have either shared or single-user cluster access mode enabled.
Data recipient Setup
To create a new recipient, hit the “New Recipient” button next to the blue “Share Data” button. Fill in the name of the recipient, the sharing identifier of the recipient if they are a Databricks user, and any comment you would like to add to this recipient (Figure 3). Clicking on ‘Create’ will generate a link that can be shared with your recipient (Figure 4), which is used later.
Once you have created the recipient you can check information such as authentication type, activation link and recipient properties by clicking on the details tab of the recipient (Figure 5).
3.) Add data table to the share.
Now that you have set up a share and a recipient, assets must be added to the share. These assets can be anything from the data assets from the Databricks workspace: catalogs, schemas, tables, and views.
In this example, we created a single table which we want to share. The Catalog, Schema, and Table name of this single table are:
- Catalog: devdev
- Schema: testtbx
- Table: test3
To add assets to the share, select the share, click on the ‘Add assets’ button on the top-right and select whether you want to add an asset or a notebook file (Figure 6). Clicking on the ‘Add assets’ opens a tab in which you can select the table(s) in the metastore you want to add to the share (Figure 7). Click on ‘Save’ to finalize your choice.
Adding recipients to the share
Now that you have added an (data) asset you need to add the data recipient you created in Section 3. In the data share window, click on the “Add recipient” button next to the “Manage assets” button. This gives you a window where you can select the recipient(s) you want to add to the Share (Figure 8). If all went well, the recipient should appear in the Recipients Tab of the Delta Share (Figure 9).
4.) Access the shared data as Data recipient.
Now that we have added the recipient to the share, the recipient can access the data. To provide an example on how to access a data share, we have set up our own data share using the steps described above. An overview is given in Figure 10. For our delta share (opendatasharepoc), we will be sharing a single data table named test3 with a Data recipient named opendatasharerecipientttbx. To access this data, we will use PowerBI. As we have mentioned before, anything that can access the Delta Sharing protocol can access the data share. So programs and programming languages like Tableau, Python, and Spark (and many more) can be used for this purpose depending on your requirements.
Credential File
When access is granted by the data provider, the data recipient will receive an activation link. Opening this link in any internet browser should give you a window like Figure 11. By clicking ‘Download Credential File’, the data recipient can download the credential file. For security reasons, the credential file can only be downloaded once, after which the download link will be deactivated. Be sure to notify the recipient about this. In case they lose the credential file, a workaround is possible where the recipient can download it again if you use a different browser session given that the credential file is not yet expired.
The credential file is a JSON file (Figure 12), and is structured as follows:
- bearerToken: the bearerToken is used to authorize your connection with the data share.
- endpoint: a URL containing the metastore id used to make the connection to the data share.
- expirationTime: the timestamp date of the expiration date of the bearerToken.
Access Data Share with PowerBI
With this JSON file, it is easy to connect to the Delta Sharing source with Power BI. Click on the ‘Get Data’ button in Power BI, select ‘Delta Sharing’ from the different data source options, and click ‘Connect’ (see Figure 13).
A new window opens, in which you can connect to the Data Share using four simple steps (Figure 14). First, copy the endpoint URL from the JSON file and paste it to the Delta Sharing Server URL (1). Click on OK (2). Power BI then requests the Bearer Token. Copy it from the JSON file and paste it here (3). Finally, Click ‘Connect’ (4). When everything goes well, the dataset within the Data Share should be added to your Power BI (Figure 15).
Conclusion.
In this article we have given a step-by-step manual that shows you how to create a share and recipient, and how to access the data within the Delta Share using Power BI. If used correctly, Delta Sharing could significantly simplify your data sharing method within your organization and reduce the amount of people having direct access to your Databricks environment. Furthermore, while we only showed how to access the data with PowerBI, this protocol can be used to share data with other clients, both open source and commercial. We highly recommend trying Delta Sharing for yourself and see what it can do for you.
Be mindful
While the Delta Sharing protocol is an amazing tool to quickly share data, there are some things you need to consider before using it. First, keep in mind that the credential file you send to the Data Recipient is the Recipient’s key to the shared data. While using the credential file is easier to share than access permissions, this convenience also brings risks. If the credential file falls into the wrong hands, people have access to the data inside the shared table. This data exposure could be significant if you decided to share sensitive data with this protocol. Secondly, the Delta Sharing protocol uses public servers for authentication, which might be an additional concern with sharing sensitive data.
It is important to be mindful of these possible risks and use common sense to reduce them: share the links only with trustworthy recipients, notify the recipients to not share their credential file, set an expiration date on the credential file suitable for your purposes, and do not share any sensitive data with this protocol.
References
Databricks. (2023, May 26). Introducing Delta Sharing: An Open Protocol for Secure Data Sharing. Retrieved from Databricks: https://www.databricks.com/blog/2021/05/26/introducing-delta-sharing-an-open-protocol-for-secure-data-sharing.html
This is the second part of a two part Medium article about Delta Sharing from a Databricks workspace. In the first part we give a description of what Delta Sharing is exactly. Readers who are interested are highly recommended to read the first part of this article as well for the full picture: https://medium.com/nntech/delta-sharing-is-caring-how-the-delta-sharing-protocol-can-save-the-day-e824e868bf65
These articles were co-written with a group of amazing people whose contributions I want to acknowledge:
Luiz Izidorio Vidal,
Stijn Mohr,
Ton Brokx,
Massimo Iannelli, and
Francisco Mercado Rueda.