Delta Sharing is Caring: How to set up a Delta Share with Databricks

Learn how to share your Databricks environment data to external recipients without them having access to a Databricks cluster.

Thijs de Goede
NN Tech
9 min readMar 28, 2024

--

In this article we will show how to use the Delta Sharing Protocol with a Unity Catalog enabled Databricks workspace. The Delta Sharing protocol allows you to share data directly from your Databricks workspace without the data recipient needing their own Databricks cluster (Databricks, 2023). This is interesting because this protocol can be used to share data from your Databricks environment without the recipient needing access to your workspace. So for non-frequent data requests, the Delta Sharing protocol could be used instead of granting direct access. With this protocol, it is therefore possible to limit the number of people with access to your Databricks environment.

Article overview

If the Databricks Delta Sharing integration is useful to you and you would like to implement it yourself, this article provides a step-by-step instruction on how to share a table from Databricks using the Delta Sharing protocol. These instructions are divided over four different sections: We start with creating a share (Section 1), recipient (Section 2), and add a data table to the created share (Section 3). Finally, we also show how a data recipient can use the credential file to access the data using PowerBI (Section 4).

1.) Creating a share

In Delta Sharing, a share is a read-only collection of tables and table partitions that a provider wants to share with one or more recipients. If your recipient uses a Unity Catalog-enabled Databricks workspace, you can also include notebook files, views (including dynamic views that restrict access at the row and column level), and Unity Catalog volumes in a share. You can add or remove tables, views, volumes, and notebook files from a share at any time, and you can assign or revoke data recipient access to a share at any time. In a Unity Catalog-enabled Databricks workspace, a share is a securable object registered in Unity Catalog. If you remove a share from your Unity Catalog metastore, all recipients of that share lose the ability to access it.

Requirements share

In our example, we want to create a Share named opendatasharepoc with our recipient. Before we can create a share however, we need to comply to the following two prerequisites:

1. Be a metastore admin or have the CREATE SHARE privilege for the Unity Catalog metastore where the data you want to share is registered.

2. Create the share using a Unity Catalog enabled Databricks workspace.

Share Setup

Figure 1: Clicking the “Share Data” button on the top right, you can create a Data share by filling the name of the share, and any optional comments you would like to add.

To create a share, go to the Delta Sharing tab in the Catalog Explorer and click on the blue “Share Data” button in the top-right corner (Figure 1). Then fill in the name of the Share you want to create and any comments you would like to add to the share. Once you created a share, it should appear within the ‘Shared by me’ list (Figure 2). As mentioned before, only share data with recipients that you trust, and ensure that the share does not contain any sensitive data.

Figure 2: When the Data Share is created, it will be visible in the “Shared by me” of the Delta Sharing menu of your Catalog Explorer.

2.) Create a data recipient.

To share data with someone, we need to create a recipient within the Databricks environment. As a data provider, you can define multiple recipients for any given Unity Catalog metastore. If you want to share data from multiple metastores with a particular user or group of users, you must define the recipient separately for each metastore. A recipient can have access to multiple shares.

Requirements data recipient

Before you can create recipients within the Databricks workspace, you need to have the following three prerequisites:

  1. You must be a metastore admin or have the CREATE_RECIPIENT privilege for the Unity Catalog metastore in which the data is located.
  2. You must create the recipient using a Unity Catalog enabled Databricks workspace.
  3. If you use a Databricks notebook to create the recipient, your cluster must use Databricks Runtime 11.3 LTS or above and have either shared or single-user cluster access mode enabled.

Data recipient Setup

Figure 3: Creating a Recipient only requires a Recipient Name, a Databricks Sharing identifier if the recipient has their own Databricks workspace, and any additional comment you would like to add to the recipient.

To create a new recipient, hit the “New Recipient” button next to the blue “Share Data” button. Fill in the name of the recipient, the sharing identifier of the recipient if they are a Databricks user, and any comment you would like to add to this recipient (Figure 3). Clicking on ‘Create’ will generate a link that can be shared with your recipient (Figure 4), which is used later.

Figure 4: Creating a recipient gives you access to an activation link that the recipient requires to access the credential file.

Once you have created the recipient you can check information such as authentication type, activation link and recipient properties by clicking on the details tab of the recipient (Figure 5).

Figure 5: Overview of a created recipient.

3.) Add data table to the share.

Now that you have set up a share and a recipient, assets must be added to the share. These assets can be anything from the data assets from the Databricks workspace: catalogs, schemas, tables, and views.

Figure 6: Adding assets to a Delta Share is as easy as pushing a button.

In this example, we created a single table which we want to share. The Catalog, Schema, and Table name of this single table are:

  • Catalog: devdev
  • Schema: testtbx
  • Table: test3

To add assets to the share, select the share, click on the ‘Add assets’ button on the top-right and select whether you want to add an asset or a notebook file (Figure 6). Clicking on the ‘Add assets’ opens a tab in which you can select the table(s) in the metastore you want to add to the share (Figure 7). Click on ‘Save’ to finalize your choice.

Figure 7: Overview of how you can select any data table within a Unity Catalog enabled workspace the Delta Share is a part of.

Adding recipients to the share

Figure 8: Before the recipient can access the activation link, you need to add them to the Delta Share.

Now that you have added an (data) asset you need to add the data recipient you created in Section 3. In the data share window, click on the “Add recipient” button next to the “Manage assets” button. This gives you a window where you can select the recipient(s) you want to add to the Share (Figure 8). If all went well, the recipient should appear in the Recipients Tab of the Delta Share (Figure 9).

Figure 9: Once the recipient is added, their name is shown in the Recipients Tab.

4.) Access the shared data as Data recipient.

Now that we have added the recipient to the share, the recipient can access the data. To provide an example on how to access a data share, we have set up our own data share using the steps described above. An overview is given in Figure 10. For our delta share (opendatasharepoc), we will be sharing a single data table named test3 with a Data recipient named opendatasharerecipientttbx. To access this data, we will use PowerBI. As we have mentioned before, anything that can access the Delta Sharing protocol can access the data share. So programs and programming languages like Tableau, Python, and Spark (and many more) can be used for this purpose depending on your requirements.

Figure 10: Schematic overview of our Proof of Concept, including the names of the Delta Share, Test Table, and Test recipient from this tutorial. This image has been modified from: (Databricks, 2023) ©2023 Databricks Inc. — All rights reserved.

Credential File

When access is granted by the data provider, the data recipient will receive an activation link. Opening this link in any internet browser should give you a window like Figure 11. By clicking ‘Download Credential File’, the data recipient can download the credential file. For security reasons, the credential file can only be downloaded once, after which the download link will be deactivated. Be sure to notify the recipient about this. In case they lose the credential file, a workaround is possible where the recipient can download it again if you use a different browser session given that the credential file is not yet expired.

Figure 11: When the recipient clicks on the activation link, it opens this web page where the recipient can download the Credential File by clicking the button. This credential file can only be downloaded once, so be sure to mention this to the data recipient.

The credential file is a JSON file (Figure 12), and is structured as follows:

  • bearerToken: the bearerToken is used to authorize your connection with the data share.
  • endpoint: a URL containing the metastore id used to make the connection to the data share.
  • expirationTime: the timestamp date of the expiration date of the bearerToken.
Figure 12: The contents of the JSON credential file.

Access Data Share with PowerBI

Figure 13: To use the endpoint and Bearer Token in Power BI, click on “Get Data” and select the “Delta Sharing” option.

With this JSON file, it is easy to connect to the Delta Sharing source with Power BI. Click on the ‘Get Data’ button in Power BI, select ‘Delta Sharing’ from the different data source options, and click ‘Connect’ (see Figure 13).

Figure 14: Adding a Delta share can be done in four steps: Add the endpoint URL (1), click OK (2) and copy the Bearer Token (3). Finally click connect (4) and you should be able to access the data.

A new window opens, in which you can connect to the Data Share using four simple steps (Figure 14). First, copy the endpoint URL from the JSON file and paste it to the Delta Sharing Server URL (1). Click on OK (2). Power BI then requests the Bearer Token. Copy it from the JSON file and paste it here (3). Finally, Click ‘Connect’ (4). When everything goes well, the dataset within the Data Share should be added to your Power BI (Figure 15).

Figure 15: With all steps complete, you are now able to access the data within the Delta Share without needing a Databricks authorization.

Conclusion.

In this article we have given a step-by-step manual that shows you how to create a share and recipient, and how to access the data within the Delta Share using Power BI. If used correctly, Delta Sharing could significantly simplify your data sharing method within your organization and reduce the amount of people having direct access to your Databricks environment. Furthermore, while we only showed how to access the data with PowerBI, this protocol can be used to share data with other clients, both open source and commercial. We highly recommend trying Delta Sharing for yourself and see what it can do for you.

Be mindful

While the Delta Sharing protocol is an amazing tool to quickly share data, there are some things you need to consider before using it. First, keep in mind that the credential file you send to the Data Recipient is the Recipient’s key to the shared data. While using the credential file is easier to share than access permissions, this convenience also brings risks. If the credential file falls into the wrong hands, people have access to the data inside the shared table. This data exposure could be significant if you decided to share sensitive data with this protocol. Secondly, the Delta Sharing protocol uses public servers for authentication, which might be an additional concern with sharing sensitive data.

It is important to be mindful of these possible risks and use common sense to reduce them: share the links only with trustworthy recipients, notify the recipients to not share their credential file, set an expiration date on the credential file suitable for your purposes, and do not share any sensitive data with this protocol.

References

Databricks. (2023, May 26). Introducing Delta Sharing: An Open Protocol for Secure Data Sharing. Retrieved from Databricks: https://www.databricks.com/blog/2021/05/26/introducing-delta-sharing-an-open-protocol-for-secure-data-sharing.html

This is the second part of a two part Medium article about Delta Sharing from a Databricks workspace. In the first part we give a description of what Delta Sharing is exactly. Readers who are interested are highly recommended to read the first part of this article as well for the full picture: https://medium.com/nntech/delta-sharing-is-caring-how-the-delta-sharing-protocol-can-save-the-day-e824e868bf65

These articles were co-written with a group of amazing people whose contributions I want to acknowledge:

Luiz Izidorio Vidal,
Stijn Mohr,
Ton Brokx,
Massimo Iannelli, and
Francisco Mercado Rueda.

--

--

Thijs de Goede
NN Tech
Writer for

Certified Nerd figuring out life one interest at the time.