Delta Sharing is Caring: How the Delta Sharing Protocol can save the day

Share your Databricks environment data to (external) recipients without them having Databricks cluster access.

Thijs de Goede
NN Tech
3 min readMar 28, 2024

--

Picture this: you are working as a Data Engineer in a large successful company. The data analysts in your team have an important deadline coming up for a large Report on the data you maintain in your shared Databricks workspace. On the last workday of the deadline you set everything up for your colleagues to refresh their Data Model one more time so they can finish their Report in the evening on time.

Unforeseen Circumstances

Then disaster strikes. For some unknown reason, a single data Table failed to load during the refresh and is missing in the Report just 30 minutes before the deadline. While small, this Table links all the other Tables within the Report together, making the Report worthless without it. The queries to create this Table are simple, but the only way to add this Table to the Report is to refresh the whole Data Model, which takes too long to get the Report ready before the deadline. All hope seems lost. But as you have recently enabled Unity Catalog on your Databricks workspace, you think might have the solution: the Delta Sharing protocol.

What is the Delta Sharing Protocol?

This open-source protocol relies on the Delta Lake open-source storage framework project governed by the Linux Foundation. You know that Delta Sharing supports a wide range of open source and commercial clients, business intelligence/analytics tools, governance solutions, cloud providers, and SAAS/Multi-cloud infrastructure integrations from a wide variety of both commercial and open-source third-party vendors. The unique selling points of Delta Sharing are that it is easy to use, the data can be shared with others directly without copying it, high scalability to large datasets and the built-in auditing and governance.

Figure 1: Schematic overview of the Delta Sharing protocol. This schematic uses AWS cloud services as an example, but it also works for sharing Parquet files from Azure Storage Accounts. Image Source: (Databricks, 2023). ©2023 Databricks Inc. — All rights reserved.

You remember that the Delta Sharing protocol works via a data recipient’s client authenticating to the data provider server, normally via a bearer token, and asking the data provider server to query a specific Table. The server verifies if the client is authorized to read the data, logs the request, and determines which data to send back. The server generates a set of pre-signed URLs that allow the client to directly read the Parquet files from the cloud provider. This process of Parquet file sharing is shown in Figure 1. You know then that it is possible to share data with someone by providing them the credentials, even if they do not have access to a Databricks cluster. This sounds like the solution to your current crisis.

Saving the day

You spring into action. You set up the missing Table and add it to a Delta Share. You create the necessary permissions for your colleagues and send them the credential file they need to obtain the Table. With the Delta Share, the analysts can finish the Report and successfully deploy it in the final seconds before the deadline. All thanks to your quick thinking and the Delta Sharing protocol.

This story shows how useful the Delta Sharing protocol can be as a tool to share data. Not only for last-minute quick shares, but also when the recipient does not have direct access to a Databricks cluster and only needs access to a few data Tables. Do you want to learn how to set up your own Delta Shares using Unity Catalog? We have written a detailed step-by-step guide for setting up a Delta Share and how you can share it with any Data recipient. You can find this article here: https://medium.com/nntech/delta-sharing-is-caring-how-to-set-up-a-delta-share-with-databricks-1f541b66c66e.

This is the first part of a two part article about Delta Sharing with Databricks. In the second part we describe in detail how a Delta Share can be created. Readers who are interested are highly recommended to read the second part of this article

These articles were co-written with a group of amazing people whose contributions to this article I want to acknowledge:

Luiz Izidorio Vidal,
Stijn Mohr,
Ton Brokx,
Massimo Iannelli, and
Francisco Mercado Rueda.

--

--

Thijs de Goede
NN Tech
Writer for

Certified Nerd figuring out life one interest at the time.