Data Sharing using Databricks and Oracle Autonomous Database (Part2: Databricks as a consumer)

6 min readApr 22, 2024

In a previous story, we’ve seen how to consume data shared by Databricks using the Delta Sharing protocol within an Oracle Autonomous Database instance.

As delta sharing is based on an open protocol, every vendor can use it. We already mentioned this and Oracle has implemented the Delta Sharing protocol within its autonomous database offer.

When used into Oracle and when the database is used as a data provider. In this case, the database has to be configured with an Object store attached. When a table is shared, the data will be then copied as a parquet datasets into an object store. (Please mention this could have an impact on the costs especially for big data sets as you’ll pay for the Autonomous Database storage reservation and for the object store data).

So the first step is to create an object store and associate a storage link to this object store using the right credentials. Then we’ll need to create a share which will refer to the shared tables, and finally create a recipient and associate this recipient to the share.

The last step will be to download the delta sharing credentials file

CREATE AN OBJECT STORE ON OCI AND ASSOCIATE A STORAGE LINK

To do so, you’ll need to create an object store (bucket) on OCI and then get the URL based on the location of your bucket, its name, and the namespace of this bucket.

Once you have the bucket URI (e.g. https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/<namespace>/b/<bucketName>/o/ for a bucket located in Frankfurt datacenter), you can create a Storage link to this location using its URI. In the following example, I created a storage link named ‘DATA_SHARE_STORAGE_LINK”.

BEGIN
   DBMS_SHARE.CREATE_CLOUD_STORAGE_LINK(
       STORAGE_LINK_NAME => 'DATA_SHARE_STORAGE_LINK',
       URI => 'https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/<namespace>/b/<bucketName>/o/');
END;
/

The next step will be to create credentials / signing keys associated with your OCI Cloud username. I will not deep dive into this, but the goal is to get 4 pieces of information (user OCID, tenancy OCID, the fingerprint and the private key of the signing keys). For more information, you can refer to the Oracle documentation: https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm#two

In the following snippet, I created using my user who will own the share (LAURENT), a credential object name DATASHARE_BUCKET_CREDENTIAL which will associate the information mentioned just before:

To control if the credentials are correctly set to read the bucket, we can use the following PLSQL block which will list the content of the bucket.

SELECT OBJECT_NAME, BYTES
FROM
DBMS_CLOUD.LIST_OBJECTS(
   credential_name=>'DATASHARE_BUCKET_CREDENTIAL',
   LOCATION_URI => 'https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/<namespace>/b/<bucketName>/o/');


OBJECT_NAME                                                                BYTES 
-------------------------------------------------------------------------- ----- 
test-adb/                                                                      0 
test-adb/config.share                                                        289 
test/                                                                          0

We are now ready to associate the credentials to the storage link name, and then we’ll be ready to move to the data share creation.

BEGIN
DBMS_SHARE.set_storage_credential
   (storage_link_name=>'DATA_SHARE_STORAGE_LINK',
    credential_name=>'DATASHARE_BUCKET_CREDENTIAL');
END;
/

CREATE A DATA SHARE AND ADD SOME ASSETS

To create a Share, we’ll use the CREATE_SHARE function from the DBMS_SHARE, and provide a name and a share type.

If the share type is ‘VERSIONED’, the share will be used to share versioned data, if not, set it to ‘LIVE’

BEGIN
DBMS_SHARE.CREATE_SHARE(
    share_name=>'DEMO_SHARE',
    share_type=>'LIVE');
END;
/

Like for Databricks, as soon as the share is created, we can associate some table. In our case, I added two tables from LAURENT’s schema (REGIONS and USERS). To simplify the process, these tables will be exposed with their original name (as set in the share_table_name parameter):

BEGIN
DBMS_SHARE.ADD_TO_SHARE(
   share_name=>'DEMO_SHARE',
   owner=> 'LAURENT',
   table_name=> 'USERS',
   share_table_name=> 'USERS');


DBMS_SHARE.ADD_TO_SHARE(
   share_name=>'DEMO_SHARE',
   owner=> 'LAURENT',
   table_name=> 'REGIONS',
   share_table_name=> 'REGIONS');
END;
/

The content of the share can be double checked with following query:

SELECT share_name,share_table_name, table_name, table_owner
FROM user_share_tables;


SHARE_NAME SHARE_TABLE_NAME TABLE_NAME TABLE_OWNER 
---------- ---------------- ---------- ----------- 
DEMO_SHARE USERS            USERS      LAURENT     
DEMO_SHARE REGIONS          REGIONS    LAURENT

Finally the share can be published, and we’ll be ready to move to the recipient creation.

BEGIN
   DBMS_SHARE.PUBLISH_SHARE(share_name=>'DEMO_SHARE');
END;
/

CREATE A RECIPIENT

Creating a recipient is a simple operation which will associate a recipient name to an email.

BEGIN
   DBMS_SHARE.CREATE_SHARE_RECIPIENT(
       recipient_name => 'DATABRICKS_RECIPIENT',
       email => 'laurent.leturgez@xxxx.com');
END;

The USER_SHARE_RECIPIENTS view contains information about the created recipient.

SELECT recipient_name, updated
FROM user_share_recipients;


RECIPIENT_NAME       UPDATED                     
-------------------- --------------------------- 
DATABRICKS_RECIPIENT 2024-04-22T07:47:56.537958Z

Recipients are also associated with some properties. In the following block, I updated the lifetime of the token and add a description to the created recipient.

BEGIN
   DBMS_SHARE.UPDATE_RECIPIENT_PROPERTY (
   recipient_name => 'DATABRICKS_RECIPIENT',
   recipient_property => 'TOKEN_LIFETIME',
   new_value => '90 00:00:00');


   DBMS_SHARE.UPDATE_RECIPIENT_PROPERTY (
   recipient_name => 'DATABRICKS_RECIPIENT',
   recipient_property => 'DESCRIPTION',
   new_value => 'A recipient to share with databricks');

   commit;
END;
/

Now, we have a recipient and a share, we just have to grant the share to the recipient. This can be done by executing the following PLSQL block. It can be controlled by the query given after the PLSQL block.

BEGIN
   DBMS_SHARE.GRANT_TO_RECIPIENT(
       share_name=>'DEMO_SHARE',
       recipient_name=> 'DATABRICKS_RECIPIENT',
       AUTO_COMMIT=>true);
END;
/


SELECT recipient_name, share_name
FROM user_share_recipient_grants
WHERE recipient_name ='DATABRICKS_RECIPIENT';


RECIPIENT_NAME       SHARE_NAME 
-------------------- ---------- 
DATABRICKS_RECIPIENT DEMO_SHARE

Finally, we have to get the activation link associated with this recipient, and use it to download the credential file.

BEGIN
   DBMS_OUTPUT.PUT_LINE(DBMS_SHARE.GET_ACTIVATION_LINK
   ('DATABRICKS_RECIPIENT'));
END;
/

Example:

Once the credential file is downloaded, transfer it to Databricks (or share it with you stakeholder/partner). In my example, I put it on a unity catalog volume.

CONSUME THE SHARE ON A DATABRICKS PySpark NOTEBOOK

Now on Databricks, we can use the following code to list tables contained in the share, and then create a Dataframe based on this share.

%pip install delta-sharing
dbutils.library.restartPython()

Then in another cell, instantiate a Delta Sharing client object and provide the location of the credentials get on the portal URL delivered by Oracle in the previous operation.

import delta_sharing
client = delta_sharing.SharingClient("/Volumes/laurent_adb/data_consumer/credentials/OCI_delta_share_profile.json")
client.list_all_tables()

In our case, it will give the following output with the share name, the schema and the table name to use.

Finally, we can write some pySpark code which will leverage the delta Sharing profile to read the table shared in Oracle:

df=spark.read.format("deltaSharing").load("/Volumes/laurent_adb/data_consumer/credentials/OCI_delta_share_profile.json#DEMO_SHARE.LAURENT.USERS");
display(df)

Once executed, it will deliver the result above:

If you go back to the Oracle autonomous database instance and you look at the content of the bucket, you will see that the content of the bucket has been updated with new parquet files which contains the data in the parquet format:

SELECT OBJECT_NAME, BYTES
FROM
DBMS_CLOUD.LIST_OBJECTS(
   credential_name=>'DATASHARE_BUCKET_CREDENTIAL',
   LOCATION_URI => 'https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/<namespace>/b/<bucketName>/o/');

OBJECT_NAME                                                                BYTES 
-------------------------------------------------------------------------- ----- 
DEMO_SHARE_851142/LAURENT/REGIONS/V21_2S_1_20240422T090906241514Z.parquet    781 
DEMO_SHARE_851142/LAURENT/REGIONS/V42_21S_1_20240422T091335438608Z.parquet   769 
DEMO_SHARE_851142/LAURENT/REGIONS/V43_22S_1_20240422T091422103372Z.parquet   747 
DEMO_SHARE_851142/LAURENT/USERS/V43_1S_1_20240422T091818194320Z.parquet     1230 
DEMO_SHARE_851142/LAURENT/USERS/V44_23S_1_20240422T092411814382Z.parquet    1248 
DEMO_SHARE_851142/LAURENT/USERS/V45_24S_1_20240422T093854824398Z.parquet    1220 
test-adb/                                                                      0 
test-adb/config.share                                                        289 
test/                                                                          0

This ends this 2 part series on how to share data in both directions between Databricks and Oracle Autonomous Database.

Data Sharing using Databricks and Oracle Autonomous Database (Part2: Databricks as a consumer)

CREATE AN OBJECT STORE ON OCI AND ASSOCIATE A STORAGE LINK

CREATE A DATA SHARE AND ADD SOME ASSETS

CREATE A RECIPIENT

CONSUME THE SHARE ON A DATABRICKS PySpark NOTEBOOK

Written by Laurent Leturgez