Securely share unstructured data with your stakeholders using the Snowflake Data Cloud

Photo by Chris Liverani on Unsplash

Snowflake recently launched unstructured data management support in public preview. Data sharing is one of the key functionalities in the Snowflake Data Cloud, using which customers can share structured and semi-structured data with their clients, partners, or customers. The latest launch of unstructured data management enables customers to securely share unstructured data as well with their stakeholders.

The unstructured data management feature takes Snowflake to the next level and customers can now store ALL of their data in Snowflake.

In the rest of the blog, let’s see a simple scenario where an administrator in an organization wants to securely share unstructured data files with various partners that they do business with.

Let’s consider that the central data lake team in an organization is managing an invoices dataset. They want to share invoices related to supplierA with the analysts in the supplierA organization, share invoices related to supplierB with the analysts in the supplierB organization, and so on.

Here is a simple architecture on how they can achieve this

  1. Store invoices dataset in producer account (owned by the organization)
  2. Securely share the invoices dataset with the consumer account via a secure view (owned by the organization).
  3. In the consumer account, create a mapping table to map roles and unstructured data.
  4. Create a row-access policy and attach it to the secure view on unstructured data.

That’s it.. and when the analysts access the secure view, they will get access only to unstructured data that they are supposed to.

Here is the sample code

SQL code for producer account

use database sshah;
use schema unstructured_data;
create stage invoices directory = (enable = true);
alter stage invoices refresh;
select * from directory(@invoices);-- Create table to store invoice metadata
create or replace table invoice_info
(
relative_path string,
order_number string,
supplier string,
estimated_tax number(10,2),
grand_total number(10,2),
items_sub_total number (10,2),
order_placed date,
order_total number(10,2),
shipping_and_handling number(10,2),
sold_by string,
total_before_tax number(10,2)
);
create or replace secure view invoice_info_v
as
select
build_scoped_file_url(@invoices, relative_path) as scoped_url,
*
from
invoice_info
;
-- Now create a share to share the unstructured data with consumer accountcreate or replace share invoice_info;
grant usage on database sshah to share invoice_info;
grant usage on schema sshah.unstructured_data to share invoice_info;
grant select on view sshah.unstructured_data.invoice_info_v to share invoice_info;

alter share invoice_info ADD ACCOUNTS = <consumer>;

SQL code for consumer account

-- Create database from share
create database invoices_share from share <producer>.invoice_info;
select * from invoice_info_v;-- Create another database to apply row access policies
create database invoices_share_p;
create schema unstructured_data;
-- create a shim view on top of the secure view
create view invoice_info_v as
select * from invoices_share.unstructured_data.invoice_info_v;
-- Create mapping table
create table supplier_role_mapping (
role varchar,
supplier varchar
);
-- Create row access policies
create row access policy supplier_role_policy as (supplier_role varchar) returns boolean ->
'ACCOUNTADMIN' = current_role()
or exists (
select 1 from supplier_role_mapping
where role = current_role()
and supplier = supplier_role
)
;
select * from invoice_info_v;alter view invoice_info_v add row access policy supplier_role_policy on (supplier);-- Test out
insert into supplier_role_mapping values ('SUPPLIER_A_ROLE', 'supplierA');
-- Now when a user with SUPPLIER_A_ROLE role tries to access unstructured data using invoice_info_v, the user will only see files that are related to supplierA.

Try out this simple scenario and let us know what you think! You can use these building blocks to implement a lot of complex scenarios that use data sharing, unstructured data, and data governance.

--

--

Saurin Shah
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Product @ Snowflake. Passionate about building products to help customers derive more value out of their data