Best Practices for Data Sharing On Analytics Hub

Thakurswati
Google Cloud - Community
5 min readFeb 27, 2023

Analytics Hub is a data exchange platform that enables you to share data and insights at scale across organizational boundaries with a robust security and privacy framework. With Analytics Hub, you can discover and access a data library curated by various data providers.

If you’re familiar with Google Cloud Platform’s data sharing platform, Analytics Hub, you would likely know how effortlessly it allows you to share analytics-ready datasets without having to copy the data.

Analytics Hub Publisher-Subscriber workflow

As a data product owner, we want to ensure that only users who should have access to a specific dataset have permission to use that data.

When sharing data, it is important to remember the key principle of enabling fine-grained access control. But at the same time subscriber must be able to access the data shared with them, without requiring an additional broader role.

In this blog, we will be discussing few of the important considerations that can help with architecting data sharing on Analytics Hub in a better way.

Share Views in Authorised Datasets

Sharing Views directly in a shared dataset from a publisher project might not let your subscriber access view’s data since the Subscriber will not have access to underlying table,view is built on.

Subscriber is not able to query the view shared in shared dataset by publisher

This might not be an accurate approach to provide subscriber with an additional access to parent dataset.

To address the problem of accessing views having underlying tables in different dataset(than shared dataset),Authorise the shared dataset from parent dataset

An authorized dataset lets you authorize all of the views in a specified dataset to access the data in a second dataset. With an authorized dataset, you don’t need to configure individual authorized views

Authorising shared dataset from Parent dataset(test_csv) containing table referred in view in shared dataset
Publisher added Authorisation for Shared dataset

Share External Tables as Biglake Tables

Sharing an external table provided in linked dataset by Publisher, Subscriber might not be able to query/view external table data due to restricted access to underlying GCS bucket object

Subscriber is not able to query the external table provided in shared dataset by publisher

So the recommended approach would be Sharing External Table as Biglake table.This gives access to Subscriber to query external table without any requirement of providing an extra access to Subscriber for GCS bucket object.

BigLake tables lets you to query structured data in external data stores with access delegation. Access delegation decouples access to the BigLake table from access to the underlying data store.

Publisher creating Biglake table to fetch external data source i.e. GCS object
Subscriber accessing Biglake table without requiring access to underlying GCS object(LHS).On the other hand Subscriber is not able to access data in normal external table without access on GCS bucket(RHS)

Column Level Filtering

As a recommended practice, to restrict sharing of sensitive columns with subscriber, you can implement column level filtering on a shared table using Policy Tags.

With Policy tags applied on a source layer table,it allows sharing of a subset of columns & limits the sensitive or not required columns from sharing with subscriber.

BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery column-level access control, you can create policies that check, at query time, whether a user has proper access.

Policy Tags applied on selective columns in a shared dataset table
Subscriber is not able to access any of the policy tag applied columns in linked dataset.

But point to be noted here is,Subscriber will not be able to access any of the policy tag applied columns in linked dataset unless granted Fine Grained Reader / Masked Reader Role on any of tagged columns

Note:Publisher needs to provide Fine Grained Reader / Masked Reader Role individually to subscriber on required(low/medium sensitive data) policy tag applied columns

Row Level Filtering

To limit sharing of a whole set of records with subscriber, a good practice would be implementation of row-level access policies.This ensures filtering out the records not relevant for subscriber.

Row-level security lets you filter data and enables access to specific rows in a table based on qualifying user conditions.The row-level access policies act as filters to hide or display certain rows of data, depending on whether a user or group is in an allowed list.

Note: Row level access policies need to be implicitly created for Subscriber & are not inherited if only created for publisher

Row level Access Policy created for Publisher providing access to only records for state=’New York’
Subscriber is not able to access data in the same table in shared dataset

Row Level access policy created for Subscriber

Row level access policy created implicitly for Subscriber
Only rows having data for ‘New York’ state are accessible by user/subscriber now

In conclusion, when sharing data on Analytics Hub, it is crucial to follow these best practices of sharing data for different types of bigquery objects.

Additionally, it is important to consider factors such as permission & roles settings and sharing views in order to ensure that subscribers are able to access the data they need. By implementing these practices, organizations can ensure that data sharing through Analytics Hub is secure, efficient, and effective!

--

--