Column Level Security in BigQuery

Leo Zhang
6 min readOct 21, 2020

--

Exploring GCP Data Catalog Policy Tags

This article explains the GCP feature: Data Catalog Policy Tags, and how it can help manage access control to sensitive columns in BigQuery tables based on data classes (email, financials, etc.).

Creating Policy Tags

BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery Column-level security, you can create policies that check, at query time, whether a user has proper access.

First go to ‘Data Catalog’ section, click ‘Create and manage policy tags’.

Next, we need to create a Taxonomy which is essentially a logical group of tags, click ‘+CREATE

Please note the naming convention requirement for the Taxonomy Name.

Please also note that the Taxonomy (and all tags within it) needs to be specifically stored at a location (as of Oct 2020, only these 3 Locations are supported: US, EU and Seoul).

The Taxonomy and tags can only be applied to BigQuery tables residing in same Location. In this case, we will create the Taxonomy and tags in US location.

Next, we can create all the tags within this Taxonomy. The hierarchy of the tags could be very flexible, in the following example we are showing a tags structure like this:

Let’s have a look at a more realistic scenario of the tags Taxonomy, for data classifications:

Click ‘Save’ to have the Taxonomy and tags created.

Applying tags to BigQuery data

Let’s move the BigQuery and start tagging the data!

A table has been created with typical PII data like names, addresses, DOB etc.

Have a look at this table, please note that it needs to reside in the same Location ‘US’ where the Taxonomy and all policy tags were created.

A quick query reveals that there is one record

Next we can start tagging these fields, click ‘Edit Schema’. Check one or more fields then we will be able to click ‘Add policy tag’. Please note that 1 field can only have 1 policy tag associated.

In next page, we can select a tag to be applied to the selected field. Again you will only be able to see the Taxonomy and its tags within same Location.

After saving, we have successfully tagged one field.

Continue until you have tagged all fields which need to be restricted. Please note that different tags are used based on the nature of the data, this can help separating the access based on roles later on.

Now let’s try to make a query to the table again, what happened? We still can see the data!

It turns out that we haven’t enabled the Taxonomy and tags yet, let’s go back to Data Catalog page and find the Taxonomy we have just created. Toggle ‘Enforce access control’ to be ON.

Go back to BigQuery page and check the table schema, we can see following notification.

In the Preview tab, we can only see the column and its content that is not restricted.

After waiting for around 30 minutes (so the Taxomony and tags can take effect), if we now try to make a query to the table, we will see following error message indicating that my login does not have the permission to see the data which has been tagged.

Granting Access

In order to access the BigQuery data that has been tagged, a user needs to be granted ‘Fine-Grained Reader’ permission on the Policy Tag.

Go to Data Catalog, then enter the Taxonomy would like to grant permission to users.

Once selected the Policy Tag, we could add Member to role ‘Data Catalog’->’Fine-grained Reader’. So now the my user account ‘Leo.Zhang’ can access the BigQuery data which was tagged with ‘C4’!

Let’s test it in BigQuery:

The error message has changed, this is because I am trying to query all the columns though I am only entitled to see the columns that are tagged with ‘C4’.

If we preview the table and we can see that BigQuery automatically hide the columns PostCode and Country that I am not entitled to see.

By explicitly excluding the columns PostCode and Country, now I can query the table successfully!

’Fine-grained Reader’ permission can be applied at different levels: Taxonomy, Policy Tags and any Child Policy Tags. This capability provides flexibility if we want to archive more granularity of the access control. E.g. Data Governance Admin could be granted at Taxonomy level while Data Analyst can be granted access for data classification C1 and C2 but not C3 or C4.

Test in monitor mode

This is one of the best practice introduced by Google: https://cloud.google.com/bigquery/docs/best-practices-policy-tags#monitor_mode

Before enforcing access policies for your organization, you can run in monitor only mode. Monitor only mode is where you are not yet enforcing access control but you are auditing the effects of your policy tags.

This best practice assumes:

You already have a set of users authorized to access your data.

You want to find out if enforcement of new column-level security changes would unexpectedly prevent those users from accessing data.

To use monitor only mode, create a taxonomy and policy tags, assign the policy tags to columns, but do not yet enforce access control. Then, have your previously authorized users continue to use the system. As they use the system, an audit trail is generated. You can scan the audit logs to view accesses to columns protected by policy tags. Determine if any of the accesses were unexpected. That is, whether the access would have resulted in a PERMISSION_DENIED error, had the policy been enforced. After you are satisfied that the column-level security is properly set up, enforce access control. Note that you won’t see PERMISSION_DENIED errors unless access control is enforced.

--

--