GCP — BigQuery — Data Security at rest (Part 2)

Murli Krishnan
Google Cloud - Community
6 min readNov 15, 2022

So far in the part 1 of the blog, we focussed on the static security controls like IAM access control, Bigquery Dataset/Table Encryption, Facade Layer and Authorized views. In this blog we will be continuing the discussion with the dynamic nature of security controls.

This blog is a part of 5 part series on “Bigquery — Data Security at rest”.
Visit here for part 1 of blog series
Visit here for the series menu

Bigquery Security Controls.

Column Level Security

Why is column level security important ?

There are different regulation requirements depending on the country and industry that the organizations need to follow to ensure the data compliance and security standards like GDPR, HIPAA, COPPA etc.
In most organizations, there are different personas who access the data and have different security clearances.
Not all the groups have similar permissions for eg: support staff might not have access on sensitive columns but do require access to table data for support purposes.
Data Analysts might not be needing access on the PII data but might need co-relation between the data for analysis and segmentation.
Regulatory reporting for organisations will be needing access on the PII data.

Due to this dynamic nature of security requirement, Bigquery provides multiple means to achieve them.

Let’s start with policy tag templates.

Column Level Security — Policy Tag Templates

The typical flow of enforcing column level protection using Policy Tag Templates is as below

Column Level Security — Policy Tags

Typically as part of data governance in any organisation, data stewards are responsible for managing the data lifecycle, defining the business rules for data, data quality rules and defining the policies and standards for data.

Based on the outcome, data steward defines the taxonomy of policy tags (data classes like PII, Finance data..)

An example of one such defined taxonomy is as below

Policy Tag Taxonomy defined in Data Catalog

The top level data classes are severity-high and severity-medium which defines the criticality of data elements.

The next level defines the list of attributes classified as either severity-high like first-name and ssn or severity-medium like married attribute.

Policy tags can be hierarchical in nature and role is checked from bottom to top and first match in the hierarchy is enforced.

Access Resolution based on direct access and hierarchy

Now the policy tags are assigned on the table columns as

Table with Policy Tags enforced

Now the user who has previous access on the table tries to query the table faces the error as below

Fine Grained Reader Permission Missing.
Table indicating policy tag enforced columns

The user can still query the columns which are not protected by policy tags.

Other column data not protected by policy tags.

Lets try to provide access at severity-high level with fine grained reader role to the principal.

Now the user is able to access both the first-name and ssn attribute (bottom — up traversal and first match in hierarchy).

First Name and SSN protected columns are accessible (fictitious data)

The policy tag templates creation can be managed by the use of terraform templates
Policy tags can be assigned from dataplex console as well (data catalog is part of dataplex offering)

Advantages
1.
The role based access mechanism introduced by policy tags avoids duplication of table content and processes to provision the required data to users as per the role.
2. The fine grained reader role allows fine grained access on columnar level on table.
3. The management of policy tags is well integrated with data catalog for management.
4. Any views created on the base table with policy tags gets the tags inherited on the view columns as well.

Considerations
1. There should be a clear deliberation on the taxonomies and policy tags formulation to avoid frequent changes.
2. Proper testing of accesses should be done in lower environment.
3. Only one policy tag can be assigned per column
4. Creating another table from the query results does not propagate the same policy enforcement except copy jobs in bigquery.
5. To identify the list of table columns that are secured using policy tags, program needs to be written leveraging the bigquery apis to list the columns and associated policy tags.
6. Policy tag templates can have 5 levels of hierarchy

Policy Tags supports dynamic data masking as data masking rule on the policy tags.

Dynamic Masking Rule supports 3 kind of rules via data policies on policy tags (Default, Nullify, Hash)
The priority being Hash, Default and Nullify rule.

If any policy tag is associated with masking rule, then it follows below steps
1. If the user resolves to Masked Data Reader on policy hierarchy, then the user can query the column but with masked data
2. If the user resolves to Fine Grained Reader on policy hierarchy, then the user can query the actual data
3. If the user has both Fine Grained Reader and Masked Data Reader, then from bottom to top hierarchy, the first match will be enforced.
4. If none of the roles are resolved, then action is denied for the principal

Understand the considerations below mentioned to identify compatibility of the bigquery features with dynamic data masking.

Let’s check with one example where the severity-medium policy tag is assigned a data policy with default masking rule and access is provided to the principal.

Now we already know the married column is tagged with married policy tag (child of severity-medium)

Table indicating policy tag enforced columns

So now when the user tries to access the column married, the user is able to access the married column but the values are defaulted to empty string.

Married column results

Advantages
1.
The hashing rule allows consistent hashing of data typically used to co-relate similar data via joins (care on brute force attack is needed)
2. Dynamic data masking allows to manage the access to the data in an easier way

Considerations
1. Check the compatibility of policy tags enforcement with different bigquery features
2. Check the limitations and restrictions of the policy tag templates with masking rules.

Moving Policy Tags

One of the most important feature to understand is, it is possible to move the policy tags across hierarchy in the same taxonomy with the patch method as below without changing any policy assignment.

curl -X PATCH -H "Content-Type: application/json" 
-H "Authorization: Bearer $(gcloud auth print-access-token)"
-d @request.json
--verbose
https://datacatalog.googleapis.com/v1beta1/projects/<project>/locations/us/taxonomies/2823830268236482597/policyTags/4631524100804902143

@request.json
{
"displayName" : "social-security-number",
"parentPolicyTag": "projects/<project>/locations/us/taxonomies/2823830268236482597/policyTags/1563760639735907832"
}

The best practice for implementation is to have attribute grouping as the top level hierarchy and followed by the individual attribute tags.

Any common default data policies that needs to be assigned to multiple groups, assign it at a higher level of hierarchy so that it is inherited by all policy tags below in hierarchy.

Assigning default permissions to multiple groups.

The fine grained reader role and masked reader role should be provided on the required policy tags and masking rules not on entire project

We will be continuing with the encryption portion of dynamic security controls in the part 3 of the series

Please connect with me on https://www.linkedin.com/in/murli-krishnan-a1319842/ for any queries.

Happy Learning.

--

--