GCP — BigQuery — Data Security at rest (Part 1)

Murli Krishnan
Google Cloud - Community
5 min readNov 15, 2022

Often times it has been seen organisations have been penalised great deal due to either loss of data or data theft or data leakage happening due to loosened security controls. Such leakages not only impacts the organisation financially but puts the customer’s data at risk which can be easily misused.

This blog is a part of 5 part series on “Bigquery — Data Security at rest”.
This blog is the first part in the series.
Visit here for the series menu

With the advent of cloud technologies, data security is a paramount driver which determines the adoption of cloud technologies.

While security is a very wide topic which ranges from hardware security, network security, software patching, access controls, data at rest and data in transit, this blog will focus on security implementations available to secure the data at rest in Big Query.

Overview

Below image shows a collection of options which can be used individually and together depending on the requirement. I have classified these security controls as static and dynamic owing to the nature of the security each provides.

High Level Security Controls

Client Side Encryption will not be covered as part of this series as its a wide topic on its own.

Access Controls (IAM)

IAM can be considered as the first entry point to provide controlled access to resources.
It is very important to understand the different roles that can have interaction with Bigquery.
Below is the list of roles (not exhaustive) which has permissions to interact with Big Query.

Roles with Bigquery permissions

There are roles with services like Data Catalog, DLP and Dataplex which can provide a direct access to Big Query Data.
Apart from these roles, there are Security Admin and Reviewer, Recommender and Big Query Migration Roles that has foot print of Big Query permissions.

Big Query roles can be granted at different granularity eg: Project, Dataset, Table level as well. Do review it thoroughly following principle of least privilege.

Assess the need of providing any admin roles properly to service accounts.

Understand the permissions granted in each of the roles and follow the principle of least privilege to provide access

Do not grant individual permissions to principals, create a custom role if needed and assign to group.

Important NoteService Agent Roles do provide access to Big Query but should not be granted to principals, they are meant for service usage. They have powerful permissions and can change without notice.

Tables/Views with limited information

One of the simplest ways of ensuring limited exposure is to create second layer of tables that has limited information.

Facade layer — Table with limited information

In this type of setup, the sensitive columns from Raw Layer is not exposed to the user.
The user only gets access to the required derived table which has limited information required for the use case.

Lets say for a raw table — customers_raw with the below information

Schema for Customers Raw Table

Lets create another table that exposes only limited information as below

customer_table_with_limited_info

The above table has first_name column hashed using SHA1 function.
The columns SSN and married are not exposed for consumption.

Advantages
1.
This pattern has its advantages as the parent table information or any relation to the lineage is not revealed as this is altogether different materialised table with its own storage.
2. This pattern works when the use case of data consumption is clearly known
3. For repeated query patterns, the query hitting the derived table can be cached and served, further it can have its own partitioning and clustering schemes

Considerations
1. If the patterns are not known before hand, then this might easily lead into creation of lots of fragmented pieces of information.
2. This requires a setup of stored procedure or scheduled query to do a periodic refresh
3. The derived table created requires its own storage

Authorized Datasets/Views

Why authorized views ?
With normal views, the permissions are required to be provided at the base table as well since views in turn query the base table.

Authorized views can be considered as views with explicit authorization to access the base table data.

Authorized View
Parent Dataset — Sharing Option
Parent Dataset authorising the child view

Authorized Datasets is a scenario where a group of views are clubbed together in a dataset and the entire dataset is authorized by the parent dataset.

Authorized Datasets

Authorized Datasets are authorized for new views that will be created in future.

Advantages
1.
There is no need to provide explicit access on the base table
2. There is no need for any additional storage to be provided as the views are logical in nature.

Considerations
1.
The authorized views are views which are executed on every query and relies on the base table for any partitioning or clustering setup
2. The view is currently authorized at a dataset level not an individual table or view level.

Data Encryption

By default all the data in Bigquery is encrypted at rest using AES encryption (Advanced Encryption Standard).

As added security mechanism, Bigquery support CMEK (customer managed encryption keys) for the entire dataset. The encryption keys are managed by Cloud KMS.

For dataset/table encrypted by CMEK, the bigquery service account requires additional role on the cloud KMS key — Encrypter/Decrypter

Advantages
1. Provides an added layer of security with encryption
2. Suitable for security compliance requirements

Considerations
1. The key rotation needs to be handled manually from updating the keys to their latest version.
2. The switching between default and CMEK is not possible without re-creating the table with different encryption option
3. The batch queries are required to mention the destination_kms_key option for the queries to work properly.
4. Datasets with default customer key encryption mentioned cannot have tables without customer key encryption.

We will be taking the dynamic security controls in the part 2 of the series.

Please connect with me on https://www.linkedin.com/in/murli-krishnan-a1319842/ for any queries.

Happy Learning.

--

--