Conditionally unmask columns in BigQuery using Authorized UDFs

Patrick Dunn
Oct 29, 2020 · 5 min read

The BigQuery data warehouse has extensive flexibility in how an administrator sets up access control and column level security. However, BigQuery’s column level security, currently only allows for one of two outcomes when a sensitive column is accessed: the query succeeds showing the clear text of the column or the query fails.

Organizations frequently need more flexibility, for example, they often want to truncate a timestamp or show only the last four digits of a phone number. In complex scenarios, an organization my want to share Tink encrypted data while also controlling access to the symmetric key.

In this tutorial I will show you how to embed BigQuery’s AEAD encryption functions into Authorized UDFs, to conditionally manipulate decrypted values without exposing the encryption key to the end user.

This guide assumes you are following the instructions using Cloud Shell and makes use bash scripting to format gcloud and bq commands. All performed operations, except querying BigQuery from a service account, can be performed in the Google Cloud Console.

Creating a secondary user

A second account is needed to simulate a user having conditional data access. This guide will simulate the second user with a service account.

gcloud iam service-accounts create bq-test-acct \
--display-name="bq-test-acct"
#assign the full sevice account name to a variable
SERV_ACCT=$(gcloud iam service-accounts list --format="value(email)" | grep bq-test-acct)
gcloud projects add-iam-policy-binding $DEVSHELL_PROJECT_ID \
--member="serviceAccount:${SERV_ACCT}" \
--role='roles/bigquery.jobUser'

Setting up the test data

In a private dataset, one that is secured from normal users, create a table that maps AEAD keysets to data categories.

ACCT=$(gcloud auth list \
--filter=status:ACTIVE --format="value(account)")
bq mk --dataset --location=US privateSQL=$(cat <<- EOM
CREATE OR REPLACE TABLE private.keys
CLUSTER BY category
AS (
SELECT
'category_a' as category,
KEYS.NEW_KEYSET('AEAD_AES_GCM_256') as keys
UNION ALL
SELECT
'category_b' as category,
KEYS.NEW_KEYSET('AEAD_AES_GCM_256') as keys
)
EOM
)
bq query --use_legacy_sql=false ${SQL}

Now create the private.user_keys table, that maps users to an access level for each category. This script gives the service account user “level 1” access and your current user “level 2” access. The level value will determine how to transform the decrypted value returned to the requesting user.

SQL=$(cat <<- EOM
CREATE OR REPLACE TABLE private.user_keys
CLUSTER BY category, username
AS (
SELECT
'${SERV_ACCT}' as username,
1 as level,
category,
keys
FROM private.keys a
WHERE category = 'category_a'
UNION ALL
SELECT
'${ACCT}' as username,
2 as level,
category,
keys
FROM private.keys a
WHERE category = 'category_a'
)
EOM
)
bq query --use_legacy_sql=false ${SQL}

Using the encryption keys created in the private.keys table, create a table in the public dataset, using the AEAD.encrypt function, that will present encrypted values to end users.

bq mk --dataset --location=US publicSQL=$(cat <<- EOM
DECLARE KEYSTRING BYTES;
SET KEYSTRING = (SELECT keys FROM private.keys a WHERE category='category_a');
CREATE OR REPLACE TABLE public.sample_data
AS (
SELECT
unique_key,
AEAD.encrypt(
KEYSTRING,
CAST(created_date as STRING),
CAST(unique_key as string)
) AS sec_created_date,
status_notes
FROM \`bigquery-public-data.san_francisco.311_service_requests\`
LIMIT 10000
);
EOM
)
bq query --use_legacy_sql=false ${SQL}

To complete the environment setup, grant the service account necessary permissions to access objects in thepublic dataset, but not the private dataset. In this example I am using the jq utility to simplify json parsing required to update dataset permissions on the commandline method.

bq show --format=prettyjson public > dataset.jsonJQE=$(cat <<- EOM
.access +=
[{
"role":"WRITER",
"userByEmail":"${SERV_ACCT}"
}]
EOM
)
cat dataset.json | jq "${JQE}" > grant.jsonbq update --source grant.json public

Creating an Authorized UDF

An authorized UDF is differentiated from a regular UDF in that the object is authorized access to the underlying GCP resource, even if the session user is not.

This UDF checks the current user and applies a transform based on their access level. A user with “level 1” access sees the truncated version of the results while a user with “level 2” access will receive the full value.

Logged into Cloud Shell as your project admin create the function.

SQL=$(cat <<- EOM
CREATE OR REPLACE FUNCTION public.cond_timestamp_unmask(col BYTES, add_data STRING, key_category STRING)
AS ((
SELECT
CASE b.level
WHEN 1
THEN TIMESTAMP_TRUNC(
CAST(aead.decrypt_string(b.keys, col, add_data) as TIMESTAMP),
DAY
)
WHEN 2
THEN CAST(aead.decrypt_string(b.keys, col, add_data) as TIMESTAMP)
ELSE NULL
END AS datevalue
FROM private.user_keys b
WHERE b.category = key_category
AND b.username = session_user()
));
EOM
)
bq query --use_legacy_sql=false ${SQL}

Authorize the UDF by granting the UDF access to the private dataset. In this example I am, again, using the “jq” utility to simplify json parsing required to update dataset permissions to authorize a UDF.

bq show --format=prettyjson private > priv.jsonJQE=$(cat <<- EOM
.access +=
[{ "routine":
{"datasetId":"public",
"projectId":"${DEVSHELL_PROJECT_ID}",
"routineId":"cond_timestamp_unmask"}
}]
EOM
)
cat priv.json | jq "${JQE}" > authorized.jsonbq update --source authorized.json private

Using the service account

In order to keep the instructions simple, this guide downloads a service account key into Cloud Shell. As an alternative you can consider creating a VM with ‘bq-test-acct’ as the VM’s service account.

In either case it is recommended to open a second terminal session within cloud shell to operate as the service account user.

SERV_ACCT=$(gcloud iam service-accounts list --format="value(email)" | grep bq-test-acct)gcloud iam service-accounts keys create key.json \
--iam-account ${SERV_ACCT}
gcloud auth activate-service-account ${SERV_ACCT} \
--key-file="key.json"
#should show the service account
bq query --use_legacy_sql=false 'select session_user()'

Confirm access to the table with encryped data while access is denied to the table with the key material.

#Query should be successful
bq query --use_legacy_sql=false \
'select * from public.sample_data limit 1'
#Query should return denied
bq query --use_legacy_sql=false \
'select * from private.keys limit 1'

Testing the UDF

Issue a query using the conditional function as both your normal user account and the as the service account. The normal user account will see a full resolution timestamp while the service account will see a timestamp truncated to the day.

SQL=$(cat <<- EOM
SELECT
unique_key,
session_user() as user,
public.cond_timestamp_unmask(
sec_created_date,
cast(unique_key as string),
'category_a'
) as unmask
FROM public.sample_data
LIMIT 10;
EOM
)
bq query --use_legacy_sql=false ${SQL}

Results for the service account.

Truncated timestamp available to a “level 1” user

Results for the user account.

“Level 2” user account showing the full timestamp

Performance impact

When working with both encryption operations and UDFs, query cost is a frequent concern. Looking at the execution details in the Cloud Console; this UDF is efficiently implemented using a hash join.

Query plan executed with a hash join

Additional resources

  • A howto on synchronizing Google Cloud IAM group information into BigQuery
  • A guide on importing Tink encrypted data into BigQuery

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Patrick Dunn

Written by

Solutions Architect @ Google Cloud

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.