Data Obfuscation for BigQuery Row Level Security Lookup Table

Yunus Durmuş
Google Cloud - Community
3 min readJul 17, 2024

BigQuery Row Level Security (RLS) via lookup tables enable you to easily change access rules by modifying data in a lookup table instead of managing many RLS rules with gcloud or Terraform. There’s a caveat though, all the users must have access to this lookup table. So users can see what others have access to. If you think that it is a problem for your company, the solution is to use data obfuscation by using hashing.

Background on RLS

BigQuery Row Level Security (RLS) allows you to give access to individual users or groups to specified rows based on given filters. Below is an example query where we give the individual user Jon, and the group sales-us access to rows where region is US.

Image taken from official documentation.

If you want to add more users to this policy, you may either add the user directly to the rule or add to one of the groups. That’s easy but for each filter, you have to create a new rule. In the above example, if you have 10 regions in your company, you need 10 different rules.

Moreover, many data engineering teams who handle RLS rules, don’t have required privileges or access to systems to add/remove users from groups. So they avoid using groups and change users in the rules almost daily. With such an architecture, you frequently hit IAM rule change limits.

Lookup Tables for Dynamic Rules

With lookup tables you may embed all different combinations into a separate table. Now you only work with this additional table, you don’t have to modify the rules.

-- Example Lookup Table:
+-------------------+--------------+
| email | region |
+-------------------+--------------+
| John@example.com | europe-west1 |
| Alice@example.com | us-west1 |
| Alice@example.com | us-west2 |
+-----------------+--------------+

CREATE OR REPLACE ROW ACCESS POLICY apac_filter
ON project.dataset.my_table
GRANT TO ('domain:example.com')
FILTER USING (region IN (
SELECT
region
FROM
lookup_table
WHERE
email = SESSION_USER()));

In the above example, you give access to users based on their assigned regions in the lookup table. The caveat in this scheme is that all the users should have Data Viewer privilege on the lookup table. So John knows that Alice can read data of us-west1 and us-west2. This may be a problem for some companies. The solution for this is using data obfuscation.

Data Obfuscation for Lookup Tables

If everyone can see the lookup table, then we can at least obfuscate the table data and hence, making it meaningless. For obfuscation, I used hashing. As you see below code block, user_email and region data are stored as hashed. They generate meaningless values that cannot be traced unless you know the HARDCODED_SALT.

-- example data table:
CREATE TABLE project.dataset.secret_data (
region STRING,
secret_data STRING,
secret_data_2 INT64
)

-- LOOKUP TABLE:
-- We use a HARCODED_SALT text to prevent users running SHA256
-- by themselves and reverse engineer the values.
-- This table has values like below which are not meaningful:

-- user_email_salted_hashed, region_email_salted_hash
-- ------------------------ -------------------------
-- 5po6QkYE/8c89.. , 8LDKyIgzEWajS5k...

CREATE TABLE IF NOT EXISTS project.dataset.lookup_rls (
-- TO_BASE64(SHA256(CONCAT(user_email, 'HARDCODED_SALT')))
user_email_salted_hashed STRING,
-- TO_BASE64(SHA256(CONCAT(region, user_email, 'HARDCODED_SALT')))
region_email_salted_hash STRING
);

-- Row Level Security rule for the lookup table
-- Ideally the salt should be different per-user but
-- (i) lookup table is visible to all users so everyone can see the salt
-- (ii) lookup table salt is not available to the filter statement during rule creation
CREATE OR REPLACE ROW ACCESS POLICY hash_user_filter
ON project.dataset.secret_data
GRANT TO ('domain:example.com')
FILTER USING (TO_BASE64(SHA256(CONCAT(region, SESSION_USER(), 'HARCODED_SALT'))) IN (
SELECT
region_email_salted_hash
FROM
lookup_rls
WHERE
user_email_salted_hashed = TO_BASE64(SHA256(CONCAT(SESSION_USER(), 'HARDCODED_SALT')));

The achilles heel in this design is the salt. If a user knows the salt, they can reverse engineer the values by comparing them with all possible email/region combinations. Unfortunately, it is not possible to make it individual per user since (i) lookup table is visible to everyone, and hence, the salt itself is visible (ii) salt from the lookup table is not available to the filter statement. Solution: The data engineer team should protect the salt value by injecting it during a CI/CD pipeline and by not allowing any user to read the rules that are in production even the data engineer team themselves.

With lookup table feature of row level security it is easier to manage the rules. The lookup table has to be visible to all users and they can see each other’s access rights but with obfuscation via hashes, we can hide that information.

--

--