How To Control Access To BigQuery At Row Level With Groups

Cristiano Breuel
Feb 19, 2019 · 6 min read

Introduction

BigQuery is a serverless, highly scalable, data warehouse system that provides enormous flexibility for varied use cases. BigQuery has a security model based on capability, which lets administrators control access to datasets by roles, groups and individual users.

In some use cases, however, it may be necessary to control access to data at a more granular level than the dataset. For example, different users might need access to different subsets of the rows in a table. For that situation, BigQuery has the functionality, which lets you grant a view access to a dataset, instead of an IAM entity like a group. That view can then implement any necessary filters and be shared with end users, who will then only be able to view the filtered data.

A common (and recommended) way to set up access control in GCP is by adding users to groups, and granting permissions to those groups. Often, organizations will already do this for other resources and keep group membership updated in their G Suite Admin or corporate LDAP system. This article explains how to integrate this group membership data with Authorized Views, via Google Groups and the G Suite Admin API (or the Cloud Identity API), in order to create a row level access control scheme based on groups.

Architecture

In order to synchronize group membership, we’ll use both the (or the ) and the . A Google Cloud Function will be scheduled to perform the periodic synchronization, populating a BigQuery table with all group memberships. An Authorized View is then created to join group membership with the business data tables. The following diagram shows how it works:

Image for post
Image for post

Prerequisites

You will need a GCP project with the following APIs enabled:

You will also need a G Suite or Cloud Identity organization, and have administrative access to both.

Setup

In order to automate the process and get access to the admin APIs, the first thing you need to do is to create a Service Account with permission to impersonate a G Suite administrator user, which in GCP parlance is called “Google Apps Domain-wide Delegation”.

Start by creating a service account:

  1. Open the . If prompted, select a project.
  2. Click Create service account.
  3. In the Create service account window, type a name for the service account, and select Enable Google Apps Domain-wide Delegation. Then click Save.

Then, you need to authorize this service account to access your G Suite/Cloud Identity domain, by following these steps:

  1. Go to your G Suite domain’s .
  2. Select Security from the list of controls. If you don’t see Security listed, select More controls from the gray bar at the bottom of the page, then select Security from the list of controls.
  3. Select Advanced settings from the list of options.
  4. Select Manage API client access in the Authentication section.
  5. In the Client name field enter the service account’s Client ID.
  6. In the One or More API Scopes field enter one of these scopes: for the G Suite Admin Directory API (Google Groups) or for the Cloud Identity API
  7. Click the Authorize button.

This is where we start to use the command line, so let’s define some variables that we’ll need:

export PROJECT_ID= # your project idexport REGION= # your region, e.g. us-central1export SERVICE_ACCOUNT_USERNAME= # service account you created aboveexport SERVICE_ACCOUNT=$SERVICE_ACCOUNT_USERNAME@$PROJECT_ID.iam.gserviceaccount.com

The service account also needs permission to use BigQuery and issue authentication tokens. You can grant these roles with the following commands:

gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT \ --role roles/bigquery.usergcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT \ --role roles/iam.serviceAccountTokenCreator

Now we get to the . First, check out the code:

git clone git@github.com:GoogleCloudPlatform/professional-services.git

Next, open main.py with a text editor and update the values of the following variables to reflect your environment:

DOMAIN = # Your G Suite/Cloud Identity domain, e.g. 'cbcloudtest.com'ADMIN_EMAIL = # The email of a domain administrator, e.g. 'admin@cbcloudtest.com'DATASET = # The name of a dataset that will be created to hold the user mapping table, e.g. 'bq_iam'GROUPS_USERS_TABLE_NAME = # Name to give the user mapping table, e.g. 'groups_users'

The code is prepared to use the G Suite API by default. If you’re using Cloud Identity, there are a few lines of code you need to change in the script group_sync.py. Open the file and look for comments that start with “For Cloud Identity”, which indicates lines that need to be changed.

The next step is deploying the code as a Cloud Function, by running the following:

gcloud beta functions deploy sync_groups \ --project $PROJECT_ID \ --runtime python37 \ --trigger-http \ --timeout 300 \ --service-account $SERVICE_ACCOUNT

At this point, you can manually run the function with gcloud functions run sync_groups. After if finishes, you should be able to see the new dataset and table populated with the data in BigQuery:

Image for post
Image for post

When you’re satisfied with the results, you can schedule the function to synchronize periodically. You can run the following command to schedule it, adapting the — schedule parameter to your needs according to the :

gcloud beta scheduler jobs create http group_sync \ --project $PROJECT_ID \ --schedule=”11 * * * *” \ --uri=https://$REGION-$PROJECT_ID.cloudfunctions.net/group_sync \ --description=”Synchronizes group membership between G Suite and BigQuery”

Creating the filtered views

Now that you have your group membership data in BigQuery, you can start joining it with your own business tables to filter it appropriately for each user.

Let’s say you have the following table with customer data, and you want to limit the data a certain analyst can access to their own geographic region:

Image for post
Image for post

Furthermore, assume your group membership table looks like this:

Image for post
Image for post

In order to apply the appropriate filter for each user, you will need to use the function. This function always returns the email of the current logged in Google account.

You could with the following query (in Standard SQL):

SELECT c.customer_name, c.customer_idFROM private.customers cJOIN user_groups g  ON SESSION_USER() = g.user_idWHERE c.country IN (  CASE    WHEN g.group='analysts_br' THEN 'br'    WHEN g.group='analysts_ca' THEN 'ca'    WHEN g.group='analysts_us' THEN 'us'  END)

You can then grant users access to this view, instead of the original table, and they will only be able to see the data they are authorized to.

Limitations

There are a few aspects that are important to keep in mind when using this technique.

First, it only works for native BigQuery tables, with their data in BigQuery storage. For federated sources, i.e. those that pull data from GCS or other databases, the user executing the query needs to have access to the underlying resource, which would defeat the purpose in this case.

Another important limitation is that, since the SESSION_USER function is non-deterministic, BigQuery will not cache results for queries on these views, which could have an impact in costs.

Related work

Also check out the article, which lays out a comprehensive set of techniques to control access to BigQuery data, given a mapping of user permissions.


Special thanks to “Woogie” Wolgemuth for contributing the authentication code, Jake Ferriero for reviewing both the code and the article, and Edu Marreto for article input.

Google Cloud - Community

Google Cloud community articles and blogs

Thanks to Felipe Hoffa

Cristiano Breuel

Written by

Machine Learning Engineer

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Cristiano Breuel

Written by

Machine Learning Engineer

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store