Guarding BigQuery: Enhancing Data Security with VPC Service Control

Yusuke Enami(Kishishita)
8 min readOct 15, 2023

--

Introduction

You’ve probably come across the saying, “data is the new oil.” It’s an apt description. While data does hold immense potential, much like crude oil, its real value is unlocked only when refined and processed. This is where BigQuery shines. It’s a state-of-the-art analytics tool, designed for quick and efficient data processing on a vast scale.

And also, protecting the data is quite important. Recently, there are a lot of news related with breaching the data.

However, with the rise in data’s importance, safeguarding it has become paramount. We’re constantly hearing about data breaches in the news.

It’s akin to realizing the value of a treasure and then leaving it unguarded. So, it’s not just about harnessing the power of data, but also about ensuring its security.

This article will delve into the world of VPC Service Control and how it serves as a robust shield for your data in BigQuery, striking a balance between accessibility and security.

Check out my BigQuery post:

https://medium.com/@bigface00/implementing-tag-based-access-control-in-bigquery-6ec09bc31ae

What is the VPC Service Control?

VPC Service Control acts as a formidable barrier, enabling the formation of a security perimeter around your Google Cloud resources. While we are focusing on BigQuery here, it’s crucial to note that VPC Service Control extends its support to several other services, detailed in the provided document:

This tool is designed to guard against both intentional and unintentional actions by internal or external entities, thus significantly reducing the risk of unwarranted data exfiltration from Google Cloud services like Cloud Storage and BigQuery.

Important Note: VPC Service Control is geared to regulate the usage of a service’s API rather than the resource itself. This implies that it exclusively controls calls to APIs like “bigquery.googleapis.com” or “storage.googleapis.com”. Accessing via the Google Cloud console also invokes these APIs and is hence regulated by VPC Service Control. While VPC Service Control regulates API calls, it’s crucial to understand that it’s not a replacement for or equivalent to a traditional firewall.

Sample Service Perimeter Image

In the illustrated example, a service perimeter is meticulously established to safeguard BigQuery and Cloud Storage. This perimeter meticulously scrutinizes and permits authorized IPs to utilize BigQuery’s Egress and Ingress while restricting Cloud Storage’s Egress. Consequently, unauthorized users and project’s VMs are barred from accessing the protected BigQuery inside the service perimeter. Moreover, VMs of authorized IPs cannot access unauthorized project’s Cloud Storage.

Acknowledging that the details behind this mechanism might seem intricate, we will delve deeper into VPC Service Control with a hands-on demo to enhance understanding.

Aligning VPC Service Control and BigQuery

Let’s navigate through the nitty-gritty of integrating VPC Service Control with BigQuery.

Establishing an Organization

The initiation of VPC Service Control necessitates the establishment of an organization in your project, using Cloud Identity, and for that, you’ll require your own domain. For those who have a domain, feel free to utilize it. If you’re in the starting blocks without a domain, acquiring one from a domain selling service, like SQUARESPACE, is your first step.

If you’re charting unfamiliar territory in creating an organization, the article below can be your guide through the process:

Create the Access Policy

Consider the Access Policy as a container that will host all your Access Context Manager resources — encompassing Access Level, Service Perimeter, and more. Let’s fashion an Access Policy at the organizational level.

# access_context_manager_access_policy.tf

resource "google_access_context_manager_access_policy" "access_policy" {
parent = "organizations/1234567890123"
title = "KSST access policy"
}

In this configuration, parent signifies your organization ID, while title is the designated name of the Access Policy.

Allow Authorized ID

In general, you can see the dataset and table in BigQuery.

BigQuery Console

We’ll create the service perimeter to allow ingress of BigQuery from authorized user.

Architecture of the Service Perimeter and Access Level

Setting Up the Access Level

In establishing an Access Level, we define specific access conditions for utilizing resources. Conditions can involve identifiers like user and service accounts, IP addresses, and they can be logically manipulated with operators such as AND and OR.

Here’s a Terraform snippet to give you a glimpse of how it’s set up:

# access_context_manager_access_level.tf

resource "google_access_context_manager_access_level" "test_access_level" {
parent = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}"
name = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}/accessLevels/test_access_level"
title = "Test Access Level"

basic {
conditions {
# Only members are granted access.
members = [
"user:xxx@yyy.com"
]
}
}
}

Setting Up the Service Perimeter

To shield BigQuery from external access, we’ll establish a Service Perimeter. Let’s walk through the process:

# access_context_manager_service_perimeter.tf

resource "google_access_context_manager_service_perimeter" "test_perimeter" {
parent = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}"
name = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}/servicePerimeters/my_project_perimeter"
title = "My Project Perimeter"
status {
# Enumerate the services to be protected by VPC SC.
restricted_services = [
"bigquery.googleapis.com",
]
# Specify the projects within the perimeter.
resources = [
"projects/1234567890123", # Replace with your project ID.
]
# Identify which Access Levels are allowed access to the restricted services.
access_levels = [
google_access_context_manager_access_level.test_access_level.name
]
}
}

I’ve added annotations to clarify each parameter in the Terraform code. Upon applying the aforementioned code, verification of the Access Policy within the Google Cloud console is possible. Navigate through “Organization” -> “Security” -> “VPC Service Control”.

Access Policy

Initially, when an authorized user attempts to access the dataset from outside the designated perimeter, the system will deny the request.

Access Declined from Outside of the Perimeter

Wondering why? The VPC Service Control Troubleshooter can shed light on the situation.

Enter the vpcServiceControlsUniqueIdentifier into the Troubleshooter, and you’ll receive feedback similar to what’s shown below:

The violation reason is clear: “Resources not in same service perimeter”. As we attempted to access the dataset from outside its perimeter, the request was denied. If you ever encounter issues with VPC Service Control, this tool can be immensely helpful.

But rest assured, authorized users can still view the dataset without a hitch.

Authorized User Access

Next, let’s see what happens when an unauthorized user tries to gain access.

Unauthorized User Access

As expected, the unauthorized user doesn’t just face restrictions in viewing the dataset; even querying is off the table. Instead of the anticipated results, a “VPC Service Controls: …” is prominently displayed.

Curious about what happens with command-line attempts? We put it to the test. When an unauthorized user tries using the bq command in the terminal, here's the outcome:

$ bq query --use_legacy_sql=false --project_id <YOUR_PROJECT_ID>  'select * from `<YOUR_PROJECT_ID>.dataset.test_table`'

BigQuery error in query operation: VPC Service Controls: Request is prohibited by organization's policy.
vpcServiceControlsUniqueIdentifier: ***.

No surprises here — the command-line attempt by the unauthorized user also hits a roadblock.

Incorporating IP Address Restrictions at the Access Level

Adding another layer to access control, IP address restrictions can be utilized to further secure resource accessibility.

# access_context_manager_access_level.tf

resource "google_access_context_manager_access_level" "test_access_level" {
parent = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}"
name = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}/accessLevels/test_access_level"
title = "Test Access Level"

basic {
# The default combining function is "AND"
combining_function = "AND"
conditions {
ip_subnetworks = [
"x.x.x.x/32" # Please specify your IP address
]
}
conditions {
members = [
"user:xxx@yyy.com"
]
}
}
}

With this configuration, accounts permitted in the prior condition are able to access the dataset, but only when using the specified IP address.

Add IP address’s condition

Finely Tuning Access with Specific Network Directions and API Operations

The Service Perimeter can setup more specific configuration. Using ingress_policy or egress_policy, you can specify the network direction like ingress and egress, and specify API’s operation.

Service Perimeters offer granular control with the ability to configure specific network directions (ingress or egress) and API operations using ingress_policy or egress_policy.

# access_context_manager_service_perimeter.tf

resource "google_access_context_manager_service_perimeter" "test_perimeter" {
parent = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}"
name = "accessPolicies/${google_access_context_manager_access_policy.access_policy.name}/servicePerimeters/my_project_perimeter"
title = "My Project Perimeter"
status {
# Enumerate the services to be protected by VPC SC.
restricted_services = [
"bigquery.googleapis.com",
]
# Specify the projects within the perimeter.
resources = [
"projects/1234567890123", # Replace with your project ID.
]
# Identify which Access Levels are allowed access to the restricted services.
access_levels = [
google_access_context_manager_access_level.test_access_level.name
]
ingress_policies {
ingress_from {
# Identify which entities are allowed access to the restricted services.
# Valid entities are user:***, serviceAccount:***, or group:***.
identities = [
"user:test@test.com"
]
# Define the sources that the IngressPolicy permits access from.
# "*" denotes all sources can be allowed.
# Or you can choose specific access_level from access_levels block.
sources {
access_level = "*"
}
}
ingress_to {
# Define the resources that IngressPolicy permits access to.
resources = ["*"]
# Define permitted operations.
operations {
service_name = "bigquery.googleapis.com"
# You can specify method or permission in this block.
method_selectors {
# If you denotes "*", all permissions/methods of service_name are allowed.
permission = "bigquery.datasets.list"
# method = "DatasetService.ListDatasets"
}
}
}
}
}
}

With the above ingress configuration, user:test@test.com is enabled to view a list of datasets. Note that the user isn't required to utilize the IP address specified in the Access Level for this operation.

Caution is advised as some permissions are exclusive to certain directions; meticulous configuration setup is crucial.

To find methods for the method block within method_selectors, refer to the documentation:

For identifying permissions for the permission block, check the documentation here:

In Conclusion

Throughout this article, detailed steps have been articulated for utilizing VPC Service Controls through Terraform. VPC Service Controls empower you to exert granular control over data access, thereby safeguarding your project’s security. Employing this tool is worthy of consideration in your security strategy!

--

--

Yusuke Enami(Kishishita)

DevOps engineer in Japanese company. I love Google Cloud/Kubernetes/Machine Learning/Raspberry Pi and Workout🏋️‍♂️ https://bigface0202.github.io/portfolio/