Google Cloud VPC-Service Controls: Lessons Learned

Published in

Google Cloud - Community

8 min readJan 24, 2022

Being part of the Google Cloud Professional Services Organization gives me the opportunity of working with some of the largest customers in my region to support them in the design and implementation of very interesting infrastructures in real-world scenarios.

I decided to write this article to share with you the important lessons I learned while applying the VPC-Service Controls (a.k.a. VPC-SC) to one of our largest customers’ complex network infrastructure.

Brief overview of customer’s network infrastructure

Let’s first take a look at a very high level overview of the customer’s projects and infrastructure.

Partial view of customer’s network on GCP

Starting from the left side we find the Cloud Interconnect that connects the on-prem systems with the Landing (untrusted zone) VPC in Google Cloud.

The network traffic then flows through some IPS appliances which can inspect the packets and send them to the transit VPC of the correct security zone. Those security zones have the same separation on-prem, so the customer sees Google Cloud as an extension of their on-prem.

The transit VPCs are then connected, via HA-VPN, to some spoke projects which share their VPC with the applications’ service projects.

Applying VPC-Service Controls

Having this infrastructure already in place, the goal of my customer was to protect the different Google services (accessible via Google APIs) of their many GCP projects from the internet and, another important requirement, from different security zones (apart for some exceptional cases). This means, for example, that the critical security zone on-prem must be able to access resources in the critical security zone on GCP, but other security zones shouldn’t access those resources.

This goal can be achieved by using Google Cloud VPC-Service Controls to create secure perimeters around those projects that need to be protected.

Once a security perimeter is in place and enforced it will protect the services inside the perimeter from access outside of the perimeter and the other way around (workloads inside the perimeter won’t be able to access those protected services in projects outside of the perimeter).

Here’s how the VPC-SC perimeters were configured around the customer’s projects to protect the different security zones and the network infrastructure:

3 perimeter for the different security zones
1 perimeter for the landing zone
1 perimeter for the transit zone

VPC-SC Bridges will be created to let communication between different perimeters as needed.
Private Service Connect endpoints will be configured to let the communication between on-prem security zone X with Google Cloud security zone X (more info on Private Service Connect endpoints later in the article).

VPC-SC perimeters around customer’s projects

VPC-Service Controls: Lessons Learned

After this short introduction of the customer’s network architecture and goals I would like to share with you a list of lessons I learned while designing and implementing the solution:

Keep it simple (when possible)

If your goal is to protect the Google APIs and services only from the Internet then you should consider having a single VPC-SC perimeter containing all of your projects or, in case you want to protect different security zones or environments, have perimeters around those projects (or around the dedicated Shared VPC).

Having too many perimeters might allow you to have more control over the single requests among different projects but it will requires more attention and maintenance for all of those exceptional cross-perimeter communications (and trust me, it will be the case).

Leverage dry-run mode before enforcing the perimeters

A VPC-SC perimeter can be created in “dry-run” mode before being enforced. This feature will log the violations for the perimeter in Cloud Logging, but it will not effectively block those API request.

This approach gives you the possibility of analyzing those violations and decide how to handle them before enforcing the perimeter. A colleague of mine wrote a great article about that.

Carefully analyze the violations reported in Cloud Logging

To be sure you’re not going to disrupt anything when you enforce the VPC-SC perimeters you should let it run in “dry run” mode for some weeks (I’d say at least one month). This because there could be some scheduled processes that just run once a week or once a month.

Carefully analyze the violations you get in the logs, make a distinction between those traffic flows that should be blocked from those exceptions that should somehow be allowed.

The logs can be extracted into a BigQuery table to be better queried and analyzed. The article I mentioned before describes how to do that.

The most useful fields to check in the logs to better understand a violation are:

timestamp (the date and time when violation happened)
protopayload_auditlog.requestMetadata.callerIp (client IP address)
protopayload_auditlog.methodName (API and method invoked)
protopayload_auditlog.authenticationInfo.principalEmail (user email or service account who invoked the API)
protopayload_auditlog.metadataJson (this field contains interesting information in JSON format, e.g.: name of violated perimeter, direction (ingress/egress), source project, target project, unique id)

Once the perimeters get enforced the access to those protected resources will be blocked for API calls coming from outside of the perimeter (or even via Cloud Console). When this happens an Unique Identifier will be displayed to the user. This ID can be looked up in the logs or in BigQuery to see all of the details about the violation.

Include VPC-SC in the design of your infrastructure from the beginning

Dry-run mode and violation logs are great tools to understand and troubleshoot cross-perimeters communications.

An even better way would be taking care of the design of the VPC-SC perimeters since the beginning, together with the rest of the infrastructure.

In my experience, this approach is much easier to introduce and maintain than retrofitting the perimeters into an existing and complex infrastructure and network design.

Cloud Shell

If you’re using Cloud Shell for interacting with the APIs and resources in your projects, you should plan for an alternative.

Cloud Shells are virtual machines that Google provides to Google Cloud users and they’re a very handy tool in many situations.

Those virtual machines belong to a Google owned project that is outside of any of your VPC-SC perimeters. For this reason, once the perimeters are enforced it’s not possible to access the protected resources anymore.

A popular alternative is setting up a Google Compute Engine or on-prem virtual machine that is or can get access to the perimeter.

Terraform (or automation in general)

Before enforcing the VPC-SC perimeters you must be sure that your automation tools, wherever they are (e.g. on-prem, public SaaS or on GCP) will still have access to the resources inside the perimeters.

It is possible to configure exceptions (see access levels and ingress/egress policies) at perimeters level to allow inbound or outbound connections to a specific source IP range or to a specific identity (e.g. the automation service account).

Services not fully supported by VPC-SC

Be careful to review the list of products supported by VPC-SC and their limitations.

Some Google Cloud services are not supported while others might have limitations when used in a VPC-SC context.

An example is Cloud Build which works well with VPC-SC only when private pools are used. The reason is that by default Cloud Build uses Google compute resources (outside of the customer’s perimeters) to run the build steps, while with private pools the customer dedicates private compute resources (inside of a VPC-SC perimeter) to run their builds.

Manage exceptions with Access Levels, Ingress/Egress policies and VPC-SC Bridges

There might be cases where you need to add exceptions to the perimeters to allow connections from outside of the perimeter to access the protected resources or to allow a workload inside the perimeter to access resources outside of the perimeter. Exceptions can be managed either via VPC-SC perimeter bridges or via ingress/egress policies (or Access Levels which are a kind of ingress policies).

When you need to allow communication between protected services in different GCP projects that belong to different VPC-SC perimeters you can simply create a VPC-SC perimeter bridge containing those projects.

When, instead, the source or destination of the communication is outside any VPC-SC perimeter (e.g. Internet, on-prem, a GCP project not in a perimeter) then ingress/egress policies can be used (ingress/egress policies could also be used instead of VPC-SC bridges for a better control).

Incoming requests (from outside to the perimeter) are covered by ingress policies. They allow us to specify the type of source and identity we want to allow on which destination project and service.

In the same way Egress policies are reserved for requests leaving the perimeter and also allow us to specify the identities who can make requests to services outside of the perimeter as well as the destination projects and services.

I want to remark that these are exceptions and they shouldn’t become the rule. If you’re having too many of them you should probably step back for a moment and re-think the design of your perimeters.

Shared VPCs

Sharing a VPC of a host project to one or more service projects is a popular way to separate responsibility between the team that manages the network and those teams taking care of the different application projects.

The virtual machines (or other compute services) running in this setup will present a behavior that might confuse some people.

The point is that, even if a VM belongs to the service project, it effectively uses the VPC network shared by the host project. For this reason if the host project and the service project are in different perimeters the VM will have access only to the protected services of the host project’s perimeter (and can’t access protected resources inside its own project).

To get around these limitations we can use one of the following approaches:

Put the service projects and host projects in the same perimeter
Create bridges only for those cross-perimeter communications that should be allowed
Use Private Service Connect endpoints to access Google APIs (see next section)
Configure ingress/egress policies on the perimeters for specific identities, projects and services.

Use Private Service Connect endpoints for Google APIs to access resources inside the perimeters from the outside (even from on-prem)

Private Service Connect endpoints can be created inside a VPC of a project protected by a VPC-SC perimeter to give access to the Google APIs.

We can leverage these endpoints by using DNS and routes to let them resolve the requests to the Google APIs.

The interesting part is that when an API request resolves via one of these endpoints it will grant access to the protected services of the perimeter containing that endpoint, even when the request comes from a project in a different perimeter or from on-prem.

If you’re interested in the details of this procedure you can read my other article where I leveraged the same concept to invoke a private Cloud Function from on-prem: Calling a private Google Cloud Function from on-prem.

Conclusions

VPC-Service Controls is the tool to protect undesired access to Google Cloud APIs and services.

In this article I covered some important points to be aware of when designing VPC-SC security perimeters.

I hope you found the information useful, feel free to leave a like or a comment.