Centralised audit logs in GCP in a secure environment with VPC Service Controls

Published in

Google Cloud - Community

7 min readOct 11, 2021

On paper, setting up GCP logging sinks is relatively easy. In reality, as soon as VPC Service Controls are involved, you suddenly find yourself locked out of every “how-to” tutorial — they likely won’t work with your security setup.

But there is hope :)

Update 2023: there is a new way to get insights from centralised logs in GCP — Log Analytics! Read all about it in this article.

In this article you will learn how to set up aggregated logging in an organization that has VPC Service Controls, and find a Terraform module that lets you automate the setup for your own Google Cloud infrastructure.

First, let’s cover the basics: what is GCP audit logging, centralised logging, and logging sinks.

Audit logging is a service which logs API calls in your cloud resources that correspond to admin activity events, data access, system events, and policy violations. This is the data that you would want to keep in order to monitor activities in the cloud : who did what and when. When you want to store those logs centrally, GCP provides you with an easily configurable way of routing the logs, called logging sinks.

Here is what you need to do in order to get the logs:

1) Enable audit in your projects (we prefer setting it to ALL_SERVICES)

2) Create logging sinks to send the logs to a storage solution of choice

You can configure logging sinks at four levels:

Project
Folder
Organization
Billing account

Each sink type captures all the logs inside its resource (project/folder/organization/billing account), plus logs from all children — folders and projects — if you set a special flag include_children to true.

Organization requirements

Now let’s cover the requirements for audit logs and their collection that our company had internally. These reflect the general architecture choices, as well as the regulatory requirements we had to adhere to:

All audit logs must be centrally stored in a separate GCP project
Logs contain sensitive data, such as emails and IP addresses, therefore need to be protected with a VPC Service Perimeter
Storage: BigQuery for analytics and GCS for long-term storage
Logs from different environments: dev, stage, prod , and locations: UK, USA, Canada— must be stored separately
We do not need to store Kubernetes and Cloud Composer logs, nor the logs from our security scanning software

A representative schema of an organisation with multiple environments, GCP projects, VPC Service Perimeters and a requirement to audit logs in a central place

The Challenge

Most of our GCP projects reside inside VPC Service Perimeters; but not all. For example, dev projects, or projects that are used for Maps and Translation APIs, do not contain sensitive data and do not need to be in a perimeter. The APIs that we restrict include:

BigQuery
Storage
Logging
Containers

This creates a challenge as the data flow between our projects and the audit logs project (itself in a separate perimeter) is blocked, unless we set up service perimeter bridges between them. In the first attempt, we opted for exactly that: project logging sinks and bridges.

This, however, did not work: because the projects that are not in a service perimeter cannot be added into a bridge, there would be no way to connect the non-perimetered projects, with the perimetered log sink.

An error email kindly sent by Google to everybody in our Cloud Admins group 👀 Note: the content has been modified for illustration and privacy purposes

Such setup also opens up a security risk: say, you have projects A and B in a perimeter, and C and D in another; they must not communicate with each other. It is easy to oversee this when creating bridges with the audit logs perimeter, and add all A, B, C anf D into one bridge — thereby allowing the communication.

The Solution

In February 2021 Google released a fantastic feature to the VPC Service Controls suite: Ingress and Egress rules. This allows you to specify rules under which API calls to restricted services are allowed to traverse the perimeter boundary. The rules extend the existing Access Levels capabilities, are available in Terraform, and one of the documented use-cases is centralised logging!

Here is how it works:

Diagram to illustrate logging sinks between different VPC SC perimeters, using a combination of Access Levels to allow logs entering the logging SP, and Egress rules to allow logs exiting the SP of origin

You have a regular project A with a logging sink, inside a service perimeter SPA; and a logging project , inside service perimeter LSP.

To enable communication between the two, you will :

Allow logging (and storage/bigquery if you are writing there) API calls to exit the VPC SC service perimeter of origin with an Egress policy
Allow the logging sink writer identity to enter the logging service perimeter with an Access Level

Here is how an Egress Policy looks like when you want to set up logging sinks to BigQuery and Storage, with the logging project number being 12345678:

egressPolicies:
 — egressFrom:
   identityType: ANY_SERVICE_ACCOUNT
 egressTo:
   operations:
   — methodSelectors:
     — method: '*'
     serviceName: bigquery.googleapis.com
   — methodSelectors:
     — method: '*'
     serviceName: logging.googleapis.com
   — methodSelectors:
     — method: '*'
     serviceName: storage.googleapis.com
   resources:
   — projects/12345678

Note: At the moment of writing, BigQuery did not support method-specific restrictions on egress rules.

Flow diagram describing the logic of the Egress Policy on the Service Perimeter A: only allow logs to exit the perimeter if they are signed by a service account, are being written to BigQuery or GCS, and are going to GCP project #12345678 — Logic of the Egress Policy on the Service Perimeter A: only allow logs to exit the perimeter if they are signed by a service account, are being written to BigQuery or GCS, and are going to GCP project #12345678

If you are not using Terraform, you could apply the above policy by saving it to logsegress.yaml and running the following command:

gcloud beta access-context-manager perimeters update $SERVICE_PERIMETER_NAME --set-egress-policies=logsegress.yaml

We will go through the Terraform setup in a bit.

For the Access Level that will allow the logging sink’s write identity to write data into the logging project, we simply need to list the identities of the relevant logging sinks as the Members of the Access Level:

echo "- members:
    - serviceAccount:f123456345-09876@gcp-sa-logging.iam.gserviceaccount.com
    - serviceAccount:f0987654345-2345@gcp-sa-logging.iam.gserviceaccount.com" > CONDITIONS.yaml
  gcloud access-context-manager levels create logging_al \
   --title logging_al \
   --basic-level-spec CONDITIONS.yaml \
   --combine-function=OR \
   --policy=0987654321

Note that the service account emails are automatically generated by GCP, and we will really benefit from the powers of Terraform to automate this step.

When we add an Access Level to the diagram above, we get an additional check that allows the data to enter the Logging Service Perimeter

Let’s get the logs from the entire Organization

Here is the proposed design:

Creating 2 logging sinks on the organization level would be the cleanest solution:

2 sinks with carefully calibrated filters
2 service accounts
2 access levels
1 Egress Policy on each of the other perimeters.

This is simple enough if you have a consistent way of filtering projects into production and non-production. But what if you don’t?

Or you have a requirement to separate logs into more granular datasets, such as “environment-location”, or “department”?

Enter folders

GCP folders, folder logging sinks, VPC SC access levels and centralised logging projects

The idea here is to create top-level folders to represent the granularity of the environments (and logs) that you need; create a folder logging sink for each folder; and enforce a policy in your ogranization that says that projects can only be created under one of those top folders!

Architecture diagram of centralised logging setup for a Google Cloud Organization with a sample Prod UK environment

Automating it with terraform

This section describes the setup for creating folders and logging sinks, and assumes that you are familiar with terraform, and are using version 0.13+.

In order to automatically set up everything that is needed for a cross-perimeter logging sink, we will need a module that creates:

a folder
a logging sink
necessary IAM permissions
necessary access level conditions

Here is the complete module for you to use: terraform-gcp-audited-folder

This module assumes that you have decided which GCP project will host your aggregated logs (your logging project), created a BigQuery dataset or a GCS bucket to stream the logs to, and have a VPC Service Perimeter with an Access Level protecting your logging project.

For each folder, you will need to provide the following inputs:

folder_name
parent_id in the format folder/12345 or organization/6789
logging_project_id
Target logging config for BigQuery and/or Cloud Storage: dataset/bucket name, optional logging filter and optional exclusions
the logging_access_level_name in the format accessPolicies/${org_access_policy_number}/accessLevels/${access_level_name}

module "folder" {
  source = "git@github.com:ngodec/terraform-gcp-audited-folder.git"
  folder_name = "My Folder"
  parent_id = "folders/123456"
  logging_project_id = "audit-logs-prod"
  bigquery_logging_sink = {
      dataset_id = "logs"
      filter = ""
      exclusions = []
    }
  logging_access_level_name = "accessPolicies/123456/accessLevels/logs-access-level"
}

You can skip the logging vars and only provide the folder name and the parent — in which case the module will only create the folder, and not the logging sinks. This is so that you can use the same module for managing all your folders, not only the top-level ones:

module "folder" {
  source = "git@github.com:ngodec/terraform-gcp-audited-folder.git"
  name = "My Folder"
  parent = "folders/123456"
}

Now all the projects that you create withing the audited folders will stream their logs to your destination of choice. I like using BigQuery as my log destination — it enables cool things such as threat analysis and Bigquery usage dashboards like this one (all numbers were randomly generated):

BigQuery usage dashboard built on top of audit logs data. All numbers here are randomly generated

Phew. Security can make things very complicated, right? And the beauty of engineering is in finding elegant solutions for the most complex problems.

Have you got any questions? Don’t hesitate to pop a comment below or @ me on Twitter.