Protecting Sensitive Data: Securing Data Pipelines on Google Cloud (part 1)

Jérôme NAHELOU
CodeShake
Published in
4 min readMar 8, 2023

--

This series of stories will help you to design and secure workload on GCP with different levels of protection.

For this series, I choose a simple architecture within one GCP project composed by:

  • A front application hosted on Cloud Run
  • A Pub/Sub topic with a Push Subscription to the backend
  • A backend layer composed by a storage service hosted on cloud run and a datastore table

Most of the code can be found on GitHub.

Authentication within Google Cloud

First of all, a good identity and access management is the top priority to protect sensitive data.

If you are running on GCP, you can access to an ephemeral identity or access token using metadata service:

curl -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=AUDIENCE"
curl -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"

Most of Google libraries automatically obtain identity from the metadata service and the only thing you need is set the good level a privileges for services account.

The identity provided by the metadata server is inherited from the Service Account used by the resource.
Resources like Cloud Run accept a user service account that helps you to assign an identity for each part of the application and then define the good level of permissions.

Authentication outside of Google Cloud

If your applications aren’t running on Google Cloud, you’ll need to provide credentials.

In addition to least privilege (authorization), authentication will be a challenge. Fixed service account keys credentials may result in a security breach.
For example, if your key is found on github, it will take very little time for people to misuse the stolen credentials.
This is why you should avoid static keys and consider workload identity federation as the first choice.

Workload Identity Federation provides OAuth2.0 based authentication workflows for using a third-party identity in Google Cloud. Products like github, gitlab or others cloud service providers are good use cases.

To avoid errors, you can restrict the creation of service account keys using the organisation policy constraints/iam.disableServiceAccountKeyCreation at project, folder or organisation level.

As a second choice, you can adopt short-lived service account keys using solutions like Hashicorp Vault. With the GCP secret backend, Vault helps you to dynamically create and revoke credentials. You are free to configure the duration of the lease.

If none of the above solutions works for your case, static credentials is the last option. You are strongly encouraged to enforce a network level protection to prevent data exfiltration or mining (the next part).

Keep in mind that service accounts are only required by applications. Users can use their own identity using gcloud.

$ gcloud auth application-default login
$ gcloud auth login

Authorization layer

Setting the right level a privileges for service accounts is called least privilege: only give permissions you need to run it, no more.
Basic (or primitive) roles like Owner or Editor are strongly discouraged and predefined roles may be too large for applications (but a good compromise for your users).
Custom roles are exactly what we want to do by adding only permissions the application needs.

If you want to dig deeper in restrictions, IAM binding can be subject to conditions. Conditions are boolean expressions to limit credential permission based on resource or request context (resource name, request time, etc.). Refet to IAM conditions documentation to list services that support conditions.

Organization policies and IAM deny access can also be very useful in restricting users and service account permissions.

Theory versus Practice

  • My front Cloud Run service is IAM protected and only my user is allowed as roles/run.invoker on this service
  • Service account used by the “front” Cloud Run is granted as roles/pubsub.publisher on the Pub/Sub topic
  • Service account used by Pub/Sub subscription is grant as roles/run.invoker on the “back” service
  • Finally the “back” service is grant as roles/datastore.user on my project

I also added default logging and monitoring permissions on service accounts used by my cloud run services.

Pros:

  • Service account keys in Google Cloud are managed by Google
  • Malicious use of managed service usage is limited by permissions set for each layers

Cons:

  • No egress traffic filtering on compute resources, code injection mau result in data exfiltration
  • Internet exposure without ingress filtering
  • A bad actor can use a service account keys stolen

In the next part, we will see how to improve ingress and egress traffic filtering using VPC Service Control.

--

--

Jérôme NAHELOU
CodeShake

Cloud rider at SFEIR the day, Akita Inu lover #MyAkitaInuIsNotAWolf