Secure Deployments from Gitlab to Google Cloud Platform

Published in

Google Cloud - Community

7 min readFeb 8, 2022

Time to come clean: do you feel the pangs of guilt whenever you generate keys for your Google Cloud Platform Service Accounts and upload them to Gitlab for your Gitlab CI pipelines? Do you lose sleep at night over your all the service account keys laying around in plain text, stuffed into CI variables? Do you joke to yourself that a hacker will inevitably compromise your keys and perhaps waste thousands of dollars mining cryptocurrency on your employer’s dime?

If you’re doing this now, all it takes is one Gitlab permission misconfiguration for your keys to be at risk of exposure.Then you’re in real trouble. Using manually generated keys not invites risk, it violates one of the core implied, least-talked about principles of service account best practices: don’t use manually generated keys if you don’t have to! This is such a frustrating and obvious problem. There must be a better way to use authenticated Google APIs from Gitlab.

I find that it’s always hardest to do the right thing when I’m in a hurry, and in the course of my work with many of Google Cloud’s customers, I’m always trying to get pipelines going with tremendous urgency— when time is at a premium – at a point in my projects where I can least afford technical debt! How can I do the right thing with the least amount of effort?

How do I give Gitlab CI pipelines access to Google credentials without downloading and uploading manually-generated service account keys?

Run Your Own Runners

Gitlab Runner is an application that works with Gitlab CI/CD to run pipeline jobs, and it can be installed on your own infrastructure. This means that we can install Gitlab Runner to instances under our control in any GCP environment. By installing to a Google Compute Engine instance, we can associate a user-managed service account with that instance. Gitlab Runner jobs scheduled to that host will automatically assume the GCP service account credentials associated with that host without ever needing to manually generate, download, and upload service account keys.

Another benefit to using self-hosted runners in your own environment is that you may place the runner on or adjacent to your GCP environment’s network, giving jobs the ability to directly connect to services running in your private VPC. This is particularly useful for deploying workloads to Google Kubernetes Engine when the cluster control plane is configured to be fully private. Typically, in order to connect to fully private clusters, one must create and connect to a bastion instance which proxies traffic across your private network to the control plane endpoint. However when your Gitlab Runner is on your VPC, you may connect directly from your Gitlab CI jobs — no proxies, no migraines, no stress.

We are not limited to install Gitlab Runner only on GCE instances; it can run on any general compute environment, including on GKE.

The Easy Way

Perhaps the simplest way to integrate Gitlab Runner into your environment is by using the runner’s Docker executor on a GCE instance running Container-Optimized OS. Using a startup script to install and register the Gitlab runner, it connects to Gitlab’s hosted SaaS platform, or optionally to self-hosted Gitlab running on-premises or in another cloud platform.

Example: Self-Hosted Gitlab Runner deploying Infrastructure as Code

In the above diagram we see the simplest possible example architecture, using Gitlab Runner to deploy infrastructure as code to Google Cloud Platform. While the code (as in Infrastructure as Code) is hosted externally to GCP on Gitlab’s SaaS platform, when a runner is registered, it accepts CI jobs from Gitlab SaaS. In this case, a job to deploy infrastructure as code like Terraform would inherit its Google identity from the service account associated with the runner instance. That service account has IAM permissions assigned as needed to deploy its infrastructure in GCP. No credentials are ever manually generated, downloaded, or exposed to the CI job — a short-lived token is simply exposed by GCP to the instance via its metadata server.

Self-Hosted Gitlab Runner deploying to GKE

Another extremely common use case for Gitlab CI is deploying Kubernetes workloads to private GKE clusters, which is pictured above. In opposition to the previous example, this architecture’s Gitlab runner is adjacent to the cluster that is the target of our deployment—to be specific, it resides in the same VPC. This is required because when a GKE cluster is created, GCP establishes a VPC peering connection between your VPC where your cluster nodes reside, and GCP’s service producer network where the cluster’s private endpoint resides. This is represented by the dotted blue line in the above diagram. When clusters are private, their endpoint is only accessible from your end of the peering connection, on your private VPC, transiting this peering into the service producer network. As such, when we put our runner on the same VPC as our GKE cluster, we can utilize this peering connection to connect directly to the cluster’s endpoint to use its API to deploy our workloads.

Example Code

The following startup script is intended to be used with an image with Docker installed by default, such as Container-Optimized OS. The only piece of data that must be provided is the Gitlab Runner registration token, which can be registered to be shared with all of your groups and projects or minimally scoped to a project.

Gitlab Runner Registration Starup Script

In order to reflect modern standards of deployment of cloud resources, let’s make this more supportable with something we can template and deploy with Terraform as infrastructure as code. Terraform code snippets are included below.

Terraform Template File Resource for Gitlab Runner Startup Script

Using Terraform we can automate the creation of Gitlab Runners with its most basic features. In the Terraform resource snippet above, we create a reusable startup script template which we can reference when creating a runner instance. Below, we use the above startup script template to create a managed instance group with one node to ensure that at least one Gitlab Runner instance is always running to accept CI/CD jobs:

Gitlab Runner Manage Instance group resource definition for Terraform

Note: Be aware of the service account block on lines 17–21. This is the service account Gitlab Runner jobs will use to access Google APIs. In the above — if this block is not set — the instance uses the compute default service account, which by design is highly privileged, and likely not at all representative of the IAM permissions you wish to assign to your Gitlab Runner jobs. Please ensure to create your own service account with IAM permissions scoped for access no greater than is necessary to perform your CI/CD tasks.

Segregating Access By Environment

By now we understand how we can leverage Gitlab Runners hosted on GCE to give Gitlab CI privileged access to GCP without explicitly creating, downloading, and then uploading service credentials to Gitlab by using a single GCE instance to run Gitlab CI jobs. Simple enough.

But how might we quickly adapt what we have just learned to support more granular access to GCP, like to support a deployment into a single environment (like production) without also granting IAM access to all other environments as well?

One useful feature that we can leverage is Gitlab CI tags, which allow one to specify a tag — or several — which corresponds to a Gitlab Runner instance. This ensures jobs with a given tags—like prod for example—only run on specific runners.

Below is an example block from a.gitlab-ci.yml pipeline file which demonstrates how one would specify a build tag:

build-app:
 stage: build
 tags:
   - prod

But how will our registered Gitlab Runners know which tags they are responsible for running? This is important: we must specify this when we register our runners with Gitlab. Note the command line arguments added on lines 6 and 7 with regard to tags:

Gitlab Runner registration supporting CI tags

Evolving this Strategy

While using Google Compute Engine to host Gitlab runners is a reasonable approach, it does come with some notable drawbacks. For instance, this strategy does not support autoscaling nor does it support scale-to-zero for cost savings. If the instance runs out of compute resources, your build jobs will inevitably suffer from resource contention and slowdown. Inversely, if no build jobs are available, the runner instance will continue to run in perpetuity, racking up compute charges while there is no active compute work to support. Last and perhaps most notably, one would need to create an instance, runner registration, and service account per required isolated deployment identity. If, for example, you needed many deployments to have segregated, least-privileged access to GCP granted on a per-deployment basis, creating a GCE instance for each deployment probably doesn’t seem reasonable.

What is an engineer to do?

In my next blog post, we’ll discuss how you can combine what we have learned about custom Gitlab Runners with Google Kubernetes Engine and Workload Identity Federation, which enables workloads running in a GKE namespace to have access to a GCP identity similarly to how workloads on Google Compute Engine are given access. This enables one to take advantage of all the scaling solutions available in Kubernetes and GKE (both up and down) while enabling runners to have access to identities with the least number of IAM privileges, using something that is relatively cheap and lightweight — Kubernetes namespaces—instead of something far more heavy-handed and costly—compute instances.