Autoscaling Runners on Google Cloud

Jasbir Singh
Google Cloud - Community
7 min readAug 30, 2023

GitHub Actions help you automate your software development workflows. You’re probably might already be familiar with the built-in runners for Windows, Linux, and macOS, but what if your workloads require custom hardware, a specific operating system, or software tools that aren’t available on these runners?

With self hosted runners, You can host your own runners and customize the environment used to run jobs in your GitHub Actions workflows. You can create custom hardware configurations that meet your needs with processing power or memory to run larger jobs, install software available on your local network, and choose an operating system not offered by GitHub-hosted runners. Self-hosted runners can be physical, virtual, in a container, and they can run on-premises, on a public cloud like Google Cloud, or on both using a platform like Anthos.

You can add self-hosted runners at various levels in the management hierarchy:

  • Repository-level runners are dedicated to a single repository.
  • Organization-level runners can process jobs for multiple repositories in an organization.
  • Enterprise-level runners can be assigned to multiple organizations in an enterprise account.

Differences between Github Hosted and Self Hosted Runners

GitHub-hosted runners offer a quicker, simpler way to run your workflows, while self-hosted runners are a highly configurable way to run workflows in your own custom environment.

GitHub-hosted runners:

  • Receive automatic updates for the operating system, preinstalled packages and tools, and the self-hosted runner application.
  • Are managed and maintained by GitHub.
  • Provide a clean instance for every job execution.
  • Use free minutes on your GitHub plan, with per-minute rates applied after surpassing the free minutes.

Self-hosted runners:

  • Receive automatic updates for the self-hosted runner application only, though you may disable automatic updates of the runner. You are responsible for updating the operating system and all other software.
  • Can use cloud services or local machines that you already pay for.
  • Are customizable to your hardware, operating system, software, and security requirements.
  • Don’t need to have a clean instance for every job execution.
  • Are free to use with GitHub Actions, but you are responsible for the cost of maintaining your runner machines.

Autoscaling

You can automatically increase or decrease the number of self-hosted runners in your environment in response to the webhook events you receive with a particular label. You can even create automation that adds a new self-hosted runner each time you receive a workflow_job webhook event with the queued activity, which notifies you that a new job is ready for processing. The webhook payload includes label data, so you can identify the type of runner the job is requesting. Once the job has finished, you can then create automation that removes the runner in response to the workflow_job completed activity.

Ephemeral Runners

GitHub recommends implementing autoscaling with ephemeral self-hosted runners; autoscaling with persistent self-hosted runners is not recommended. In certain cases, GitHub cannot guarantee that jobs are not assigned to persistent runners while they are shut down. With ephemeral runners, this can be guaranteed because GitHub only assigns one job to a runner.

This approach allows you to manage your runners as ephemeral systems, since you can use automation to provide a clean environment for each job. This helps limit the exposure of any sensitive resources from previous jobs, and also helps mitigate the risk of a compromised runner receiving new jobs.

To add an ephemeral runner to your environment, include the --ephemeral parameter when registering your runner using config.sh. For example:

./config.sh --url https://github.com/octo-org --token example-token --ephemeral

Controlling Software Updates

By default, self-hosted runners will automatically perform a software update whenever a new version of the runner software is available. If you use ephemeral runners in containers then this can lead to repeated software updates when a new runner version is released. Turning off automatic updates allows you to update the runner version on the container image directly on your own schedule.

To turn off automatic software updates and install software updates yourself, specify the --disableupdate flag when registering your runner using config.sh. For example:

./config.sh --url https://github.com/YOUR-ORGANIZATION --token EXAMPLE-TOKEN --disableupdate

If you disable automatic updates, you must still update your runner version regularly. New functionality in GitHub Actions requires changes in both the GitHub Actions service and the runner software. The runner may not be able to correctly process jobs that take advantage of new features in GitHub Actions without a software update.

Elastic GitHub Runners in Google Cloud Platform

Now I will show how to create elastic GitHub Self Hosted Runners on demand that scale to zero when idle. Having idle runners waiting for jobs to be executed it’s a waste of resources and can be very expensive for organisations that have hundreds of runners.

Requirements

You will need access to GCP, the services we are going to use are: Secret Manager, CloudBuild and Compute Engine. You also need access to GitHub, and the permissions to generate a registration token.

Cloud Build Webhook Trigger

We are going to configure a Cloud Build Webhook Trigger to run when a job is queued. The trigger will spin-up a VM that will register a new runner. The new runner will be executed using the --ephemeral flag.

Cloudbuild can be triggered by code changes but also via webhooks. Cloudbuild can also extract information from the payload sent by the caller. So, we are going to pass the following variables to th cloud build configuration.

Substitution variables:

  • _ACTION = $(body.action)
  • _JOB_NAME = $(body.workflow_job.name)
  • _ORG_NAME = $(body.organization.login)
  • _REPO_FULLNAME = $(body.repository.full_name)
  • _REPO_NAME = $(body.repository.name)
  • _RUNNER_LABELS = $(body.workflow_job.labels)
  • _TIMEOUT = 600

gcloud commands on the terminal

Create a file named build-config.yaml with this content:

steps:
- name: gcr.io/cloud-builders/gcloud
entrypoint: 'bash'
args:
- '-c'
- |
#### Create ci.yml
cat > /workspace/ci.yml <<- EOF
#cloud-config
users:
- name: ghr
groups: docker
homedir: /home/ghr
uid: 2000
package_upgrade: true
yum_repos:
docker-ce:
name: Docker CE Stable - \$basearch
baseurl: https://download.docker.com/linux/centos/\$releasever/\$basearch/stable
enabled: true
gpgcheck: true
gpgkey: https://download.docker.com/linux/centos/gpg
packages:
- yum-utils
- git
- docker-ce
- docker-ce-cli
- containerd.io
write_files:
- path: /etc/systemd/system/shutdown.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Shutdown Service
[Service]
Type=simple
Restart=no
ExecStart=/bin/bash /etc/systemd/system/shutdown.sh
[Install]
WantedBy=multi-user.target
- path: /etc/systemd/system/ghr.service
permissions: 0644
owner: root
content: |
[Unit]
Description=GitHub Self-Hosted Runner
Requires=network.target
After=network.target
[Service]
User=ghr
Type=oneshot
Restart=no
WorkingDirectory=/home/ghr
Environment="HOME=/home/ghr"
ExecStartPre=/bin/bash /home/ghr/pre-start.sh
ExecStart=/bin/bash /home/ghr/run.sh
ExecStartPost=/bin/bash /home/ghr/terminate.sh
[Install]
WantedBy=multi-user.target
- path: /home/ghr/pre-start.sh
permissions: 0755
owner: root
content: |
#!/usr/bin/env bash
set -euo pipefail
REGISTRATION_TOKEN=\$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/REGISTRATION_TOKEN)
RUNNER_LABELS=\$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/RUNNER_LABELS)
REPO_FULLNAME=\$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/REPO_FULLNAME)
curl -o /home/ghr/actions-runner-linux-x64-2.285.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.285.1/actions-runner-linux-x64-2.285.1.tar.gz
tar xzf /home/ghr/actions-runner-linux-x64-2.285.1.tar.gz
/home/ghr/config.sh --unattended --url https://github.com/\$${REPO_FULLNAME} --token \$${REGISTRATION_TOKEN} --labels \$${RUNNER_LABELS} --ephemeral
- path: /home/ghr/terminate.sh
permissions: 0755
owner: root
content: |
#!/usr/bin/env bash
set -euo pipefail
NAME=\$(curl -X GET http://metadata.google.internal/computeMetadata/v1/instance/name -H 'Metadata-Flavor: Google')
ZONE=\$(curl -X GET http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google')
gcloud --quiet compute instances delete \$$NAME --zone=\$$ZONE
- path: /etc/systemd/system/shutdown.sh
permissions: 0755
owner: root
content: |
#!/usr/bin/env bash
set -euo pipefail
NAME=\$(curl -X GET http://metadata.google.internal/computeMetadata/v1/instance/name -H 'Metadata-Flavor: Google')
ZONE=\$(curl -X GET http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google')
TIMEOUT=\$(curl -X GET http://metadata.google.internal/computeMetadata/v1/instance/attributes/TIMEOUT -H 'Metadata-Flavor: Google')
re='^[0-9]+$'
if [[ \$$TIMEOUT =~ \$$re ]] ; then
echo "TIMEOUT is a valid number" >&2
sleep \$$TIMEOUT
shutdown +2
gcloud --quiet compute instances delete \$$NAME --zone=\$$ZONE
fi
runcmd:
- chown -R ghr /home/ghr
- systemctl daemon-reload
- systemctl start docker
- systemctl start shutdown.service
- systemctl start ghr.service
EOF
- name: gcr.io/cloud-builders/gcloud
entrypoint: 'bash'
args:
- '-c'
- |
#### Create startup.sh
cat > /workspace/startup.sh <<- EOF
#!/bin/bash

if ! type cloud-init > /dev/null 2>&1 ; then
echo "Ran - `date`" >> /root/startup
sleep 30
yum install -y cloud-init

if [ \$? == 0 ]; then
echo "Ran - Success - `date`" >> /root/startup
systemctl enable cloud-init
else
echo "Ran - Fail - `date`" >> /root/startup
fi

# Reboot either way
reboot
fi
EOF
- name: gcr.io/cloud-builders/gcloud
entrypoint: 'bash'
args:
- '-c'
- |
#### Create shutdown.sh
cat > /workspace/shutdown.sh <<- EOF
#!/bin/bash

export RUNNER_ALLOW_RUNASROOT=1
REGISTRATION_TOKEN=\$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/REGISTRATION_TOKEN)
/home/ghr/config.sh remove --token \$${REGISTRATION_TOKEN}
EOF
- name: gcr.io/cloud-builders/gcloud
entrypoint: 'bash'
args:
- '-c'
- |
apt update && apt install jq -y
REGISTRATION_TOKEN=$(curl -H "Authorization: token $$GITHUB_TOKEN" -X POST https://api.github.com/repos/${_REPO_FULLNAME}/actions/runners/registration-token | jq -r .token)
RUNNER_NAME=ghr-$(cat /proc/sys/kernel/random/uuid | sed 's/[-]//g' | head -c 6; echo;)
RUNNER_LABELS=$(echo '${_RUNNER_LABELS}' | jq -r '@csv' | sed 's/"//g')
gcloud compute instances create $$RUNNER_NAME \
--project=$PROJECT_ID \
--zone=europe-west1-b \
--machine-type=e2-standard-2 \
--network-interface=network-tier=PREMIUM,subnet=runners \
--maintenance-policy=TERMINATE \
--service-account=github-runner@$PROJECT_ID.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--create-disk=auto-delete=yes,boot=yes,device-name=instance-1,image=projects/centos-cloud/global/images/centos-7-v20211214,mode=rw,size=20,type=projects/$PROJECT_ID/zones/europe-west1-b/diskTypes/pd-ssd \
--metadata=^:^REGISTRATION_TOKEN=$$REGISTRATION_TOKEN:RUNNER_LABELS=$$RUNNER_LABELS:REPO_FULLNAME=${_REPO_FULLNAME}:TIMEOUT=${_TIMEOUT} \
--metadata-from-file user-data=/workspace/ci.yml,startup-script=/workspace/startup.sh,shutdown-script=/workspace/shutdown.sh \
--no-shielded-secure-boot \
--preemptible \
--no-restart-on-failure \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any \
--labels org_name=${_ORG_NAME},repo_name=${_REPO_NAME},job_name=${_JOB_NAME}
secretEnv: ['GITHUB_TOKEN']
options:
logging: CLOUD_LOGGING_ONLY
availableSecrets:
secretManager:
- versionName: projects/$PROJECT_NUMBER/secrets/github-token/versions/latest
env: 'GITHUB_TOKEN'

Now we can create the trigger with following command:

PROJECT_ID="THE-PROJECT-ID"
SA="projects/${PROJECT_ID}/serviceAccounts/runner-bootstrap@${PROJECT_ID}.iam.gserviceaccount.com"
gcloud alpha builds triggers create webhook \
--name=elastic-runner-webhook \
--secret=projects/$PROJECT_ID/secrets/webhook-secret/versions/latest \
--substitutions=_ACTION='$(body.action)',_JOB_NAME='$(body.workflow_job.name)',_ORG_NAME='$(body.organization.login)',_REPO_FULLNAME='$(body.repository.full_name)',_REPO_NAME='$(body.repository.name)',_RUNNER_LABELS='$(body.workflow_job.labels)',_TIMEOUT=600 \
--filter='_ACTION == "queued"' \
--service-account=$SA \
--inline-config=build-config.yaml

Now that our Trigger is created let’s configure the webhook on GitHub. From your GitHub repo click on -> Settings -> Webhooks

Now click on Add webhook

In the Payload URL paste the Cloud Build Webhook Trigger copied from the Trigger configuration on (Cloudbuild):

Now scroll down and click on “Let me select individual events.” we only wan to get triggered when a job is queued so we only click on Workflow jobs make sure Active is selected and click on Add webhook

--

--

Jasbir Singh
Google Cloud - Community

Consulting Cloud Architect, Public Cloud@Rackspace Technology