Self-hosted runner vs. GitHub-hosted runner

Published in

ServiceRocket Engineering

7 min readDec 23, 2022

Background

I’m Moga, I’m a Software Engineer at ServiceRocket and I’m part of the Developer Experience squad. The Developer Experience squad is a Platform type team — our Vision is to help Stream-Aligned teams move faster by defining Principles, Practices and Patterns, removing redundancy and providing psychological safety for Software Engineering here in ServiceRocket.

One of the major decisions that we took quite some time ago is that we will decommission our Jenkins and replace it with GitHub Actions. The maintenance cost of Jenkins was slowly chirping away our engineers’ time and GitHub Actions was the answer we were looking for — fully managed, secure and it’s already included within our GitHub subscription.

Along the way, while all the squads are slowly migrating their pipeline to GitHub Actions, one of our squads that supports one of the chunkiest apps had a problem with their integration tests and E2E tests in GitHub Actions. After looking into the problem, we discovered that the default GitHub Actions runner machine is not powerful enough to run our integration tests and E2E tests. The standard GitHub Actions runner machine (for Linux) comes with 2-core CPU (x86_64), 7 GB of RAM and 14 GB of SSD space; Supported runners and hardware resources.

Coincidently, at the same time, GitHub announced the GitHub-hosted larger runner. Currently, this feature is in beta and you will need to send a request to GitHub to enable this feature for your GitHub organization.

For comparison, we took a job from our current Jenkins and perform some estimation on how it would look in GitHub Actions.

Machine specs:
- CPU: 4 cores
- Memory: 16GB
- OS: Ubuntu
- Instance type: m6i.xlarge (Spot instance)
Job:
- Execution duration: ~2h (~1h for each job that runs in parallel)
- Frequency (in a month): 12–15 (on the main branch)
- Triggered based on: push to any branch

For simplicity's sake, we will just look at the main branch to have an idea of how it would look like.

Pricing comparison (AWS vs. GHA larger runners)

Based on 15 runs on the main branch. (15 x 2h = 30h)

The idea behind the AWS (Spot) and AWS (On-demand) (spin for use) is that we will have custom shared GitHub Actions that will provision the resources in AWS and once we are done with the runner, we will destroy the resources.

Pros and cons

The pros and cons of each approach are documented:

Decision

Well, we went with AWS (Spot) for obvious reasons. It’s already the existing design and we just need to implement something very minimal to get this to work with GitHub Actions — we need a custom shared GitHub Actions.

The idea was taken from ec2-github-runner, but we want it to be more templatized and automated so that the EC2 gets the latest AMI, running in a specific AWS account, in a specific VPC and subnet and we are very sure that all the runners are running in the architecture that we designed so that we can have a peace of mind that they all are configured securely. By templatizing, we also removed the cognitive load from all the squads — they do not have to worry about the AMI ID, how should the network be configured and etc. — it’s very close to plug-and-play manner to provision and destroys the runners.

Custom GitHub Action

The usage will look something like the following — you will have a job that will provision the runner and then, you will also have a job to destroy it once it’s not needed and they all can exist in the same GitHub Actions workflow.

- name: Start runner/ Destroy runner
  id: start-runner/ destroy-runner
  uses: servicerocket/actions/self-hosted-runner@{ref}
  with:
    # AWS access key id (do not change the value)
    aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
    # AWS secret access key (do not change the value)
    aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    # Robot GitHub token (do not change the value)
    robot-token: ${{ secrets.ROBOT_TOKEN }}
    # Squad identifier for AWS resources
    team:
    # Operation mode; start/stop
    mode: 
    # Number of runner machines (optional, only for mode: start, default 1)
    capacity:
    # Indicates that this is a test; true/false (optional, only for mode: start, default false)
    # By default (test: false), the action will provision m6i.xlarge instance(s).
    # If test: true, the action will provision t2.micro instance(s).
    test:

In the background, the custom action will do the following:

name: 'Manage Self Hosted Runner'
description: "A shared action to create/destroy AWS Self Hosted GitHub Action runner"
author: mogavenasan, raeveen
inputs:
  aws-access-key-id:
    description: "The aws-access-key-id used to authenticate with AWS"
    required: true
  aws-secret-access-key:
    description: "The aws-secret-access-key used to authenticate with AWS"
    required: true
  robot-token:
    description: "ROBOT GitHub token"
    required: true
  team:
    description: "Team tag for AWS resources"
    required: true
  mode:
    description: 'Operation type; start/stop'
    required: true
  capacity:
    description: "Number of nodes to provision"
    default: "1"
    required: false
  test:
    description: "To provision smaller instance(s) for test"
    default: "false"
    required: false
outputs:
  runner-name:
    description: "Self hosted runner name"
    value: ${{ steps.set-env.outputs.runner-name }}
runs:
  using: "composite"
  steps:
    - name: Set environment variables
      id: set-env
      run: |
        echo "run_id=${{ github.run_id }}" >> $GITHUB_ENV
        echo "repo_owner=${{ github.repository_owner }}" >> $GITHUB_ENV
        echo "repo_name=${{ github.event.repository.name }}" >> $GITHUB_ENV
        stack_name=${{ github.repository_owner }}-${{ github.event.repository.name }}-${{ github.run_id }}
        echo "stack_name=$stack_name" >> $GITHUB_ENV
        echo "::set-output name=runner-name::$stack_name"
      shell: bash
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        aws-access-key-id: ${{ inputs.aws-access-key-id }}
        aws-secret-access-key: ${{ inputs.aws-secret-access-key }}
        aws-region: us-west-2
    - name: Get latest version of AMI ID
      if: ${{ inputs.mode == 'start' }}
      run: |
        ubuntu_release_url="https://changelogs.ubuntu.com/meta-release-lts"
        ubuntu_release_info=$(curl -s "$ubuntu_release_url")
        ubuntu_name=$(echo "$ubuntu_release_info" \
          | grep Dist: \
          | tail -n1 \
          | sed 's/Dist: //'
        )
        ubuntu_version=$(echo "$ubuntu_release_info" \
          | grep Version: \
          | tail -n1 \
          | sed 's/Version: //;s/ LTS//' \
          | awk -F \. {'print $1"."$2'}
        )
        if [[ -z "$ubuntu_name" ]] || [[ -z "$ubuntu_version" ]]
        then
          echo "Unable to find the latest Ubuntu LTS version."
          echo "Please check if $ubuntu_release_url returns the metadata correctly."
        else
          id=$(aws ec2 describe-images \
            --owners 099720109477 \
            --filters "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-$ubuntu_name-$ubuntu_version-amd64-server-*" \
            --query 'sort_by(Images,&CreationDate)[-1].ImageId' \
            --output text
          )
        echo "ami_id=$id" >> $GITHUB_ENV
        fi
        # Uncomment the next line as a fallback method if the script is not able to get the latest Ubuntu version
        # echo "ami_id=ami-08df94af6199f15b6" >> $GITHUB_ENV
      shell: bash
    - name: Create runner
      if: ${{ inputs.mode == 'start' }}
      run: |
        aws cloudformation create-stack \
          --stack-name ${{ env.stack_name }} \
          --template-body file://${{ github.action_path }}/spot-fleet.yml \
          --capabilities CAPABILITY_NAMED_IAM \
          --parameters \
          ParameterKey=GHARunID,ParameterValue=${{ env.run_id }} \
          ParameterKey=RepoOwner,ParameterValue=${{ env.repo_owner }} \
          ParameterKey=RepoName,ParameterValue=${{ env.repo_name }} \
          ParameterKey=RobotToken,ParameterValue=${{ inputs.robot-token }} \
          ParameterKey=AmiId,ParameterValue=${{ env.ami_id }} \
          ParameterKey=Origin,ParameterValue=${{ env.stack_name }} \
          ParameterKey=Team,ParameterValue=${{ inputs.team }} \
          ParameterKey=Capacity,ParameterValue=${{ inputs.capacity }} \
          ParameterKey=Test,ParameterValue=${{ inputs.test }}
      shell: bash
    - name: Destroy runner
      if: ${{ inputs.mode == 'stop' }}
      run: |
        aws cloudformation delete-stack --stack-name "${{ env.stack_name }}"
        sleep 30
        offline_runners=$(curl -s \
          -H "Accept: application/vnd.github+json" \
          -H "Authorization: Bearer ${{ inputs.robot-token }}" \
          https://api.github.com/repos/${{ env.repo_owner }}/${{ env.repo_name }}/actions/runners \
          | jq -r ".runners[] | select((.status == \"offline\") and (.labels[].name == \"${{ env.stack_name }}\")) | .name"
        )
        export RUNNER_CFG_PAT=${{ inputs.robot-token }}
        for runner in $offline_runners
        do
          curl -s https://raw.githubusercontent.com/actions/runner/main/scripts/delete.sh \
            | bash -s ${{ env.repo_owner }}/${{ env.repo_name }} $runner
        done
      shell: bash

And then, there is the CloudFormation YAML file; spot-fleet.yml. In this file, you will define the AWS resources and the architecture of the EC2 machine. We went with Spot Fleet since there are possibilities that we will need to request more than one EC2 instance. The stacks are created/destroyed using the AWS CLI | cloudformation.

To register the GitHub runner, we are using the scripts from Automate Configuring Self-Hosted Runners. To register a runner, this script is embedded within the EC2 UserData and to de-register a runner, this script is used within the custom action itself.

In order for this to work, you will need a PAT (Personal Access Token) from a GitHub account that has an Admin privilege on the GitHub repository. To make it clear, we have designed the custom action in a way that it will register the runner to the repository instead of the organization — this gives us some auditability and visibility when it’s needed.

Future work

We are continuously improving the custom action to make it work seamlessly in squads’ CI/CD pipeline in GitHub Actions.

There are a few different ongoing efforts right now such as how to make this custom action work with matrix strategy and how to compute the capacity automatically instead of hardcoding in the action’s input.

For time being, this solution works for us and helps us with the migration. We will revisit this solution again sometime in the future to re-evaluate the design. We are excited to see what the future holds for us!