Supercharging your CICD game with Self-hosted Runners from GitHub Actions

Published in

Sysco LABS Sri Lanka

15 min readFeb 22, 2024

CICD and its current landscape

The realm of Continuous Integration and Continuous Deployment (CI/CD) has significantly transformed how software development and delivery pipelines function. CI/CD automates the process from code commits to deployment, making it faster and more reliable. With the explosion of Software as a Service (SaaS) solutions in the recent past, the market has seen an array of CI/CD tools from a multitude of vendors. Among them, GitHub has emerged as a unique and powerful platform that streamlines and enhances the CI/CD experience.

GitHub Actions for CICD

The GitHub platform has been a cornerstone of the technology community for over fifteen years. It has consistently led the way in the realm of version control Software as a Service (SaaS) platforms since its inception. Over the years, GitHub has fortified its portfolio with a diverse array of supporting tools tailored for application development, project management, and an array of other functions. Furthermore, it has remained at the vanguard of open-source development, solidifying its position as a household name in open-source initiatives

In the IT industry, it is evident that a majority of organizations rely on GitHub as a cornerstone of their development processes. Given this prevalent landscape, there’s no better method to package and efficiently deliver code managed on GitHub than by leveraging GitHub itself. It is in this context that GitHub Actions, a part of their CICD (Continuous Integration and Continuous Deployment) solution platform, emerges as a pivotal tool. Historically, CICD procedures were predominantly managed by DevOps engineers. However, GitHub introduced a revolutionary CICD platform based on GitHub Actions, which has democratized CICD processes by bringing them directly to application repositories.

GitHub Action’s secret sauce is,

Simplified Setup: No dedicated resources are needed. Just add a single file to your repository for an effortless CI/CD setup.
Flexible Webhook Integration: GitHub Actions integrates seamlessly with GitHub, allowing various event triggers, including external app webhooks.
Community-Powered Reusability: Share or access pre-built CI/CD workflows with over 11,000 available actions in the GitHub Marketplace.
Platform Neutrality: GitHub Actions supports various platforms, languages, and clouds, offering flexibility for your technology choices.

Components of the GitHub Action Platform

If you are to adopt GitHub actions and the surrounding platform provided by GitHub, there are 2 paths to choose from.

GitHub-hosted Runners — where all of the CICD infrastructure needed to run your workloads is managed by GitHub. You can choose from a limited set of VM options and sizes that GitHub has to offer as of now while allowing you to do certain customizations like setting up additional dependencies.
Self-hosted Runners — offer enhanced control over hardware, OS, and software compared to GitHub-hosted runners. This allows you to take control of the underlying infrastructure and keep your CICD solution in-house, within the limits of your organization network if required.

In this article, I’ll be focusing on the use of self-hosted runners and their nuances, starting with why it would be a good fit for you and your organization’s use cases.

Likely, your organization has already invested in a cloud service provider for your application needs, and with GitHub’s self-hosted runners, it’s like bringing your infrastructure to run the GitHub workflows on, simple as that. All you need is a properly configured runtime with connectivity to GitHub to receive events.

Now that we know the lay of the land, let’s delve into how you can set up your own Self-hosted runner-based CICD platform. But before getting started, let’s see the composition of a sample workflow setup in GitHub.

Figure 1. Illustration of the composition of a GitHub workflow — Fig 1. Illustration of the composition of a GitHub workflow

A GitHub event initiates a workflow and there are various events you can choose from like Pull requests being created, changes being pushed, releases getting tagged, etc.
A workflow, as the name suggests is an automated workflow/process configured using a YAML file in your repository. Workflows are defined in the .github/workflows directory in a repository, and a repository can have multiple workflows, each of which can perform a different set of tasks.
A workflow will have one or many jobs defined and a job will have multiple steps within. Those steps can either run a script that you define or run an action, which is a reusable extension that can simplify your workflow. Jobs can run sequentially or in parallel inside a runner of your choosing, a GitHub-hosted runner, or a self-hosted runner. We’ll be using the latter in this article.

If we are to put the above sample sequence of jobs into a GitHub workflow, it would look something like below:

name: CI/CD Workflow

on:
  push:
    branches:
      - main

jobs:
  build:
    name: Build Job
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2
      - name: Build Application
        run: |
          # Add your build commands here
  test:
    name: Test Job
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2
      - name: Run Tests
        run: |
          # Add your test commands here
  deploy:
    name: Deploy Job
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2
      - name: Deploy Application
        run: |
          # Add your deployment commands here

A dynamic self-hosted runner generator

Fig 2. Component diagram of the Dynamic GitHub self-hosted runner generator

The diagram above illustrates a prospective design for a dynamic GitHub self-hosted runner generator, utilizing GitHub events as its foundation and AWS as the cloud service provider. Let’s analyze the components and associated services to understand the system further.

GitHub Events

The starting point of this implementation is a GitHub event defined in the workflow file (i.e: Code push, PR creation or merge, release tagging, etc.), which triggers a GitHub workflow. For this event to translate to a runner creation in your cloud provider, a webhook needs to be configured, that will send a POST request to a specified endpoint with details of any subscribed events.

Payload URL: Webhook request receiving endpoint (mentioned below in HTTP API)
Content type: Content type of the webhook payload (JSON, x-www-form-urlencoded, etc)
Secret: A secret of your choice to verify the origin of the webhook request
Events that trigger the webhook: Since this webhook intends to notify a workflow initiation, the main event subscription would be workflow_run and workflow_job
- workflow_run Event: The workflow_run event is triggered when an entire workflow is initiated or completed. This event occurs at the workflow level.
- workflow_job Event: The workflow_job event, on the other hand, is triggered when a specific job within a workflow starts or completes.

HTTP API

We have used a public API endpoint to receive GitHub webhook requests, which initiates the subsequent runner creation on AWS infrastructure. HTTP API service is used as it has in-built OIDC and OAuth2 integrations, allowing the creation of an additional layer of security with ease. We could even use lambda functional URL as an alternative.

IP Whitelistor

This Lambda function brings in an additional layer of validation on top of the existing webhook secret-based security, to make sure that only an intended party will be calling the above endpoint. This implementation uses the lambda authorizer feature in HTTP API where you can set a lambda function to filter API traffic and act as an authorizer. This implementation returns an IAM policy based on the source IP of the request, either allowing the API execution or blocking it. Source IP will be checked against a pre-defined block of IPs stored in the Parameter store, fetched from the GitHub Meta endpoint, daily.

GitHub Metadata Fetcher

This lambda fetches data from the GitHub meta endpoint, specifically the IP block that will be used for its webhook requests. Once fetched, relevant data will be stored in a parameter in the parameter store, to be used by other services when needed.

Webhook Listener

This will be the Lambda function that digests the GitHub workflow requests coming via webhook requests. It will filter out specific events (workflow_job events in this case) if the webhook has multiple even subscriptions. Once it detects a valid event, it will add the event to the request queue, which is a standard SQS queue, indicating that there’s an incoming request to spin up a runner to execute a GitHub workflow job.

Request Queue

The intention of using a queue is to buffer the incoming requests and make sure we don’t simply create a runner for all incoming requests. It would be ideal to define your limit on the maximum number of runners you are willing to run at a given time, or else when u have multiple developers in a team creating pull requests frequently, it could quickly get out of hand. You can configure the delivery delay of your queue based on the event consumption rate for your use case. A one-minute delay would mean the queued messages will only reach the consumer function after one minute.

Runner Generator

This Lambda function is the highlight of this implementation as it’s responsible for the actual runner generation in ECS. Runner generator lambda will be configured to receive messages from the Request queue with a delay and when a request reaches this function, it will go through the following sequence.

Fig 3. Flow chart depicting the runner generation logic at the generator Lambda

In this step, the GitHub workflow job in the incoming message is checked for its current state (queued, in_progress, completed or waiting). If it’s already in a queued state, that means this job is being picked up by an existing runner while the request to create a new runner is sitting in the queue.
It’s best to define an upper bound for the number of active runners you are willing to have, based on your workloads and cost expectations.
Based on the received request, you can decide on which type of runner to spin up. You could have multiple types of runners, based on its operating system, size of allocated resources, dependencies, etc.
Once an active runner is created, details of the runner like the AWS task ARN, and repository that requested the runner will be stored in the info queue to be used in cleaning up tasks.

Runner Task

Once you have your runner generation request flow sorted, then what’s left is to spin up the GitHub runner. The runner application is an open-source project and we can use it as we see fit. The approach that we have used and explained in this article is creating a containerized version of the runner. With this approach, we can pick and choose what libraries or tools are required in setting up a runner to run workflows of different flavors.

You can create the Dockerfile as you please with a base image of your choice. (Pick a suitable OS and a distro from the list of supported Operating systems based on your workload requirement). Below are a few pointers on how to set up the runner with Docker. (You can download the GitHub runner application at https://github.com/actions/runner/releases/)

The runner requires the following variables to configure it.

Repository URL ( — url):
URL of the repo to which the runner will get assigned
Runner name ( — name):
An identifier to be assigned to the runner
Runner token ( — token):
A short-lived runner token is required during the configuration process. Instead, we can use a GitHub personal access token at the docker run stage to fetch a runner token, which makes it more dynamic and convenient. This is achieved by setting the token fetching logic in the entrypoint script of the docker image.

RUNNER_TOKEN="$(curl -XPOST -fsSL \
-H "Authorization: token ${GITHUB_ACCESS_TOKEN}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/${SCOPE}/${_PATH}/actions/runners/registration-token" \
| jq -r '.token')"

Runner labels ( — labels) — Optional:
Custom labels to identify the runner during a workflow assignment when you have multiple runners to choose from
Runner working directory ( — work) — Optional:
The working directory for the runner. Defaults to ‘/_work’.
Runner group ( — runnergroup) — Optional:
Name of the runner group to add this runner to (defaults to the default runner group)

The above variables need to be defined in your Dockerfile, to be passed down to the GitHub runner configurator (Refer to the config.sh implementation at https://github.com/actions/runner/blob/main/src/Misc/layoutroot/config.sh )

2. You can use something like the below RUN command in your Dockerfile to download and set up the GitHub Runner application

RUN GH_RUNNER_VERSION=${GH_RUNNER_VERSION:-$(curl --silent "https://api.github.com/repos/actions/runner/releases/latest" | grep tag_name | sed -E 's/.*"v([^"]+)".*/\1/')} \
    && curl -L -O https://github.com/actions/runner/releases/download/v${GH_RUNNER_VERSION}/actions-runner-linux-x64-${GH_RUNNER_VERSION}.tar.gz \
    && tar -zxf actions-runner-linux-x64-${GH_RUNNER_VERSION}.tar.gz \
    && rm -f actions-runner-linux-x64-${GH_RUNNER_VERSION}.tar.gz \
    && ./bin/installdependencies.sh \
    && chown -R root: /home/runner \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

3. GitHub runners are by default, configured to automatically check for the availability of a newer version of the runner during a job execution. We’ve seen certain issues being raised in the open-source community about runner auto-updates causing a loop (container starts, tries to update then shuts down and loops through) when running as a container. GitHub mentions that the use of runsvc.sh as the docker entrypoint to invoke the runner as a service to mitigate this or else you can choose from the following 2 options

You can set up a service like supervisord to allow the runner to exit and restart by itself. Below code snippet shows a sample supervisord configuration that can be used to set up the runner. Once the configuration is done, move it to the supervisord config location and add a CMD entry to the Dockerfile to start the container with supervisord.

[supervisord]
user=root
nodaemon=true
logfile=/dev/fd/1
logfile_maxbytes=0
loglevel=error

[program:runner]
directory=/home/runner
command=/home/runner/bin/runsvc.sh
stdout_logfile=/dev/fd/1
stdout_logfile_maxbytes=0
redirect_stderr=true

CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

You can turn off the automatic update feature of the runner by setting --disableupdate flag when configuring the runner and manually updating the runner version. (You can utilize a tool like Watchtower to manage image updates). However, for compatibility with the GitHub Actions service, you will need to manually update your runner within 30 days of a new runner version being available.

4. Multiple open-source projects provide pre-configured docker images with the GitHub runner included inside. This would have the required dependencies pre-configured so all you have to do is pull the image and run it on the infrastructure of your choice.

Once you have your runner image sorted, then you’ll have to decide on your runtime. If you are on AWS, it could be ECS on EC2 or Fargate or if K8s is your cup of tea, with a few additional configurations, it can be done as well. In our setup, we are using AWS Fargate as the compute engine for containers.

When the runner creation request is verified, we can spawn a GitHub runner task, which runs the runner image created above inside a cluster. Once the runner application starts up and registers with a repository, it can accept the queued workflows in that repository and run to completion.

Infrastructure Info Queue

This queue is put in place to facilitate the clean-up process of any idle GitHub runners from AWS and GitHub end. When the Runner Generator spins up a runner, it will send the runner’s details to this queue. This queue is configured as a FIFO, delay queue (5 minutes) with a visibility timeout (1 minute).

Note: There’s an option to configure the runner as an ephemeral runner by setting the --ephemeral flag when configuring the runner and such a runner will unregister after a single job, ideal for workflow jobs requiring a fresh image. These runners will not require a clean-up flow. You can read about it more here.

Runner Cleaner

This function will receive messages from the Infrastructure Info Queue and will decide on which runner to clean up at what time. Based on the queue configuration, the clean-up request will first reach the Fargate task after 5mins and at that time this function will check;

whether the current task is in the running state, as it could already be stopped due to other reasons. if it’s already being cleaned up, no further action will be taken and the lambda integration will clean up the message from the queue.
are there any pending workflows in the repository that initially requested the runner, so that the runner wouldn’t get stopped abruptly. If there are any pending workflows, this function will throw an error and the message will not get cleared out from the queue. Since the queue is configured with a visibility timeout, that particular message will not be visible for the period of that timeout (1 minute). After that, the message will get re-delivered. This will continue till the runner gets cleared or the queue elapses its message retention period.

Integrations

With GitHub Action’s adoption by the masses, it’s become one of the main CICD solutions leading to better integrations with other IT solution providers. Listed below are a few possible integrations to make the use of the GitHub Actions platform more meaningful.

Something that goes hand in hand with any IT platform is an observability service that provides insights into that system’s inner workings. Datadog has been the front-runner in the observability segment and it also provides an integration to monitor tests and pipeline executions in GitHub Actions.

Fig 4. Datadog CI Visibility view of Pipeline executions in GitHub Action

2. Apache DevLake, an up-and-coming open-source dev data platform that ingests, and analyzes data from fragmented data originating from various DevOps tools to visualize and make certain analyses available, like DORA metrics, for engineering excellence. GitHub Actions is one of the supported platforms by Apache DevLake.

3. Jira, one of the most famous issue-tracking products, developed by Atlassian, also has support for GitHub-related operations. When you add your Jira issue key to your commits in a pull request, and when the pull request initiates a workflow, that workflow information will get linked to your issue. This would allow you to track your workflow runs within the Jira issues and track them to deployments.

Security Considerations

When designing a system like the above, it’s paramount that you pay attention to how secure it is and what needs to be done to make it more secure. Outlined below are two such security considerations when designing a similar system.

GitHub strongly advises against the use of self-hosted runners for public repositories as a malicious user can fork that repository and run malicious code on your self-hosted runner machine by creating a Pull Request (Given you have PR-triggered workflows enabled). However, if your public repository is public and you want to use self-hosted runners, make sure to enable Require approval for all outside collaborators in the repo settings so that manual approval will be required to run workflows on pull requests. One other option would be to set up a private companion repo to your public repo which will handle all the workflows.
Extending from the above use case, make sure to use secrets when introducing sensitive information to your workflows.
- Refrain from using structured data as secrets as it can cause secret redaction within logs to fail, leading to secret exposure.
- Register all sensitive information used in workflows as secrets, including any derived values from another registered secret.

Additional Considerations

Given the high adoption rate of containerized application development using tools like Docker, it’s likely that your workflow will have steps to build Docker images. If you are running your GitHub runners as containers, you’d likely have difficulties in getting docker functions to work within the runner (i.e. Docker in Docker capabilities). In such a scenario, you’d most likely have to rely on GitHub-hosted runners (refer https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions to find out details like execution minutes and storage for GitHub-hosted runners ) or a VM-based self-hosted runner.

However, if your application workloads are

Java-based, you can leverage Google’s Jib tool, which allows docker image creation in a daemon-less manner, with Maven, Gradle, or as a Java library. This is the approach that we’ve used for our Java-based microservices, bringing in additional benefits like better layering and reproducible images.
Deployed as serverless functions, you can simply use a runner with required dependencies (i.e. AWS SDK or Serverless Framework) pre-installed.

Conclusion

CI/CD has taken center stage in revolutionizing software development, automating processes from code commits to deployment. Among the array of tools, GitHub Actions stands out, streamlining CI/CD with simplified setup, webhook integration, and community-powered flexibility. Focusing on GitHub’s Self-hosted Runners, this article delves into their usage nuances, how to set up a dynamic self-hosted runner generator on AWS, security considerations around the usage of self-hosted runners, and how you can design them based on your workloads. Overall, GitHub Actions redefine CI/CD, fostering adaptability and innovation in software development.