Implementing a Custom GitHub Actions Cache

Published in

Nerd For Tech

7 min readApr 12, 2023

While building CI pipelines on GitHub Actions, I noticed a big shortcoming with the official GitHub cache functionality. Stiff restrictions that quickly become productivity impeding when using ephemeral runners were implemented for security reasons.

Through a certain action, actions support data caching. I was very surprised though, when I’ve noticed that caching, as frequently described, has a very severe limitation — it’s not shared across PRs; this limits its performance severely. [1]

Usage Restrictions

The GitHub cache action actions/cache@v3 is the official way of caching dependencies in workflows. Access restrictions were implemented to create logical boundaries between different branches or tags.

Workflow runs can restore caches created in either the current branch or the default branch (usually main). If a workflow run is triggered for a pull request, it can also restore caches created in the base branch, including base branches of forked repositories. [2]

When the workflow is triggered by a pull_request event, the logical boundaries are tightened. The Cache has a limited scope to that particular workflow run.

When a cache is created by a workflow run triggered on a pull request, the cache is created for the merge ref (refs/pull/.../merge). Because of this, the cache will have a limited scope and can only be restored by re-runs of the pull request. It cannot be restored by the base branch or other pull requests targeting that base branch. [2]

Usage Limits

In addition to the productivity impeding usage restrictions, there is a 10GB total size cache limit per repository, and a 7 days eviction policy.

The Problem

Because of the usage restrictions and limits highlighted above, the first job(s) of the first workflow run(s) of every new Pull Request will not benefit from caching dependencies, which can result in significant pipeline delays on ephemeral runners. This is a big deal for CI pipelines relying on the GitHub flow.

The Solution

A custom cache implementation built on top of Object Storage should speed up our first workflow runs for new Pull Requests. The custom cache will have a loose access restriction, rendering inter-repository and even inter-organization caching possible if desired. It will also support caching workflow run-time dependencies, which is not the case for the official GitHub Actions Cache that generates hash keys (hashFiles() ) based on the branch state at workflow trigger time.

For example, if the Python version is modified using asdf during a workflow run to test an artifact with multiple Python versions, it will be possible to invoke our caching logic and generate Cache hash keys after the modification. A Cache hit will be evaluated for future tests that use that same version of Python.

The Implementation

Pre-requisites:

GitHub Actions Secrets defined for AWS IAM Credentials: SRV_GHA_AWS_ACCESS_KEY_ID and SRV_GHA_AWS_SECRET_ACCESS_KEY.
An S3 Bucket created in the AWS Account with the name custom-actions-cache .

Demo:

For the sake of this demo, we are going to build a custom GitHub Actions cache implementation on top of AWS S3.

Let’s assume that we solely need to setup a Python virtual environment and install tensorflow.

Consider the following GitHub Actions workflow:

---
name: topic-venv
on:
  pull_request:
    types:
      - opened
      - synchronize
env:
  CACHE_VERSION: 1

jobs:
  Configure-venv:
    runs-on:
      - ubuntu-latest
    steps:
    - name: Checkout Repo
      uses: actions/checkout@v3
    - name: Configure AWS Profile
      run: |
        mkdir -p ~/.aws && touch ~/.aws/credentials
        cat << EOT > ~/.aws/credentials
        [srv_gha]
        aws_access_key_id=${{ secrets.SRV_GHA_AWS_ACCESS_KEY_ID }}
        aws_secret_access_key=${{ secrets.SRV_GHA_AWS_SECRET_ACCESS_KEY }}
        EOT
    - name: Configure AWS Config
      run: |
        mkdir -p ~/.aws && touch ~/.aws/config
        cat << EOT > ~/.aws/config
        [default]
        region = us-east-1
        output = json
        EOT
    - uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    - name: Setup Virtual Environment and Install Requirements
      run: |
        if [[ ! -f "${GITHUB_WORKSPACE}/.venv/bin/python" ]]; then
          echo "Creating Virtual Environment"
          python3 -m venv --clear .venv
        else
          echo "Virtual Environment has already been restored from cache"
        fi

        .venv/bin/pip install -r requirements/requirements.txt

With the following requirements.txt :

tensorflow==2.12.0

Without any cache setup, this workflow run takes 57 seconds to complete.

Note that in this workflow we are only installing tensorflow and the caching logic is scoped to pip dependencies. In production CI pipelines, you may have simultaneous caching strategies for pre-commit, or other tool dependencies. The workflow time can increase considerably.

Custom Cache

.
├── .github
│ ├── workflows
│   └── topic-build.yaml
├── README.md
├── requirements
│ └── requirements.txt
├── scripts
  ├── post-cache.sh
  └── pre-cache.sh

In the case of this demo, we are going to cache the Virtual Environment. We will use the requirements.txt file for the Cache hash key:

“${{ runner.os }}-python-$(sha1sum ./requirements/requirements.txt | cut -d ‘ ‘ -f1)-${{ env.CACHE_VERSION }}"

For the custom cache, we are going to rely on two scripts for the caching logic:

pre-cache.sh

: '
args:
    $1: Path.
        Example: ${GITHUB_WORKSPACE}/.venv
    $2: Key.
        Example: Linux-python-3.11-$(sha1sum ./requirements/requirements.txt | cut -d ' ' -f1)-1
input vars:
    CACHE_TYPE (required).
        Example: PYTHON
'

This script evaluates a cache hit for the cache type for the specified cache key in $2. In the case of a cache hit, it will restore the cache at the specified Path in $1. In the case of a cache miss, it will set the ${CACHE_TYPE}_CACHE_HIT variable to false.

post-cache.sh

: '
args:
    $1: Path.
        Example: ${GITHUB_WORKSPACE}/.venv
    $2: Key.
        Example: Linux-python-3.11-$(sha1sum ./requirements/requirements.txt | cut -d ' ' -f1)-1
input vars:
    CACHE_HIT (required): true | false

If CACHE_HIT is set to false, this script will save the specified cache path in $1 to the AWS S3 Bucket custom-actions-cache.

#!/usr/bin/env bash

set -euo pipefail

script_directory="$( cd "$( dirname "${BASH_SOURCE[0]}" )" > /dev/null 2>&1 && pwd)"
root_directory="$(realpath "${script_directory}/..")"

pushd "${root_directory}" > /dev/null

s3_bucket_name=custom-actions-cache

mkdir -p "$1"

if [[ "$CACHE_HIT" == true ]]; then
    echo "Cache hit occured on the primary key $2, not saving cache."
elif [[ "$CACHE_HIT" == false ]]; then
    if [[ $(aws --profile srv_gha s3 ls s3://${s3_bucket_name}/$2/ --region us-east-1 | head) ]]; then
        echo "Cache is already saved successfully for key: $2"
    else
        temp_dir="$(mktemp -d)"
        tar cf "${temp_dir}/archive.tar" "$1" > /dev/null
        size_in_b="$(ls -l "${temp_dir}/archive.tar" | cut -d ' ' -f 5 )"
        size_in_mb="$(echo "scale=2 ; ${size_in_b} / 1000000" | bc)"
        aws --profile srv_gha s3 cp "${temp_dir}/archive.tar" "s3://${s3_bucket_name}/$2/archive.tar" --region us-east-1 > /dev/null
        copy_exit_code=$?
        rm -rf "${temp_dir}"
        echo "Cache size: ~${size_in_mb} MB (${size_in_b} B)"
        if [[ "${copy_exit_code}" == 0 ]]; then
            echo "Cache saved successfully for key: $2"
        fi
    fi
fi

popd > /dev/null

Implementing the custom cache logic in our workflow:

---
name: topic-venv
on:
  pull_request:
    types:
      - opened
      - synchronize
env:
  CACHE_VERSION: 1

jobs:
  Configure-venv:
    runs-on:
      - ubuntu-latest
    steps:
    - name: Checkout Repo
      uses: actions/checkout@v3
    - name: Configure AWS Profile
      run: |
        mkdir -p ~/.aws && touch ~/.aws/credentials
        cat << EOT > ~/.aws/credentials
        [srv_gha]
        aws_access_key_id=${{ secrets.SRV_GHA_AWS_ACCESS_KEY_ID }}
        aws_secret_access_key=${{ secrets.SRV_GHA_AWS_SECRET_ACCESS_KEY }}
        EOT
    - name: Configure AWS Config
      run: |
        mkdir -p ~/.aws && touch ~/.aws/config
        cat << EOT > ~/.aws/config
        [default]
        region = us-east-1
        output = json
        EOT
    - uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    - name: Custom Pre Cache - Python Virtual Environment
      run: |
        CACHE_TYPE=PYTHON ./scripts/pre-cache.sh \
        "${GITHUB_WORKSPACE}/.venv" \
        "${{ runner.os }}-python-$(sha1sum ./requirements/requirements.txt | cut -d ' ' -f1)-${{ env.CACHE_VERSION }}"
    - name: Setup Virtual Environment and Install Requirements
      run: |
        if [[ ! -f "${GITHUB_WORKSPACE}/.venv/bin/python" ]]; then
          echo "Creating Virtual Environment"
          python3 -m venv --clear .venv
        else
          echo "Virtual Environment has already been restored from cache"
        fi

        .venv/bin/pip install -r requirements/requirements.txt

    - name: Custom Post Cache - Python Virtual Environment
      run: |
        CACHE_HIT=${{ env.PYTHON_CACHE_HIT }} ./scripts/post-cache.sh \
        "${GITHUB_WORKSPACE}/.venv" \
        "${{ runner.os }}-python-$(sha1sum ./requirements/requirements.txt | cut -d ' ' -f1)-${{ env.CACHE_VERSION }}"

The Custom Implementation in Action

On a cache miss:

On a cache hit:

On a Cache Hit, the workflow took 32 seconds to complete. In the current demo setup, this Custom Cache implementation will speed up new branches Pull Requests first workflow runs by 44% when a Cache hit occurs. That speed up is crucial for pipelines relying on the GitHub Flow.

References:

[1]: https://saveriomiroddi.github.io/Improving-the-Github-Actions-caching-effectiveness/

[2]: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows