Optimising GHA Test Pipelines: A Guide to Efficient Caching
Introduction
In the realm of Microfrontends, where each repository functions as a powerhouse in itself, is more important than ever to establish reliable and swift pipelines for code building and test suite execution. This article explores the optimisation of GitHub Actions (GHA) test pipelines by leveraging caching strategies, specifically focusing on the caching of node_modules and yarn packages.
The Challenge
As software repositories grow in complexity, the installation of dependencies becomes a bottleneck in continuous integration pipelines. Most pull requests involve unchanged dependencies, making it inefficient to download and install them repeatedly. The primary objective is to reduce the installation time of modules from the typical 5–15 minutes to a matter of seconds.
Caching Strategy
To achieve the node modules caching, it’s important to consider the two following folders.
- yarn_cache: This directory functions as a repository for yarn packages, ensuring their swift availability on the cache server. The “prefer-offline” flag is employed to prioritise cache utilisation during installations.
- node_modules_cache: Dedicated to storing installed packages and their dependencies for each unique “yarn.lock” file.
Infrastructure Setup
Utilising an NFS cache, each repository within the organisation has its designated folder, facilitating isolated caching for every project.
First of all, the cache is mounted in the container:
container:
image: <custom_image>:latest
volumes:
- /mnt/nfs-ci-cache:/mnt/cache
env:
YARN_CACHE_FOLDER: "/mnt/cache/${{ github.repository }}/cache"
NODE_PATH: "/mnt/cache/${{ github.repository }}/lock_node_modules"
Caching Initialization
The checkout step employs the sparse-checkout option to pull only the essential files, optimising the process by fetching only “package.json” and “yarn.lock”. For big repositories this step can save a lot of time.
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.ref }}
sparse-checkout: |
package.json
yarn.lock
Before proceeding with installation, a quick check determines whether a cache already exists for the given yarn.lock file. If not, the script initiates the creation process:
- name: Install node_modules for new yarn.lock
id: yarnInstall
shell: bash
run: |
set -x
export checksum="$(sha1sum yarn.lock | awk '{print $1}')"
if [ -d "${NODE_PATH}/${checksum}/node_modules" ]; then
echo "Cache folder already exists for yarn.lock: ${NODE_PATH}/${checksum}/node_modules"
else
echo "Creating cache folder for yarn.lock: ${NODE_PATH}/${checksum}"
echo "started=true" >> $GITHUB_OUTPUT
mkdir -p ${NODE_PATH}/${checksum}/node_modules
ln -s ${NODE_PATH}/${checksum}/node_modules node_modules
yarn install --frozen-lockfile --prefer-offline
echo "Done!"
fi
Notice that when the procedure starts we are echoing a started=true variable in GITHUB_OUTPUT.
In case of a failure during this step, a failsafe mechanism is in place to archive the potentially corrupted cache, allowing for a fresh start.
- name: Delete residues on failure
if: ${{ always() && (steps.yarnInstall.outputs.started == 'true' && !(steps.yarnInstall.outcome == 'success')) }}
shell: bash
run: |
set -x
export checksum="$(sha1sum yarn.lock | awk '{print $1}')"
if [ -d "${NODE_PATH}/${checksum}/node_modules" ]; then
echo "Cache folder will be deleted for safe measure"
mv ${NODE_PATH}/${checksum} ${NODE_PATH}/${checksum}_archive${{ github.run_number }}
echo "Done!"
fi
Utilising the Cache
In order to use the generated cache we have to link the cached modules and ensure the setup is correct before build the code and run our test scripts.
- name: Link node_modules
run: |
set -x
export checksum="$(sha1sum yarn.lock | awk '{print $1}')"
ln -s ${NODE_PATH}/${checksum}/node_modules node_modules
- name: Yarn Install (NFS Cache)
run: |
yarn global add serve && yarn install --frozen-lockfile --prefer-offline
Putting everything together
In order to use everything in our pipelines we can move them in reusable workflows and actions. In the following image we can see an example of cache population for a new “yarn.lock” file.
When the pipeline is triggered again, the install node_modules step finds the existing cached folder and continues with the test execution.
The prepare-the-cache job ensures that everything is set up correctly so we can then proceed with our test execution:
Let’s have a look now inside a test job:
The linking and installation process takes just a few seconds. It’s important to highlight that if we were to install everything from scratch in all the different parallel thread that we are using, we would need additional storage, memory, and CPU usage.
Conclusion
By strategically implementing caching at key stages of the GitHub Actions workflow, we have significantly reduced dependency installation times, streamlining the continuous integration process. This approach ensures that Microfrontend repositories can be developed and tested with greater efficiency and speed.