How Fever has speed-up e2e execution time by 16.5% using Cypress and Github Actions.

Antonio Jiménez

Published in

Fever Engineering

6 min readApr 12, 2022

By actively handling the cache we have managed to significantly reduce the joint execution time.

Background

To get some context, at Fever we already had a pretty good configuration in our e2e test workflow. Our tests are developed using the Cypress framework which is already more than a standard in the market. These tests are run on each new PR that is opened in our web client repository. The tests are launched on 8 parallel machines. If the tests fail, the pull request cannot be merged.

With the continuous growth of contributors that the repository was experiencing due to the increase in the size of our developer staff, these tests were being launched quite frequently in the last weeks so we started to investigate new ways to optimize the execution time of the runners.

Our old configuration

Schema of our old workflow configuration

With our old workflow configuration we had 2 different jobs:

Prepare: in this job all we do is wait for the lint and unit tests workflows to complete successfully. We do this for 2 reasons:
1. To avoid launching the e2e tests on a code that does not even pass the lint standards, or in which the unit tests fail.
2. To avoid launching 8 different jobs, and making them wait for 3~5 minutes each, consuming unnecessary computation time in Github Actions.
Runner: it is the job in charge of executing the e2e tests, it is executed on 8 different machines to reduce the execution time of the suite. In this job we do the following:
- Checkout of the repo to have access to it.
- Retrieve the /.npm cache, if it exists.
- Retrieve the /.cache/Cypress cache, if it exists.
- Install dependencies, launch the web client and tests. The Cypress Action has its own cache management, it also takes care of installing dependencies and launching the web client.

With this configuration we were having an average execution time per runner of about 7:58 minutes with cache and about 8:03 minutes without cache.

The main goal of looking for a more optimal solution was to reduce the time it took each runner to execute the full cycle of:

Install dependencies → Launch Angular server → Run test.

The optimal solution would have been to launch the client on a single machine and launch the runners with the tests attacking that single client, but this solution is not possible in Github Actions so we had to think of some other alternative to reduce the total execution time. For this we focused on some way to avoid installing dependencies on each of the runners, following different approaches.

Objective: not to install dependencies on runners

Install dependencies & build on a separate job

After seeing this repository from Gleb Bahmutov, a former Cypress employee and the main contributor to the Cypress community. We started to think if there was a way to take advantage of the prepare job that we kept “idle” while running lint and unit tests. We thought that by doing something similar to that repository we could get rid of the step of installing dependencies in each of the runners, which was our main goal.

We did two different approaches:

Install dependencies & build on “prepare” job
We followed the steps of Gleb’s solution to the letter, uploading an artifact of build’s dist folder and then download it to each of the 8 runners.

Install dependencies & build on “build” workflow
In the second approach we wanted to go a step further in terms of optimization and tried to take advantage of our workflow build.yml which also runs on each PR, uploading the artifact in this workflow and then downloading it on each of the “runners”. Keeping the prepare job only to wait for lint and unit_tests.

These approaches have some disadvantages, for example, by uploading an artifact to Github Actions we are using storage space, if the workflow that uploads the artifact runs very frequently we may be taking up too much space in our Github workspace. By default, uploaded artifacts are stored for 90 days on GitHub. To avoid keeping so many artifacts, you can define the retention days:

After testing different configurations with build and artifacts, we discarded these options because they were not helping us to optimize the execution time due to the configuration of our web project. So we continued our search for a more optimal solution, which did not involve a build of the project.

Install dependencies & upload node_modules as artifact on “prepare” job

We are telling you right now that this is a bad idea, don’t do it! Never upload node_modules as an artifact to GitHub if you don’t want to die of resources starvation.

The idea was similar to the previous ones, use the prepare job while waiting for lint and unit_tests to install the project dependencies and create a node_modules artifact. The result is that node_modules weighs about 950Mb and the process of uploading an artifact of this size is very slow, so the workflow became much slower.

Add a cache step to retrieve node_modules

Having discarded the options with artifacts upload, our only option was to continue using caches. With our original workflow, we already retrieved the /.npm cache and the /.cache/Cypress cache if they exist. If we want to eliminate the install step in each of our 8 runners, we need to have the node_modules folder already created in each of the 8 machines, to be able to build the client in each of them without installation. Finally it was the chosen configuration and it was as follows:

New configuration

With this configuration, we avoid installing dependencies if a node_modules cache is found, which is exactly what we wanted to do. If for some reason node_modules was not found in cache or node_modules was updated with a new commit, an npm ci would be done updating the node_modules cache.

Results

The results we have obtained are as follows:
(Note: We consider that the results when no cache is found are identical to the previous implementation since in both cases an npm ci was performed. So we have not made differentiation between one case and the other)

Average duration of each runner:
Cache not found: 8 min 3 sec
Before: 7 min 58 sec
After: 6 min 40 sec
Savings: ~1 min 20 sec per runner

Average duration of the complete workflow (summing up parallel runners’ executions):

Cache not found: 1 hour 4 min 22 sec.
Before: 1 hour 3 min 43 sec
After: 53 min 23 sec
Savings: ~10 min 20 sec per workflow execution

This workflow is executed on average about 190 times per month (and it is increasing because there are more and more devs working on the web client).
This means a total saving of: 32 hours 43 minutes and 20 seconds of computation time in Github Actions saved per month.