How to lower your costs of Gitlab CI using AWS ?

Published in

Legalstart

7 min readJun 7, 2022

Nowadays, having a great CI/CD pipelines is very important to sustain the productivity of a tech team on many levels.

Usually what you are looking for, especially in bigger teams, is a good balance between efficiency (speed of jobs) and costs. Indeed, the more your team grows, the more projects you may have and the more CI jobs you might have to run (concurrently or not).

One very famous CI/CD platform, very well integrated is Gitlab CI. It runs along your git repositories inside the Gitlab infrastructure and relies on what is called a Gitlab Runner. You can think of Gitlab runners as “orchestrators” for running jobs with different type of executions (docker, shell on the runner itself, docker-machine, etc…).

Here we will show some ways to reduce CI costs on Gitlab taking advantage of the different types of runners offered by Gitlab:

Spot Instances requested when needed
Custom Gitlab runner on AWS Lambda

Using AWS Spot instances to run jobs

On AWS, one can make substantial savings by using AWS spot instances. These are instances not “reserved” for your usage and that can be interrupted if there is a lot of demand on the type of instance your are requesting. This comes with the possibility to save over 70% on the regular price for the same performances and type of instance.

When it comes to running CI jobs, this would be a great opportunity, as these jobs usually don’t run for several hours/days, meaning we are less at risk of being interrupted because of the nature of spot instances. As a matter of fact, we can do that by using a great project from Niek Palm, Terraform module for running Gitlab runner on AWS.

This project is using the docker+machine Gitlab runner executor, meaning for each job queued on the runner, it will launch an AWS instance for running the job. The particularity of this project is that is permits to request some spot instances to run jobs with Gitlab.

Pros:

cost savings
auto scale as needed
common cache on S3 for jobs

Cons:

might take some time for a spot request to be fulfilled (it might sometimes take around 1–2 minutes for a job to start, waiting for an instance)
need one runner per instance type requested for spot instances (for example, if you want m5.large instances and c5.large instances in order to spread the risk of a spot instance request not being fulfilled because of no capacity available, you have to have 2 different Gitlab runners but Gitlab permits to have multiple runners on one single instance, so it might not be too problematic)

Using AWS Lambda to run jobs

By default, running jobs on AWS Lambda is not possible out of the box on Gitlab but it might be of interest because it’s extremely fast to start jobs, doesn’t cost too much (as you pay only for the time spent running jobs) and can handle quite some concurrency.

To do so, you can use a Gitlab runner with a custom executor. This implies to write your own implementation of the way of handling jobs based on the Gitlab documentation.

Let’s see how to do so.

The runner itself

When registering the runner, we tell Gitlab it’s a custom executor and give it the link to the script that needs to run when executing the jobs’ script:

gitlab-runner register   \
    --non-interactive \
    --registration-token="<YOUR_GITLAB_REGISTRATION_TOKEN>" \
    --url="<YOUR_GITLAB_URL>" \
    --limit=100 \
    --executor custom \
    --custom-run-exec=executor.sh \
    --builds-dir=/tmp/builds \
    --cache-dir=/tmp/cache \
    --description "gitlab-runner-with-lambda-executor"

The bash wrapper (executor.sh)

Because of a limitation in the custom executor, we cannot call directly the python script bellow, so we make a little bash wrapper around it (called by the runner):

#!/bin/bashif [[ "\$2" = "build_script" ]]; then
    python3 /usr/local/bin/lambda_executor.py "\$1"
fi

The runner script (lambda_executor.py)

The runner script basically needs to call the lambda function passing it the script that needs to run along with the environment variables.

Here is an example of a job written in a Gitlab CI yaml file.

my_job:
  script:
    - echo "Hello World"
    - echo "Goobdye"

The script of the CI job itself, (i.e the two echo commands) is automatically stored and passed to the executor as a tmp bash script by Gitlab itself. It is given to the executor run script as the first argument. So we first parse this script/commands itself:

with open(sys.argv[1]) as file:
    command = file.read()

Then all the environments variables are prefixed with CUSTOM_ENV_ as stated per the Gitlab documentation so we extract them and prepare the environment variables to pass to the Lambda function, without any prefix:

job_environment = {
    name[len("CUSTOM_ENV_"):] : value
    for name, value in os.environ.items()
    if name.startswith("CUSTOM_ENV_")
}

Then we prepare the payload to send to the Lambda function with the command and the environment variables and we call the Lambda function synchronously. We use the Tail Logtype to have the execution output in the response.

payload = json.dumps({
    "command": command,
    "environment": job_environment,
})# we put a high read_timeout in order to let the lambda the time to finish
client = boto3.client('lambda', region_name="<AWS_REGION>", config=Config(read_timeout=1200))
response = client.invoke(
    FunctionName='test-lambda-function-arn',
    Payload=payload,
    LogType='Tail'
)

We then get back the response and get the output from the Lambda function to print to the Gitlab CI console:

json_result = response['Payload'].read().decode('utf-8')
result = json.loads(json_result)
print(result["output"])

Finally we return an error code if the Lambda function was unsuccessful so that the jobs can fail properly in the Gitlab UI:

if result["return_code"] != 0:
    # we terminate as recommanded by gitlab documentation
    # https://docs.gitlab.com/runner/executors/custom.html#build-failure
    exit(os.environ["BUILD_FAILURE_EXIT_CODE"])

That’s it ! This is enough for the gitlab runner to call a Lambda function for each job queued on the runner.

Now let’s see the code that runs the CI scripts themselves.

The Lambda function

First, you need a Lambda function setup with a git executable available. This can be done by using a Git Layer on your function. It also needs the Gitpython package for cloning from the Python Lambda function.

With Git available, the function needs to prepare the Lambda environment with the custom variables passed by the executor:

environment_variables = event["environment"]
os.environ.update(environment_variables)

Then we make sure that we run from a clean filesystem (/tmp directory as this is where Lambda has the right to write) and create the required directories needed by the runner:

shutil.rmtree("/tmp/builds", ignore_errors=True)
os.mkdir("/tmp/builds")
shutil.rmtree("/tmp/cache", ignore_errors=True)
os.mkdir("/tmp/cache")

Then we can clone the project repository surely needed for running the scripts in it:

Repo.clone_from(
    (
        "https://gitlab-ci-token:"
        + os.environ["CI_JOB_TOKEN"]
        + "@"
        + "<YOUR_GITLAB_URL>"
        + os.environ["CI_PROJECT_PATH]"
        + ".git"
    ),
    "/tmp/builds/" + os.environ["CI_PROJECT_PATH"],
    # 99% of the time we only need a shallow clone
    depth=1,
    single_branch=True,
    branch=os.environ["CI_COMMIT_REF_NAME"],
)

Inside the command argument in the Lambda payload lies the bash script written in a Gitlab CI job, in the script key. As it is passed as a string, we must write it on the disk and make this file executable:

with open("/tmp/script.sh", "w") as script:
    script.write(event["command"])st = os.stat("/tmp/script.sh")
os.chmod("/tmp/script.sh", st.st_mode | stat.S_IEXEC)

Before calling the script, we make available to the script (via the PATH and PYTHONPATH) the places where the executables installed via pip are stored:

# the copy is just to be extra-safe
copy_env_variables = os.environ.copy()
# for all the binary added by the requirements.txt
lambda_root_dir = os.environ["LAMBDA_TASK_ROOT"]
# we add lambda_root_dir for git-lfs
# and lambda_root_dir/bin for binary installed by pip
copy_env_variables["PATH"] = (
    copy_env_variables["PATH"] + f":{lambda_root_dir}:{lambda_root_dir}/bin"
)
copy_env_variables["PYTHONPATH"] = (
    copy_env_variables.get("PYTHONPATH", "") + ":" + lambda_root_dir
)

Now that everything is in place, we can call the script itself and get back the result to return to the runner:

proc = subprocess.Popen(
    "/tmp/script.sh",
    shell=False,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    env=copy_env_variables,
)
(stdout, _) = proc.communicate()return {
    "return_code": proc.returncode,
    "output": stdout.decode("utf-8"),
}

Et voilà ! You now have a Lambda function that can run some Gitlab CI jobs, orchestrated by a custom Gitlab runner.

The packaging of the Lambda function

With the code above, you can now package your function along with other needed binaries (such as linters or others). You just need to have a requirements.txt file with your dependencies, install it locally alongside the above function and then zip everything to be uploaded as the code for the function.

python3 -m pip install -r requirements.txt -t  ./
zip -r archive.zip . -x \*.pyc *.git*

Notes and downsides

As we are running on AWS Lambda, this comes with some limitations that needs to be pointed out:

the execution is limited to 15 minutes before reaching a timeout
the deployment package cannot be too big, so not too many executables can be stored (50MB zipped)
per default, only the /tmp directory is writeable

Thanks to https://github.com/jean553/gitlab-runner-lambda-executor for the inspiration on this topic.

Complete snippets can be found on my Github:

Conclusions

These are 2 ways of reducing Gitlab CI costs drastically along with offering some dynamic auto scaling features for CI jobs in Gitlab. We are extensively using it today at Legalstart with great success for hundreds of jobs daily, without exploding our AWS bill !

Feel free to reach out for more details or just to chat about anything around infra/devops/CI/Terraform etc… 😃