On-demand CI/CD infrastructure with GitLab and AWS Fargate

How to reduce costs and scale GitLab Runner down to zero

Daniel Coutinho de Miranda
CI&T
9 min readJul 22, 2020

--

Photo by Andrew Neel on Unsplash

In a previous article, I explained how to deploy the GitLab Runner manager and Fargate driver on AWS Fargate with no virtual machine setup. In this way, you can have your GitLab CI/CD jobs running serverless.

In the present article, I will show how you can use AWS Lambda functions to stop the Runner manager hosted on AWS Fargate when there are no CI/CD jobs to process and start it when a new pipeline is triggered. This configuration can significantly reduce the costs when you have considerable idle times between builds.

How GitLab Runner works and the challenges to scale it down to zero

GitLab Runner is an open-source software used to run your jobs and send the results back to GitLab. In summary, it works as an agent that polls a GitLab instance from time to time, asking for a pipeline job to be processed. It executes the assigned job and returns the job output and final status to the GitLab instance.

Since GitLab Runner needs to actively query GitLab instance for pending jobs, and not the opposite, it is expected to be always up and running. Given this scenario, a possible approach to achieve the "scale down to zero" behavior when idle is to create some kind of integration with the GitLab instance that we could use for identifying when a new pipeline was started and when no more jobs are pending. For this, we can use two integrations provided by GitLab: Webhooks and GitLab API.

Limitations of our solution

For sake of simplicity, we did not consider more complex scenarios in the present solution. Below I list the most relevant limitations:

  • We consider all jobs from the GitLab project should be processed by the same Runner. Nevertheless, it is not hard to adapt it for the case you need distinct Runners for different types of builds.
  • We did not consider group runners in the present solution. In the present scenario, the runner is specific for processing the builds of a given GitLab project.

Premises

For the scope of this article, we took into account the premises below:

  • All the necessary AWS infrastructure (VPC network, subnet, security group, and Fargate cluster) was previously created and configured in your AWS project.
  • Creating the container image and Task Definition for the Runner manager are also not in the scope since they were presented in a previous article.
  • The AWS SAM client will be used in all deployments mentioned throughout the text. The reader may refer to AWS documentation for instructions on how to install and use it.

Solution overview

The image below presents a high-level overview of the solution.

Image showing all components involved in the solution: for example the Lambda functions and the Runner Fargate Task.

The core of the solution consists of two AWS Lambda functions:

  • The first function is triggered by a GitLab Webhook when pipeline events occur in a given GitLab project. The function is responsible for starting a new GitLab Runner manager Fargate Task, in case it is not already running. Since the AWS Fargate driver uses SSH to connect to other Fargate Tasks, the function will also create an inbound rule in a specified Security Group to allow connections from the Runner's container IP address.
  • The second function is triggered on a regular schedule using AWS CloudWatch Events rule and it is responsible for stopping the GitLab Runner manager Fargate Task when no more CI Jobs are pending execution. In order to discover pending CI Jobs, it queries the GitLab Jobs API for the given GitLab project. Finally, the function removes the inbound rule created in the Security Group by the first function.

Please note this is a simplistic architecture to make it easier for the reader to understand the flow and the idea behind this article. In a more micro-service oriented architecture, the functions presented above could be split into multiple functions with a more focused scope.

It is worth mentioning AWS offers a generous free usage tier for using Lambda functions. Free tier also applies to API Gateway and CloudWatch.

From here, I will focus on detailing the steps necessary to implement the presented solution:

  1. Store AWS Lambda functions' configs in AWS Parameter Store
  2. Create the API Gateway and AWS Lambda function to start the GitLab Runner
  3. Configure a GitLab Webhook to trigger the function on pipeline events
  4. Create the CloudWatch event rule and AWS Lambda function to stop GitLab Runner when idle
  5. Test the configuration

Step 1: Store AWS Lambda functions’ configs in AWS Parameter Store

Both Lambda functions that are part of the proposed solution will need to receive some information about the available AWS infrastructure or your GitLab project to be able to start/stop the Runner manager Fargate Task. We will use the AWS Parameter Store to centralize this information.

  1. Go to AWS Parameter Store in your AWS project.
  2. Click Create parameter.
  3. Name it lambda-gitlab-runner. Note that the functions will search for a parameter having exactly this name.
  4. For the Type field, you can choose SecureStringor String.
  5. For the Value field, you should fill it with the following JSON, replacing the attribute values by the correct information:
{
"clusterName":"yourClusterName",
"subnet":"subnet-XYZ",
"securityGroup":"sg-XYZ",
"runnerTaskDefinition":"yourTaskDefinition",
"gitlabProjectId":"yourPrivateId",
"gitlabApiPrivateToken":"yourPrivateToken",
"gitlabHeaderToken":"yourGitLabToken"
}

Below is the explanation for each attribute:

  • clusterName: Name of the Fargate cluster where the GitLab Runner Task should be started/stopped.
  • subnet: Subnet where the GitLab Runner Task should be started/stopped.
  • securityGroup: Security group used by your GitLab Runner Task.
  • runnerTaskDefinition: Task Definition used to create the GitLab Runner Task.
  • gitlabProjectId: Your GitLab project id. You can find this information in your GitLab project initial page, below the project name.
  • gitlabApiPrivateToken: GitLab personal access token necessary to use the GitLab API. If you don't have one already created, just follow the GitLab documentation for generating one.
  • gitlabHeaderToken: String you will use as the GitLab Webhook Secret Token. You can think of it as a password you create for authenticating requests between GitLab Webhook and the AWS Lambda function.

6. Click Create parameter.

Step 2: Create the API Gateway and AWS Lambda function to start the GitLab Runner

To simplify this step, we provided a public Git repository containing the function implementation, as well as an AWS SAM template we will use for deploying both the API Gateway and AWS Lambda function.

Below we show two commands you will need to execute to deploy the resources. Please refer to the function documentation for more information about the necessary IAM permissions your AWS user will need to successfully complete the deploy.

sam package \
--template-file template.yml \
--output-template-file package.yml \
--s3-bucket <your-s3-bucket>
sam deploy \
--template-file package.yml \
--stack-name <your-stack-name> \
--capabilities CAPABILITY_IAM

Note: Remember to replace the S3 bucket and stack name by the correct values.

If everything works as expected, the deploy will output the Amazon Resource Name (ARN) of the function, as well as the API Gateway endpoint URL to be used to trigger the function. You will need to provide this URL when configuring GitLab Webhook in the next step.

Detailing how this function works

This section describes in more detail how the function works. The reader not interested in a deeper understanding of it may jump to the next section.

In summary, the function performs the following steps:

  • Read config parameters: When the function is started, it reads the lambda-gitlab-runner parameter from the AWS Parameter Store. As presented previously, this parameter contains a JSON with several configuration values to be used by the function.
  • Authentication: The function uses the gitlabHeaderToken configuration to compare its value with the value received within the "X-Gitlab-Token" HTTP header of the request. If those values differ, the authentication will fail.
  • Start a new Runner Fargate Task: If there is no Runner manager Task is currently running, a new Task is started with a specific value in the “started-by” field, in order to make it easier to be identified by the other function we will use to stop the Runner.
  • Add an inbound rule to Security Group: The function creates an inbound rule in the Security Group specified by the securityGroup configuration parameter to allow SSH connections from the Runner manager.

Below we show a Python code that is the core part of the function, where you can identify some of the steps described above.

def _process_request(cluster_name, subnet, security_group, task_definition):   message = None   task_count = count_tasks_running(cluster_name, task_definition)   if task_count == 0:      task_arn = _run_task(
cluster_name, task_definition, subnet, security_group
)
_create_ssh_inbound_rule(cluster_name, security_group, task_arn) message = "Task successfully created" else:
LOGGER.info("Task already exist, will abort")
message = "Task already exist on cluster"
return {"message": message}

Step 3: Configure a GitLab Webhook to trigger the function on pipeline events

Below we show how to configure GitLab Webhook to trigger the function every-time a pipeline event happens in your GitLab project.

  1. In the GitLab project, go to the Settings menu and click in Webhooks.
  2. For the Secret Token field, use the same value as you used for the gitlabHeaderToken configuration in the AWS Parameter Store.
  3. Fill the URL field with the API Gateway endpoint URL printed in your console when you deployed the function.
  4. In the Trigger field, leave only the Pipeline events checkbox selected.
  5. Click Add webhook.

Step 4: Create the CloudWatch event rule and AWS Lambda function to stop GitLab Runner when idle

Similar to what we have done for the first AWS Lambda function, we provided another public Git repository containing the function implementation as well as an AWS SAM template we will use for deploying both the CloudWatch event and AWS Lambda function.

You will need to use similar commands to deploy this new function. Please refer to the function documentation for information about the necessary IAM permissions for the deploy.

sam package \
--template-file template.yml \
--output-template-file package.yml \
--s3-bucket <your-s3-bucket>
sam deploy \
--template-file package.yml \
--stack-name <your-stack-name> \
--capabilities CAPABILITY_IAM

Note: Remember to replace the S3 bucket and stack name by the correct values.

If everything works as expected, the deploy will output the function ARN.

Note: in the default settings, the CloudWatch Events rule will trigger this function every 10 minutes. You can customize this value in the SAM template file.

Detailing how this function works

The reader not interested in a deeper understanding of the function may jump to the next section.

In summary, the function performs the following steps:

  • Search Runner managers currently running: The function initially searches for all Runner Fargate Tasks created by the function presented in Step 2. For that, it uses the “started-by” field of the Fargate Task.
  • Check if there are pending Jobs to process: The function then uses the GitLab API to search for jobs in the GitLab project that are currently in pending or running states, ignoring those being processed by shared Runners.
  • Remove the Security Group inbound rule: If no CI job to process is found, it will remove the inbound rule used to allow SSH connections from the Runner manager.
  • Stop Runner Fargate Tasks: finally, the function will stop the Runner manager.

Below we show the Python code for the core part of the function.

def _process_request(
cluster_name, security_group, gitlab_token, gitlab_project
):
runner_arn_list = _search_for_runner_manager_tasks(cluster_name) if len(runner_arn_list) > 0: exist_job = _exist_ci_jobs_being_processed(
gitlab_project, gitlab_token
)
if not exist_job:
_remove_ssh_inbound_rules(
cluster_name, runner_arn_list, security_group
)
_stop_runner_managers(cluster_name, runner_arn_list)

Step 5: Test the configuration

At this point, you should be able to trigger your pipeline and check if the Runner is properly started and stopped by the Lambda functions.

  1. In your GitLab project, go to the CI/CD menu and click in Pipelines.
  2. Click in Run Pipeline.
  3. Select the correct branch in the Run for field and add any variable your build requires in the Variables field.
  4. Click Run Pipeline.

Conclusion

This article presented a tutorial on how to use AWS Lambda functions to keep the GitLab Runner up and running in AWS Fargate only during the time there are CI jobs to process. We tried to focus on a simple solution but we believe it can be enhanced and evolved to fit more complex scenarios.

I hope you found this article helpful. Thanks for reading!

--

--

Daniel Coutinho de Miranda
CI&T
Writer for

software engineer & google cloud certified architect