Dynamic build agents in Jenkins using AWS

Published in

Globant

10 min readSep 28, 2021

Introduction

Did you ever experience your Jenkins build in queue waiting significantly for the next available executor ? Turns out that other collaborators in your project would be making changes to code simultaneously leading to all the available executors being taken up by builds triggered from their changes. It might become quite frustrating if yours is an urgent bug fix or something that is nearing its deadline of completion. Add to that, delay in execution might also lead to problems being detected late and fixes becoming more expensive.

In this article, we will discuss the below points:
1- Solution
2- Prerequisites
3- Configuration in AWS
4- Installing plugin in Jenkins
5- Configuring EC2-fleet plugin in Jenkins
6- Plugin in Action
7- Troubleshooting
8- Summary
9- References

1- Solution

Jenkins offers running jobs on slaves, enabling us to scale up the number of simultaneous jobs we can run, surpassing the capacity of the master.Traditionally, we would think to scale up by adding additional servers and configuring them to be part of our cluster. But this would involve upfront cost of setting them up and recurring maintenance. We also have to consider that there could be high demand phases where we will be utilizing most of the resources and vice versa low demand phases where our resources would be idle. Therefore a solution dynamic enough to scale as per our workloads should be optimal.

We can utilize the “pay as you go” model of Cloud wherein we do not have to provision the infrastructure in advance and will be billed as per our usage.It also offers features like Auto Scaling which helps to scale in/out a fault tolerant infrastructure capacity based on the workload.

Jenkins offers an “EC2-Fleet” plugin to provision the build agents in AWS and utilize Auto Scaling feature to scale up or down depending on the jobs in the queue. It also replaces instances that were terminated as a result of demand spikes in specific Spot Instance pools.(Given if we have configured our capacity to use Spot instances)

2- Prerequisites

1. An AWS IAM account which has permissions to EC2 and AutoScaling actions.

2. A user in Jenkins having permissions to install and configure plugins.

3- Configuration in AWS

1. Go to the EC2 console > Key Pairs. Click on the “Create Key Pair” button. The key will be used when we configure our Jenkins plugin to launch the agents via ssh.

a) Provide the name of the key pair.

b) Select RSA as key pair type and .pem format

This will generate the key, make sure to download it and save it. It will be required in the setup later.

2. Go to EC2 console > Launch Templates. Click on the “Create launch template” button.

Auto Scaling offers a launch template and launch configuration to define the configuration information used to launch an instance. However, launch configuration is the legacy version and launch template is recommended owing to the ability to have multiple versions, support provisioning both On-demand and Spot instances among many other benefits over launch configuration. Therefore we would be making use of the launch template for our implementation.

We can configure

a) AMI used to launch the instance. We can use our own customized AMIs as well.

b) Instance type depending upon the number of concurrent builds and workload we would want to run on these instances.

c) The ssh key pair to login into the instances once they are launched.

d) Security group settings.

e) Spot instance details if we plan to utilize Spot instances to have cost saving benefits.

f) User data that will be executed at the launch of the instance.

Once the template is created, it will be visible under “Launch Templates” section reflecting its name, default version, latest version among the other details :

3. Now that the launch template is available we can create the Auto Scaling group.Go to EC2 console > Auto Scaling groups then click on the “Create an Auto Scaling group” button.

We need to configure the Auto Scaling group as per our requirement :

a) Select the launch template and the version created in earlier step from the drop down list in Launch Templates section :

b) Define network settings for our Auto Scaling group like what VPC and subnets it will use. Use of multiple subnets within the selected VPC is advised to ensure high availability and fault tolerance.

c) We might also want to attach a load balancer to this configuration which is optional.

d) Configure the group size wherein we can mention the minimum,maximum and desired capacity of our Auto Scaling group. The settings in the EC2 fleet plugin will override the numbers specified here.

e) Notification settings for any scale in/out activity such as sending email is optional.

Once created the Auto Scaling group will be visible under the Auto Scaling groups section :

4- Installing plugin in Jenkins

Install the latest version of the EC2-Fleet plugin in Jenkins.

Go to Manage Jenkins > Plugin Manager then install EC2 Fleet Jenkins Plugin

5- Configuring EC2-fleet plugin in Jenkins

Go to Manage Jenkins > Manage Nodes and Clouds > Configure Clouds > Select “Amazon EC2 Fleet”

2. We need to configure our AWS credentials using our Access key ID and Secret access key of the IAM user. The fleet list will be available only once the AWS region and credentials are specified. As a security best practice we should scope the credentials to “System” rather than “Global”.

3. Select the AWS Region where we have created our Auto Scaling group.

4. The list of the Auto Scaling groups available will be populated and we choose the one we created. Click on “Test Connection” to verify if its working :

5. We now have to configure how the agents will be launched via “Launcher” configuration.

a) Select “Launch agents via SSH” from the drop down list.

b) Select “ec2-user” from the drop down list under the Credentials section.

c) Add the ssh key to be used by ec2-user by selecting the “SSH Username with private key” option from the drop down list. We will use the key downloaded earlier from the AWS EC2 console and paste its content into the text area of Key.

d) Select “Non verifying Verification Strategy” from the “Host Key Verification Strategy” drop down list. This option is useful where we use Spot Instances, since they have a random SSH host fingerprint.

6. We can check the “Private IP” box if Jenkins master and the agents are either in the same or peer network to enable communication between them using a private IP address. If this option is not selected Jenkins master will utilize the public IP address of the agents to communicate. Hence the option has to be selected based on your network configuration.

7. Specify the label for our fleet which will be later utilized in our pipeline or freestyle jobs to have the builds run on our Cloud agents.

8. Configure “Max Idle Minutes Before Scaledown” to a non-zero value of your choice. This value will determine how long an agent can remain idle before it is scaled down. Setting it to a value of 0 means it will never be scaled down.

9. Determine the size of our cluster by defining “Minimum Cluster Size” and “Maximum Cluster Size”. The values mentioned here will override capacity defined in our original Auto Scaling group configuration.

10. Configure the number of executors based on the number of concurrent builds that run based on the physical capacity of AWS instances in Auto Scaling group.We have to be careful to not define too many executors into one machine which will result in increased build execution times.

6- Plugin in Action

Once the plugin is configured, the status of the fleet will show under the Jenkins dashboard:

Since we have set the minimum capacity of our fleet to 0 and there is no build running, the value for nodes and target is also 0.

We can configure our pipeline to use the Cloud agents using the agent directive within the pipeline block.

The agent directive can be also set within individual stages to restrict only those stages within the pipeline to utilize the Cloud agents. To enable freestyle jobs to utilize the Cloud agent, we can mention the agent within the “Label expression” available with “Restrict where this project can be run” option within the General tab in the job configuration page.

Once either a pipeline or freestyle job configured to utilize the Cloud agent is run we can see below changes in Jenkins :

Fleet status shows launching (the id appended to the name of the fleet is the id of the EC2 instance that is being launched in AWS)

The agent will now reflect under Dashboard > Nodes

Point to note :

We set “Minimum Cluster Size” in our fleet setting to 0 therefore the build agent has to wait for the instance to be available. This can be seen from both the pipeline logs as well as the build agent logs :

The agent log shows Jenkins is trying to connect to the build agent but the instance is not yet available therefore there will be some delay in getting our build started. If your builds are time sensitive and cannot afford to wait, make sure that you have a certain number of instances already running and available.

Once agent is available and connect, we can see the status in the agent logs :

We can see the capacity updated in our Auto Scaling group :

Once build completes, if there is no other build to run the agent will be allowed to run idle as per the value configured within the “Max Idle Minutes Before Scaledown” in our fleet settings which in our case is 5 minutes. We have to carefully choose this value otherwise the agents will be scaled down too soon and then have to be relaunched for new builds to run.

Once the agent is terminated we can verify it in the Activity History of our Auto Scaling group

The capacity in Auto Scaling group will also reflect the same :

7- Troubleshooting

In case there are java exceptions in your agent log stating “Java not found” , you can configure to install java on your build agents using user data:

#!/bin/bash

yum -y update

yum install -y java-1.8.0-openjdk.x86_64

In case there are connectivity issues, check if the rules in the security group associated with the Auto Scaling group allow the traffic. In case the Jenkins and the EC2 fleet are in different networks, verify if the instance has a public IP address to connect to.

8- Summary

Our main objective was to set up a dynamic capacity in Jenkins which can automatically scale in/out as per our workloads using Jenkins plugin and AWS features.The above setup is flexible but still we need to carefully calculate below areas to optimize the efficiency suited for our workloads :

Cluster size for our fleet as this will determine the minimum and maximum capacity that will be available.
Number of executors that we would want to run on each machine.
Configuring idle time disconnection for our fleet.
Inclusion of Spot instances in our fleet to maximize cost savings.

We also might have to revisit the capacity in case the number of builds waiting for an agent increases more frequently.