Securing DevOps workhorse
Considerations for deploying and securing DevOps remote agents
Migration to DevOps culture with Continuous Integration and Delivery infrastructure is an important part of overall digital transformation within an organization. As part of this process, in order to automate build and deployment, most organizations end up developing a new DevOps platform that addresses the various operational and security requirements of the organization.
Most of the components of DevOps infrastructure like code repository, registry/artifact services are pretty well understood services within the organization and typically operate and are secured very similarly across on-premise infrastructure and cloud as covered by CNCF security white paper and my previous article in Geek Culture. The placement of agents, the workhorse, that run the build and deployment in cloud provides unique capabilities that can improve the security of devops infrastructure.
Most of the CI/CD automation platforms like Jenkins, Gitlab CI, Atlassian bamboo, Github actions, Terraform Cloud, AWS Codepipeline, Azure DevOps Pipeline support remote agent capabilities (called different names) and provide varying level of information that align with its use in cloud and associated security (available in the link). This article provides a common set of security capability that should be evaluated and considerations specific to cloud.
Considerations
Before starting the journey towards securing the remote agents, it is important to understand some of the basic considerations.
Self-Hosted or Platform agents/runner
Majority of “DevOps as a Service” platforms like Github, AWS Codepipeline, Azure DevOps pipeline provide agents that can be used to run build and deployment processes. Organization may choose to restrict the use of out-of-box hosted platform for build and deployment purpose based on following considerations
- Geo-restrictions: Organization may want to ensure that all the build processes should run in specific geography to reduce chance of leak of proprietary code.
- Use Machine identity for access to service: In case the build process needs to access shared services and other dependencies, machine ID of agent can be used to authenticate. This removes the need to embed passwords in build process. Such a setup, if required or preferred, is not possible with platform agents.
- Custom security controls: Organization may want to use specific security controls like EDR, log collection agents, vulnerability scanners which can not be deployed on platform hosted environment.
- Access restriction: by reducing external locations and repositories that the build process can use to retrieve dependencies like libraries, container images, etc.
It is important to ensure that any such restrictions for the use of platform agents/runner are enforceable at platform level by organization either through preventive controls like platform policies/configuration or using detective/corrective controls like Security Posture Management platforms.
Virtual Machines or Containers
Most of the build and deployment steps of the pipeline need a compute environment to generate the artifact and deploy these artifacts. At this time, these options are Virtual Machines, containers, and serverless/functions platforms. In majority of case, due to specific constraints of serverless platforms (limited runtime, starts on demand, etc) are not a viable alternative for build.
Virtual Machines are typically deployed with agent running with non-root account. It is important to ensure that platform has built-in capability to cleanup the built artifacts after every run if shared across multiple teams and projects.
As an alternative, containers with pre-packaged scripts and tools to build the code pulled from repository during the process can be used to ensure consistent environment for build and deployment. Container based approach allows setting up isolated environment for each pipeline that does not interfere with other build pipelines. In addition to that it can be used to store content (e.g. terraform state files) across build. At the same time, there should be adequate control in place to ensure that images and containers are secure across multiple runs of a pipeline.
Containers in combination with appropriate technology in cloud like Google Cloud Run or AWS Fargate can reduce the security responsibility of user in shared responsibility model and improve overall security of build infrastructure.
Shared or dedicated
An important design decision is to identify whether the remote agent VM or container will be shared across one or multiple applications/pipelines. In case of shared model, it is recommended that application/pipeline specific account be used to run CI/CD pipeline to limit the possibility of credential and content leaks/sharing across applications. Such requirement puts additional burden of managing application specific accounts, including creating local accounts, as part of application on-boarding process.
Securing remote agent
Remote agents should be secured by implementing applicable security controls to the platform in line with V.L.A.D.R or equivalent approach.
Vulnerability and drifts
At the very basic level, it is important to ensure that private or well known base images be used to create the VM image (e.g. AMI) or container image for remote agents.
Regardless of use of VM or container, it is important to have standard security controls like vulnerability scanning (long running container and/or VM), runtime threat protections (EDR) built into the remote agent compute as part of devops process. In addition to that it is important to have configuration drift monitoring within the VM and/or container to ensure that any insecure configuration can be identified and remediated. This may be a lower priority in case these containers are ephemeral and create anew for every new build execution.
In addition to the remote agent, monitoring configuration drift associated with devops platform is also important. This would include scanning the platform configuration and pipeline configuration to ensure that users can not use platform runner if not authorized to do so. Alternatively, identification of “rogue” remote agent should also be an important part of such scans to reduce possibility of code and credential leaks.
Logging
It is important to ensure that remote agent logs are collected and aggregated for monitoring, health check and future forensics. This is in addition to the standard log aggregation performed across standard Operating System and critical service logs.
Access
Depending up-on the type of remote agent, it may either run as service listening for incoming connection (e.g. Jenkins) or initiate connection to DevOps platform (e.g. Gitlabs, Bamboo, Azure DevOps, etc). Depending on the type of connection i.e. incoming or outgoing, the security posture for remote agent varies. Majority of the platforms leverage outgoing connection model thus reducing network attack surface. At the same time, other issues like DNS poisoning, enforcing SSL connectivity, and similar issues associated with outgoing connections should be considered to ensure secure connectivity between remote agent and devops platform. In addition to that, appropriate controls like firewall, network segmentation should be in place to ensure that CI/CD pipeline can not access external locations and repository not authorized.
It is important to ensure that each remote agent is identified (through token or certificate, as the case may be) uniquely. In case agent is being packaged as an VM or container image, ensure that registration token is not stored in image. The initial script should retrieve the token from vault (authenticated using machine identity) and then use it to register the runner with unique identity. Such an approach can remove possibility of reuse of tokens and identities.
In addition to that setting a token/registration expiry can ensure that remote agent go through a regular certification process which ensures appropriate review of need, security level, access and other controls of the agent.
If remote agents are deployed in application specific landing zone for deployment purpose, ensure that appropriate network and access controls are in place to reduce blast radius.
In addition to that it is important to split the execution of build and deployment pipelines with very different permission requirements to reduce possibility of over-provisioning of account access.
Data
Securing the artifacts retrieved (e.g. code), generated (e.g. executable) and pushed out (e.g. packages) during the pipeline execution should be adequately secured for provenance purpose. This can be achieved by ensuring that all the components of the process are appropriately packaged and signed for future reference.
Securing the secrets used as part of build and deployment process is an important part of data protection. Ensure adequate controls are present to not store the secrets on the runner. In case of shared, long running remote agents, deploy credential scanners that can identify build processes that are leaking credentials. In addition to that there may be other similar intermediate artifacts like terraform state files that may contain sensitive data. Such data should be stored securely between different runs.
In addition to that, use of platform specific capabilities like defining pull_policy can ensure that content of appropriate provenance is used as part of build process.
Remote agents should not be used for public repository as explained well by Github.
Resilience
Resilience of the platform can be achieved by defining remote runner groups in devops platform that contain two or more remote runners available.
Remote agents form an important part of the devops infrastructure and ensure adequate control for these are sometime missed during development. Securing these agents is an important part of ensuring a secure software supply chain exists.