Dynamic AWS Credentials in GitLab Pipelines with Hashicorp Vault

Just create a set of access and secret keys for your access? For my large corporation.. nope, I don’t think so. I need something more secure.

Jack Lei
12 min readOct 12, 2021
Source: https://aws.amazon.com/security/

Background: I work in a large enterprise company with many teams under different departments, which means very different requirements and large amounts of AWS accounts. Any solution must be scalable and maintainable. Our Cloud team manages the roles and policies associated while our team manages all of the pipelines and access to said roles.

Source: https://www.coffeebeans.io/2020/09/04/deploy-app-amazon-ecs-via-gitlab-ci/

Problem: GitLab pipelines needs a set of credentials to interact with the AWS CLI for various actions. These actions include deployments (CloudFormation, ECS, EKS, CloudFront Invalidations, etc), file upload/retrieval to/from S3. Our Cloud team has some strong security requirements.

  • Avoid creating IAM Users (if possible)
  • Avoid creating long lived credentials (accesskey+secretkey, sshkeypairs, …)
  • Credentials should not be exposed to developers, if they are credentials should be unusable
  • Credentials should be rotated
  • Follow AWS best practices as much as possible (such as use IAM roles instead of IAM users with access key and secret keys)

So why not generate a set of access keys and secret keys from IAM in the AWS Console?

Two reasons: It is not secure and is not scalable. These long lived credentials will need to be used by multiple projects which have many developers. When the AWS CLI runs, the credentials need to be readable and therefore able to be exported. This becomes a risk since a user can use these credentials outside of the pipeline.

Source: https://www.vaultproject.io/

Solution: Hashicorp Vault
Hashicorp Vault is capable of generating AWS credentials if the role has sts::AssumeRole permissions. This is the key. Access to Vault and the credentials are restricted to the GitLab project. Since the authentication and generation of credentials are dynamic, there is nothing that can be retrieved and used by a malicious party.

All aspects revolve around this capability. All pieces will not hold any sensitive data, nor generate any long lived credentials.

Managing Vault

I have chosen to use Hashicorp Terraform for managing Vault. This greatly benefited us:

  • This allowed the entire management of Vault to be contained in a GitLab project and be treated as Code as Infrastructure.
  • No credentials are needed to be used when applying changes (root token is only needed upon initial apply).
  • Terraform will use the same authentication as we are setting up in this write-up (Inception, I know).
  • Remove the need of backup+restore logic. If infrastructure goes down, we just need to rerun the pipeline.

All examples will show Terraform code instead of the Vault CLI commands.

The GitLab project has the following pipeline to allow terrafom validate, plan, and apply:

# Fork of Autodevops' Terraform pipeline template:
# https://gitlab.com/gitlab-org/gitlab-foss/-/blob/master/lib/gitlab/ci/templates/Terraform.gitlab-ci.yml
# Official image for Hashicorp's Terraform. It uses light image which is Alpine
# based as it is much lighter.
stages:
- validate
- build
- test
- deploy
.terraform-jobs:
image:
name: terraform:light
entrypoint:
- '/usr/bin/env'
- 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
# Default output file for Terraform plan
variables:
PLAN: plan.tfplan
JSON_PLAN_FILE: tfplan.json
VAULT_ADDR: https://vault.example.com
TF_HTTP_ADDRESS: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/production
cache:
paths:
- .terraform
- .terraform.lock.hcl
before_script:
- apk add --no-cache vault libcap jq
- setcap cap_ipc_lock= /usr/sbin/vault
- |
[[ -z "$VAULT_TOKEN" ]] && export VAULT_TOKEN=$(vault write -field=token auth/gitlab/login role=vault-provisioner jwt=$CI_JOB_JWT)
- alias convert_report="jq -r '([.resource_changes[]?.change.actions?]|flatten)|{\"create\":(map(select(.==\"create\"))|length),\"update\":(map(select(.==\"update\"))|length),\"delete\":(map(select(.==\"delete\"))|length)}'"
- terraform --version
- terraform init
rules:
- if: $CI_PIPELINE_SOURCE != 'schedule' && $CI_PIPELINE_SOURCE != 'merge_request_event'
when: always
exists:
- '**/*.tf'
validate:
extends: .terraform-jobs
stage: validate
script:
- terraform validate
plan:
extends: .terraform-jobs
stage: build
script:
- terraform plan -out=$PLAN
- "terraform show --json $PLAN | convert_report > $JSON_PLAN_FILE"
artifacts:
paths:
- $PLAN
reports:
terraform: $JSON_PLAN_FILE
drift-detection:
extends: .terraform-jobs
stage: build
script:
- terraform refresh
- terraform plan -detailed-exitcode -out=$PLAN 2> /dev/null || ec=$?
- |
case $ec in
0) echo "No Changes Found"; exit 0;;
1) printf '%s\n' "Command exited with non-zero"; exit 1;;
2) echo "Changes Found, opening Issue";
echo \`\`\`diff > plan.txt;
terraform show -no-color ${PLAN} | tee -a plan.txt;
echo \`\`\` >> plan.txt;
sed -i -e 's/ +/+/g' plan.txt;
sed -i -e 's/ ~/~/g' plan.txt;
sed -i -e 's/ -/-/g' plan.txt;
MESSAGE=$(cat plan.txt);
apk add curl;
curl -X POST -g -H "PRIVATE-TOKEN: ${GITLAB_TOKEN}" \
--data-urlencode "title=Drift has been detected" \
--data-urlencode "description=${MESSAGE}" \
"${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/issues";;
esac
rules:
- if: $CI_PIPELINE_SOURCE == 'schedule'
exists:
- '**/*.tf'
# Separate apply job for manual launching Terraform as it can be destructive
# action.
apply:
extends: .terraform-jobs
stage: deploy
environment:
name: production
script:
- terraform apply -input=false $PLAN
dependencies:
- plan
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $CI_PIPELINE_SOURCE != 'schedule'
when: manual
exists:
- '**/*.tf'

I can further explain the pipeline if asked, otherwise I will focus on the topic at hand.

Code for using GitLab for Terraform state management + Vault provider

terraform {
backend "http" {
}
}
provider "vault" {
address = "https://vault.example.com"
}

Vault Authentication

GitLab will be using the JWT auth method to authenticate with Vault. Vault will use the GitLab JWKS endpoint to validate tokens.

How this works:

  1. When a pipeline job starts, a JWT is created which is also signed by the GitLab server.
  2. As part of the job, the JWT is sent to Vault to authenticate itself.
  3. Vault verifies the JWT by using the public keys found in the JWKS endpoint.

References:

Implementation

Create the JWT auth method for GitLab at the gitlab path. This will allow Vault to know how to validate the JWTs being passed. Once a JWT has been authenticated, we default the Vault token with the aws auth method role.

// Auth method for all GitLab projects
resource "vault_jwt_auth_backend" "gitlab-jwt" {
description = "GitLab authentication via JWKS"
path = "gitlab"
jwks_url = "https://git.example.com/-/jwks"
bound_issuer = "git.example.com"
default_role = "aws"
}

Create the auth method role for all projects. We automatically assign the gitlab-projects-aws-assumed-role policy to all authenticated jobs. The claim mappings allow the Vault Entity Alias to retain the JWT payload as metadata. This will be used to automatically provide access to secrets in the policy.

// Role for the projects to obtain AWS credentials
resource "vault_jwt_auth_backend_role" "aws" {
backend = vault_jwt_auth_backend.gitlab-jwt.path
role_name = "aws"
role_type = "jwt"
token_policies = ["gitlab-projects-aws-assumed-role"]
token_explicit_max_ttl = 600
user_claim = "project_path"
claim_mappings = {
environment = "environment",
environment_protected = "environment_protected",
job_id = "job_id",
namespace_id = "namespace_id",
namespace_path = "namespace_path",
pipeline_id = "pipeline_id",
project_id = "project_id",
project_path = "project_path",
ref = "ref",
ref_protected = "ref_protected",
ref_type = "ref_type",
user_email = "user_email",
user_id = "user_id",
user_login = "user_login"
}
bound_claims = {
iss = "git.example.com"
}
}

We also create a special auth role for apply changes to the Terraform code. The auth role and policy are named vault-provisioner. The project ID in GitLab is 7012, this is important. The bound claim only allows this project to use this auth role and no others.

// Role for the vault-provisioner gitlab project
resource "vault_jwt_auth_backend_role" "vault-provisioner" {
backend = vault_jwt_auth_backend.gitlab-jwt.path
role_name = "vault-provisioner"
role_type = "jwt"
token_policies = ["vault-provisioner"]
token_explicit_max_ttl = 600
user_claim = "project_path"
claim_mappings = {
environment = "environment",
environment_protected = "environment_protected",
job_id = "job_id",
namespace_id = "namespace_id",
namespace_path = "namespace_path",
pipeline_id = "pipeline_id",
project_id = "project_id",
project_path = "project_path",
ref = "ref",
ref_protected = "ref_protected",
ref_type = "ref_type",
user_email = "user_email",
user_id = "user_id",
user_login = "user_login"
}
bound_claims = {
iss = "git.example.com"
project_id = "7012"
}
}

Vault Policies

gitlab-projects-aws-assumed-role

This policy will have access to a shared AWS secrets role. These are for common actions such as running aws cloudformation validate-template or uploading to our company’s shared S3 bucket. Vault internal path for this shared role:

aws/sts/gitlab-tools

This policy will also have access to specific AWS secrets role dynamically mapped to the project’s ID and environment name. Vault internal path for the specific role:

aws/sts/{{identity.entity.aliases.${vault_jwt_auth_backend.gitlab-jwt.accessor}.metadata.project_id}}-{{identity.entity.aliases.${vault_jwt_auth_backend.gitlab-jwt.accessor}.metadata.environment}}

The path will use the metadata from the JWT payload/Vault entity alias.

References:

Implementation

First, we create the gitalb-projects-aws-assumed-role policy which has access to a shared role, a role specific to just the project id, and a role specific to the project id and environment name. This works by using the entity alias metadata as part of the template policy. The project id is not something developers within the project can modify.

// This policy will be given to all gitlab projects authenticating
// with the gitlab jwt auth. The project will be given access to a
// path matching the metadate from the JWT payload.
data "vault_policy_document" "gitlab-projects-aws-assumed-role" {
rule {
path = "${vault_aws_secret_backend.aws.path}/sts/{{identity.entity.aliases.${vault_jwt_auth_backend.gitlab-jwt.accessor}.metadata.project_id}}-{{identity.entity.aliases.${vault_jwt_auth_backend.gitlab-jwt.accessor}.metadata.environment}}"
capabilities = ["create", "update"]
description = "Projects will be given access to the path matching the metadata from the JWT payload"
}
rule {
path = "${vault_aws_secret_backend.aws.path}/sts/gitlab-tools"
capabilities = ["create", "update"]
description = "Common gitlab-tools role for validation and packaging"
}
rule {
path = "${vault_aws_secret_backend.aws.path}/sts/{{identity.entity.aliases.${vault_jwt_auth_backend.gitlab-jwt.accessor}.metadata.project_id}}"
capabilities = ["create", "update"]
description = "Projects will be given access to the path matching the metadata from the JWT payload"
}
}
resource "vault_policy" "gitlab-projects-aws-assumed-role" {
name = "gitlab-projects-aws-assumed-role"
policy = data.vault_policy_document.gitlab-projects-aws-assumed-role.hcl
}

Next, we will create the vault-provisioner for the terraform project maintaining all of this code. This allows us to remove the root token after the initial terraform apply and allows the project to use the same authentication pattern for all GitLab projects that this project is providing AWS role mappings for.

// This policy is only for the vault-config-nextcloud project to 
// provision Vault.
data "vault_policy_document" "vault-provisioner" {
rule {
path = "sys/auth"
capabilities = ["read"]
description = "list auth methods"
}
rule {
path = "auth/token/create"
capabilities = ["update"]
description = "allow creation of token"
}
rule {
path = "sys/auth/${vault_jwt_auth_backend.gitlab-jwt.path}"
capabilities = ["create", "update", "delete", "sudo"]
description = "create, update, delete only the gitlab jwt auth method"
}
rule {
path = "auth/${vault_jwt_auth_backend.gitlab-jwt.path}/*"
capabilities = ["create", "read", "update", "delete", "sudo"]
description = "manage only the gitlab jwt auth method"
}
rule {
path = "sys/policies/acl/${vault_policy.gitlab-projects-aws-assumed-role.name}"
capabilities = ["read"]
description = "manage only the project to role mapping policy"
}
rule {
path = "sys/policies/acl/${vault_policy.gitlab-projects-aws-assumed-role.name}/*"
capabilities = ["read"]
description = "manage only the project to role mapping policy"
}
rule {
path = "sys/policies/acl/${vault_policy.gitlab-projects-all-environments.name}"
capabilities = ["read"]
description = "manage only the project to role mapping policy"
}
rule {
path = "sys/policies/acl/${vault_policy.gitlab-projects-all-environments.name}/*"
capabilities = ["read"]
description = "manage only the project to role mapping policy"
}
rule {
path = "sys/policies/acl/vault-provisioner"
capabilities = ["read"]
description = "manage only the vault-provisioner policy"
}
rule {
path = "sys/policies/acl/vault-provisioner/*"
capabilities = ["read"]
description = "manage only the vault-provisioner policy"
}
rule {
path = "${vault_aws_secret_backend.aws.path}/*"
capabilities = ["create", "read", "update", "delete", "list"]
description = "manage only the AWS secrets manager"
}
rule {
path = "sys/mounts"
capabilities = ["read"]
description = "list existing secrets engines"
}
rule {
path = "sys/mounts/${vault_aws_secret_backend.aws.path}"
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
description = "manage aws secrets engine"
}
rule {
path = "sys/mounts/auth/${vault_jwt_auth_backend.gitlab-jwt.path}/tune"
capabilities = ["read"]
description = "allow for gitlab auth tune"
}
rule {
path = "identity/*"
capabilities = ["create", "read", "update", "delete", "list"]
description = "Create and manage entities and groups"
}
}
resource "vault_policy" "vault-provisioner" {
name = "vault-provisioner"
policy = data.vault_policy_document.vault-provisioner.hcl
}

Secrets Engine + AWS Role Mapping

The AWS Secret Engine generates AWS access credentials dynamically based on IAM policies. The AWS IAM credentials are time-based and are automatically revoked when the Vault lease expires. In our use case, the Vault lease expires when the GitLab job finishes and revokes the JWT.

Since Vault is in AWS, the secret engine will inherit the EC2 role associated. This means no root configuration needs to be done. Our Cloud team can setup IAM policies based on EC2 role.

We will be using the assumed_role type to generate credentials (Vault Docs). Vault will call sts:AssumeRole and return the access key, secret key, and session token to the caller. Cloud team can setup the permission in AWS IAM, not Vault.

A new AWS secret backend role will be created for each project and environment. Example. the GitLab project somenamespace/someproject will be mapped to aws/sts/7103-dev where 7103 is the project id and dev is the environment.

References:

Implementation

We will need to create the Vault secrets engine, this only needs to be defined once and does not need a new entry per mapping. Since I am relying on the EC2 role, I do not need to pass in the root configuration which usually has the access_key, secret_key, and region. If you are not following the same pattern, here are the Vault docs and the corresponding Terraform docs.

// AWS Secrets Engine
resource "vault_aws_secret_backend" "aws" {
path = "aws"
description = "AWS Assumed roles"
default_lease_ttl_seconds = 1800
max_lease_ttl_seconds = 3600
}

Time for the GitLab project to AWS role mapping. In my design, I chose to only pass one ARN into the list of role_arns so the users won’t need to ever know about the ARNs. If there is only one ARN, that is assumed when the Vault role is accessed.

// https://git.example.com/somenamespace/someproject - DEV
resource "vault_aws_secret_backend_role" "someproject-dev" {
backend = vault_aws_secret_backend.aws.path
name = "365-dev"
credential_type = "assumed_role"
role_arns = ["arn:aws:iam::1231231231:role/gitlab-service-role"]
}

If you recall from the policy template, the ${project_id}-${environment_name} secret role was given access. In our case, project id is 365 and environment name is dev.

Design

Process: GitLab pipeline obtaining assumed role credentials

  1. When a job starts, a JWT (as the $CI_JOB_JWT variable) is created and is accessible by the job script.
  2. In the job, we authenticate to Vault with the JWT provided. We get back a Vault token for all following Vault requests.
    * command: export VAULT_TOKEN=$(vault write -field=token auth/gitlab/login jwt=${CI_JOB_JWT})
    * When Vault receives the authentication request through the JWT authentication method, Vault will validate the JWT using GitLab’s JWKS. The JSON Web Key Set (JWKS) is a set of keys containing the public keys used to verify any JSON Web Token (JWT) issued by the authorization server and signed using the RS256 signing algorithm.
    * Upon successful validation, a Vault token will be generated with the attached policy: gitlab-projects-aws-assumed-role. This Vault token will be used for all following Vault requests from the job.
  3. In the job, we request for the credentials by specifying the secrets path
    * command: export CREDS=$(vault write -format=json aws/sts/${CI_PROJECT_ID}-${CI_ENVIRONMENT_NAME} ttl=20m)
    * If the policy allows this project+environment to generate a new set of credentials, then credentials are returned in JSON format.
    * This is a write/POST command because Vault will need to make a request to AWS to create a new set of credentials.
  4. In the job, we will need to set the AWS CLI variables for consumption. All are needed.
    * command: export AWS_ACCESS_KEY_ID=$(echo ${CREDS} | jq -r .data.access_key)
    * command: export AWS_SECRET_ACCESS_KEY=$(echo ${CREDS} | jq -r .data.secret_key)
    * command: export AWS_SESSION_TOKEN=$(echo ${CREDS} | jq -r .data.security_token)
  5. In the job, perform allowed AWS CLI commands as Cloud team has governed the role to do so.
  6. When the job finishes, the JWT provided will be revoked by GitLab.
    * This has a domino effect. The Vault token and the generated AWS assumed role credentials will be revoked.
    * The Vault authentication method and secrets management both have timeouts defined. If any of the timeouts happens before job completes, the AWS assumed role credentials will be revoked.

Implementation

This will be a stripped down version of the pipeline templates to reduce confusion. Also note, the image that I am using is based on python:alpine and has all of the following packages installed: aws cli v2, vault, and jq.

deploy cft dev:
image: $CI_REGISTRY/devops/builders/aws:latest
stage: dev
environment:
name: dev
script:
- >
[[ "$TRACE" ]] && set -x
- |
echo "** Getting AWS credentials"
export VAULT_TOKEN=$(vault write -field=token auth/gitlab/login jwt=${CI_JOB_JWT})
export CREDS=$(vault write -format=json aws/sts/${CI_PROJECT_ID}-${CI_ENVIRONMENT_NAME} ttl=1h)
export AWS_ACCESS_KEY_ID=$(echo ${CREDS} | jq -r .data.access_key)
export AWS_SECRET_ACCESS_KEY=$(echo ${CREDS} | jq -r .data.secret_key)
export AWS_SESSION_TOKEN=$(echo ${CREDS} | jq -r .data.security_token)
aws sts get-caller-identity
rules:
- if: ($CI_COMMIT_BRANCH == 'master' || $CI_COMMIT_BRANCH == 'main'

Diagram

Conclusion

This solution allowed us to scale fast during our massive migration from on-premise infrastructure to AWS. All access from pipelines to AWS can follow the same code as infrastructure development patterns. Developers are able to create merge requests to give their own project’s access. Our DevOps engineers can grant access within minutes.

I am quite happy with our solution. I’d love to hear your thoughts or improvements to the approach than the ones outlined above.

--

--

Jack Lei

Currently a Site Reliability Engineer. Previously a Sr. Software Developer and Sr. DevOps Engineer. https://www.linkedin.com/in/jack-lei