GitLab pipeline template for environment variable substitution

Anne Lohmeijer
7 min readJul 19, 2023

--

Photo by Roman Pentin on Unsplash

The fields of Platform -, Machine Learning (ML) -, and Site Reliability Engineering (SRE) have one thing in common: the craft of building and maintaining scalable and reliable platforms that can run their applications on premise or in the cloud.

These platforms can vary from hosting simple web applications to scalable and dynamic ML platforms, to highly advanced super systems like Netflix or Spotify.

While developing these platforms, engineers deploy to multiple environments to ensure a seaming-less experience for their developers and users.

One common challenge is that some variables needed for deployment are only available during runtime, i.e. when resources are deployed. Linux has several packages to achieve this, e.g. sed or envsubst. They are both useful and both have their limitations.

After multiple iterations, the deployment pipelines of our cloud infra got so lengthy with sed commands, that we decided to create a Gitlab pipeline template that would solve this issue for once and for all.

The problem that we face is deploying cloud resources with infrastructure as code (IaC), using Terraform as the tool of choice, with state files that we host ourself in a remote backend, to a multi environment platform.

Does IaC or Terraform sound like gibberish to you? Then first dive into those topics.

Let’s outline the stack

Terraform is the IaC tool of choice here; it allows you to declare your cloud infrastructure in code and deploy it in pipelines. Compare this to manual deployments or alterations that pose a challenge: it is manual labour, hard to keep track of changes, and therefore hard to reproduce anything.

The state of the deployed infrastructure is kept track of in so-called state files. You can keep them on your local machine, when you work in a team however, they are stored remotely in a storage account in your cloud provider (e.g. S3 bucket in Amazon or Storage Account in Azure). Using the credentials of your cloud provider, you allow Terraform to connect to the storage account such that it can fetch the state of the infrastructure via internal API calls. This is called a remote backend.

# provider.tf
backend "azurerm" {
resource_group_name = "state-rg"
storage_account_name = "application_storage_account"
container_name = "statefiles"
key = "app/container.tfstate"
}

Another remote provider option is Terraform Cloud. The idea is the same, except that it manages the state files for you. Are you using Terraform Cloud? Then your setup is probably significantly different.

The deployment pattern of Terraform consists of three stages: plan, apply and destroy. The DevOps platform of your choice (Gitlab, Azure DevOps, GitHub, to name a few) can be used to automate these deployments.

If you keep track of the Terraform state yourself using cloud storage, the Terraform provider is defined in a file provider.tf, and can be parametrised towards the environment which is deployed to, allowing you to have different state files. Instead of hardcoding the state file, we can decalre the following code

# provider.tf
backend "azurerm" {
resource_group_name = "state-rg"
storage_account_name = "application_storage_account"
container_name = "statefiles"
key = "app/containers/$ENVIRONMENT.tfstate"
}

By using this approach, you can have a different state file for each DTAP environment and possibly for each resource in the corresponding environment. This flexibility is significant; I can tell from personal experience that it is a nightmare to migrate resources from one remote Terraform state to another, often ending up in nuking the whole thing and starting over — if this is one of the possibilities already (it’s not a coincidence this thread starts with “the least painful way…”).

Environment variable substitution

So, we have a parametrized path to a remote Terraform state file, and on deployment this $ENVIRONMENT needs to be replaced with e.g. dev or prod such that we use a different state file for each environment. This we achieve with environment variable substitution.

Let me emphasize that it is not best practise (and quite complex even) to alter source code on the fly. A reference to a CI/CD variable however, is quite common: it allows you to reuse code for different environments. A simple reference replacement can be achieved with

sed -i s/$ENVIRONMENT/dev/g provider.tf

All fun and games so far. But this does not scale nicely. If, with Terraform, you have a plan, apply and destroy stage, each variable reference you add requires three more sed commands.

For that, Linux has the envsubst command. It will replace all environment variable references in a file with the ones that reside in your actual environment. So if you want to replace environment variables references in a file in-place by the actual values in your environment, you can achieve this by creating a temporary file with the desired output, and replace the old file by the temporary file afterwards.

envsubst < “$file” > “$file.tmp”
mv "$file.tmp" "$file"

Often one has multiple files in which environment variables need to be substituted, so in order to do it in a recursive fashion on a nested folder structure with files, we need to write a function which can be invoked by itself for all the subdirectories or files in our directory.

#!/bin/bash
function recurse() {
for file in "$1"/*; do
if [[ -d "$file" ]]; then
recurse "$file"
else
if [[ -f "$file" ]]; then
envsubst < "$file" > "$file.tmp"
mv "$file.tmp" "$file"
fi
fi
done
}

recurse $FOLDER_PATH

Gitlab pipeline template

Include the script in a pipeline .yml

If, as a team you do not work in a monolithic repository (let’s not dive into that discussion…), you probably have multiple repo’s where it would be useful to apply this script. So in order to make this script available, set up a template repo for your pipeline scripts. In a separate repo, in a folder templates , create a file envsubst.yml . Populate the content of the file with the script. If you want to source a particular envfile on the fly, include it too.

# templates/envsubst.yml
variables:
ENV_FILE: null
ENVSUBST_PATH: '.'

.envsubst:
script:
- |
# conditionally read environment variables from file
if [ -z "$ENV_FILE" ]
then
echo "No .env file given, only taking vars in current env"
else
echo "Sourcing .env file $envfile"
set -o allexport
source $ENV_FILE
set +o allexport
fi

# define function to recursively apply envsubst to files
function recurse() {
for file in "$1"/*; do
if [[ -d "$file" ]]; then
recurse "$file"
else
if [[ -f "$file" ]]; then
envsubst < "$file" > "$file.tmp"
mv "$file.tmp" "$file"
fi
fi
done
}

# apply substitution to files in specified directory
recurse "$ENVSUBST_PATH"

This defines a .envsubst job with a script task that you can refer to from other places. A modular building block for your pipeline, basically.

With the optional ENV_FILE and ENVSUBST_PATH variables we allow for flexibility in the consumption layer; it might be that — due to side effects — it is not desired to apply the script on all files in the repo, instead you just want to apply it on one particular folder. For this restrict your script to a particular subset of folders by setting the ENVSUBST_PATH variable in your consumption pattern.

On top of that, if you have additional .env files that you want to include (e.g. for different environments) you can pass the ENV_FILE variable too, which will be sourced in the script.

Consume the template

Once your template repo is setup, one wants to be able to consume your template in other places. In your main .gitlab-ci.yml make your template available by including it as remote project.

# .gitlab-ci.yml
include:
- project: <root-project>/<your-template-repo>/templates/pipeline
ref: 0.0.1
file: templates/envsubst.yml

Here the ref refers to the git tag for the template repo — something you can setup in your pipelines yourself (see e.g. this thread), but you can also specify a branch (e.g. main, or a feature branch during development).

Once available in your pipeline, refer to the template in a pipeline job with the !reference syntax, which is a Gitlab feature added a few years ago.

# .gitlab/container/dev.yml
plan:container:dev:
stage: build
script:
- !reference [.envsubst, script]
- terraform ...

And that’s it. If you want to specify additional variables to be consumed by the template that are only applicable to this job, specify them with the variables keyword.

# .gitlab/container/dev.yml
plan:container:dev:
stage: build
variables:
ENVSUBST_PATH: app/container
ENV_FILE: .env.dev
script:
- !reference [.envsubst, script]
- terraform ...

Or, if you want to optimise your pipeline a step further, define another anchor with variables that you can refer to.

.containerdev:
variables:
ENVSUBST_PATH: app/container
ENV_FILE: .env.dev

This way you can use one variable group set in multiple stages, without the need to define it multiple times, such that you adhere to the DRY principle.

# .gitlab/container/dev.yml
plan:container:dev:
stage: build
variables: !reference [.containerdev, variables]
script:
- !reference [.envsubst, script]
- terraform init ...
- terraform plan ...

apply:container:dev:
stage: deploy
when: manual
variables: !reference [.containerdev, variables]
script:
- !reference [.envsubst, script]
- terraform init ...
- terraform apply ...

The when: manual keyword here is added because often you do not want to apply your infrastructure changes automatically. Rather, first check in the plan stage what changes Terraform is planning to make, and if you think they make sense, apply them manually.

Closing

Think this was useful, or any suggestions or comments? Leave them below!

Feel like you are just getting started with IaC, shell scripting or Gitlab pipelines? Below a few useful resources to get you started.

--

--

Anne Lohmeijer

DevOps engineer, learning about blockchain and the EVM.