Terraform with Workspaces on Google Cloud

Mazlum Tosun
Google Cloud - Community
12 min readJun 26, 2023

1. Explanation of the use case presented in this article

The goal of this article is showing a complete example with the use of Terraform Workspaces created on Google Cloud.

The Workspaces allow to isolate an infrastructure, each workspace has its own infrastructure and there is no any relationship between them.

This system is powerful and very practical because in real life and projects, we need to isolate the infrastructure.

For example, for testing purposes, a Cloud Engineer needs to validate the updates in the IAC code.

Sometimes it’s complicated to validate it in a shared state in a dev project, because developers can work in the same resources at the same time.

It’s better in this case to have an isolated environment per developer to test the code and the resulting infra without conflicts, a total controle on the IAC code and with the freedom to break and recreate the Infra if needed.

Here you can see the use case diagram of this use case :

  • The use case creates BigQuery Datasets and Tables to illustrate to concept of workspaces
  • We will creates this Infra on different workspaces in the same GCP Project
  • To prevent duplicates in the real infra, GCP project and BigQuery, the workspace name will be used as a prefix on the dataset name
  • The CI part is done with Cloud Build, that will execute the Terraform commands

I also created a video on this topic in my GCP Youtube channel, please subscribe to the channel to support my work for the Google Cloud community :

English version

French version

2. Structure of the project

2.1 The Terraform part

To illustrate the creation of isolated infra with Terraform workspaces, this use case creates BigQuery Datasets and Tables.

This video explains this use case in details and we will share a future article on this topic :

We will be more focused on the elements that concern Terraform workspaces and to prevent having duplicates in the real infra from the same GCP dev project and BigQuery in this case.

The parent set in the infra is Datasets, and we will add the workspace name as a prefix on the dataset name, to prevent duplicates.

When the Terraform code is executed, a workspace can be created and selectioned.

We can then retrieve the workspace in the Terraform code with this syntaxe, example in the locals.tf file :

locals {
datasetPrefix = terraform.workspace != "default" ? "${terraform.workspace}_" : ""
datasetsArray = jsondecode(file("${path.module}/resource/datasets_with_tables.json"))
datasetsMap = {for idx, val in local.datasetsArray : idx => val}
tables_flattened = flatten([
for dataset in local.datasetsMap : [
for table in dataset["tables"] : {
datasetId = dataset["datasetId"]
tableId = table["tableId"]
tableSchemaPath = table["tableSchemaPath"]
partitionType = try(table["partitionType"], null)
partitionField = try(table["partitionField"], null)
expirationMs = try(table["expirationMs"], null)
requirePartitionFilter = try(table["requirePartitionFilter"], null)
clustering = try(table["clustering"], [])
}
]
])
}
  • The Terraform workspace name is retrieved with terraform.workspace command
  • If no workspace is passed, Terraform will execute the infra in the default infrastructure and workspace.
  • We apply the following logic : if a workspace is given and injected in the CI part by the command line, we create a dataset prefix with {workspace_name}_ otherwise there is no prefix
  • The rest of the code retrieves the Datasets and Tables from a Json configuration datasets_with_tables.json :
[
{
"datasetId": "team_league_raw",
"datasetRegion": "EU",
"datasetFriendlyName": "Team league Dataset containing raw data",
"datasetDescription": "Team league raw Dataset description",
"tables": [
{
"tableId": "team_stat_raw",
"tableSchemaPath": "resource/schema/team_league_raw/team_stat_raw.json"
}
]
},
{
"datasetId": "team_league",
"datasetRegion": "EU",
"datasetFriendlyName": "Team league Dataset containing domain data",
"datasetDescription": "Team league domain Dataset description",
"tables": [
{
"tableId": "team_stat",
"tableSchemaPath": "resource/schema/team_league/team_stat.json",
"partitionType": "DAY",
"partitionField": "ingestionDate",
"clustering": [
"teamName",
"teamSlogan"
]
}
]
}
]

Then in the main.tf file, we will retrieve the dataset prefix from the locals.tf file :

resource "google_bigquery_dataset" "datasets" {
for_each = local.datasetsMap

project = var.project_id
dataset_id = "${local.datasetPrefix}${each.value["datasetId"]}"
friendly_name = each.value["datasetFriendlyName"]
description = each.value["datasetDescription"]
location = each.value["datasetRegion"]
}

resource "google_bigquery_table" "tables" {
for_each = {for idx, table in local.tables_flattened : "${table["datasetId"]}_${table["tableId"]}" => table}

project = var.project_id
depends_on = [google_bigquery_dataset.datasets]
dataset_id = "${local.datasetPrefix}${each.value["datasetId"]}"
table_id = each.value["tableId"]
deletion_protection = false
clustering = each.value["clustering"]

dynamic "time_partitioning" {
for_each = each.value["partitionType"] != null ? [1] : []

content {
type = each.value["partitionType"]
field = each.value["partitionField"]
expiration_ms = each.value["expirationMs"]
require_partition_filter = each.value["requirePartitionFilter"]
}
}

schema = file("${path.module}/${each.value["tableSchemaPath"]}")
}

In the google_bigquery_dataset resource, the dataset_id param value is the concatenation of {locals.datasetPrefix}{datasetIdFromConfig}

The same in the google_bigquery_table resource : dataset_id={locals.datasetPrefix}{datasetIdFromConfig}

2.2 The CI part with Cloud Build and Shell scripts

The CI logic is managed by Cloud Build and the executions can be done with gcloud commands from our local machine or with Cloud Build Triggers

There are three files to execute the classic IAC commands with Terraform :

  • plan
  • apply
  • destroy

2.2.1 The plan part

The plan file :

steps:
- name: hashicorp/terraform:1.5.0
entrypoint: 'sh'
args:
- '-c'
- |
./scripts/init.sh \
&& ./scripts/select_workspace.sh \
&& ./scripts/plan.sh
env:
- 'TF_VAR_project_id=$PROJECT_ID'
- 'TF_STATE_BUCKET=$_TF_STATE_BUCKET'
- 'TF_STATE_PREFIX=$_TF_STATE_PREFIX'
- 'WORKSPACE=$_WORKSPACE'
- 'INFRA_ROOT_FOLDER=$_INFRA_ROOT_FOLDER'
- 'MODULE_NAME=$_MODULE_NAME'
- 'GOOGLE_PROVIDER_VERSION=$_GOOGLE_PROVIDER_VERSION'

There is one step from the official hashicorp/terraform:1.5.0 Docker image.

Three scripts are executed.

init.sh file :

set -e
set -o pipefail
set -u

echo "#######Init the Terraform module"

cd "$INFRA_ROOT_FOLDER/$MODULE_NAME" &&
terraform init \
-backend-config="bucket=${TF_STATE_BUCKET}" \
-backend-config="prefix=${TF_STATE_PREFIX}/${MODULE_NAME}"

We need to execute a terraform init command to initialize our module. The bucket and prefix are given to specify the remote state.

The init will also download the terraform providers given by the versions.tf file. In our case, there is only the Google Cloud official provider.

We need to execute the command in the Terraform folder and module : datasets_and_tables , that’s why we go inside the module with a cd instruction before executing the init command.

select_workspace.sh file :

set -e
set -o pipefail
set -u

echo "#######Create and Select the Terraform workspace ${WORKSPACE}"

if [ -z "${WORKSPACE}" ] || [ "${WORKSPACE}" == "" ] || [ "${WORKSPACE}" == " " ]; then
echo "#######No workspace passed, the default workspace will be used"
exit 0
fi

cd "$INFRA_ROOT_FOLDER/$MODULE_NAME" &&
terraform workspace new "${WORKSPACE}"

If a Terraform workspace is given by the Cloud Build command line, this script has the responsibility to create a workspace and select it.

If no workspace is passed, we exit the script and let Terraform select the default workspace.

After selecting a workspace with Terraform, all the commands executed after will be done in this workspace : plan, apply, destroy.

plan.sh file :

set -e
set -o pipefail
set -u

echo "#######Plan the Terraform module"

cd "$INFRA_ROOT_FOLDER/$MODULE_NAME" &&
terraform plan --out tfplan.out

We go inside the module and execute the terraform plan command.

This command will log all the changes brought by the IAC code and generate a tfplan.out file.

2.2.2 The apply part

The apply file :

steps:
- name: hashicorp/terraform:1.5.0
entrypoint: 'sh'
args:
- '-c'
- |
./scripts/init.sh \
&& ./scripts/select_workspace.sh \
&& ./scripts/plan.sh \
&& ./scripts/apply.sh
env:
- 'TF_VAR_project_id=$PROJECT_ID'
- 'TF_STATE_BUCKET=$_TF_STATE_BUCKET'
- 'TF_STATE_PREFIX=$_TF_STATE_PREFIX'
- 'WORKSPACE=$_WORKSPACE'
- 'INFRA_ROOT_FOLDER=$_INFRA_ROOT_FOLDER'
- 'MODULE_NAME=$_MODULE_NAME'
- 'GOOGLE_PROVIDER_VERSION=$_GOOGLE_PROVIDER_VERSION'

The principle is the same as in the plan part, but the apply.sh script is executed at the end.

apply.sh file :

set -e
set -o pipefail
set -u

echo "#######Apply the Terraform module"

cd "$INFRA_ROOT_FOLDER/$MODULE_NAME" &&
terraform apply -auto-approve tfplan.out

The terraform apply command is based on the tfplan.out generated by the plan part.

2.2.3 The destroy part

The destroy file :

steps:
- name: 'hashicorp/terraform:1.5.0'
entrypoint: 'sh'
args:
- '-c'
- |
./scripts/init.sh \
&& ./scripts/select_workspace.sh \
&& ./scripts/destroy.sh
env:
- 'TF_VAR_project_id=$PROJECT_ID'
- 'TF_STATE_BUCKET=$_TF_STATE_BUCKET'
- 'TF_STATE_PREFIX=$_TF_STATE_PREFIX'
- 'WORKSPACE=$_WORKSPACE'
- 'INFRA_ROOT_FOLDER=$_INFRA_ROOT_FOLDER'
- 'MODULE_NAME=$_MODULE_NAME'
- 'GOOGLE_PROVIDER_VERSION=$_GOOGLE_PROVIDER_VERSION'

The logic is the same here, we need to init and select the workspace for the module, then the destroy.sh script is executed :

set -e
set -o pipefail
set -u

echo "#######Destroy the Terraform module"

cd "$INFRA_ROOT_FOLDER/$MODULE_NAME" &&
terraform apply -auto-approve &&
terraform destroy -auto-approve

There is a subtlety here, generally a destroy command is sufficient to destroy a resource with Terraform, but in the recent versions of Google Provider and the resource concerning BigQuery tables, we need to launch the apply command before the destroy. We also need to set the param deletion_protection as false in this Terraform resource :

On newer versions of the provider, you must explicitly set deletion_protection=false (and run terraform apply to write the field to state) in order to destroy an instance. It is recommended to not set this field (or set it to true) until you're ready to destroy.

3. Execution of Cloud Build jobs and create Infra on different workspaces

3.1 Set environment variables

Set the following environment variables :

export PROJECT_ID={{project_id}}
export LOCATION=europe-west1
export TF_STATE_BUCKET=gb-poc-terraform-state
export TF_STATE_PREFIX=testmazlum
export WORKSPACE=workspacetest
export INFRA_ROOT_FOLDER=infra
export MODULE_NAME=datasets_and_tables
export GOOGLE_PROVIDER_VERSION="= 4.47.0"
  • By default, a workspace with the name workspacetest is given
  • The infra root folder in the project is infra
  • The Terraform module name is datasets_and_tables

3.2 Create the infra without workspace and in the default state

All the needed commands are proposed in the README.md file

For simplicity we will use the Cloud Build gcloud commands from our local machine.

We will firstly execute the plan job without workspace and in the default infra, to check the logs brought by the IAC code :

gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config terraform-plan-modules.yaml \
--substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_WORKSPACE=,_INFRA_ROOT_FOLDER=$INFRA_ROOT_FOLDER,_MODULE_NAME=$MODULE_NAME,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .

In this case, the _WORKSPACE external and substitution variable is empty.

The log indicates that the Datasets and Tables will be created :

  # google_bigquery_dataset.datasets["0"] will be created
+ resource "google_bigquery_dataset" "datasets" {
+ creation_time = (known after apply)
+ dataset_id = "team_league_raw"
+ delete_contents_on_destroy = false
+ description = "Team league raw Dataset description"
+ etag = (known after apply)
+ friendly_name = "Team league Dataset containing raw data"
+ id = (known after apply)
+ last_modified_time = (known after apply)
+ location = "EU"
+ project = "gb-poc-373711"
+ self_link = (known after apply)
}
# google_bigquery_table.tables["team_league_raw_team_stat_raw"] will be created
+ resource "google_bigquery_table" "tables" {
+ clustering = []
+ creation_time = (known after apply)
+ dataset_id = "team_league_raw"
+ deletion_protection = false
+ etag = (known after apply)
+ expiration_time = (known after apply)
+ id = (known after apply)
+ last_modified_time = (known after apply)
+ location = (known after apply)
+ num_bytes = (known after apply)
+ num_long_term_bytes = (known after apply)
+ num_rows = (known after apply)
+ project = "gb-poc-373711"

As expected, there is no prefix before the dataset id in the two resources.

We execute the apply for the default workspace :

gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config terraform-apply-modules.yaml \
--substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_WORKSPACE=,_INFRA_ROOT_FOLDER=$INFRA_ROOT_FOLDER,_MODULE_NAME=$MODULE_NAME,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .

The two datasets was created in BigQuery without the workspace name as prefix :

In the Cloud Storage bucket used by Terraform for its remote state, a default.tfstate was generated for the workspace and infra by default.

3.3 Create the infra on a workspace called workspacetest

We execute the apply job with a workspace called workspacetest :

export WORKSPACE=workspacetest
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config terraform-apply-modules.yaml \
--substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_WORKSPACE=$WORKSPACE,_INFRA_ROOT_FOLDER=$INFRA_ROOT_FOLDER,_MODULE_NAME=$MODULE_NAME,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .
# google_bigquery_dataset.datasets["0"] will be created
+ resource "google_bigquery_dataset" "datasets" {
+ creation_time = (known after apply)
+ dataset_id = "workspacetest_team_league_raw"
+ delete_contents_on_destroy = false
+ description = "Team league raw Dataset description"
+ etag = (known after apply)
+ friendly_name = "Team league Dataset containing raw data"
+ id = (known after apply)
+ last_modified_time = (known after apply)
+ location = "EU"
+ project = "gb-poc-373711"
+ self_link = (known after apply)
}
 # google_bigquery_table.tables["team_league_raw_team_stat_raw"] will be created
+ resource "google_bigquery_table" "tables" {
+ clustering = []
+ creation_time = (known after apply)
+ dataset_id = "workspacetest_team_league_raw"
+ deletion_protection = false

In this case, we have the workspace name as a prefix before the dataset ID : workspacetest_team_league_raw to prevent duplicates in the real infra and BigQuery.

The two datasets was created with the workspace name as prefix :

In the Cloud Storage bucket used by Terraform for its remote state, a workspacetest.tfstate was generated for the given workspace. This infra is isolated and has no adhesion with the default.tfstate file and the default infra.

3.4 Create the infra on a workspace called workspacetest2

We will execute the same command than in the previous section, but with a different workspace name in the environment variable :

export WORKSPACE=workspacetest2

Execute the same command for the apply :

gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config terraform-apply-modules.yaml \
--substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_WORKSPACE=$WORKSPACE,_INFRA_ROOT_FOLDER=$INFRA_ROOT_FOLDER,_MODULE_NAME=$MODULE_NAME,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .

The expected datasets/tables and isolated infra was created for this workspace in BigQuery and Cloud Storage :

3.5 Simulate an infra change only on the workspacetest workspace and infra

We will simulate a change only on the infra of workspacetest workspace, to confirm the isolation and the no adhesion between the different workspaces.

We add a new column called teamFakeColumn in the team_stat BigQuery table in the team_league dataset :

{
"name": "teamName",
"type": "STRING",
"mode": "NULLABLE",
"description": "Team name"
},
{
"name": "teamScore",
"type": "INTEGER",
"mode": "NULLABLE",
"description": "Team score"
},
.........
{
"name": "teamFakeColumn",
"type": "STRING",
"mode": "NULLABLE",
"description": "Fake column"
}
.........

We change the env variable with workspacetest :

export WORKSPACE=workspacetest

We execute the apply command :

gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config terraform-apply-modules.yaml \
--substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_WORKSPACE=$WORKSPACE,_INFRA_ROOT_FOLDER=$INFRA_ROOT_FOLDER,_MODULE_NAME=$MODULE_NAME,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .

Changes are indicated by Terraform on workspacetest :

# google_bigquery_table.tables["team_league_team_stat"] will be updated in-place
~ resource "google_bigquery_table" "tables" {
id = "projects/gb-poc-373711/datasets/workspacetest_team_league/tables/team_stat"
~ schema = jsonencode(
~ [
# (1 unchanged element hidden)
{
description = "Team score"
mode = "NULLABLE"
name = "teamScore"
type = "INTEGER"
},
+ {
+ description = "Fake column"
+ mode = "NULLABLE"
+ name = "teamFakeColumn"
+ type = "STRING"
},
{
description = "Team total goals"
mode = "NULLABLE"
name = "teamTotalGoals"
type = "INTEGER"
},
# (4 unchanged elements hidden)
]
)

Then we go to BigQuery and this new column was added only the team_stat table of the workspacetest dataset :

This column was not added in the team_league (default) and the workspacetest2_team_league datasets :

team_league :

workspacetest2_team_league :

3.6 Destroying the isolated infra for the workspace workspacetest2

In this section, we will destroy the isolated infra for workspacetest2 and show again how simple it is to destroy an infra for a workspace without affecting the rest of the infra.

Change the env variable for the workspace :

export WORKSPACE=workspacetest2

Execute the destroy command line with Cloud Build :

gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config terraform-destroy-modules.yaml \
--substitutions _TF_STATE_BUCKET=$TF_STATE_BUCKET,_TF_STATE_PREFIX=$TF_STATE_PREFIX,_WORKSPACE=$WORKSPACE,_INFRA_ROOT_FOLDER=$INFRA_ROOT_FOLDER,_MODULE_NAME=$MODULE_NAME,_GOOGLE_PROVIDER_VERSION=$GOOGLE_PROVIDER_VERSION \
--verbosity="debug" .

The Terraform logs :

# google_bigquery_dataset.datasets["0"] will be destroyed

......

# google_bigquery_table.tables["team_league_raw_team_stat_raw"] will be destroyed
......

Only the datasets and tables concerning workspacetest2 were deleted.

Conclusion

This use case showed how to isolate the creation of an infrastructure with the concept of Terraform workspaces.

In Cloud projects, sometimes Cloud Engineers and Developers need to validate the IAC code in an isolated environment.

In the example of a BigQuery table, if several developers work on a use case, implying changes of its schema in a shared infra, they can have conflicts and issues.

The use of workspaces is a way to prevent this kind of situation, and each developer can easily have an isolated infra to test their changes and IAC code evolution.

Developers can also easily destroy and recreate an isolated infra if needed, for a workspace on a dev project, with the necessary protections to prevent a destruction by mistake on a shared dev infrastructure.

This system could also be used for integration tests based on a short lived infra.

All the code shared on this article is accessible from my Github repository :

If you like my articles, videos and want to see my posts, follow me on :

--

--

Mazlum Tosun
Google Cloud - Community

GDE Cloud | Head of Data & Cloud GroupBees | Data | Serverless | IAC | Devops | FP