FinOps Cost Management using Terraform Cloud Run Tasks

Narish Samplay
Google Cloud - Community
10 min readNov 14, 2023

Learn how to manage Google Cloud costs using a Terraform Cloud Custom Run Task to dynamically control infrastructure as code deployments.

Introduction

“FinOps is an evolving cloud financial management discipline and cultural practice that enables organizations to get maximum business value by helping engineering, finance, technology and business teams to collaborate on data-driven spending decisions.” — as per FinOps Foundation

Terraform Cloud infrastructure as code orchestration includes features such as Cost Estimation, Policy as Code and Run Tasks.

Terraform Cloud Run Tasks integrations can be found in the Terraform Registry. Run Tasks interact with Terraform Cloud run at specific points in the lifecycle, e.g. pre-plan, post-plan, and pre-apply. Run tasks have an enforcement level of advisory or mandatory set within the Terraform Cloud workspace. The Terraform run will stop the deployment if the Run Task returns a failed status and the enforcement level is set to mandatory.

Google Cloud Billing Budgets provides the ability to set up alerts and triggers, but these do not prevent further costs incurring once the allocated budget has been exceeded. To cap spending, preventative measures need to be implemented, such as Cloud Pub/Sub and Cloud Function integration. The diagram below shows an example of removing the billing account from the Google project. It is recommended this approach should only be used in non-production environments as this will result in a disruption to service, so a more sophisticated approach is needed by dynamically block new deployments.

Google Cloud Billing Budgets automation could add a key/value label such as ‘tfc-deploy’: ‘false’ to the Google Project. A Run Task can then evaluate The Google Project label during a Terraform run and block any further deployments.

Finally, Terraform Ephemeral workspaces beta, launched recently, can automatically delete resources after a set period. It can then be used with the Run Task to optimize cloud consumption costs. This feature is ideal for developer sandboxes required for a fixed duration of time, which destroy automatically.

Architecture

In this blog you will create a custom Terraform Cloud Run Task for FinOps cost control. It has been developed using Google Cloud serverless resources and Python to implement the Run Task, scaling in / out automatically, and usage within the free tier results in minimal Google Cloud running costs. Alternatively, you can implement the Run Task on any architecture, such as Google Cloud Compute Engine, which provides a HTTPS endpoint but will incur higher running costs.

The Google Cloud resources used in the Run Task are:

  • API Gateway
  • Cloud Functions
  • Cloud Storage Bucket
  • Service Accounts
  • Workflows

The diagram below shows the Run Task workflow:

The sequence of events is as follows:

  1. Terraform Cloud sends a POST request to Google Cloud API Gateway in the ‘post-plan’ stage of the Terraform run
  2. API Gateway forwards the request header & payload to the Request Google Cloud Function
  3. Request Google Cloud Function performs the following tasks:
    - Validates the request and payload
    - Invokes the Google Cloud Workflow
    - Send HTTP 200 return code to Terraform Cloud
  4. Google Cloud Workflow performs the following tasks:
    - Triggers Process Google Cloud Function
    - Downloads the Terraform Plan JSON file
    - Parses the JSON file for Google Projects in the provider configuration blocks
    - Validate each Google Project and check the state of tfc-deploy label
    - Returns true or false result
    - Triggers Callback Google Cloud Function
    - Sends a PATCH request to Terraform Cloud with the result
  5. Terraform Cloud Run continues if status = passed and errors if status = failed

Google API Gateway / Cloud Functions

Google Cloud Functions triggered by HTTP authentication can be set to authenticated or unauthenticated invocations. If set to public unauthenticated access, this requires the Cloud Run Invoker role assigned to the allUsers principal and can be blocked by Google Organization policies. To overcome this restriction, the Google API Gateway can be deployed before the Google Cloud Function to forward public requests from Terraform Cloud to the Request Google Cloud Function and authenticate via the assigned Google Service Account principal.

Google Workflows

Google Workflows is a fully managed serverless resource that allows orchestration or services, it can hold state, retry, poll, or wait for up to a year. The workflow YAML file is used by the terraform template function, and the URL’s substituted with the deployment values. Steps are defined to execute the Google Cloud Function, passing inputs as required.

Terraform Cloud POST Request

An example below of a POST request sent by Terraform Cloud.

{
"payload_version": 1,
"stage": "post_plan",
"access_token": "4QEuyyxug1f2rw.atlasv1.iDyxqhXGVZ0ykes53YdQyHyYtFOrdAWNBxcVUgWvzb64NFHjcquu8gJMEdUwoSLRu4Q",
"configuration_version_download_url": "https://app.terraform.io/api/v2/configuration-versions/cv-ntv3HbhJqvFzamy7/download",
"configuration_version_id": "cv-ntv3HbhJqvFzamy7",
"is_speculative": false,
"organization_name": "hashicorp",
"plan_json_api_url": "https://app.terraform.io/api/v2/plans/plan-6AFmRJW1PFJ7qbAh/json-output",
"run_app_url": "https://app.terraform.io/app/hashicorp/my-workspace/runs/run-i3Df5to9ELvibKpQ",
"run_created_at": "2021-09-02T14:47:13.036Z",
"run_created_by": "username",
"run_id": "run-i3Df5to9ELvibKpQ",
"run_message": "Triggered via UI",
"task_result_callback_url": "https://app.terraform.io/api/v2/task-results/5ea8d46c-2ceb-42cd-83f2-82e54697bddd/callback",

}

The following parameters are used by the Run Task:

  • access_token — Authentication token to authenticate with Terraform Cloud
  • plan_json_api_url — URL to download Terraform Cloud Plan JSON file
  • task_result_callback_url — URL to send PATCH result

Run Task PATCH Request

An example below of a PATCH request sent to Terraform Cloud by the Run Task.

{
"data": {
"type": "task-results",
"attributes": {
"status": "passed",
"message": "4 passed, 0 skipped, 0 failed",
}
}
}

The PATCH request back to Terraform Cloud uses the access_token and task_result_callback_url in the original POST request. The Terraform Cloud Run will wait a maximum of 10 minutes to receive the Run Task result, after which it will automatically fail.

The ‘message’ attribute of the PATCH request will be displayed in the Terraform Cloud run.

Run Task Verification

An optional HMAC key is used to verify the Terraform Cloud POST request. An HMAC key is used to generate a sha512 signature for the payload; this is then added to the POST headers ‘x-tfc-task-signature’ attribute.

{
"x-tfc-task-signature": "b7832ce69b791e39105e50ca55039aede4778caec8e24a20c2f6acaa5274e397cef80a450b44acd6fcc62517784ae970021c338314301c5f39e7ae579db29249"
}

The Run Task generates the sha512 signature for the payload again and compares this to the ‘x-tfc-task-signature’ attribute in the header. If the signatures match, the request has not been tampered with.

Google Project ID Detection

To correctly identify the target Google Project in the Terraform run, the provider configuration block needs to specify the project parameter explicitly or via an input variable.

The Terraform plan is parsed by terraformplan.py and searches for the JSON paths below:

   jsonpath_references_expressions = [
'$.configuration.provider_config[?(@.name = "google")].expressions.project.references',
'$.configuration.provider_config[?(@.name = "google-beta")].expressions.project.references',
]

jsonpath_values_expressions = [
'$.configuration.provider_config[?(@.name = "google")].expressions.project.constant_value',
'$.configuration.provider_config[?(@.name = "google-beta")].expressions.project.constant_value',
]

If multiple google / google-beta provider blocks are detected the JSON path search will return them all, de-duplicated, and passed to googleproject.py to fetch the project label.

Tutorial

The code for this Run Task can be found in the GitHub repository below:

https://github.com/nhsy-hcp/terraform-gcp-runtask-budgets

Download the repository to your local workstation or Google Cloud Shell using git.

git clone https://github.com/nhsy-hcp/terraform-gcp-runtask-budgets.git

Pre-requisites

Prerequisites for the Terraform Run Task deployment are:

  • Google Cloud SDK
  • Google Cloud project with owner permissions
  • Google Cloud credentials setup
    - gcloud auth application-default login or gcloud auth login
  • Terraform v1.4+
  • Terraform Cloud account — https://app.terraform.io

Terraform Cloud Workspace

Use the following tutorial to set up a Terraform Cloud workspace with Google Cloud Workload Identity. Dynamic credentials are the recommended approach for authentication with short lived ephemeral secrets.

https://developer.hashicorp.com/terraform/tutorials/cloud/dynamic-credentials

Google Cloud Deployment

Create a file in the terraform folder named terraform.tfvars.

project_id = "_DEPLOYMENT_GOOGLE_PROJECT_"
project_viewer = ["_BUDGET_GOOGLE_PROJECT_"]
  • project_id — Google Project ID for deploying the Run Task
  • project_viewer — Google Project ID’s to assign the viewer IAM role to allow the Process Google Cloud Function service account to read project labels.

Execute the commands below in your terminal to deploy the Google Cloud resources.

cd terraform
terraform init
terraform apply

Apply complete! Resources: 35 added, 0 changed, 0 destroyed.

Outputs:

api_gateway_endpoint_uri = "https://apigw-s0sib-2007ua9j.ew.gateway.dev/runtask-budgets"
cloud_functions_bucket = "runtask-cloud-functions-s0sib"
runtask_callback_uri = "https://europe-west1-runtask-budgets-100181.cloudfunctions.net/runtask-callback-s0sib"
runtask_process_uri = "https://europe-west1-runtask-budgets-100181.cloudfunctions.net/runtask-process-s0sib"
runtask_request_uri = "https://europe-west1-runtask-budgets-100181.cloudfunctions.net/runtask-request-s0sib"

On success, the resources will be deployed to the Google Project.

Terraform check blocks have been added to validate the Google Cloud Functions have been deployed successfully and return the HTTP status code 403. A 403 status code is returned as these Google Cloud Functions do not accept unauthenticated connections, and are triggered by the Google Workflow.

check "cloudfunction_callback_health" {
data "http" "cloudfunction_callback" {
url = google_cloudfunctions2_function.runtask_callback.url
}
assert {
condition = data.http.cloudfunction_callback.status_code == 403
error_message = format("Cloud function request unhealthy: %s - %s", data.http.cloudfunction_callback.status_code, data.http.cloudfunction_callback.response_body)
}
}

The Google Cloud Console also shows the Cloud Functions deployed with a green status.

By selecting one of the Google Cloud Function detailed metrics are displayed, e.g. invocations/seconds, execution time, and memory utilization.

The Logs tab will show any outputs from the Google Cloud Function.

Terraform Cloud Configuration

Terraform Cloud Run Task set up is required for integration. Under Settings/Run tasks create a Run Task with the following settings:

  • Name — Enter a suitable name for the Run Task
  • Endpoint URL — Terraform output variable api_gateway_endpoint_uri
  • HMAC key — Should match the terraform input variable hmac_key in variables.tf or overridden value

Next, go to the Terraform Cloud workspace. Under Settings\Run Tasks, add the Run Task with the following settings:

  • Run stage — Post-plan
  • Enforcement level — Mandatory

Google Cloud Configuration

Set the ‘tfc’-deploy’ label on the Google Project to true.

Terraform Run

Execute a new ‘plan and apply’ run, the output should be similar to the screenshot below and successfully deploy.

Update the ‘tfc-deploy’ label on the Google Project to false.

Execute a new ‘plan and apply’ run, the output should be similar to the screenshot below and blocked from deployment.

Cleanup

Execute the commands below in your terminal to destroy the Google Cloud Run Task resources.

cd terraform
terraform destroy

Destroy complete! Resources: 35 destroyed.

Run Task Development Guidance

The GitHub repository contains the complete implementation for the Run Task. This section describes how to customize the Run Task and test locally.

Pre-requisites

  • Pycharm IDE or equivalent
  • Python virtual environment, e.g. venv
  • Install pip dependencies in each cloud function folder
  • Install pip dependencies in cloud_functions/tests folder
  • Default Google Project set with `gcloud config set project PROJECT_ID`

Google Cloud Functions

The Python source code for the Google Cloud Functions are located in the folders below:

Each function has an entrypoint in the file main.py that is automatically triggered when the Google Cloud Function is invoked. The example below is of the Request function `request_handler` entrypoint.

@functions_framework.http
def request_handler(request):
try:
logging.info("headers: " + str(request.headers))
logging.info("payload: " + str(request.get_data()))

request_headers = request.headers

Google Cloud Logging integration has been implemented, to aid troubleshooting and viewing the shell output from the Google Cloud Functions.

Pytest

Pytests for the Google Cloud Functions have been created in the folder cloud_functions/tests to aid local development and unit testing. The developer workflow is optimized by using pytest and ensures the Cloud Functions are working satisfactorily prior to terraform deployment to Google Cloud.

Update the pytests.ini file with the correct values for your environment.

DISABLE_GOOGLE_LOGGING=1
HMAC_KEY=secret
LOG_LEVEL=INFO
RUNTASK_PROJECT=runtask-budgets-10181
RUNTASK_REGION=europe-west1
RUNTASK_WORKFLOW=runtask-workflow

Navigate to the cloud_functions/tests folder in the terminal and execute the commands below to perform unit testing of the Google Cloud Functions

cd cloud_functions/test
pytest

=============================== test session starts ===============================
platform darwin -- Python 3.9.6, pytest-7.4.0, pluggy-1.2.0
rootdir: /work/terraform-gcp-runtask-budgets/cloud_functions/tests
configfile: pytest.ini
plugins: requests-mock-1.11.0, env-0.8.2
collected 20 items

test_runtask_callback.py ....... [ 35%]
test_runtask_process.py ... [ 50%]
test_runtask_process_integration.py . [ 55%]
test_runtask_process_projects.py ... [ 70%]
test_runtask_process_terraformplan.py .. [ 80%]
test_runtask_request.py .... [100%]

=============================== 20 passed in 8.43s ===============================

Google Cloud Functions Framework

Google Cloud Functions Framework allows local testing of Cloud Functions during development. This library is installed via the pip dependencies and used by the pytest integration tests.

Further information on Google Cloud Functions Framework can be found below:

https://cloud.google.com/functions/docs/running/function-frameworks#functions-local-ff-install-python

Google Workflows

The workflow YAML file below defines the steps to be executed, if additional Google Functions need to be added it can be customized as required.

main:
params: [input]
steps:
- process:
call: http.post
args:
url: ${process_url} # replaced by terraform
body: $${input} # replaced by workflow
auth:
type: OIDC
result: process_result
next: callback

- callback:
call: http.post
args:
url: ${callback_url} # replaced by terraform
body:
task: $${input} # replaced by workflow
result: $${process_result.body} # replaced by workflows
auth:
type: OIDC
result: callback_result

- complete:
return: '$${callback_result}' # replaced by workflows

The Callback Google Cloud Function expects the JSON body to contain the original task payload and result to send back to Terraform Cloud.

Summary

In this blog you learned how to deploy a Google Cloud Run Task and integrate with Terraform Cloud. The custom Run Task can perform any validation to meet your use case. Google Cloud Functions and Workflows were used, but it could have been implemented using any architecture or programming language that supports asynchronous HTTPS requests.

A Google Cloud Project that has exceeded the budget or otherwise been disabled will now be blocked dynamically from deploying new resources.

What Next?

Terraform Cloud Custom Run Tasks can be integrated at pre/post plan and pre/post apply stages. Using this workflow other use cases such as AI, static code scanning, etc can be implemented.

Further Reading

https://developer.hashicorp.com/terraform/cloud-docs/integrations/run-tasks

https://github.com/aws-ia/terraform-aws-runtask-iam-access-analyzer

https://awstip.com/build-a-custom-terraform-run-task-using-python-and-aws-lambda-4a1558ba903b

--

--