Troubleshooting Terraform on a serverless world

Published in

Stash-consulting

12 min readMay 5, 2020

I will show you how to deploy an infrastructure with Terraform on GCP to create an application that generates an RSS feed file inside a bucket, there’s another function that shows the file generated in JSON. If you are interested in seeing the whole source code that inspired me to create this blog, I’m glad to share it! by clicking here. Also, I’m going to cover the problems I had during the implementation and the decision I made to fix those.

The infrastructure would look like this:

In this article I’m going to be leveraging the following technologies on GCP:

Google Cloud Storage
Cloud Firestore Database
Google Cloud Functions
Cloud Run
Cloud Endpoints Service
Google Cloud SDK

Requirements:

I’ll try to make this as clear as possible, but you’ll need at least a basics knowledge in the following topics to be able to follow this article efficiently:

Serverless.
Service Accounts and Roles.
Cloud Computing.
Terraform or other Infrastructure as code tools.

Before you begin:

Create a project in the Google Cloud Console and set up the billing.
Install Terraform
Install Google SDK

It’s important to make sure that Cloud SDK is authorized to access your data and services.

gcloud auth login

On the new browser tab that opens by running the command, choose an account that has the Editor or Owner role in the Google Cloud project.

Configuring the Google Provider

Let’s create a Terraform config file named "main.tf". Inside, we are going to include the following configuration:

The project should be your project id. The assignment is declared in a locals block

The Google provider block is used to configure your Google Cloud Platform infrastructure, the credentials you use to authenticate with the cloud, as well as a default project and location for your resources.

To make successful requests against the GCP API, we need to authenticate to prove that we are making the request. The method that I used to authenticate is the Service Account, if you want to define it in a couple of words it is an account that has a limited set of IAM permission that should follow The principle of least privilege giving the user or the service the permission that is only needed to perform its work.

Use the service account key page in the Cloud Console to choose an existing account, or create a new one. After that, download the JSON key file. If you want you can name it something you can remember, and then store it somewhere secure on your machine. If you don’t specify the credentials in the file we can supply the key to Terraform using the environment variable setting the value to the location of the file. If you are in Windows use set instead of export.

export GOOGLE_CLOUD_KEYFILE_JSON= {{path}}

If you have any problems with this step check this Issue.

Now we initialize a working directory containing Terraform configuration files.

terraform init

Google Cloud Storage

Let's define the bucket infrastructure good to know that the resource of Google Cloud Storage is named google_storage_bucket in Terraform. The google part of the name identifies the provider for Terraform, storage indicates the GCP product family, and bucket specifies the resource name. Something to keep in mind how the resources are called in Terraform and the parameters it accepts.

We are going to use the bucket named bucket_file_tf (It was available) to store any file sent in the request and the RSS Feed.

resource "google_storage_bucket" "bucket_file" {
  name          = "bucket_file_tf"
  location      = local.region
  force_destroy = true
}resource "google_storage_bucket_access_control" "public_rule" {
  bucket        = google_storage_bucket.bucket_file.name
  role          = "READER"
  entity        = "allUsers"
}

Bucket ACLs can be managed using the storage_bucket_acl resource; In this configuration, the bucket will be public on the internet that means all users are allowed to see its content. Something to keep in mind bucket_file is the identifier for the bucket if this name change after the apply, Terraform will consider as a new resource.

google_storage_bucket.bucket_file.name on storage_bucket_acl reference the bucket name on google_storage_bucket resources to get its value.

It’s time to apply our changes by running the command:

terraform apply

If the apply success, we can see the changes that are made of the resources it manages. In my case, I always check on Google although I know it worked.

Cloud Firestore Database

In my first attempts, I was trying to create the Database without having good results I didn’t realize I was just creating an index, going through Google I found an Issue that made me feel dummy, we are always learning.

I realized that collections are created implicitly when you add a document to a collection, which is the same situation for google_datastore. At this moment we can only create indexes via Terraform. Collections are managed by application-level code instead of infrastructure, my Cloud Functions will take care of that.

Google Cloud Functions

At first, I thought it would be very simple to create the Cloud Function infrastructure because I had already created my bucket, and I thought I understood enough of how Terraform works.

resource "google_cloudfunctions_function" "function_insert_data" {
  name          = "function_insert_data"
  description   = "My function"
  runtime       = "python37"available_memory_mb   = 128
  trigger_http          = true
  entry_point           = "main"
  environment_variables = {    
    collection = "image_file_details"        
    DESTINATION_BUCKET = "bucket_file_tf"  
  }
}

Does this work? the answer is negative. I went through this error, even though Terraform documentation tells you that source_archive_bucket and source_repositoryare optional we have to specify where the code will be contained to put it in Cloud Functions, make sense. It happened to me because I didn’t read the optional arguments at the beginning. In the descriptions, it’s explained that one of these is required. At the moment terraform is updating the Cloud Function documentation and it is in mess.

I chose source_archive_bucket that will have the zip archive which contains the function, this step must be done first then the Cloud Function infrastructure. So, we have to find a way for Terraform to upload my files to the bucket, it would not make sense if I do it manually.

# Archive a single file.

data "archive_file" "init" {
  type        = "zip"
  source_dir = "${path.root}/source_code/"
  output_path = "${path.root}/source_code.zip"
}

If we have a terminal error like this Error: provider.resource: no suitable version installed… it is solved by doing terraform init it’s start downloading a plugin for provider “resource”.

It was my first idea to save all the code in one file. what’s wrong with this? After following this idea, I needed a way to place the zip-ed code in the bucket. when I finished it and I wanted to connect it with my service.

How could I find a way to specify which code I would need with its respective requirements.txt for the function?

Deployment failure:
Function failed on loading user code. Error message: File main.py that is expected to define function doesn't exist

Then I realized that there was a way to save multiple files. ${path.root} is the filesystem path of the root module of the configuration and ${path.module} is the filesystem path of the module where the expression is placed. In my case, both are functional because I am not using submodules.

I am archiving the function code insert_data with a requirements.txt ; This step will be repeated for each function that your application has. I use file(path) in the source block to copy the contents of the insert_data.py file and save it under the name of main.py inside the zip, the same happens with requirements.txt

# Archive multiple files.data "archive_file" "insert_data" {
  type        = "zip"
  output_path = "${path.module}/source_code/insert_data.zip"

  source {
    content  = file("${path.module}/source_code/insert_data.py")
    filename = "main.py"
  }

  source {
    content  = file("${path.module}/source_code/requirements.txt")
    filename = "requirements.txt"
  }
}

You will place the zip-ed code in the bucket for each archive file you have. The reason why I didn’t choose source_repository is that I would have to divide each program with its respective requirements.txt into different repositories. I would prefer to keep all functions in the same repo so we only need to set-up one CI service.

Cloud Run

It is possible that this is one of the most uncomfortable steps guys, according to the Google documentation to set up Cloud Endpoints for Cloud Functions you need Cloud Run to deploy the prebuilt Extensible Service Proxy V2 Beta (ESPv2 Beta) in order to intercepts all requests to your functions. Create a service name is easy, but the steps from here get complicated. I have to change the authentication manually in Cloud Run to accept unauthenticated requests because I was not clear what argument I could use for this.

resource "google_cloud_run_service" "cloudrunsrv" {
  name          = "cloudrunsrv"
  location      = local.region

  template { 
    spec {
      container {
        image = "gcr.io/endpoints-release/endpoints-runtime- serverless:2"
      }
    }
  }  traffic {
    percent         = 100
    latest_revision = true
  }
}

The most important thing for creating a service name is to be aware of what is your cloud_run_hostname you need it to specify it in the host field of your OpenAPI document later. It would have a structure similar to this {service-name}-{project-hash}-{cluster_level_suffix}.a.run.app In this example, the service name is cloudrunsrv and cloud_run_hostname is cloudrunsrv-cvrukhkgjq-uc.a.run.app We are done with the initial version of ESPv2 Beta.

Before when the service name was deleted it, I could not be used again for up to 30 days, I wanted to have a photo of the error for the blog, but right now I realized that they solved that problem, which is better for the experience of the programmer, I had to change the name of my service name every time I ran terraform destroy command.

Cloud Endpoints Service

I call this part my headache, each one of us has a headache about something, and it is normal when working with advanced topics, this part affected me a lot because I was used to creating endpoints without problems, but passing what Google needed to Terraform was difficult.

resource "google_endpoints_service" "openapi_service" {
  service_name   = "api-name.endpoints.project-id.cloud.goog"
  openapi_config = file("openapi_spec.yml")
}

Endpoints and ESP require the following Google services to be enabled:

gcloud services enable servicemanagement.googleapis.com
gcloud services enable servicecontrol.googleapis.com
gcloud services enable endpoints.googleapis.com

At the moment Cloud Endpoints cannot be configured through the Google console, and I did not have to specify a name for the service. I just had to deploy the Endpoints configuration using gcloud endpoints services deploy command. Apparently the service name had to comply with a specific structure, but it’s confusing.

I specified the name openapi_service.endpoints.project-test-270001.cloud.goog

I specified the name service.endpoints.project-test-270001.cloud.goog

I try to specify a name that I delete before.

It’s strange that Cloud Endpoint accepted the name test.endpoints.project-test-270001.cloud.goog in the first place and didn’t want to acceptopenapi.endpoints.project-test-270001.cloud.goog or other names that I put random. For that reason, I showed you the photos possibly it was a Google error(I’m not sure) that allowed the name and I saw the service name of Endpoints(but I think it doesn’t work) on the console under that name what confused me, and it is possible that other services names will appear even if it shows you an error in the terminal.

In the second image, it explains the error Endpoints service use the name that we specified in the host field of the openapi-functions.yaml file. So, this is the solution:

resource "google_endpoints_service" "openapi_service" {
  service_name   = replace(
     "${google_cloud_run_service.cloudrunsrv.status[0].url}",
     "https://",
     ""
     )
}

We takecloud_run_service_url value that it would be http://cloudrunsrv-cvrukhkgjq-uc.a.run.app and replace http:// string with replace(string, substring, replacement), we are going to configure Cloud Endpoints according to our OpenAPI document, we need to specifycloud_run_hostname in the host field and each of our functions that works with HTTP trigger must be listed in the paths section of the openapi-functions.yaml file, but we don’t want to do it manually. So, we have to put replaceable patterns in the OpenAPI document.

resource "google_endpoints_service" "openapi_service" {
  service_name   = replace(
     "${google_cloud_run_service.cloudrunsrv.status[0].url}",
     "https://",
     ""
     )
  openapi_config = templatefile("${path.module}/openapi-functions.yaml", {
    cloud_run_hostname = replace(
        "${google_cloud_run_service.cloudrunsrv.status[0].url}",
        "https://",
        ""
    ),
    inserdata_functionurl = google_cloudfunctions_function.function_insert_data.https_trigger_url,
    getjson_functionurl = google_cloudfunctions_function.function_convert_xml_to_json.https_trigger_url
}

A view of my editor.

templatefile(path,vars) function reads the file its content as a template using a supplied set of template variables (the content of the document is not modified). According to this Issue status is a list. If you don’t use status[0] you’ll get this kind of error.

Checking the developer portal I see my API, but is it functional? When I send a POST request from send_test_request.py(for the moment we have to specify cloud_run_service_url manually to send the file).

The Google documentation tells us that we have to build the Endpoints service config into a new ESPv2 Beta docker image, and we need a script in our local machine and run it. config_id is the configuration ID created by the deployment usually looks like this 2020-04-01-r0

chmod +x gcloud_build_image

./gcloud_build_image -s CLOUD_RUN_HOSTNAME \
    -c CONFIG_ID -p ESP_PROJECT_ID

We have to tell Terraform to run those commands above. $var search the values inside environments of the same module.

resource "null_resource" "building_new_image" {
  provisioner   = "local-exec" {
     command = "chmod +x gcloud_build_image; ./gcloud_build_image -s $cloud_run_hostname -c $config_id -p  ${local.project}"
  environment = {
    config_id = google_endpoints_service.openapi_service.config_id
    cloud_run_hostname = google_endpoints_service.openapi_service.service_name
  } 
 }
}

Now we have to redeploy the ESPv2 Beta Cloud Run service with the image. Remember I changed the authentication manually in Cloud Run to accept unauthenticated requests, but with the commands, it will be done automatically.

gcloud run deploy CLOUD_RUN_SERVICE_NAME \
  --image="gcr.io/ESP_PROJECT_ID/endpoints-runtime-serverless:CLOUD_RUN_HOSTNAME-CONFIG_ID" \
  --allow-unauthenticated \
  --platform managed \
  --project=ESP_PROJECT_ID

At first, I thought to use cloud_run block to redeploy the new image, but I got this error. Now I use cloudrunsrv500 (I delete the previous one and I have to wait 30 days to use for Endpoints) as a service name and the configurations update automatically if you want to run terraform destroy to eliminate resources that you won’t use or simply change the service name and run terraform apply

I thought to change the cloud_run block to null_resource to deploy the initial version of ESPv2 Beta and then use cloud_run block to redeploy the new image, but if it worked how I get the service name that I need for Cloud Endpoints? The arguments belong to the resource block then I discarded this option.

The option we have left is to use null_resource again to redeploy the new image. Now imagine putting those commands, if you use string interpolation remember that simple quotes do not allow bash to do the replacement. At this moment I didn’t know about<<EOF.

# using string interpolation
config_id = "${google_endpoints_service.openapi_service.config_id}"

The most logical thing is to use <<EOF and put together all the commands in the same block, this way the environment variables are not repeated.

It did not make any changes, and my API still not working. Why didn’t it change? because changing something to the command does not trigger the re-execution if the command worked and the first place.

To solve this problem we have to change the identifier of null_resource and we are done finally! It works. We are going to send our request.

We changed the resource outside of Terraform that’s why we receive this error. I try to use terraform import to bring the current status of the resource, but Terraform does not have it well developed. Although it shows you this error the API continues working.