I use terraform to manage all infrastructure components. And it goes from GCP projects to monitoring stack.
Like many Terraform users, my main difficulty is to manage versions upon multiple deployments, providers and software or APIs versions.
Last week, I had to deploy a code I used one year ago (terraform 0.12) to deploy a predefined dashboard in grafana (with datasource, notification channel, …).
So I decided to upgrade the code to Terraform 1.0+ and started my deployment without success 💣
In this story, I will show you how I applied concepts described in documentation to debug the Terraform provider Grafana using Delve and VSCode in a real life experience.
Let’s take a look to my context
My advice is to reproduce the issue with a very minimal terraform code.
Most of users I worked with, try to debug more than 200 lines of terraform code accross muliple files, resources, ...
What’s more boring than waiting for a terraform plan ?
For this issue, I used only one resource with a function to reproduce my content :
Then I runned my
terraform init and
Stack trace from the terraform-provider-grafana plugin:
panic: Attempted to unmarshal invalid JSON. This unexpectedly got past schema validation.
goroutine 54 [running]:
github.com/grafana/terraform-provider-grafana/grafana.unmarshalDashboardConfigJSON(0x110cae4, 0x24, 0x0)
Three noticeable infrastructure changes on the environment have been made since my last deployment…
- Terraform version 0.12 to 1.0+ (what I want)
- Grafana 7 to 8 (what administrators done)
- Provider version from 1.5 to 1.13.2 (which is required by Grafana 8)
After a quick read of the changelogs, I found that the dashboard json_config changed from Grafana 7 to 8.
Thanks to the sample above (the smallest dynamic content required to create a dashboard) and unit-tests (static content for each of them) in the provider grafana, I can exclude typos in my template.
I found out a github issue is already opened for the same bug :
Plan crashes when undetermined resource used in templated JSON · Issue #246 ·…
jnahelou added a commit to jnahelou/terraform-provider-grafana that referenced this issue Jul 26, 2021 You can't…
Of course, I can add
log.Printfin provider code…
But let’s try to improve the debugging experience with a real debugger tool like delve
Plugin Development — Debugging Providers — Terraform by HashiCorp
This guide documents a few different ways to access more information about the runtime operations of Terraform…
Thanks to great work on terraform-plugin-sdk v2 and examples provided by the community, it’s easy to go !
- Start with the compilation of the provider as described in Terraform documentation :
go build -gcflags=”all=-N -l”
- Configure vscode to attach an already existing debugging session
- Set breakpoint on functions you want to stop (in my case, the function before where the panic is triggered)
- Run debugger session using dlv command :
dlv exec --listen=:8888 --headless --api-version=2 ./terraform-provider-grafana -- -debug
- Connect to the debugging session in vscode (press F5 or run->start debugging)
- On delve logs, you will be invited to export the `TF_REATTACH_PROVIDERS` environment variable
- Now in the shell, export the TF_REATTACH_PROVIDERS, run terraform as usual and follow execution in vcscode.
You can configure terraform to use your local provider :
Let’s go back to my issue…
It seems for a new plan that the StateFunc defined on the
json_config attribut “normalizeDashboardConfigJSON” is called with the “74D93920-ED26–11E3-AC10–0800200C9A66" string instead of the result of my template.
You can refer to documentation to read more about schema helpers :
Plugin Development - Schema Behaviors - Terraform by HashiCorp
Schema fields that can have an effect at plan or apply time are collectively referred to as "Behavioral fields", or an…
In fact, at this stage, Terraform is not able to know the content of the template because the template is defined by an other resource (“known after apply”).
This value comes from the terraform-plugin-sdk itself :
// UnknownVariableValue is a sentinel value that can be used
// to denote that the value of a variable is unknown at this time.
// RawConfig uses this information to build up data about
// unknown keys.
terraform-plugin-sdk/values.go at v2.7.0 · hashicorp/terraform-plugin-sdk
Terraform Plugin SDK enables building plugins (providers) to manage any service providers or custom in-house solutions…
As a workaround, I suggested to maintainers in a pull request to replace the
panic() by an error handler, and skip this step if the json_config format is not expected.
It’s now merged, I’m happy it helps !
There is no magical ways to avoid this situation. Applications evolve, users ask about new features, security issues must be fixed, … But you want to choose when to apply them.
- Use version-constraints for modules, providers and Terraform to avoid unpredictable deployment issues
- Schedule Terraform plans using Terraform cloud API and monitor the result
- Follow upgrades !
During my Grafana upgrade from 1.5 to 1.13, I skipped important versions regarding the dashboard format in the tfstate from slug to uid.
Now I have to update manually my tfstate for all existing workspaces..