Accuracy and Completeness of Terraform’s HCL Data Format

HCL is Hashicorp Configuration Language

Peter Bi
4 min readJan 7, 2024

JSON is the predominant data interchange format. It is precise and consistent, which means that a programmed object can be encoded into a plain-text JSON string, transferred, and then decoded into the same object at the destination.

JSON is fully capable, signifying that we can encode any object from any programming language into JSON, and vice versa.

Lastly, JSON is objective and independent, as the logic to decode and encode a JSON string is universally applicable to all objects without any exceptions or special cases.

However, JSON does have a few drawbacks:

  • It does not support comments within the data.
  • It does not accommodate variables, functions, or logical expressions.
  • More than half of the data blocks in JSON are key-value pairs where the keys are strings. JSON requires these string keys to be quoted, which can be cumbersome. In practice, we often prefer unquoted keys for simplicity and readability.

HCL is a data format that enhances JSON by incorporating the aforementioned features. So far, it’s primarily used by HashiCorp in their infrastructure provisioning products such as Terraform and Nomad etc. However, it has potential for broader application, such as serving as the backend for no-code platforms.

As a data format, does HCL maintain the same completeness and accuracy as JSON? In other words, can we convert HCL to a JSON format and restore it back accurately and uniquely?

The answer isn’t entirely clear. Terraform is specifically designed for convenience and relevance in infrastructure provisioning, so some HCL features depend on underlying objects. In a previous article, I introduced the package github.com/genelet/determined, which converts among the three formats HCL, JSON and YAML. Since version v1.8.x, it has implemented 2 minor but significant features, bringing HCL a step closer to being as complete and accurate a data format as JSON.

Here they are.

1. Array Key of Multiple Labels

Utilizing array keys of string labels is highly convenient and commonly used in Terraform and HCL. For instance:

resource "aws_instance" "example" {
instance_type = "t2.micro"
ami = "ami-abc123"
}

Here, the data structure is identified by an array key named resource, which has two labels: aws_instance and example. The value of the key, a map, contains two variables: instance_type and ami.

Since JSON does not support keys of array type, the equivalent JSON string must be expressed as a nested map of maps:

{
"resource": {
"aws_instance": {
"example": {
"instance_type": "t2.micro",
"ami": "ami-abc123"
}
}
}
}

Clearly, using an array key in HCL is more intuitive and readable than using a map of maps in JSON.

But how deep can an array key be? For instance, how would we interpret the following five-level map in JSON using HCL?

{
"resource": {
"aws_instance": {
"example": {
"lifecycle": {
"create_before_destroy": true
}
}
}
}
}

We may express it as

  resource "aws_instance" "example"  {
lifecycle {
create_before_destroy = true
}
}

where lifecycle is a variable name within the value of the two-label array key resource.

Alternative, we could write:

  resource "aws_instance" "example" "lifecycle" {
create_before_destroy = true
}

In this case, lifecycle is the last element in the three-label array key.

The official document leans towards the first interpretation, implicitly suggesting that the size of the array should be limited to two, with any additional labels placed within the value body.

The latest version of determined adopts this rule, converting a map of maps in JSON to a two-label array key. Future HCL specifications could potentially allow for an unlimited size of the array key, but impose a size limitation in dialects like Terraform.

Currently, HCL supports array keys up to two-labels.

2. Array or Map?

Another feature addressed in determined pertains the following JSON :

{
"resource": {
"aws_instance": {
"example": {
"provisioner": [
{
"local-exec": {
"command": "echo 'Hello World' >example.txt"
}
},
{
"file": {
"source": "example.txt",
"destination": "/tmp/example.txt"
}
},
{
"remote-exec": {
"inline": ["sudo install-something -f /tmp/example.txt"]
}
}
]
}
}
}
}

Here provisioner is anarray of maps containing three elements.

Terraform suggests the following HCL to match the JSON:

resource "aws_instance" "example" {

provisioner "local-exec" {
command = "echo 'Hello World' >example.txt"
}
provisioner "file" {
source = "example.txt"
destination = "/tmp/example.txt"
}
provisioner "remote-exec" {
inline = [
"sudo install-something -f /tmp/example.txt",
]
}

}

2.1 ) Anonymous Parsing

If we don’t know the struct type of provisioner, and need to parse the array data anonymously, the above HCL is inconsistent with most other cases where data enclosed within curly bracket { and } are always interpreted as maps.

determined converts it into the following HCL, specifically assigning square bracket[ and ] as array markers to the provisioner body:

  resource "aws_instance" "example"  {
provisioner = [
{
local-exec = {
command = "echo 'Hello World' >example.txt"
}
},
{
file = {
source = "example.txt"
destination = "/tmp/example.txt"
}
},
{
remote-exec = {
inline = [
"sudo install-something -f /tmp/example.txt"
]
}
}
]
}

If we convert this HCL string back to JSON using determined, we will retrieve the original JSON.

Therefore, we propose a rule in HCL:

Sorted array should be enclosed in square bracket if parsed anonymously in HCL.

(Note that in the current HCL parsing package, data blocks within square bracket must be assigned with the equal sign = )

2.2) Struct Parsing

If provisioner is a defined struct, like in Terraform, a data block like

  provisioner "local-exec" {
command = "echo 'Hello World' >example.txt"
}
provisioner "file" {
source = "example.txt"
destination = "/tmp/example.txt"
}
...

will be parsed as a map of struct, or an array of struct with one label, depending on how the outer struct defines it in determined.

Enjoy determined!

--

--