Developing a public Terraform provider — Part 3: data representation

Steve Strutt
IBM Cloud Infrastructure as Code
7 min readMar 25, 2019

Resource data representation in Terraform

Part 3 in this series of writing Terraform providers. In this post I look at the data representation format used by Terraform and its relationship to HCL, the cloud resource API and the Schema. Link to the previous post on the structure of a provider.

Where do you start writing a Terraform resource provider? My experience is that it starts with the REST API for the resource. It is only once the developer understands the API; the functions it implements, the data passed and returned and how it behaves can a developer move on to writing the provider to configure the resource. But before I look at the client API in this series, it is necessary to take a deeper look at the provider data model and how it relates to the client API.

Photo by John Barkiple on Unsplash

The first time I wrote a provider, I followed the section on Writing Custom Providers, building the example. It is purposely a simple example to get started. Though the doc left me with unanswered questions as to what the other elements of a provider are and how to interact with the client API. The recommendation in Next Steps is to reference the source code of the internal providers for examples. This was quite a leap and required a lot of reading code and experimentation. To help bridge this gap, I will dig deeper into the internals of a resource provider and its data model in the next few posts.

What makes up a provider?

From a 60,000 ft viewpoint a provider implements Create, Read, Update and Delete (CRUD) functions to configure cloud resources. In terms of understanding what functions a provider implements this is the starting point. It is also where the Terraform documentation starts at in Writing Custom Providers.

But how are these functions implemented? The hint is the documentation on Terraform Schemas. A large part of a provider is about data transformation to and from the data model defined by the schema. With the cloud API interaction code isolated to the client library package, most of the logic and coding of a provider is data transformation as input to and output from the client API CRUD functions.

I find visual representations easier to understand. The following schematic is a simplified illustration of the Terraform provider data structures, data flows and transformations.

On an apply, resource configuration parameters go through several transformations from HCL to Terraform internal representation; to native Go data structures before finally conversion to JSON to be passed to the cloud services’ REST API. JSON data read from the cloud REST API, describing the resulting resource configuration, follows the reverse path before being written out as JSON in the Terraform state file.

Central to these transformations is the Terraform resource schema. It defines how resource attribute data will be represented internally within Terraform as key value pairs and parsed from HCL. This key value pair format is also how Terraform core expects to receive state information from the client API library. As Terraform is written in Go, an additional transformation is introduced. Data is passed between the provider package and client library functions as Go structs.

The general Terraform execution flow for a Create or Update operation is shown below:

HCL > Parsed > Internal Representation > Expand > native Go struct > Marshal > API JSON

Similarly the execulation for a Read operation:

API JSON > Unmarshal > native Go struct > Flatten > Internal Representation > State JSON

Marshal and Unmarshal are the built in Go functions used in the Go REST client libraries to convert to and from native Go structs to JSON and back.

Expand and Flatten are Terraform data transformations implemented in the provider by the developer. They exist to convert to and from the HCL config parameters represented as flat key value pairs to the native Go structs required by the client API library. These functions are briefly introduced in Implementing a more complex Read.

The provider CRUD implementations consist of a series of expanders or flatteners for each of the resource configuration parameters and objects. These ‘expand’ the key value pair data Terraform parsed from the HCL, into arrays/slices in native Go structs to pass to the client API library functions and similarly ‘flatten’ on return. A large part of the developer’s role is to create the flatteners and expanders that are specific to the resource’s configuration parameters. More on flatteners and expanders in the next post.

Terraforms’ flat data representation

One of the aspects of Terraform that initially surprised me was that resource and state data is stored in a ‘flat’ form. All Terraform resource attribute data is stored internally as strings in flat key value pairs. There are no nested structures. JSON Strings, Ints and Bools easily transform to key value pairs. Object lists and arrays in API JSON or HCL must be ‘flattened’ into multiple key value pairs as in the following example.

“check_regions”: [ ”WEU”, “EEU” ]

check_regions.# = 2
check_regions.1997072153 = WEU
check_regions.3284254904 = EEU

Each resource attribute in the HCL array becomes a key value pair, with a unique key being assigned for each value. The type of key used depends on how the data array is described in the Schema as a List, Set or Map. The example here is a ‘Set’, where a unique 9 digit key is assigned. I will look at Lists, Sets and Maps in more detail in a future post, along with their associated flatteners and expanders.

To a Terraform user this flattened format is most evident when using interpolation with resource attributes or configuring outputs. Though a user would be forgiven for thinking that they are accessing a native Go data structure using array/slice syntax.

${data.ibm_subnet.example.0.cidr_block}

In this example the schema defines that the subnet resource has one or more CIDR blocks in a ‘List’ format, accessed using a numeric key. The best way to gain an appreciation of the native schema format is to list the Terraform state file with `terraform show`. The example here is the IBM Cloud Internet Services (CloudFlare) origin pool resource with a single item in the flattened array of origins objects.

ibm_cis_origin_pool.lon:
id = 12e…..c4a5::
check_regions.# = 1
check_regions.1997072153 = WEU
cis_id = crn…..c4a5::
description = LON pool
enabled = true
minimum_origins = 1
monitor = ad9…c4a5::
name = lon02
notification_email =
origins.# = 1
origins.3284254904.address = lb1–1530547-lon02.lb.bluemix.net
origins.3284254904.enabled = true
origins.3284254904.name = lon02
origins.3284254904.weight = 0

Provider data representations

How do HCL, API JSON, the internal format and the schema relate? To illustrate the transformations and their relationship to the schema, I’ve again used the example of the IBM Cloud Internet Services origin pool resource. I have purposely omitted some parameters and fields to emphasize the commonality between the three data formats and the schema. In practice the schema, HCL and internal representation are all derived from the API JSON.

The API JSON is determined by the developer of the cloud API. It is a fixed starting point for the provider developer to define the schema and other representations. If you compare columns one and four, you can see that for this resource the HCL maps directly onto the client API JSON data structure. In fact, there is a one to one correspondence.

The HCL could be structured and named differently to the API, but I’ve found the IBM Cloud API JSON to be well structured and the one to one mapping reduces the effort of writing the provider. For IBM Cloud, the CLI and UI interfaces are implemented using the API, so the parameter names and structure are already familiar to users and this is carried over into the HCL.

There is a similar correspondence between client API JSON and the Terraform key value pair representation in column three for the simple data types; string, int and bool. The complex JSON data types of nested arrays and objects (lists) are represented in their ‘flattened’ format (Set, List, Map). This is can be seen in column three for the array of ‘origin’ JSON objects. In column two the origin object is defined as schema.TypeSet and determines the flattening in ‘set’ format in column three. The type assignment for the array is important as it determines the resource configuration actions Terraform takes if any of the parameters in the HCL array are changed. More in the next post.

So where does the developer start?

For the provider developer, the schema and data representation touch all aspects of provider development. This goes from defining the HCL syntax, the coding of expanders and flatteners, right through to the specification of provider unit/acceptance tests.

The schema is central to writing a provider, but as I hope I have illustrated the client API and library comes first. Once the client API JSON data structures have been codified as native Go structs in the client library, the schema can be defined, and the required flatteners and expanders written.

Before getting to the client API library, I will continue this top down exploration of the Terraform provider. In the next post I will take a deeper look the resource schema and its role in the Terraform user experience.

--

--

Steve Strutt
IBM Cloud Infrastructure as Code

Reflective and exploring what life is and means. Reorienting after a life filled by busyness and expectations.