Developing a public Terraform provider — Part 4: resource schema

Defining resources and behaviours with the resource schema

Steve Strutt
IBM Cloud Infrastructure as Code
8 min readMay 19, 2019

--

What attributes and input values are accepted by the provider? Can Terraform modify a resource attribute using the cloud service API, or must it delete and recreate the resource? In this post I look at the role of the provider Schema in the Terraform user experience and customizing its behavior for the cloud provider under development.

Photo by Artem Beliaikin on Unsplash

As Terraform core is intentionally vendor agnostic it has no built-in understanding of the internal implementation of a cloud resource and API operation. With no UI to direct the user writing configs, it is the provider schema that instructs Terraform how to validate user input. Additionally, the schema determines the actions performed (behavior) via the service API when attribute updates are requested, or the provisioned resource is found to have changed. As before the schema definition starts with the resource API.

The following is an example of the documentation for the IBM Cloud Internet Services API to create a new origin pool. A Swagger/OpenAPI definition could likewise be used as the source reference.

It is a typical JSON data structure with strings, bools, ints and arrays. The schema is essentially an enriched version of this data structure that determines how Terraform’ interprets the config as input to the API. Here is the CIS provider schema equivalent.

The JSON key value pairs are represented in Terraform as Schema `Types`. The schema here defines additional `Behaviors` that are not represented in the API data structure. Some behaviors are self-evident, `ValidateFunc`, `Required`, `Optional` and `Default` attributes. Note the example used here is the first resource I wrote and is poor in comparison to best practice as validation is largely omitted.

Array handling introduces new terms. Here `check_regions` is defined as a type of `TypeSet`, a list of elements of type string. TypeSet is part of a group of attributes known as ‘aggregate types’ which inform Terraform how to handle arrays of attributes. More on this later.

As the documentation covers Types and Behaviors in detail, I will limit myself to covering observations from developing the provider for IBM Cloud resources.

Resource Attributes

At a high level the schema parameterizes the configuration attributes accepted by the cloud service API. Individual attributes are defined in key value pair format; the data type; any data input validation to be performed; whether it’s a required input parameter or if it’s a read only output value.

Schema Types

Attributes can be one of 7 different types, four primitive and three aggregate. From the doc ‘Primitive types are simple values such as integers, booleans, and strings’. They are as we would expect in any programming language. ‘Aggregate types form more complicated data types by combining primitive types’, again they are familiar, arrays of objects, as lists and maps. I will cover the behaviors of aggregate types in more detail later, as I didn’t find this obvious in the documentation.

Schema Behaviours

Behavior fields determine how Terraform responds at plan or apply time to attribute input values or changes. There are several additional behaviors including, `ConflictsWith`, `Sensitive`, etc. that are documented in the schema helper library. I strongly recommend going to the code, as it documents far more about writing providers than you will find on the website. The other great source of information are the issues in the Terraform Github repo.

The flags `Required` and `Optional` are self-describing for input fields. `Computed` is the term used to describe an output field, where a value is returned by the cloud API. `Default` provides a default value if one is not specified.

One non-evident usage was an optional input field, but for which the API always returns an output value. The API may have an internal default, or it supports two mutually exclusive attributes inputting the same data. The solution I found was ‘Optional: true’ with ‘Computed: true’. Without the computed flag, after an apply, a plan always returns a diff indicating a change. Its right there is a change as no input value was specified. But it’s not helpful. A `DiffSuppress` function is another way of handling this.

Input Validation

A major function of the schema is to enable Terraform to catch attribute input errors early before performing a plan or apply. Validation is vital in the users experience of automating cloud infrastructure with Terraform. Though it is not essential. As Terraform and providers are idempotent, errors can be corrected and the operation retried without undue effect. But as a user its annoying and time consuming when an apply fails due to a typo or invalid character. I would have liked to know earlier.

Input validation should to be robust, but it is a double-edged sword. Cloud services and APIs change. Firstly, there is the developer overhead of keeping the validation routines in the provider updated. Secondly, users have to wait for a new release of the provider. Where change is frequent and expected, patterns are better than absolute values to allow for extension. As I’ve highlighted in previous posts a detailed knowledge of the service API is essential.

In the absence of a UI, validation and the provider documentation are the users’ best friend. To handle the complex data types required, Terraform includes a set of built in validation functions for many of the common cloud provider input parameters, such as CIDR formats, `IPRange` and `validateAllowedStringValue`. The latter I found particularly useful. If your API has custom attributes, you can write your own validation functions using the built-in functions as examples or use regex with `validateRegexp`.

Lifecycle operations

During plan/apply the schema determines how Terraform responds to attribute changes. Be these explicit user changes in via the source configuration or detected drift of the cloud resources from the current configuration. Can an attribute be changed, or must the entire resource be deleted and recreated? This is defined in the schema.

Most APIs support the modifying of existing resources in place. Attributes that cannot be changed require the whole resource to be deleted and recreated. `ForceNew` externalizes this requirement from the provider to inform Terraform to call the provider resource delete and create functions, rather than update.

Changes to standalone independent attributes are relatively easy for the code to identify and action. More challenging are changes in nested attribute groups, lists and repeating groups of attributes, such as firewall rules. Can each attribute in a rule be changed separately, or must the entire rule be deleted and recreated? Is the order of a list important or not?

For these Terraform uses complex aggregate types.

Aggregate Types

Prior warning: HCL2 and Terraform 0.12.x introduce a richer type system in this area of aggregate types. Many of the observations here for 0.11.x releases. See Terraform 0.12 Compatibility for Providers for some discussion on the changes. I recommend also looking at the issues in the Terraform Github repo.

One of the important roles of Aggregate types is to tell Terraform how to store and access attribute data in the state file. Hence what the index is to an attribute, whether a single attribute can be directly accessed or they have to be treated as a block for update or creation purposes. There are frequently multiple choices over how attributes can be represented, the decision being based on the simplicity of the provider coding and the restrictions of 0.11.x releases. I suggest freely plagiarising from the many examples in Github.

TypeMap

This is a key based map (also known as a dictionary) with string keys and values. TypeMap items are stored in state with the key as the index.

TypeMap is reasonably self-evident as to the type of attributes it can be applied to. Though in 0.11.x releases it supports only primitive types and does not allow nesting of aggregate types. In 0.12.x nested aggregate types are supported. Check the issues log in the Terraform Github repo for more details about the changes to TypeMap.

TypeList

TypeList is used to represent an ordered collection of items, where the order of the items can affect the behavior of the resource being modelled. An example of ordered items would be network routing rules, where rules are examined in the order they are given until a match is found.

TypeList ensures that Terraform preserves this structure in strict order when its stored and retrieved, and that the provider updates the entire list as a whole when any item changes. I’ve found that TypeList only tends to be needed for older devices or services, or for short lists. If the list is likely to be long, the service API itself usually supports indexing and each element can be updated independently. In that case the simpler TypeSet attribute can be used with an unordered list. TypeList items are stored in state in a zero-based index data structure.

TypeList with nested element schema

TypeSet

TypeSet is used to represent an unordered collection of items and implements mathematical set behaviour. Set helper functions Difference, Intersection and Union can be used in the provider code to create new sets and identify differences. With sets, the ordering of the items is unimportant and has no impact on the behavior of the resource. The elements of a set can be any of the other types allowed by Terraform, including another schema.

TypeSet items are stored in state with an index value calculated by the hash of all of the attributes of the set. If anything about the element changes, be it a primitive type or a complex aggregate, a new hash is generated for the element. Terraform treats it as a completely different element, where the old version of the element has to be deleted and the new version created. An example is a list of whitelist IP addresses. The Difference function can be used to determine both the IP’s that have been added to the list and those to be removed. The provider then individually adding and deleting IPs.

TypeSet with nested schema

So where does the developer start?

When it comes to defining the schema, the cloud service API is again the starting point. The schema enriches the raw API parameters with behaviors that affect both the user experience and how Terraform interacts with the provider.

We are getting closer to the service API and client library. But before that in the next post I will take a deeper look the implementation of Create, Read, Update and Delete functions for a resource and the role of flatteners and expanders.

--

--

Steve Strutt
IBM Cloud Infrastructure as Code

Reflective and exploring what life is and means. Reorienting after a life filled by busyness and expectations.