When to allow your environment to drift? and other edge cases for Terraform.

Published in

Google Cloud - Community

6 min readNov 21, 2022

Terraform, the industry standard for provisioning Infrastructure in the cloud is great for many reasons — its Open Source, cloud agnostic, enables single click instantiations and immutable infrastructures. This brings us to the problem at hand, which goes hand-in-hand with one of the key benefits of Terraform i.e. immutability and environmental drift.

Let’s start by talking about environment drift. Environment drift is the phenomenon where the Infrastructure code and the real-time environment state do not match. Thereby causing a drift. If you work in the Cloud-native space you know this is frowned upon and rightly so, because it goes against the Cloud-native principle that your Infrastructure Code should be the single source of truth for your environment. But is there an anti-pattern to this?

Let’s look at a scenario where we wouldn’t want environment drift to be corrected. Let’s say our Infrastructure code states that we have 3 labels — LOB name, Cost centre and Billing ID. Imagine that you are running a self-serve platform on a public cloud and use these labels to charge the appropriate department for their cloud consumption. These labels are persistent and are created as part of the automated infrastructure instantiation pipelines. And are therefore tracked in the “terraform.tfstate” file which is Terraform’s map of real world resources to your configuration, keeps track of metadata; in simpler terms it captures the real state that your environment was in when Terraform “apply” or “refresh” was last run. If one of these labels’ key-value pairs were to be updated from the console or CLI, the environment state would not match the “terraform.state” - causing a drift.

As a result, if you re-run terraform for the environment, it will re-write these values to match the Infrastructure code. This is indubitably one of the strongest abilities of Terraform and why it is so widely accepted. And it also brings us to the edge case where let’s say a re-org happens (Does happen more than we’d like, right?) and the cost centre and Billing ID have been changed as a result. This is a critical update that will determine who to bill for the respective resources and it could also be time sensitive if your billing cycle is ending soon. There is also another factor to consider that these changes might be frequent.

We have 2 options, go and update the Terraform code to reflect the change or go and update the key-value pair via the console or CLI.

Updating the values outside of Terraform might be favourable in the above scenarios where getting the code updated is an overhead and / or the resources that manage the Cost Centre and Billing ID’s want the freedom to to manage these resources in the cloud without engaging the team that manages Terraform.

Luckily, Terraform thought of and accepted this edge case and built a feature for it.

Enter (drum roll, please) “The lifecycle Meta-Argument”. lifecycle is a nested block that can appear within a resource block. The lifecycle block and its contents are meta-arguments.

Note: It is available for all resource blocks regardless of type.

The argument within the lifecycle resource block that is specific to our edge case here is the ignore_changes meta-argument.

“By default, Terraform detects any difference in the current settings of a real infrastructure object and plans to update the remote object to match configuration.”

The ignore_changes meta-argument is intended to be used when a resource is created with references to data that may change in the future, but should not impact said resource after its creation. In some rare cases, settings of a remote object are modified by processes outside of Terraform, which Terraform would then attempt to "fix" on the next run. In order to make Terraform share management responsibilities of a single object with a separate process, the ignore_changes meta-argument specifies resource attributes that Terraform should ignore when planning updates to the associated remote object.

The arguments corresponding to the given attribute names are considered when planning a create operation, but are ignored when planning an update. The arguments are the relative address of the attributes in the resource. Map and list elements can be referenced using index notation, like tags["Name"] and list[0] respectively.” — from Terraform webpage

The text above means that the attributes that are added to the ignore_changes meta argument will still be created and destroyed on “terraform applies and destroys” but will not be updated (re-written or fixed) on subsequent “terraform apply”. This exactly solves our problem. Whenever a business event leads to a change in the “Cost center” and “Billing ID”, the associated FinOps Team(s) can go and update the key:value pairs of these labels without having to worry about Terraform and the associated delay(s).

Here’s what that would like in action:

The other arguments available within a lifecycle block are create_before_destroy, prevent_destroy and replace_triggered_by. All of these arguments have unique edge cases that they are apt for. Lets look at them on a high level.

create_before_destroy : By default, when Terraform must change a resource argument that cannot be updated in-place due to remote API limitations, Terraform instead destroys the existing object and then creates a new replacement object with the new configured arguments. The create_before_destroy meta-argument changes this behaviour so that the new replacement object is created first, and the prior object is destroyed after the replacement is created. This can be useful for edge cases where we do not want downtime i.e. always want a copy of the resource to be running.

prevent_destroy: This meta-argument, when set to true, will cause Terraform to reject with an error any plan that would destroy the infrastructure object associated with the resource, as long as the argument remains present in the configuration. This can be used as a measure of safety against the accidental replacement of objects that may be costly to reproduce, such as business critical compute. prevent_destroy provides an added layer of protection against accidental deletion, or for resources that you want managed outside of Terraform. The destruction of resources that have the meta-argument set to “true” requires an additional step confirming that you “really” want the resource deleted. You would either make a change to the Terraform code that would go through a PR process before it gets committed or you would have to go to the console and delete the resource manually (which ideally only a handful people would have permissions to do).

replace_triggered_by: Replaces a resource when any of the referenced resources are changed. Note that change here includes modification and deletion. You have the ability to supply a list of expressions referencing multiple resources.

References trigger replacement in the following conditions:

If the reference is to a resource with multiple instances, a plan to update or replace any instance will trigger replacement.
If the reference is to a single resource instance, a plan to update or replace that instance will trigger replacement.
If the reference is to a single attribute of a resource instance, any change to the attribute value will trigger replacement.

For example, in the below scenario we want the Load Balancer and Instance Template resources to be coupled with each other. If we remove the instance template we definitely wouldn't need the backend service anymore.

I was prompted to write this article as I feel that the edge cases that these meta-arguments set out to address are becoming more and more common as larger enterprises adopt Terraform. Hope this article helps reduce the time it takes for you to discover that the edge case you are working with has been thought of.

To learn more about the lifecycle meta-arguments and their implementations, visit the official Hashicorp page below.

https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle

When to allow your environment to drift? and other edge cases for Terraform.

Written by Vipul Raja