Terraform at LumApps: Part 3

Jerome Pin
LumApps Experts
Published in
5 min readJun 10, 2024
Photo by Julia Craice on Unsplash

In our first two articles we talked about the structure of our code and our platform.

As a reminder, we currently have 15 cells (where the entirety of the LumApps product is running) and around 60 services in each cell.

It means we have :

  • 15 terragrunt.hcl files for every service.
  • Around 900 terragrunt.hcl files (15*60) in total.

We saw how we built our repo to manage such a scale and how we rely on our in-house cli infracli to handle tedious tasks such as editing all terragrunt.hcl files for any given service.

So… Life is better in our repository, but life is not great (yet). We still had two issues that arise when we make a change on our main Terragrunt module :

  • How can we refactor the terragrunt.hcl’s inputs and avoid both the tediousness and the human error ?
  • How can we move resources in the Terraform state when their ID is dynamic ?

The answer is simple and elegant. We made a new addition to our infracli tool : migrations !

Variable refactoring & human error

Let’s take an example for this one.

In front of our database instances, we have PGBouncer running. Each service has its own PGBouncer pods that can be configured:

  • has_pgbouncer (bool, False): whether to deploy PGBouncer for the service.
  • pgbouncer_transaction_mode (bool, False): To use transaction pooling instead of session pooling.
  • pgbouncer_resources_requests (map): Describe the CPU and RAM requests to assign to the pods for the Kubernetes scheduler.

Now, we want to modify these variables according to a few rules:

  • Group them under a single map like :
pgbouncer = {
transaction_mode = ...
requests = {...}
...
}
  • Remove the pgbouncer prefix before each variable
  • Rename has_pgbouncer to pgbouncer.enabled
  • Always set pgbouncer.enabled to True

It’s not very hard to do by hand obviously. But it’s very tedious, it’s easy to forget one of the rules, and doing it ourselves doesn’t add any value.

Additionally, since a service configuration could be spread into multiple files (because of possible per-environment and per-cell overrides, as shown in the “Overridable inputs” part of our second article), it adds even more places to make these changes.

We need a solution to make it easy and less error-prone.

State manipulation

Then we have an issue with state manipulation : moving, importing and deleting resources.

Terraform currently provides two ways of manipulating its state safely :

Import and moved blocks seem very cool from an IaC standpoint because they can be written next to the code they pair with, and everyone knows what’s been done to the state. However, references in the state must be static. Here is an example:

We have a simple resource :

resource "local_file" "file" {
content = "LumApps is a cool product!"
filename = "my_file"
}

But then, we want to change its reference in the state to take into account a variable :

variable "cell_name" {
type = string
default = "foo"
}

resource "local_file" "file" {
for_each = toset([var.cell_name])
content = "LumApps is a cool product!"
filename = "my_file"
}

Applying this will force a re-creation of our resource. So of course, we create a moved block :

moved {
from = local_file.file
to = local_file.file[var.cell_name]
}
│ Error: Invalid expression

│ on main.tf line 19, in moved:
│ 19: to = local_file.file[var.cell_name]

│ A single static variable reference is required: only attribute access and indexing with constant keys. No calculations, function calls, template expressions, etc are allowed here.

Terraform cannot do this move operation by itself unfortunately, so we must rely on the terraform state mv command.

For this simple case we would need to write a terragrunt command to make the move. And sometimes, even worse, two commands : one to remove the old resource from the state and another to import the new resource.

Since we have 15 cells at LumApps, we would need to do this 15 times, each time with a little change to var.cell_name.

Multiply this by 60 if we have to do the change for each service and we’d be spending our whole week watching Terraform commands run.

Manipulating the state with such commands has quite a few drawbacks:

  • It’s error-prone
  • It cannot be reviewed in advance
  • It is tedious to write multiple times and adapt every time

Infracli models

In infracli, every “thing” related to terraform is described by a Python class.

Here is an overview of our implementation :

class Cell:
name: str
provider: CloudProvider

get_subfolder(self, name: str) -> Subfolder: ...
class Subfolder(TerragruntFolder):
path: pathlib.Path
terragrunt_file: TerragruntFile
class TerragruntFile(TerraformFile):
dependencies: dict[str, str]
source: str | None
inputs: dict[str, str | dict]
generate: dict[str, Any]
remote_state: dict[str, Any]
before_hooks: dict[str, Any]

def read_file(self) -> str: ... # read the file's content
def load(self) -> dict: ... # parse the HCL content and store into attributes
def write(self) -> TerragruntFile: ... # write the attributes back into the file as HCL

From any larger element, we can easily get deep down the stack. Starting from a cell, it’s easy to reference a subfolder’s input :

cell.get_subfolder("foo").terragrunt_file.inputs.get("has_pgbouncer")
# => True

Infracli migrations

With our two issues in mind, we settled on running some kind of command(s) before Terraform would run, with access to the whole Terraform context : state, variables, etc.

Armed with our Python classes, our life is quite easy when we want to modify anything in our repository.

Based on our previously-explained issues, we ended up building two classes for our migrations : one to modify the state (TerraformSubfolderStateMigration) and one to modify a module’s inputs (InputsFileMigration).

Then we can create subclasses from each of them based on our needs, and write our migration. Here are two migrations corresponding to our two earlier examples :

class MigratePGBouncerFlatVariablesToObject(InputsFileMigration):
"""Convert pgbouncer_* variables to a pgbouncer object."""

TERRAGRUNT_MODULE_MIN_VERSION = "6.0.0" # The version of our terragrunt module required to run this migration

# What will happen with our current variables
VARIABLES_MAPPING = {
"has_pgbouncer": None, # Remove has_pgbouncer
"pgbouncer_transaction_mode": "pgbouncer.transaction_mode", # Put transaction_mode inside the pgbouncer map
"pgbouncer_resources_requests": "pgbouncer.requests", # Same
}

# Run only if the subfolder is recent enough AND if the keys declared in the mapping above are in the terragrunt.hcl file.
def is_necessary(self, input_file: TerragruntFile) -> bool:
return (
self.is_above_minimal_module_version()
and input_file.has_variables_requiring_migration()
)

# Change the variable according to the mapping above then re-write the terragrunt.hcl file and format it.
def migrate(self, input_file: TerragruntFile) -> None:
input_file.inputs = self.remap_variables(input_file)
TerragruntFile.write(input_file.path, input_file)
input_file.format()
class MoveLocalFileInState(TerraformSubfolderStateMigration):
"""Move the local_file to support the for_each addition."""

# Manipulate the state for the subfolder
def migrate(self, cell: Cell, subfolder: Subfolder) -> None:
self.terragrunt_state_mv(
old_id="local_file.file",
new_id=f"local_file.file[{cell.name}]",
)
...

Conclusion

We made a few proof-of-concept about such migrations over the years, with more-or-less satisfaction.

Most of the time, we thought it would require a small bash script written in 5 minutes. And we would end up with an unmanageable monstrosity, feared by everyone. It was never safe to run and re-run and was always missing some features.

With the design of infracli well thought out, we finally had all the cards we needed to write great migrations, and we did !

Nowadays, this is a very appreciated part of our toolbox which solves a lot of our previously tedious tasks and we are very happy with it !

In addition to the peace of mind we gained, the investment is paying off and it saved us countless hours already.

--

--

Jerome Pin
LumApps Experts

Platform Engineer @LumApps working with Terraform, Kubernetes and Elasticsearch.