Terraforming MediaMarktSaturn

Published in

MediaMarktSaturn Tech Blog

12 min readFeb 7, 2023

by Rene Schach and Anna Käfferlein

At MediaMarktSaturn (MMS) our cloud infrastructure is running on the Google Cloud Platform (GCP) and Microsoft Azure. This article describes how we use Terraform to provision and manage this infrastructure. It will also shortly cover our history and explains the reasoning behind the setup that is implemented today.

Background and evolution process

When we made our first steps into the awesome world of the cloud, the concept of Infrastructure as Code (IaC) was not really present (yet) at MediaMarktSaturn. This had quickly become an issue for obvious reasons (this nice post explains the benefits of IaC) and we decided to overcome the dark times of manual clicking and configuring resources in different portals and UIs (e.g. the Google Cloud Console) by introducing Terraform as our IaC tool. It was chosen because of its independence of the present cloud providers and its easy extensibility (there are providers for nearly everything, you can even automate your pizza delivery!). We started implementing Terraform modules intended to be used across the organization for creating common resource types (GCP projects, Google Kubernetes Engine clusters, GCP VPC networks/subnets etc.).

As we are an organization which is separated into a lot of different product teams, the next step was to introduce a unified onboarding process to enable them to use Terraform in an easy way that requires as less setup as possible on their side. This process and its necessary solutions should be maintained by a central team to ensure traceability and to avoid uncontrolled/untracked usage. As we did not find any publicly available tool that fulfilled all these options, we implemented a customized solution.

The main component was a central API deployed as a GCP Cloud Function (the so-called “orchestrator”) that had different endpoints used for:

an initialization process: whenever a team wanted to get onboarded, this endpoint was called. It provisioned:
a GitHub repository that was intended to contain the Terraform setup of the team’s cloud infrastructure
a folder in our GCP organization dedicated to the team; the team members were granted read permissions on that specific folder (there was no Azure setup back then because it has not been considered at that time)
an additional GCP project placed in the created folder; in this project the actual Terraform runs (plans/applies) were executed as GCP Cloud Builds
endpoints for all Terraform actions (plan/apply/destroy): each endpoint was used to trigger the respective action. The API created a Cloud Build running in the aforementioned GCP project. The Cloud Build checks out the Terraform GitHub repository (also created by the initialization process/endpoint) and plans/applies/destroys the infrastructure defined in the code

The rise of a new solution

The previous paragraph might give you an idea on how much maintenance work needs to be done in order to keep everything up and running — and up to date. However, the solution was usable and accepted within the company and was running fine.

However, the implementation had its disadvantages:

decentral setup: each team has its own Terraform execution environment (GCP project), no good visibility or traceability from a management perspective
no developer workflow integration: all Terraform operations have to be triggered manually when pushing changes
auditing/traceability: no real Terraform run history available that is connected to the Git history
governance/security: enforcing guidelines and restrictions on created infrastructure is not possible

All these aspects started to bother us over the course of time and we began to think about finding a new solution. And then one day, we stumbled upon the beautiful Terraform Enterprise.

The setup running today

Terraform Enterprise (TFE) is the self-hosted solution of Terraform Cloud. We decided to go with the self-hosted model because Terraform Cloud did fulfill all existing MediaMarktSaturn requirements, e.g. regarding data storage locations. TFE basically covers all points that were mentioned in the section above. Its purpose is to have one central platform that securely stores the Terraform state and executes all Terraform runs and operations across an entire organization.

Its essential building blocks are the so-called workspaces in which the Terraform state is stored. Each workspace is connected to some “chunk” of Terraform code (e.g. everything regarding the setup of a GCP project) that is stored in a GitHub repository. These are so-called VCS-driven workspaces which is the only supported workspace type in our implementation. There are other types available (CLI- and API-driven) that have been excluded because they did not have any use case in our desired setup.

VCS-driven workspaces integrate nicely into a standard Git-driven developer workflow. Once you create a pull request to merge changes into the main branch of a repository connected to a Terraform Enterprise workspace, each commit will automatically trigger a Terraform plan run and post the result as a status check to the pull request (via a standard webhook integration created by TFE itself).

TFE furthermore provides the tool Sentinel for enforcing policies on the infrastructures managed by Terraform (see You shall not pass paragraph for a deeper insight). Another feature is a central private module registry that is used to host our internally developed modules and to prevent the usage of other private module sources.

That’s nice, let’s go!

One of our main goals in our setup was to enable a self-service onboarding to TFE so that the process is as easy as possible for the product teams across MediaMarktSaturn. We started this by creating a single organization in Terraform Enterprise called mediamarktsaturn that contains all workspaces.

The essential part of the new setup is again an “orchestrator” component (Remember the old one mentioned earlier? This one is way cooler!). The TFE orchestrator is a VCS-driven workspace on its own connected to an internal GitHub repository. In the repository, we defined a set of resources via Terraform that is created when a team gets onboarded. The onboarding process can be triggered by simply creating a pull request to the repository and submitting a JSON configuration file containing all required information. The review of the initial pull request is done by the central MediaMarktSaturn Cloud Platform team that manages the TFE installation.

The onboarding process also requires setting a code owner to the JSON configuration pointing to a specific GitHub team. This way product teams can review all further changes to their JSON configuration file on their own and do not require any approval from the central team anymore. To avoid unwanted changes (e.g. a team creates a PR adding 327 workspaces) automatic pull request status checks have been implemented as GitHub actions workflows preventing the merge of such changes.

Here is an excerpt of an example JSON configuration file defining the TFE setup of one product team:

{
  "customer": "tfe-customer",
  "github_team_name": "tfe-customer-git",
  "teammembers": [
    "karl@mediamarktsaturn.com",
    "larry@mediamarktsaturn.com"
  ],
  "workspaces": [
    {
      "name": "dev",
      "auto_apply": false,
      "config_file_name": "dev.tfvars",
      "working_directory": "infra",
      "provider": "gcp",
      "branch": "",
      "terraform_version": "1.3.0"
    }
  ]
}

The following definition list explains each top-level attribute of the JSON configuration file. All attributes are mandatory.

customer : Name of the customer
github_team_name : Name of the GitHub team
teammembers : List of team member email addresses to be added in MediaMarktSaturnT TFE org, Azure AD groups if an Azure workspace is defined, customer’s TFE Google Group if a GCP workspace is defined
workspaces : List of workspace objects

The section below covers the subfields of the workspaces attribute. Each subfield needs to be defined separately for each workspace.

name : Name of the workspace
auto_apply : Whether to automatically run an automatic apply after a successful plan. Defaults to false
config_file_name : The variable file to use for this workspace. If specified, a .tfvars or .tfvars.json ending is required. Defaults to ""
working_directory : A relative path that Terraform will execute within. Defaults to the root of your repository
provider : Whether this workspace will be used to managed GCP or Azure infrastructure. Defaults to gcp
branch : The repository branch that Terraform will execute from. Defaults to master
terraform_version : Terraform version to use for the specific workspace. Defaults to 1.3.2

The Terraform Enterprise orchestrator parses all configuration of the product teams and creates the skeleton infrastructure accordingly — depending on the defined workspaces. The following resources are always created (independent of the provider that is defined for the workspaces):

a GitHub repository that is used by the team to define the infrastructure
a basic TFE setup (creating the specified workspaces, adding members to Terraform Enterprise organization, creating teams in TFE with all team members that get access to the workspaces)

If a product team has at least one GCP workspace defined the orchestrator creates the following resource set:

a dedicated GCP folder
a Google Workspace group that contains all team members and which is granted viewer permissions of the GCP folder
a GCP service account and a service account key; the service account is granted permissions to be able to create infrastructure in the GCP folder; the key is added as a sensitive variable to each workspace by the orchestrator and is used by the product team to initialize the Terraform Google provider

As soon as the product team has at least one Azure workspace, it creates these resources:

an Azure management group
an Azure Active Directory group containing all team members that gets granted viewer permissions on the management group
an Azure Active Directory client ID and client secret; the values are added as sensitive variables to each Azure workspace by the orchestrator to be able to initialize the Terraform Azure provider

The image below gives a rough overview about resources created by the Terraform Enterprise orchestrator:

To summarize, this setup enables the following:

easy self-service onboarding process by simply creating and adding a JSON configuration file to one GitHub repository
further changes to the configuration file can be reviewed by the team itself
orchestrator takes care of creating a starter set of resources automatically upon merging changes to the config file and also takes care of applying changes to this resource set
teams setup their cloud infrastructure (GCP projects/Azure subscriptions and all of its contents) via Terraform in their own GitHub repository
the TFE workspaces are set up as VCS-driven workspaces connected to the respective repositories
teams benefit of the built-in VCS integration (automatic plans and status checks in pull requests) that fits nicely into a Git-driven developer workflow

You shall not pass

Want to enforce CIS benchmarks and standards on your infrastructure? Want to prevent resources from being deployed into specific locations around the world? Say no more, Sentinel is here to save your day. Sentinel is a Policy as Code framework meaning that you can code your own policies and enforce them on the infrastructure defined in the Terraform code before changes to the infrastructure get applied. It is directly provided and usable when installing TFE. Sentinel policy checks are run after a successful Terraform plan in a separate step and its results can be viewed in the Terraform run itself. The picture below shows the different steps of a Terraform run in TFE (plan, policy check, apply):

At the time that this run was triggered, there were 39 policies defined. 37 of them passed, but two (so-called “advisory”) policy checks failed. Sentinel policies are separated into three enforcement levels:

advisory: failing a policy will be displayed, however applying the changes will not be prevented
soft-mandatory: if the policy fails, an “override” has to happen before the changes get applied. An override in this case can mean to e.g. require an approval by a security team implemented in TFE that can trigger these “overrides”
hard-mandatory: policy has to pass, failing the policy will prevent the changes to be applied

As in the example shown above, only two advisory policies failed. This is why the changes to the infrastructure were still applied.

Now, how are the policies connected to Terraform Enterprise? At MediaMarktSaturn, we leverage the feature of creating so-called policy sets. It enables the integration of policies stored in GitHub into Terraform Enterprise. All of our policies are stored in a separate internal GitHub repository that is connected to the policy sets in TFE (we separate different policy sets into different subfolders of the repository).

In our setup, we have three policy sets each serving a specific purpose (e.g. the sec-global-policies policy set contains policies regarding CIS benchmarks and the gov-global-policies contains governance policies like enforcing specific labels on resources). All policies are enforced on all workspaces automatically. When a change is committed to our Sentinel GitHub repository, the updated policies will be directly synced to TFE automatically.

As with almost everything setup in TFE, the policy sets are provisioned by our TFE orchestrator (simplified code for easier understanding):

locals {
  base_sentinel_policies = [
    "clp",
    "gov",
    "sec"
  ]
}

resource "tfe_policy_set" "sentinel_policy_set" {
  for_each      = toset(local.base_sentinel_policies)
  name          = "${each.key}-global-policies"
  organization  = "mediamarktsaturn"
  policies_path = each.key
  global        = true
  vcs_repo {
    identifier     = var.sentinel_policy_repo
    branch         = var.base_sentinel_policies_vcs_branch
    oauth_token_id = data.tfe_oauth_client.github_oauth_client.oauth_token_id
  }
}

As a side note, HashiCorp provides a set of foundational policies to get you started that can be easily integrated into the Sentinel code like this:

policy "gcp-cis-6.2-databases-cloud-sql-databases-instances-are-not-open-to-the-world" {
    source = "https://raw.githubusercontent.com/hashicorp/terraform-foundational-policies-library/master/cis/gcp/databases/gcp-cis-6.2-databases-cloud-sql-databases-instances-are-not-open-to-the-world/gcp-cis-6.2-databases-cloud-sql-databases-instances-are-not-open-to-the-world.sentinel"
    enforcement_level = "hard-mandatory"
}

Module registry

Another very nice feature of Terraform Enterprise is the private registry. In this registry MMS internal providers and modules can be shared within the organization.

At MMS we created several Terraform modules in order to provide reusable Terraform code to all developers in our organization. Unlike public module registries (e.g. the official registry) the private registry can only be used within our own organization.

As we have quite a lot of legal requirements to fulfill, we built lots of code that forces the developers to create their infrastructure the defined way. We do not want the teams to write duplicate code but provide reusable Terraform modules centrally. That way we are able to create and provide standards and best practices via modules. As an addition the usage of specific modules can be checked via Sentinel policies.

The module code is stored in GitHub and can easily be added to TFE via our Terraform Enterprise orchestrator (simplified code for easier understanding):

locals {
  terraform_modules = [
    "MediaMarktSaturn/terraform-google-project",
    "MediaMarktSaturn/terraform-azurerm-subscription",
  ]
}

# Add each module provided in the locals to the module registry
resource "tfe_registry_module" "registry-modules" {
  for_each = toset(local.terraform_modules)

  vcs_repo {
    display_identifier = each.key
    identifier         = each.key
    oauth_token_id     = data.tfe_oauth_client.github_oauth_client.oauth_token_id
  }
}

After a successful apply run of the orchestrator the modules are available within the registry and ready to use within the organization.

Each module comes with an overview page that contains detailed information regarding e.g. inputs, outputs and resources that will be created.

How to handle Terraform versions

Terraform Enterprise allows using all available versions of Terraform by default. As we have shown already, teams are able to set Terraform versions for their workspaces on their own. But of course we only want to support the most recent versions of Terraform.

Fortunately, Terraform Enterprise comes with the possibility to either allow or disallow specific versions.

This way we can limit the amount of allowed Terraform versions. All versions that are set to Disabled can not be used in any workspace across the whole organization. To harden this setup there is a hard-mandatory Sentinel policy in place that fails if a disabled Terraform version is used. The code snippet below shows the basic policy code:

import "tfplan/v2" as tfplan

// Regex to check for allowed versions
// Currently allowed in this example:
// >= 1.x.x
versionRegex = "^1\\.[0-9]+\\.[0-9]+$"

// Get Terraform version that was used for the plan
tfVersion = tfplan.terraform_version

isAllowedVersion = rule { tfVersion matches versionRegex }

main = rule {
  isAllowedVersion
}

Summary

We hope this article could give you some nice insights into our implementation of Terraform Enterprise at MediaMarktSaturn. All in all, this setup has now been running for roughly two years and is happily used in the company by the different product teams.

As a side note, we are hosting TFE itself on the Google Cloud Platform following the reference architecture. We are even managing the hosting infrastructure of TFE with Terraform to automate the management of TFE itself (but that’s a topic for another day and another blog post).

If you have any feedback or comment to our setup or this post in general, just let us know!

get to know us 👉 https://MediaMarktSaturn.tech 👈