Mastering Multi-Cloud Management with Terraform

9 min readOct 25, 2023

Using multiple public clouds like Azure, AWS, and GCP provides flexibility, optimizes costs, and reduces vendor lock-in. However, managing infrastructure and services across diverse cloud platforms can be challenging.

Let’s explore how Terraform enables consistent and automated multi-cloud management. We will cover multi-cloud architectures, terraform abstractions, provisioning, governance, networking, deployment patterns, testing, and monitoring across Azure and GCP.

Overview

We will cover the following topics:

Multi-cloud architectures with Terraform
Abstracting infrastructure differences
Provisioning resources and managing dependencies
Policy enforcement and governance
Networking topologies and connectivity
Blue-green, canary, and multi-region deployment patterns
Integration testing and mocks
Centralized logging, metrics, and observability

We will use code examples deploying infrastructure and services to Azure and GCP with Terraform. This illustrates real-world multi-cloud management using Terraform as the consistent abstraction layer.

Multi-Cloud Architecture

A basic multi-cloud architecture with Terraform looks like:

Terraform provisions infrastructure in multiple cloud accounts and abstracts away the differences between platforms. The state is stored remotely and shared across accounts.

Some key design principles for multi-cloud architecture:

Abstract differences — Minimize provider-specific logic and hide differences behind abstractions
Modularize environments — Separate configurations into modular files by environment and components
Standardize naming — Use a consistent naming scheme across clouds for easy correlation
Encapsulate policies — Keep policy and governance rules in modular, reusable files
Centralize state — Use remote state to share state across services and clouds
Extract shared services — Build shared services like identity, DNS, CDN once and reuse
Unify pipelines — Standard CI/CD pipelines that deploy to any cloud target

With these principles, you can build portable configurations that remain consistent across any number of target clouds.

Next, let’s look at how Terraform helps abstract away cloud differences.

Abstracting Infrastructure

One of Terraform’s strengths is its ability to provide a uniform abstraction layer above the differences between cloud platforms.

For example, provisioning a MySQL database on Azure vs GCP:

Azure

resource "azurerm_mysql_server" "db" {
  name                = "mysqlserver"
  location            = "eastus"
  resource_group_name = azurerm_resource_group.rg.name

  sku {
    name     = "B_Gen5_2"
    capacity = 2
  }
}resource "azurerm_mysql_database" "db" {
  name                = "mydatabase"
  resource_group_name = azurerm_resource_group.rg.name
  server_name         = azurerm_mysql_server.db.name
  charset             = "utf8"
  collation           = "utf8_unicode_ci"
}

GCP

resource "google_sql_database_instance" "db" {
  name             = "mysql-instance"
  database_version = "MYSQL_5_7"
  region           = "us-central1"

  settings {
    tier = "db-f1-micro"
  }
}resource "google_sql_database" "database" {
  name     = "my-database"
  instance = google_sql_database_instance.db.name
}

The Terraform resource types abstract away the underlying API differences to provide a consistent interface.

This abstraction enables switching between clouds more easily without massive changes. You customize for each cloud provider at the edges where necessary.

Some key areas Terraform provides multi-cloud abstractions:

Compute — VMs, containers, Kubernetes, scaling
Network — Subnets, routing, security groups, load balancing
Storage — Blobs, disks, databases
Identity — Roles, permissions, access controls
Infrastructure — DNS, VPNs, rules, policies

Build your configurations using these multi-cloud abstractions as much as possible for ultimate portability.

Provisioning and Dependencies

A best practice when provisioning multi-cloud infrastructure is structuring resources into layers of dependencies.

Higher level constructs depend on lower level ones. For example:

With Terraform, use the depends_on attribute to explicitly define resource dependencies:

resource "azurerm_subnet" "public" {
  #...
}

resource "azurerm_network_interface" "nic" {
  #...
  subnet_id = azurerm_subnet.public.id
}resource "azurerm_virtual_machine" "main" {
  #...
  
  network_interface_ids = [
    azurerm_network_interface.nic.id,
  ]  depends_on = [
    azurerm_network_interface.nic
  ]
}

Terraform analyzes dependencies between resources and plans and applies changes in the right order.

Some tips for dependencies in multi-cloud setups:

Put common building blocks like networks in base modules
Separate distinct components into layers
Explicitly define depends_on even if implicit
Watch dependency cycles causing deadlocks
Plan and apply in dependency order

Getting resource dependencies right is crucial for successful multi-cloud provisioning.

Policy Enforcement and Governance

Enforcing organizational policies and compliance requirements is key for multi-cloud management.

Some examples of governance rules:

Restricting VM types
Controlling regions and zones for data sovereignty
Setting tagging rules and standards
Limiting networking exposure
Enforcing encryption requirements
Restricting high-risk resource usage

Terraform enables policy enforcement via:

Variables

Constrain allowed values:

variable "region" {
  type    = string
  default = "us-east-1"
}

resource "aws_db_instance" "db" {
  region = var.region # Can only be us-east-1
}

Modules

Encapsulate and reuse governance logic in modules:

module "server" {
  source = "./modules/certified_server"
# Certified server sets sane defaults
}

Sentinel Policies

Apply targeted policies that restrict resources:

aws_instance_disallowed_type = rule {
  all aws_instance as _, instance {
    instance.instance_type is not in ["t2.micro", "t3.micro"]
  }
}

These mechanisms enable embedding governance directly into configurations and make compliance easier.

Networking Topologies

Managing network connectivity across clouds adds complexity. Some hybrid cloud network topologies include:

Pairing

Connect cloud regions to on-prem data centers. Useful for migration and hybrid workloads.

Hub-and-Spoke

Central hub VPC with connectivity to multiple cloud spokes. Enables transitivity between spokes.

Mesh

Fully meshed network with site-to-site VPN between clouds. Provides direct region communication.

Terraform simplifies orchestrating these network topologies via the terraform-provider-aws, terraform-provider-azurerm, and similar networking providers.

For example, creating a site-to-site VPN from AWS to Azure:

# AWS side
resource "aws_customer_gateway" "gw" {
  bgp_asn    = 65002
  ip_address = "172.0.0.1"
  type       = "ipsec.1"
}

resource "aws_vpn_connection" "main" {
  vpn_gateway_id      = aws_vpn_gateway.vgw.id
  customer_gateway_id = aws_customer_gateway.gw.id
  type                = "ipsec.1"
  static_routes_only  = true  tunnel1_ike_versions   = ["ikev2"]
  tunnel2_ike_versions   = ["ikev2"]
  tunnel1_phase1_dh_group_numbers = [31]
  tunnel2_phase1_dh_group_numbers = [31] 
}
# Azure side
resource "azurerm_local_network_gateway" "lgw" {
  name                = "aws-conn"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  gateway_address = aws_customer_gateway.gw.ip_address
  address_space     = ["172.16.0.0/16"]
}resource "azurerm_virtual_network_gateway_connection" "main" {
  name                       = "aws-conn"
  resource_group_name        = azurerm_resource_group.rg.name
  location                   = azurerm_resource_group.rg.location
  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.vgw.id
  local_network_gateway_id   = azurerm_local_network_gateway.lgw.id  shared_key = aws_vpn_connection.main.tunnel1_preshared_key
  
  ipsec_policy {
    dh_group         = "DHGroup31"
    ike_encryption   = "AES256"
    ike_integrity    = "SHA256"
    ipsec_encryption = "AES256"
    ipsec_integrity  = "SHA256"
    pfs_group        = "PFS31"
    sa_datasize      = 536870912
  }
}

This allows defining the full connectivity orchestration in a consistent way for multi-cloud.

Deployment Patterns

When deploying to multiple clouds, using patterns like blue-green, canary, and multi-region can simplify management.

Blue-Green

Blue-green deploys new versions in parallel then switches traffic atomically. This provides rollback and progressive rollout capabilities.

For example:

# Blue environment 
module "blue_env" {
  source = "./env"
  color  = "blue"
}

# Green environment
module "green_env" {
  source = "./env"
  color  = "green"  # Initially no traffic
  traffic_weight = 0 
}# Traffic splitter
resource "aws_lb" "main" {
  # Send 100% traffic to blue
}

Then gradually shift traffic from blue to green via the load balancer.

Canary

Like blue-green but releases to a small percentage of users first.

module "prod_env" {
  source = "./env"

  # Most traffic goes to prod
}module "canary_env" {
  source = "./env"  # Small % to canary
  traffic_weight = 0.1
}

Multi-Region

Provision resources in multiple regions for high-availability and low latency.

For example:

# West region
module "west" {
  source = "./region"
  providers = {
    azurerm.west = azurerm.west
  }
}

# East region 
module "east" {
  source = "./region" 
  providers = {
    azurerm.east = azurerm.east
  }
}

Then use DNS and load balancing to distribute globally.

These patterns simplify multi-cloud deployments. Module reuse helps consistency.

Integration Testing

Validating deployments across multiple clouds requires automated integration testing.

Some best practices for multi-cloud testing with Terraform:

Infrastructure tests — Validate resources in the right configuration using terraform plan and terraform show
Provisioning tests — Deploy disposable test environments from scratch
Failure tests — Simulate failures like terminated instances
Service tests — Validate service reachability and behavior with mocking

For example, creating a disposable test environment:

module "test_env" {
  source = "./env"
providers = {
    aws = aws.test
  }
}

And a simple connectivity check:

resource "null_resource" "check_connectivity" {
  provisioner "local-exec" {
    command = "ping -c 3 ${module.test_env.ip}"
  }
}

These automated tests provide safety for frequent multi-cloud changes.

Multi-Cloud Testing

Validating deployments across heterogeneous environments requires automated testing.

Some best practices using Terraform:

Provision disposable test environments
Verify resource configurations
Simulate different failure scenarios
Use mocks for dependencies

For example:

module "test_env" {
  source = "./env"

  # Test AWS account credentials 
}resource "null_resource" "check_connectivity" {
  provisioner "local-exec" {
    command = "ping.exe -n 3 ${module.test_env.ip}"
  }
}

This creates an isolated environment to safely test changes.

Other examples:

Replay historical traffic for performance tests
Inject faults like terminated instances
Stub out external services for deterministic tests

Automated testing is key to maintain confidence refactoring components.

Observability

Observability into logs, metrics, and traces across heterogeneous clouds is difficult.

Terraform enables aggregating observability data into shared platforms:

Logging

resource "aws_cloudwatch_log_group" "app" {
  name = "/aws/lambda/app"
}
resource "azurerm_monitor_diagnostic_setting" "app" {
  name               = "diag"
  target_resource_id = azurerm_function_app.app.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
  log {
    category = "FunctionAppLogs"
    enabled  = true
  }
}
resource "google_logging_project_sink" "app" {
  name        = "app-sink"
  destination = google_storage_bucket.logs.name  
}

Centrally aggregate logs in tools like Splunk.

Metrics

resource "signalfx_detector" "latency" {
  name = "High latency"
program_text = <<-EOF
    A = data('latency', filter=filter('cloud', '*') and filter('env', '*')).publish(label='A')
    B = (A).sum(by=['cloud', 'env']).publish(label='B')
    detect(when(B > 1000, '5m')).publish('High latency!')
  EOF
}

Unify metrics in platforms like Datadog.

Tracing

module "opentelemetry" {
  source = "./opentelemetry"
providers = {
    aws = aws
    azure = azurerm 
    google = google
  }
}

Common formats like OpenTelemetry simplify correlating traces.

With Terraform, you can build shared observability patterns across diverse platforms.

Example: Multi-Cloud Web Application

Let’s go through a real-world example deploying a multi-cloud web application with Terraform.

We will deploy the app infrastructure across clusters in AWS ECS and Azure Container Instances (ACI). A global load balancer will distribute traffic between the platforms.

Networking

First, we need to connect the AWS VPC and Azure VNet:

# AWS VPC and subnets
resource "aws_vpc" "main" {
  cidr_block = "10.1.0.0/16"
}

resource "aws_subnet" "public" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.1.1.0/24"
}# Azure VNet 
resource "azurerm_virtual_network" "main" {
  name                = "app-network"
  address_space       = ["10.2.0.0/16"]
}resource "azurerm_subnet" "public" {
  name                 = "public-subnet"
  resource_group_name  = azurerm_resource_group.main.name
  virtual_network_name = azurerm_virtual_network.main.name
  address_prefixes     = ["10.2.1.0/24"]
}

Connect them together via VPC peering:

# AWS side of peering
resource "aws_vpc_peering_connection" "peer" {
  vpc_id      = aws_vpc.main.id
  peer_vpc_id = azurerm_virtual_network.main.id
  auto_accept = true
}

# Azure side of peering
resource "azurerm_virtual_network_peering" "peer" {
  name                      = "peer-aws"
  resource_group_name       = azurerm_resource_group.main.name
  virtual_network_name      = az

Compute

Deploy Azure Container Instances:

resource "azurerm_container_group" "app" {
  name                = "app-aci"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  ip_address_type     = "public"
  dns_name_label      = "app-aci"
  os_type             = "Linux"

  container {
    name   = "app"
    image  = "myapp:v1" 
    cpu    = "1"
    memory = "1"    ports {
      port     = 80
      protocol = "TCP"
    }
  }
}

And Amazon ECS cluster and service:

resource "aws_ecs_cluster" "main" {
  name = "myapp-cluster"
}

resource "aws_ecs_service" "web" {
  name            = "web"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 3
  launch_type     = "FARGATE"
}

Load Balancing

Deploy a global load balancer with backends in each cloud:

resource "aws_lb" "web" {
  name               = "myapp-lb"
  internal           = false
  
  subnets = [
    aws_subnet.public.id
  ]
}

resource "azurerm_lb" "web" {
  name                = "myapp-lb"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
   
  frontend_ip_configuration {
    name                 = "public"
    public_ip_address_id = azurerm_public_ip.main.id
  }
}

DNS

Route a single domain across load balancers:

resource "aws_route53_zone" "main" {
  name = "myapp.com"
}

resource "aws_route53_record" "webapp" {
  zone_id = aws_route53_zone.main.id
  name    = "webapp.myapp.com"  type = "CNAME"
  ttl  = "300"
  records = [aws_lb.web.dns_name]
}resource "azurerm_dns_cname_record" "webapp" {
  name                = "webapp"
  zone_name           = azurerm_dns_zone.main.name
  resource_group_name = azurerm_resource_group.main.name
  ttl                 = 300
  record              = azurerm_lb.web.fqdn
}

This provides the core infrastructure to run the web application across multiple clouds, connected together via load balancing and DNS.

The Terraform abstractions allow us to express the architecture in a consistent way across cloud platforms. We can extend the deployment and add additional components like databases, object storage, and cache as needed.

Conclusion

In this blog, I covered patterns and best practices for managing multi-cloud infrastructure, networking, deployment, testing, and monitoring using Terraform including:

Abstractions — Build portable configs using abstracted providers and resources
Modules — Encapsulate complex components into reusable modules
State — Store remote state to enable collaboration
Networking — Orchestrate connectivity across VPCs and VNETs
Deployment — Use blue-green, canary, and multi-region patterns
Testing — Automated provisioning of test environments
Observability — Centralize logs, metrics, and traces

Terraform provides a consistent workflow to provision and manage infrastructure across public clouds, private data centers, and SaaS. With the patterns covered in this guide, you can overcome the complexity of bridging across diverse APIs, platforms, and topologies to enable easier multi-cloud automation and orchestration.

Mastering Multi-Cloud Management with Terraform

Overview

Multi-Cloud Architecture

Abstracting Infrastructure

Provisioning and Dependencies

Policy Enforcement and Governance

Networking Topologies

Deployment Patterns

Integration Testing

Multi-Cloud Testing

Observability

Example: Multi-Cloud Web Application

Conclusion

Written by Bijit Ghosh