Mastering Multi-Cloud Management with Terraform

Bijit Ghosh
9 min readOct 25, 2023

--

Using multiple public clouds like Azure, AWS, and GCP provides flexibility, optimizes costs, and reduces vendor lock-in. However, managing infrastructure and services across diverse cloud platforms can be challenging.

Let’s explore how Terraform enables consistent and automated multi-cloud management. We will cover multi-cloud architectures, terraform abstractions, provisioning, governance, networking, deployment patterns, testing, and monitoring across Azure and GCP.

Overview

We will cover the following topics:

  • Multi-cloud architectures with Terraform
  • Abstracting infrastructure differences
  • Provisioning resources and managing dependencies
  • Policy enforcement and governance
  • Networking topologies and connectivity
  • Blue-green, canary, and multi-region deployment patterns
  • Integration testing and mocks
  • Centralized logging, metrics, and observability

We will use code examples deploying infrastructure and services to Azure and GCP with Terraform. This illustrates real-world multi-cloud management using Terraform as the consistent abstraction layer.

Multi-Cloud Architecture

A basic multi-cloud architecture with Terraform looks like:

Terraform provisions infrastructure in multiple cloud accounts and abstracts away the differences between platforms. The state is stored remotely and shared across accounts.

Some key design principles for multi-cloud architecture:

  • Abstract differences — Minimize provider-specific logic and hide differences behind abstractions
  • Modularize environments — Separate configurations into modular files by environment and components
  • Standardize naming — Use a consistent naming scheme across clouds for easy correlation
  • Encapsulate policies — Keep policy and governance rules in modular, reusable files
  • Centralize state — Use remote state to share state across services and clouds
  • Extract shared services — Build shared services like identity, DNS, CDN once and reuse
  • Unify pipelines — Standard CI/CD pipelines that deploy to any cloud target

With these principles, you can build portable configurations that remain consistent across any number of target clouds.

Next, let’s look at how Terraform helps abstract away cloud differences.

Abstracting Infrastructure

One of Terraform’s strengths is its ability to provide a uniform abstraction layer above the differences between cloud platforms.

For example, provisioning a MySQL database on Azure vs GCP:

Azure

resource "azurerm_mysql_server" "db" {
name = "mysqlserver"
location = "eastus"
resource_group_name = azurerm_resource_group.rg.name
  sku {
name = "B_Gen5_2"
capacity = 2
}
}
resource "azurerm_mysql_database" "db" {
name = "mydatabase"
resource_group_name = azurerm_resource_group.rg.name
server_name = azurerm_mysql_server.db.name
charset = "utf8"
collation = "utf8_unicode_ci"
}

GCP

resource "google_sql_database_instance" "db" {
name = "mysql-instance"
database_version = "MYSQL_5_7"
region = "us-central1"
  settings {
tier = "db-f1-micro"
}
}
resource "google_sql_database" "database" {
name = "my-database"
instance = google_sql_database_instance.db.name
}

The Terraform resource types abstract away the underlying API differences to provide a consistent interface.

This abstraction enables switching between clouds more easily without massive changes. You customize for each cloud provider at the edges where necessary.

Some key areas Terraform provides multi-cloud abstractions:

  • Compute — VMs, containers, Kubernetes, scaling
  • Network — Subnets, routing, security groups, load balancing
  • Storage — Blobs, disks, databases
  • Identity — Roles, permissions, access controls
  • Infrastructure — DNS, VPNs, rules, policies

Build your configurations using these multi-cloud abstractions as much as possible for ultimate portability.

Provisioning and Dependencies

A best practice when provisioning multi-cloud infrastructure is structuring resources into layers of dependencies.

Higher level constructs depend on lower level ones. For example:

With Terraform, use the depends_on attribute to explicitly define resource dependencies:

resource "azurerm_subnet" "public" {
#...
}
resource "azurerm_network_interface" "nic" {
#...
subnet_id = azurerm_subnet.public.id
}
resource "azurerm_virtual_machine" "main" {
#...

network_interface_ids = [
azurerm_network_interface.nic.id,
]
depends_on = [
azurerm_network_interface.nic
]
}

Terraform analyzes dependencies between resources and plans and applies changes in the right order.

Some tips for dependencies in multi-cloud setups:

  • Put common building blocks like networks in base modules
  • Separate distinct components into layers
  • Explicitly define depends_on even if implicit
  • Watch dependency cycles causing deadlocks
  • Plan and apply in dependency order

Getting resource dependencies right is crucial for successful multi-cloud provisioning.

Policy Enforcement and Governance

Enforcing organizational policies and compliance requirements is key for multi-cloud management.

Some examples of governance rules:

  • Restricting VM types
  • Controlling regions and zones for data sovereignty
  • Setting tagging rules and standards
  • Limiting networking exposure
  • Enforcing encryption requirements
  • Restricting high-risk resource usage

Terraform enables policy enforcement via:

Variables

Constrain allowed values:

variable "region" {
type = string
default = "us-east-1"
}
resource "aws_db_instance" "db" {
region = var.region # Can only be us-east-1
}

Modules

Encapsulate and reuse governance logic in modules:

module "server" {
source = "./modules/certified_server"
# Certified server sets sane defaults
}

Sentinel Policies

Apply targeted policies that restrict resources:

aws_instance_disallowed_type = rule {
all aws_instance as _, instance {
instance.instance_type is not in ["t2.micro", "t3.micro"]
}
}

These mechanisms enable embedding governance directly into configurations and make compliance easier.

Networking Topologies

Managing network connectivity across clouds adds complexity. Some hybrid cloud network topologies include:

Pairing

Connect cloud regions to on-prem data centers. Useful for migration and hybrid workloads.

Hub-and-Spoke

Central hub VPC with connectivity to multiple cloud spokes. Enables transitivity between spokes.

Mesh

Fully meshed network with site-to-site VPN between clouds. Provides direct region communication.

Terraform simplifies orchestrating these network topologies via the terraform-provider-aws, terraform-provider-azurerm, and similar networking providers.

For example, creating a site-to-site VPN from AWS to Azure:

# AWS side
resource "aws_customer_gateway" "gw" {
bgp_asn = 65002
ip_address = "172.0.0.1"
type = "ipsec.1"
}
resource "aws_vpn_connection" "main" {
vpn_gateway_id = aws_vpn_gateway.vgw.id
customer_gateway_id = aws_customer_gateway.gw.id
type = "ipsec.1"
static_routes_only = true
tunnel1_ike_versions = ["ikev2"]
tunnel2_ike_versions = ["ikev2"]
tunnel1_phase1_dh_group_numbers = [31]
tunnel2_phase1_dh_group_numbers = [31]
}
# Azure side
resource "azurerm_local_network_gateway" "lgw" {
name = "aws-conn"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
gateway_address = aws_customer_gateway.gw.ip_address
address_space = ["172.16.0.0/16"]
}
resource "azurerm_virtual_network_gateway_connection" "main" {
name = "aws-conn"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
type = "IPsec"
virtual_network_gateway_id = azurerm_virtual_network_gateway.vgw.id
local_network_gateway_id = azurerm_local_network_gateway.lgw.id
shared_key = aws_vpn_connection.main.tunnel1_preshared_key

ipsec_policy {
dh_group = "DHGroup31"
ike_encryption = "AES256"
ike_integrity = "SHA256"
ipsec_encryption = "AES256"
ipsec_integrity = "SHA256"
pfs_group = "PFS31"
sa_datasize = 536870912
}
}

This allows defining the full connectivity orchestration in a consistent way for multi-cloud.

Deployment Patterns

When deploying to multiple clouds, using patterns like blue-green, canary, and multi-region can simplify management.

Blue-Green

Blue-green deploys new versions in parallel then switches traffic atomically. This provides rollback and progressive rollout capabilities.

For example:

# Blue environment 
module "blue_env" {
source = "./env"
color = "blue"
}
# Green environment
module "green_env" {
source = "./env"
color = "green"
# Initially no traffic
traffic_weight = 0
}
# Traffic splitter
resource "aws_lb" "main" {
# Send 100% traffic to blue
}

Then gradually shift traffic from blue to green via the load balancer.

Canary

Like blue-green but releases to a small percentage of users first.

module "prod_env" {
source = "./env"
  # Most traffic goes to prod
}
module "canary_env" {
source = "./env"
# Small % to canary
traffic_weight = 0.1
}

Multi-Region

Provision resources in multiple regions for high-availability and low latency.

For example:

# West region
module "west" {
source = "./region"
providers = {
azurerm.west = azurerm.west
}
}
# East region 
module "east" {
source = "./region"
providers = {
azurerm.east = azurerm.east
}
}

Then use DNS and load balancing to distribute globally.

These patterns simplify multi-cloud deployments. Module reuse helps consistency.

Integration Testing

Validating deployments across multiple clouds requires automated integration testing.

Some best practices for multi-cloud testing with Terraform:

  • Infrastructure tests — Validate resources in the right configuration using terraform plan and terraform show
  • Provisioning tests — Deploy disposable test environments from scratch
  • Failure tests — Simulate failures like terminated instances
  • Service tests — Validate service reachability and behavior with mocking

For example, creating a disposable test environment:

module "test_env" {
source = "./env"
providers = {
aws = aws.test
}
}

And a simple connectivity check:

resource "null_resource" "check_connectivity" {
provisioner "local-exec" {
command = "ping -c 3 ${module.test_env.ip}"
}
}

These automated tests provide safety for frequent multi-cloud changes.

Multi-Cloud Testing

Validating deployments across heterogeneous environments requires automated testing.

Some best practices using Terraform:

  • Provision disposable test environments
  • Verify resource configurations
  • Simulate different failure scenarios
  • Use mocks for dependencies

For example:

module "test_env" {
source = "./env"
  # Test AWS account credentials 
}
resource "null_resource" "check_connectivity" {
provisioner "local-exec" {
command = "ping.exe -n 3 ${module.test_env.ip}"
}
}

This creates an isolated environment to safely test changes.

Other examples:

  • Replay historical traffic for performance tests
  • Inject faults like terminated instances
  • Stub out external services for deterministic tests

Automated testing is key to maintain confidence refactoring components.

Observability

Observability into logs, metrics, and traces across heterogeneous clouds is difficult.

Terraform enables aggregating observability data into shared platforms:

Logging

resource "aws_cloudwatch_log_group" "app" {
name = "/aws/lambda/app"
}
resource "azurerm_monitor_diagnostic_setting" "app" {
name = "diag"
target_resource_id = azurerm_function_app.app.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
log {
category = "FunctionAppLogs"
enabled = true
}
}
resource "google_logging_project_sink" "app" {
name = "app-sink"
destination = google_storage_bucket.logs.name
}

Centrally aggregate logs in tools like Splunk.

Metrics

resource "signalfx_detector" "latency" {
name = "High latency"
program_text = <<-EOF
A = data('latency', filter=filter('cloud', '*') and filter('env', '*')).publish(label='A')
B = (A).sum(by=['cloud', 'env']).publish(label='B')
detect(when(B > 1000, '5m')).publish('High latency!')
EOF
}

Unify metrics in platforms like Datadog.

Tracing

module "opentelemetry" {
source = "./opentelemetry"
providers = {
aws = aws
azure = azurerm
google = google
}
}

Common formats like OpenTelemetry simplify correlating traces.

With Terraform, you can build shared observability patterns across diverse platforms.

Example: Multi-Cloud Web Application

Let’s go through a real-world example deploying a multi-cloud web application with Terraform.

We will deploy the app infrastructure across clusters in AWS ECS and Azure Container Instances (ACI). A global load balancer will distribute traffic between the platforms.

Networking

First, we need to connect the AWS VPC and Azure VNet:

# AWS VPC and subnets
resource "aws_vpc" "main" {
cidr_block = "10.1.0.0/16"
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.1.1.0/24"
}
# Azure VNet
resource "azurerm_virtual_network" "main" {
name = "app-network"
address_space = ["10.2.0.0/16"]
}
resource "azurerm_subnet" "public" {
name = "public-subnet"
resource_group_name = azurerm_resource_group.main.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.2.1.0/24"]
}

Connect them together via VPC peering:

# AWS side of peering
resource "aws_vpc_peering_connection" "peer" {
vpc_id = aws_vpc.main.id
peer_vpc_id = azurerm_virtual_network.main.id
auto_accept = true
}
# Azure side of peering
resource "azurerm_virtual_network_peering" "peer" {
name = "peer-aws"
resource_group_name = azurerm_resource_group.main.name
virtual_network_name = az

Compute

Deploy Azure Container Instances:

resource "azurerm_container_group" "app" {
name = "app-aci"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
ip_address_type = "public"
dns_name_label = "app-aci"
os_type = "Linux"
  container {
name = "app"
image = "myapp:v1"
cpu = "1"
memory = "1"
ports {
port = 80
protocol = "TCP"
}
}
}

And Amazon ECS cluster and service:

resource "aws_ecs_cluster" "main" {
name = "myapp-cluster"
}
resource "aws_ecs_service" "web" {
name = "web"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 3
launch_type = "FARGATE"
}

Load Balancing

Deploy a global load balancer with backends in each cloud:

resource "aws_lb" "web" {
name = "myapp-lb"
internal = false

subnets = [
aws_subnet.public.id
]
}
resource "azurerm_lb" "web" {
name = "myapp-lb"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name

frontend_ip_configuration {
name = "public"
public_ip_address_id = azurerm_public_ip.main.id
}
}

DNS

Route a single domain across load balancers:

resource "aws_route53_zone" "main" {
name = "myapp.com"
}
resource "aws_route53_record" "webapp" {
zone_id = aws_route53_zone.main.id
name = "webapp.myapp.com"
type = "CNAME"
ttl = "300"
records = [aws_lb.web.dns_name]
}
resource "azurerm_dns_cname_record" "webapp" {
name = "webapp"
zone_name = azurerm_dns_zone.main.name
resource_group_name = azurerm_resource_group.main.name
ttl = 300
record = azurerm_lb.web.fqdn
}

This provides the core infrastructure to run the web application across multiple clouds, connected together via load balancing and DNS.

The Terraform abstractions allow us to express the architecture in a consistent way across cloud platforms. We can extend the deployment and add additional components like databases, object storage, and cache as needed.

Conclusion

In this blog, I covered patterns and best practices for managing multi-cloud infrastructure, networking, deployment, testing, and monitoring using Terraform including:

  • Abstractions — Build portable configs using abstracted providers and resources
  • Modules — Encapsulate complex components into reusable modules
  • State — Store remote state to enable collaboration
  • Networking — Orchestrate connectivity across VPCs and VNETs
  • Deployment — Use blue-green, canary, and multi-region patterns
  • Testing — Automated provisioning of test environments
  • Observability — Centralize logs, metrics, and traces

Terraform provides a consistent workflow to provision and manage infrastructure across public clouds, private data centers, and SaaS. With the patterns covered in this guide, you can overcome the complexity of bridging across diverse APIs, platforms, and topologies to enable easier multi-cloud automation and orchestration.

--

--

Bijit Ghosh

CTO | Senior Engineering Leader focused on Cloud Native | AI/ML | DevSecOps