Terraform: Building Multi-Cloud Infrastructure for AWS and GCP: A Comparative Analysis
In this article, the objective is to understand how to build similar infrastructure on both AWS and GCP for an organization to have multi-cloud infrastructure for their business-critical applications. Using Terraform, we can build these multi-cloud applications as “Infrastructure as Code” (IaC) with better consistency, maintainability and reduced cost.
Pre-requisite
We won’t be able to discuss the basics of a cloud provider’s infrastructure. So an understanding of the basic concept of AWS & GCP cloud offering and resources and the basics of Terraform is desirable.
Summary of Topics
- Understanding the requirement for a simple customer-facing application
- Cloud Architecture for a simple application — AWS & GCP
- Terraform Resources for AWS & GCP
- Code Structure — Modules
- Detail walkthrough of code
- GCP & AWS Console Output of resource creation
- Conclusion
Understanding the Requirement
Let’s take an example of a simple application where you have WebServers to serve the traffic from across different regions, have micro-services/apps on backend to serve the business logic and a database for persistence. And since this is a business-critical application, you want this application to be secure, highly available and fault-tolerant. So we need auto-scaling, load-balancing and some security rules to define and limit access to these resources.
I have also made a few features customizable so that you can define them differently for your Dev, Test and Prod environment. For example, we will see later that we can control how many micro-services/apps instances we want to create in each zone, whether we want to enable load balancing for web servers or turn off and configurable auto scaler policies.
The idea is to take it to the next-level, where you can define what resources you need to build the application in a cloud agnostic way and using a simple Python script, you can generate the configuration (variables which control what infra to create) for each cloud provider. But I will extend this article to include that in the next part. For now, let’s get started to create infrastructure for an application for multiple clouds, specifically AWS and GCP.
Cloud Architecture: AWS & GCP
Terraform Resources for AWS & GCP
We will be building the following resources in both cloud providers:
a) VMs for micro-services deployment
b) VM for hosting webservers
c) RDBMS database (Postgres or MySQL) for persistence
d) Auto Scaler and Load Balancers
e) Security Rules/firewalls for restrictive access
Both GCP and AWS have their own ways of how you define these resources and hence there are subtle differences in terraform code to create infra on each of these cloud providers. The table below shows a comparison for AWS and GCP resources needed to create the above list of resources for this simple app:
As you can see, there are quite a few differences in how you create the same infrastructure for both cloud providers. For example, in AWS, you create an elastic load balancer using aws_elb resource whereas GCP uses a backend_service in pair with forwarding rule, url_map and http proxy to define a complete load balancer configuration.
Code Structure — Modules
Before I explain code snippets for each provider, let’s look at the directory structure for both GCP and AWS. I have tried not to overly modularize it to keep it simple for the discussion.
I have created 4 modules in AWS and 5 in GCP because the way I had to define Instance Template for GCP.
Database Module (db) - creates a Postgres SQL database and db user.
Load Balancer Module (lb)- creates the necessary load balancer infrastructure for each provider.
Microservice Instance Module - creates VM Instances in multiple zones as per the configuration defined using variables. I am simply deploying multiple VM instances for backend services and keeping docker deployment out of scope of this discussion.
VPC Module - creates non-default VPC and subnets for each zone.
Instance Template Module - only in GCP to define a template for creating web server.
You can find all the codes in the following GitHub repo:
Detail walkthrough of code
Let’s look at the code now for one of the providers (GCP). You can find the AWS specific code in the above GitHub repo.
The code is structured in various modules and in main.tf, it’s mostly including all the modules…
GCP/main.tf - provider "google" {
credentials = "${file("${var.credentials}")}"
project = var.project
region =var.region
}# Include modules#create a postgres sql database
module "database" {
count = var.no_of_db_instances
source = "./modules/database"
# nat_ip = module.microservice-instance[0].nat_ip[0]
no_of_db_instances = var.no_of_db_instances
db_user = var.db_user
db_password = var.db_password
}# create load balancer, auto scaling
module "lb" {
count = var.enable_autoscaling ? 1:0
source = "./modules/lb"
name = var.name
project = var.project
region = var.region
webserver_count = var.webserver_count
ws-instance_template = module.instance-template.ws-instance_template
zones = var.zones
}#create an instance template for webserver for auto scaling
module "instance-template" {
source = "./modules/instance-template"
name = var.project
env = var.env
project = var.project
region = var.region
vpc-name = module.vpc.vpc-name
image = var.image
instance_type = var.instance_type
subnet-self_link = module.vpc.subnet-self_link
enable_autoscaling = var.enable_autoscaling
}# create microservices/apps in each zone
module "microservice-instance" {
count = var.appserver_count
source = "./modules/microservice-instance"
appserver_count = var.appserver_count
image = var.image
instance_type = var.instance_type
zones = var.zones
vpc-name = module.vpc.vpc-name
subnet-self_link = module.vpc.subnet-self_link
}#create VPC and subnets
module "vpc" {
source = "./modules/vpc"
region = var.region
name = var.name
}
In the main-variables.tf, I have declared all the variables needed for the customization of the infrastructure. For some variables, I have defined default values and for others, I have provided values from terraform.tfvars.
GCP/main-variables.tf - variable name {default="xcloud-project"}
variable project {}
variable credentials {}
variable region {default = "europe-west2"}
variable zones { default = ["europe-west2-a", "europe-west2-b"] }
variable env { default = "dev" }
variable network_name {default = "xcloud network"}
variable image {default="ubuntu-os-cloud/ubuntu-1804-lts"}variable appserver_count { default = 1 }
variable webserver_count { default = 2 }
variable app_image { default = "centos-7-v20170918" }
variable instance_type { default = "f1-micro" }
variable no_of_db_instances{ default = 1 }
# variable create_default_vpc{ default = true }
variable enable_autoscaling {default = true}
variable db_user {}
variable db_password {}
In the database module, I have created a Postgres SQL database and a DB user.
GCP/modules/database/database.tf - resource "random_id" "db_name_suffix" {
byte_length = 4
}resource "google_sql_database_instance" "postgres_db" {
name = "postgres-instance-${random_id.db_name_suffix.hex}"
database_version = "POSTGRES_11"
deletion_protection = false
settings {
tier = "db-f1-micro"ip_configuration {
authorized_networks {
value = "0.0.0.0/0"
name = "all"
}
}
}
}
resource "google_sql_user" "users" {
name = var.db_user
instance = google_sql_database_instance.postgres_db.name
password = var.db_password}
Instance-template modules define a template which will be used by load balancer and auto scaler to create web servers instances. It has simple scripts defined as user_data which install an apache server and create a hello html page. It outputs ws-instance_template which will be used by Instance Group manager in load balancer module.
GCP/modules/instance-template/instance-template.tfdata "template_file" "init" {
template = "${file("${path.module}/scripts/startup.sh")}"
}resource "google_compute_instance_template" "ws-instance_template" {
count = var.enable_autoscaling ? 1:0
name = "${var.name}-webserver-instance-template"
project = var.project
machine_type = var.instance_type
region = var.regionmetadata = {
# ssh-keys = "${var.user}:${file("${var.ssh_key}")}"
}disk {
source_image = var.image
auto_delete = true
boot = true
}network_interface {
network = var.vpc-name
subnetwork = var.subnet-self_link
access_config {
# Ephemeral IP - leaving this block empty will generate a new external IP and assign it to the machine
}
}metadata_startup_script = "${data.template_file.init.rendered}"tags = ["http"]labels = {
environment = var.env
}
}GCP/modules/instance-template/it-outputs.tfoutput "ws-instance_template" {
value = var.enable_autoscaling ? "${google_compute_instance_template.ws-instance_template[0].self_link}" : ""
# value = "${google_compute_instance_template.webserver.self_link}"
}
Load Balancer module “lb” will create Instance Group Manager, an Autoscaler, a forwarding rule, url_map and a backend service which all together define how load balancer and auto scaler would work.
GCP/modules/lb/lb.tfresource "google_compute_instance_group_manager" "webservers-group-manager" {
name = "${var.name}-instance-group-manager-${count.index}"
project = var.project
base_instance_name = "${var.name}-webserver-instance"
count = var.webserver_count
zone = "${element(var.zones, count.index)}"version{
name = "ws1"
# instance_template = "${module.instance-template.instance-template}"
instance_template = var.ws-instance_template
}
named_port {
name = "http"
port = 80
}
}resource "google_compute_autoscaler" "autoscaler" {
count = var.webserver_count
name = "${var.name}-scaler-${count.index}"
project = var.project
zone = "${element(var.zones, count.index)}"
target = "${element(google_compute_instance_group_manager.webservers-group-manager.*.self_link, count.index)}"autoscaling_policy {
max_replicas = 2
min_replicas = 1
cooldown_period = 90cpu_utilization {
target = 0.8
}
}
}resource "google_compute_http_health_check" "healthcheck" {
name = "${var.name}-healthcheck"
project = var.project
port = 80
request_path = "/"
}resource "google_compute_backend_service" "backend_service" {
name = "${var.name}-backend-service"
project = var.project
port_name = "http"
protocol = "HTTP"
backend {
group = "${element(google_compute_instance_group_manager.webservers-group-manager.*.instance_group, 0)}"
balancing_mode = "RATE"
max_rate_per_instance = 100
}backend {
group = "${element(google_compute_instance_group_manager.webservers-group-manager.*.instance_group, 1)}"
balancing_mode = "RATE"
max_rate_per_instance = 100
}health_checks = ["${google_compute_http_health_check.healthcheck.self_link}"]
}resource "google_compute_url_map" "url_map" {
name = "${var.name}-url-map"
project = var.project
default_service = "${google_compute_backend_service.backend_service.self_link}"
}
resource "google_compute_target_http_proxy" "target_http_proxy" {
name = "${var.name}-proxy"
project = var.project
url_map = "${google_compute_url_map.url_map.self_link}"
}
resource "google_compute_global_forwarding_rule" "global_forwarding_rule" {
name = "${var.name}-global-forwarding-rule"
project = var.project
target = "${google_compute_target_http_proxy.target_http_proxy.self_link}"
port_range = "80"
}
Microservice module creates multiple VMs to deploy your business services/apps in each zone. The number of such instances to be created is defined in variable file with appserver_count variable.
GCP/modules/microservice-instance/microservice-instance.tf - resource "random_id" "app_name_suffix" {
byte_length = 4
}resource "google_compute_instance" "apps" {
name = "apps-${random_id.app_name_suffix.hex}"
machine_type = var.instance_type
for_each = toset(var.zones )
zone = each.keyboot_disk {
initialize_params {
image = var.image
}
}network_interface {
# network = "default"
# network = "google_compute_network.xcloud-vpc.self_link"
network = var.vpc-name
subnetwork = var.subnet-self_linkaccess_config {
// Ephemeral IP
}
}
}
VPC module creates a non-default VPC, subnet and firewall rule for the region defined in variables. VPC output files define two output variables which are passed to microservice-instance and instance-template modules.
GCP/modules/VPC/vcp.tf - resource "google_compute_network" "xcloud-vpc" {
name = "xcloud-vpc"
auto_create_subnetworks = "false"
}resource "google_compute_subnetwork" "xcloud-subnet" {
name = "${var.name}-subnet"
ip_cidr_range = "10.0.0.0/24"
network = google_compute_network.xcloud-vpc.id
# depends_on = ["google_compute_network.xcloud-vpc"]
region = "${var.region}"
}resource "google_compute_firewall" "xcloud-vpc-firewall" {
name = "xcloud-firewall"
network = google_compute_network.xcloud-vpc.idallow {
protocol = "icmp"
}
allow {
protocol = "tcp"
ports = ["80", "8080", "1000-2000"]
}
source_tags = ["web"]
}GCP/modules/VPC/vpc-output.tf -output "vpc-name" {
# value = var.enable_autoscaling ? "${google_compute_instance_template.webserver[0].self_link}" : ""
value = google_compute_network.xcloud-vpc.name
}
output "subnet-self_link" {
# value = var.enable_autoscaling ? "${google_compute_instance_template.webserver[0].self_link}" : ""
value = google_compute_subnetwork.xcloud-subnet.self_link
}
That was all! You can take a look at the corresponding AWS code in the GitHub repo link here.
GCP & AWS Console Output
GCP Resources created by Terraform Scripts:
AWS Resources created by Terraform Scripts:
Conclusion:
Terraform enables us to build these ephemeral infrastructures across various cloud providers with great consistency and repeatability. It provides you extensive customization, scalability, version control of infra code and ensures parity between various environments (Dev, QA, Prod). Also, for most organizations, it’s extremely important to keep their critical business applications live every second, 24x7, 365 days. There has been outages every year for each of the cloud providers impacting big organizations like Salesforce, Netflix, Adobe. Terraform would help in building consistent infrastructure across different cloud providers and reduce the damage caused by these outages.
As I said in the beginning, the idea is to take it to the next level, where you can define resources in a cloud-agnostic way in a simple config and a script can create the same infra on multiple clouds. That’s coming up next; I am still working on it…