Multi-region HA in Google Cloud

Published in

Google Cloud - Community

21 min readApr 18, 2024

Google Cloud is one of the remarkable cloud “hyperscalers”. Hyperscalers are designed for massive capacity. They possess immense data center networks spread globally, allowing them to handle the enormous computing demands of large enterprises and applications with vast user bases. With that, Hyperscalers can enable applications with unsurpassed capabilities in scalability, reliability and global reach.

In this article we will try to explore the levels of possible application availability in Google Cloud with a focus on private internal networks. We’ll also provide actual infrastructure configuration examples.

Let’s imagine a business critical web application or API that provides its important service to the, potentially internal, business customers or end users. Often the business needs require the application to minimize its downtime, make it accessible to the users and responsive most of the time. A common measure of success of such metric is the application service uptime metric often aiming for targets like “99.99%” (“four nines”) or even “99.999%” (“five nines”) which translate into very few minutes of allowed downtime per year.

The typical mechanisms that the application design can rely upon to improve application Availability (as measured by uptime) are

Redundancy — run application on multiple independent hardware instances
Load Balancing — distribute incoming network traffic across multiple application instances running on multiple independent hardware instances
Failover — mechanisms to automatically detect failures and switch operation to a working application instance seamlessly
Monitoring & Alerting — robust monitoring systems to detect problems quickly and preferably proactively notify the team responsible for addressing them
Self-healing — ability of the application components restart themselves or re-provision failing resources with minimal manual intervention

In this article we will concentrate on how Google Cloud can help with the first three means of improving cloud application availability: redundancy, load balancing, failover.

Redundancy

A single application instance or application running in a single failure domain cannot sustain underlying hardware failure and hence the application would not be available to the end users in case of an underlying hardware outage:

Fig. 1: Single application instance on single GCE VM

If our business objectives require addressing only a single Google Compute Engine (GCE) VM outage we would need to apply Redundancy and Load Balancing in order to improve application availability and resilience to that failure scenario:

Fig. 2: Multiple application instances in single GCE Zone

This setup is addressing the single GCE VM or application instance outage failure scenario.

Google Cloud hardware is organized into clusters. A cluster represents a set of compute, network, and storage resources supported by building, power, and cooling infrastructure. Infrastructure components typically support a single cluster, ensuring that clusters share few dependencies. However, components with highly demonstrated reliability and downstream redundancy can be shared between clusters. For example, multiple clusters typically share a utility grid substation because substations are extremely reliable and clusters use redundant power systems.

A zone is a deployment area within a region and Compute Engine implements a layer of abstraction between zones and the physical clusters where the zones are hosted. Each zone is hosted in one or more clusters and you can check the Zone virtualization article for more details about that mapping.

To simplify reasoning without sacrificing accuracy it would be fair to assume that a GCE zone is a deployment area within a geographic region mapped to one or more clusters that can fail together, e.g. because of the power supply outage.

GCE zone outage is not an impossible scenario and a highly reliable application on Google Cloud typically seeks to sustain its service during such unfortunate event by running application replicas in multiple zones:

Fig. 3: Multiple application instances in multiple GCE Zones

With high zone availability SLA levels provided by the Google Cloud Compute engine, the application setup in Figure 3 should be sufficient for majority of business use cases even for very demanding customers requiring high application service SLA levels.

Unfortunately, a full region outage is also not an impossible scenario.

The power of cloud hyperscalers is especially in that they provide customers with significantly better tools to survive disasters similar to this one, for example, than other cloud providers. Amongst other things, that is what differentiates “Hyperscalers” from small-scale or localized cloud service providers. In Google Cloud an application can run its replicas not only on power independent hardware within one data center or geographic location (probably connected to the same power plant in the neighborhood) but also across geographic location and even across continents!

So we are coming to the next level of application redundancy that is possible with Google Cloud: multi-regional application deployment.

With that, a business critical application can now have a strategy for the entire site (region) failure and promise uptime to its critical clients even in that unlikely case.

Load Balancing

There needs to be some magic happening in order to seamlessly direct clients from around the world to the application instance replicas running in multiple geographic locations. And not only that. Whenever a VM, GCE zone or even full region goes down that magic needs to seamlessly redirect application clients to the healthy locations in other surviving region.

What are the options for load balancing that Google Cloud provides?

On the picture in Figure 4 the load balancer is located in Google Cloud but outside of any particular region. That kind of a global service can be provided by the following types of Google Cloud Load balancers:

Global External Application Load Balancer
Classic Application Load Balancer in Premium Tier
Global External proxy Network Load Balancer
Classic Proxy Network Load Balancer

Load balancers of all of these listed types load balancing traffic coming from the clients on the internet to the workloads running on Google Cloud.

An enterprise organization on Google Cloud would keep VPC networks private and expose application workloads to the internal company clients, which are also often located across the world.

Internal load balancers on Google Cloud restrict access to the application to the clients in internal networks only. Unlike global external, internal load balancers on Google Cloud currently rely on the regional infrastructure. Availability of the applications exposed by internal load balancers can hence be affected by a single cloud region outage.

That means that for the internal clients the multi-regional application deployment depicted in Figure 4 logically changes to:

Fig. 5: Multiple application instances in multiple GCE Regions with internal load balancing

The choice of internal load balancers on Google Cloud is even bigger:

Regional Internal Application Load Balancer
Cross-region Internal Application Load Balancer
Regional internal proxy Network Load Balancer
Cross-region internal proxy Network Load Balancer
Internal passthrough Network Load Balancer

There is an open question with the regional internal load balancing though. How would application clients know and seamlessly failover to the healthy region in case of a full region outage (a high bar challenge we have set us up to)?

To address that challenge we can revert to a well known technique called DNS load balancing (or round-robin DNS).

Fig. 6: Internal DNS load balancing with regional application backends

Fully managed Google Cloud DNS service offers an important and convenient tool to setup such cross-regional client access, it is called Geolocation routing policies and it “lets you map traffic originating from source geographies (Google Cloud regions) to specific DNS targets. Use this policy to distribute incoming requests to different service instances based on the traffic’s origin. You can use this feature with the internet, with external traffic, or with traffic originating within Google Cloud and bound for internal passthrough Network Load Balancers. Cloud DNS uses the region where the queries enter Google Cloud as the source geography.”

Using Cloud DNS Geolocation routing policies in the setup depicted in Figure 6 application clients will automatically receive IP address of the Internal Load Balancer nearest to their geographic location from the Cloud DNS server.

Please note, that the Google Cloud DNS is a fully managed global service offering impressive SLO targets. DNS cache on the application client side helps sustaining the application service available in the rare case of possible Cloud DNS service outage.

In fact, many parts of the Cross-region Internal Application Load Balancers are global Google Cloud resources as well. Here is a more detailed diagram borrowed from the public Google Cloud pages:

Fig. 7: Global resources of the internal cross-region load balancer

Failover

But what exactly happens to the client connections and overall application availability in case of an individual GCE VM, zone or region outage? How would a DNS service know that it needs to resolve application hostnames to a different IP address to direct clients to another (healthy) region?

The Cloud DNS service and its Geolocation routing policies in particular has yet another feature which completes the multi-regional application deployment puzzle. It is Health Checks.

For Internal Passthrough Network Load Balancers (L4), Cloud DNS checks the health information on the load balancer’s individual backend instances to determine if the load balancer is healthy or unhealthy. Cloud DNS applies a default 20% threshold, and if at least 20% of backend instances are healthy, the load balancer endpoint is considered healthy. DNS routing policies mark the endpoint as healthy or unhealthy based on this threshold, routing traffic accordingly.

For Internal Application Load Balancers and Cross-region Internal Application Load Balancers, Cloud DNS checks the overall health of the internal Application Load Balancer, and lets the internal Application Load Balancer itself check the health of its backend instances.

Fig. 8: Internal DNS load balancing with cross-region application backends with health checks

Cloud DNS health checks are a crucial solution component for achieving maximum application uptime. Without Cloud DNS ability to test the current status of the internal load balancers and application backends it would not be able to reason about which IP address exactly should be returned to the clients on their DNS requests in case of a region outage. Hence the seamless client failover to the health application instances would not be possible.

Please note that in order to achieve best results, the Time-To-Live parameter of the application A-record in Cloud DNS needs to be set to a minimal value. It could even be zero, in which case applications would contact DNS for a current IP before every call to the application service. The choice of the DNS record TTL value is a tradeoff between the application availability requirements and DNS service load and client response latency.

Internal load balancers maintain their own application backends health checks (Cloud DNS health checks are using a different mechanism) and in case of the cross-region internal application load balancers a load balancer operating in a particular region can automatically failover and redirect client requests to the application replicas running in another healthy region.

This setup addresses the “partial” region outage scenario. That is when only application backend instances are not available (e.g. GCE VMs are down or there is a error in the application preventing it from accepting incoming connections) but other services in the affected region (such as networking and load balancing) continue working.

Configuration with Managed Instance Groups

Let’s combine all pieces of an HA solution discussed before into a single picture and see how the Google Cloud resources need to be configured together to achieve the desired effect.

GCE Managed Instance Groups based scenarios, discussed in this article, are also relevant to the managed Kubernetes service on Google Cloud, GKE, as well. Kubernetes node pools in GKE are implemented as GCE MIGs. Hence, Kubernetes workload deployed to GKE on Google Cloud can be made multi-regional by deploying the application service to several GKE clusters in distinct regions. The load balancer resources for such set up can be provisioned using

Kubernetes Gateway API and GKE Gateway Controller in GKE clusters
Multi Cluster Ingress resources in GKE clusters
Terraform resources (outside of GKE cluster)

In this example we will use Terraform to set up load balancers, a common tool for declarative cloud infrastructure definition and provisioning, but it is also possible to achieve the same setup using the other two approaches. You can find the full Terraform example in this GitHub project.

Our first example will be based on the regional Internal Network Passthrough Load Balancers and Google Compute Engine (GCE) Managed Instance Groups (MIGs).

In the last section of this article we’ll discus pros and cons of different load balancer and application backends combinations.

First we define the GCE MIGs in two Google Cloud regions:

// modules/mig/mig.tf:
module "gce-container" {
  source = "terraform-google-modules/container-vm/google"
  container = {
    image = var.image
    env = [
      {
        name = "NAME"
        value = "hello"
      }
    ]
  }
}

data "google_compute_default_service_account" "default" {
}
module "mig_template" {
  source               = "terraform-google-modules/vm/google//modules/instance_template"
  version              = "~> 10.1"
  network              = var.network_id
  subnetwork           = var.subnetwork_id
  name_prefix          = "mig-l4rilb"
  service_account      = {
    email  = data.google_compute_default_service_account.default.email
    scopes = ["cloud-platform"]
  }
  source_image_family  = "cos-stable"
  source_image_project = "cos-cloud"
  machine_type         = "e2-small"
  source_image         = reverse(split("/", module.gce-container.source_image))[0]
  metadata             = merge(var.additional_metadata, { "gce-container-declaration" = module.gce-container.metadata_value })
  tags = [
    "container-vm-test-mig"
  ]
  labels = {
    "container-vm" = module.gce-container.vm_container_label
  }
}

module "mig" {
  source             = "terraform-google-modules/vm/google//modules/mig"
  version            = "~> 10.1"
  project_id         = var.project_id

  region             = var.location
  instance_template  = module.mig_template.self_link
  hostname           = "${var.name}"
  target_size        = "1"
  
  autoscaling_enabled = "true"
  min_replicas = "1"
  max_replicas = "1"
  named_ports = [{
    name = var.lb_proto
    port = var.lb_port
  }] 

  health_check_name = "${var.name}-http-healthcheck"
  health_check = {
    type = "http"
    initial_delay_sec   = 10
    check_interval_sec  = 2
    healthy_threshold   = 1
    timeout_sec         = 1
    unhealthy_threshold = 1
    port                = 8080
    response            = ""
    proxy_header        = "NONE"
    request             = ""
    request_path        = "/"
    host                = ""
    enable_logging      = true
  }
}

// mig.tf
module "mig-l4" {
  for_each      = var.locations
  source        = "./mig"
  project_id    = var.project_id
  location      = each.key
  network_id    = data.google_compute_network.lb_network.id
  subnetwork_id = data.google_compute_subnetwork.lb_subnetwork[each.key].id
  name          = "failover-l4-${each.key}"
  image         = var.image
}

Then, let’s define two Cross-regional Internal Network Passthrough Load Balancers (L4 ILBs), each in respective region:

// modules/l4rilb/l4-rilb.tf
locals {
  named_ports = [{
    name = var.lb_proto
    port = var.lb_port
  }]
  health_check = {
    type                = var.lb_proto
    check_interval_sec  = 1
    healthy_threshold   = 4
    timeout_sec         = 1
    unhealthy_threshold = 5
    response            = ""
    proxy_header        = "NONE"
    port                = var.lb_port
    port_name           = "health-check-port"
    request             = ""
    request_path        = "/"
    host                = "1.2.3.4"
    enable_log          = false
  }
}

module "l4rilb" {
  source        = "GoogleCloudPlatform/lb-internal/google"
  project       = var.project_id
  region        = var.location
  name          = "${var.lb_name}"
  ports         = [local.named_ports[0].port]
  source_tags   = ["allow-group1"]
  target_tags   = ["container-vm-test-mig"]
  health_check  = local.health_check
  global_access = true

  backends = [
    {
      group       = var.mig_instance_group
      description = ""
      failover    = false
    },
  ]
}

// l4-rilb-mig.tf
module "l4-rilb" {
  for_each            = var.locations
  source              = "./modules/l4rilb"
  project_id          = var.project_id
  location            = each.key
  lb_name             = "l4-rilb-${each.key}"
  mig_instance_group  = module.mig-l4[each.key].instance_group
  image               = var.image
  network_id          = data.google_compute_network.lb_network.id
  subnetwork_id       = data.google_compute_subnetwork.lb_subnetwork[each.key].name

  depends_on = [ 
    google_compute_subnetwork.proxy_subnetwork 
  ]
}

And now let’s also add the global Cloud DNS record set configuration:

// dns-l4-rilb-mig.tf
resource "google_dns_record_set" "a_l4_rilb_mig_hello" {
  name         = "l4-rilb-mig.${google_dns_managed_zone.hello_zone.dns_name}"
  managed_zone = google_dns_managed_zone.hello_zone.name
  type         = "A"
  ttl          = 1

  routing_policy {
    dynamic "geo" {
      for_each  = var.locations
      content {
        location  = geo.key
        health_checked_targets {
          internal_load_balancers {
              ip_address         = module.l4-rilb[geo.key].lb_ip_address
              ip_protocol        = "tcp"
              load_balancer_type = "regionalL4ilb"
              network_url        = data.google_compute_network.lb_network.id
              port               = "8080"
              region             = geo.key
              project            = var.project_id
            }
        }
      }
    }
  }  
}

After we apply the Terraform configuration to the target Google Cloud project:

terraform init
terraform plan
terraform apply

we get all solution infrastructure components including a test application running in the GCE VMs in two distinct Google Cloud regions needed to perform end-to-end testing.

Let’s see how the clients can now access our application.

For testing of continuous request flow we can use the Fortio tool, which is a common tool for testing service mesh application performance. We will run it from a GCE VM attached to the same VPC where the load balancers are installed:

gcloud compute ssh jumpbox

docker run fortio/fortio load --https-insecure -t 1m -qps 1 http://l4mig.hello.zone:8080

The results after a minute of execution should look similar to the following:

IP addresses distribution:
10.156.0.11:8080: 1
Code 200 : 258 (100.0 %)
Response Header Sizes : count 258 avg 390 +/- 0 min 390 max 390 sum 100620
Response Body/Total Sizes : count 258 avg 7759.624 +/- 1.497 min 7758 max 7763 sum 2001983
All done 258 calls (plus 4 warmup) 233.180 ms avg, 17.1 qps

Please note the IP address of the L4 internal regional load balancer in the nearest region that is getting all of the calls.

In the second console window SSH into the VM in the GCE MIG group in the nearest region

export MIG_VM=$(gcloud compute instances list --format="value[](name)" --filter="name~l4-europe-west3")
export MIG_VM_ZONE=$(gcloud compute instances list --format="value[](zone)" --filter="name=${MIG_VM}")

gcloud compute ssh --zone $MIG_VM_ZONE $MIG_VM --tunnel-through-iap --project $PROJECT_ID

docker ps

Now let’s run the load test in the first console window again.

While the test is running switch to the second console window and execute

docker stop ${CONTAINER}

Switch to the first console window and notice the failover happening. The output at the end of the execution should look like following

IP addresses distribution:
10.156.0.11:8080: 16
10.199.0.48:8080: 4
Code -1 : 12 (10.0 %)
Code 200 : 108 (90.0 %)
Response Header Sizes : count 258 avg 390 +/- 0 min 390 max 390 sum 100620
Response Body/Total Sizes : count 258 avg 7759.624 +/- 1.497 min 7758 max 7763 sum 2001983
All done 120 calls (plus 4 warmup) 83.180 ms avg, 2.0 qps

Please note that the service VM in the Managed Instance has been automatically restarted. This functionality is provided by the Google Compute Engine Managed Instance groups and implements the forth component of application high availability posture, as you remember from the beginning of the article, it is self-healing.

Cloud Run Backends

Let’s consider a second scenario and assume that our cloud-native application is implemented as a Cloud Run service.

The Cloud Run based scenarios, discussed in this article, are relevant for the Cloud Functions (2nd generation) application backends as well. Cloud Functions can be configured as load balancer backends similarly to the Cloud Run instances using the same Serverless Network Endpoint Groups resources.

The overall multi-region application deployment changes slightly.

Fig. 9: Internal DNS load balancing with cross-region application backends in Cloud Run

We start with defining two regional Cloud Run instances allowing invocations from unauthenticated clients:

// modules/cr/cr2.tf
resource "google_cloud_run_v2_service" "cr_service" {
  project       = var.project_id
  name          = "cr2-service"  
  location      = var.location
  launch_stage  = "GA"

  ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
  custom_audiences = [ "cr-service" ]

  template {
    containers {
      image = "gcr.io/cloudrun/hello" # public image for your service
    }
  }
  traffic {
    percent         = 100
    type = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
  }
}

resource "google_compute_region_network_endpoint_group" "cloudrun_v2_sneg" {
  name                  = "cloudrun-sneg"
  network_endpoint_type = "SERVERLESS"
  region                = var.location
  cloud_run {
    service = google_cloud_run_v2_service.cr_service.name
  }
}

resource "google_cloud_run_v2_service_iam_member" "public-access" {
  name     = google_cloud_run_v2_service.cr_service.name
  location = google_cloud_run_v2_service.cr_service.location
  project  = google_cloud_run_v2_service.cr_service.project
  role     = "roles/run.invoker"
  member   = "allUsers"
}

// cr.tf
module "cr-service" {
  for_each    = var.locations
  source      = "./modules/cr"
  project_id  = var.project_id
  location    = each.key
  image       = var.image
}

And then define global Internal Cross-Region Application Load Balancer resources:

// modules/l7crilb/l7-crilb.tf
resource "google_compute_global_forwarding_rule" "forwarding_rule" {
  for_each              = var.subnetwork_ids
  project               = var.project_id
  
  name                  = "${var.lb_name}-${each.key}"

  ip_protocol           = "TCP"
  load_balancing_scheme = "INTERNAL_MANAGED"
  port_range            = var.lb_port
  target                = google_compute_target_https_proxy.https_proxy.self_link
  network               = var.network_id
  subnetwork            = each.value
}

resource "google_compute_target_https_proxy" "https_proxy" {
  project  = var.project_id

  name    = "${var.lb_name}"
  url_map = google_compute_url_map.url_map.self_link

  certificate_manager_certificates = [
    var.certificate_id
  ]
  lifecycle {
    ignore_changes = [
      certificate_manager_certificates
    ]
  }
}

resource "google_compute_url_map" "url_map" {
  project  = var.project_id

  name            = "${var.lb_name}"
  default_service = google_compute_backend_service.backend_service.self_link
}

resource "google_compute_backend_service" "backend_service" {
  project  = var.project_id

  load_balancing_scheme = "INTERNAL_MANAGED"
  session_affinity = "NONE"
  
  dynamic "backend" {
    for_each          = var.backend_group_ids
    content {
      group           = backend.value
      balancing_mode  = var.balancing_mode
      capacity_scaler = 1.0      
    }
  }

  name        = "${var.lb_name}"
  protocol    = var.backend_protocol
  timeout_sec = 30

  // "A backend service cannot have a healthcheck with Serverless network endpoint group backends"
  health_checks = var.is_sneg ? null : [google_compute_health_check.health_check.self_link]

  outlier_detection {
    base_ejection_time {
      nanos     = 0
      seconds   = 1
    }
    consecutive_errors = 3
    enforcing_consecutive_errors = 100
    interval {
      nanos     = 0
      seconds   = 1
    }
    max_ejection_percent = 50
  }

}

resource "google_compute_health_check" "health_check" {
  project    = var.project_id

  name   = "${var.lb_name}"
  http_health_check {
    port_specification = "USE_SERVING_PORT"
  }
}

// l7-crilb-cr.tf
module "l7-crilb-cr" {
  source            = "./modules/l7crilb"
  project_id        = var.project_id
  lb_name           = "l7-crilb-cr"

  network_id        = data.google_compute_network.lb_network.name
  subnetwork_ids    = { for k, v in data.google_compute_subnetwork.lb_subnetwork : k => v.id }
  certificate_id    = google_certificate_manager_certificate.ccm-cert.id
  backend_group_ids = [ for k, v in module.cr-service : v.sneg_id ]
  is_sneg           = true
}

Please note that all load balancer related resources in this case are global (not regional).

In this demo case we need to define define the Cloud DNS resources as well:

// dns-l7-crilb-cr.tf
resource "google_dns_record_set" "a_l7_crilb_cr_hello" {
  name         = "l7-crilb-cr.${google_dns_managed_zone.hello_zone.dns_name}"
  managed_zone = google_dns_managed_zone.hello_zone.name
  type         = "A"
  ttl          = 1

  routing_policy {
    dynamic "geo" {
      for_each  = var.locations
      content {
        location  = geo.key
        health_checked_targets {
          internal_load_balancers {
              ip_address         = module.l7-crilb-cr.lb_ip_address[geo.key]
              ip_protocol        = "tcp"
              load_balancer_type = "globalL7ilb"
              network_url        = data.google_compute_network.lb_network.id
              port               = "443"
              project            = var.project_id
            }
        }
      }
    }
  }  
}

For each region where the Cloud Run instance with our application is running we need to create a dedicated Cloud DNS routing policy.

Let’s now apply the Terraform to the target Google Cloud project and see how the clients can access our Cloud Run application.

Similarly to the Network Passthrough load balancer case described in the previous section, we’ll call our application endpoint exposed by the Cloud Run via the configured FQDN hostname:

gcloud compute ssh jumpboxdocker run fortio/fortio load --https-insecure \
    -t 5m -qps 1 https://l7-crilb-cr.hello.zone

The results after a minute of execution should look similar to the following:

IP addresses distribution:
10.156.0.55:443: 4
Code 200 : 8 (100.0 %)
Response Header Sizes : count 8 avg 216 +/- 0 min 216 max 216 sum 1728
Response Body/Total Sizes : count 8 avg 226 +/- 0 min 226 max 226 sum 1808
All done 8 calls (plus 4 warmup) 17.066 ms avg, 1.4 qps

With our Fortio tool setup of one call per second, all calls have reached their destination.

The IP address that shows up in the output is the IP of the L7 internal cross-regional load balancer in the nearest region that is receiving all of our calls at the moment.

To simulate Cloud Run backend service outage, while running Fortio tool started in the previous step, in the second console window we can delete the backend resource in the nearest region from the load balancer backend service definition, e.g.:

gcloud compute backend-services remove-backend l7-crilb-cr \
   --network-endpoint-group=cloudrun-sneg \
   --network-endpoint-group-region=europe-west3 \
   --global

We can also check, to which regions the load balancer sends the traffic using:

gcloud compute backend-services list --filter="name:l7-crilb-cr"

NAME         BACKENDS                                         PROTOCOL
l7-crilb-cr  us-central1/networkEndpointGroups/cloudrun-sneg  HTTPS

There is only one backend left running in the remote region. Yet, the Fortio results in the first console session show no hiccup or interruption:

IP addresses distribution:
10.156.0.55:443: 4
Code 200 : 300 (100.0 %)
Response Header Sizes : count 300 avg 216.33333 +/- 1.106 min 216 max 220 sum 64900
Response Body/Total Sizes : count 300 avg 226.33333 +/- 1.106 min 226 max 230 sum 67900
All done 300 calls (plus 4 warmup) 193.048 ms avg, 1.0 qps

What we have seen so far was the failover at the Internal Cross-Regional Application load balancer backend side. That is, the client application (Fortio) was still accessing the load balancer IP address in the nearest europe-west3 region. That can also be verified by running host l7-crilb-cr.hello.zone which will return the internal load balancer IP address from the subnetwork in the europe-west3 region.

What would happen in case of a full local region outage?

The first use case discussed above (Network Passthrough Load Balancer with MIG backends) illustrates that case. The Cloud DNS L4 health checks for the Network Passthrough load balancer test connection all way through to the actual application process running in GCE VMs (it is not possible to configure this type of load balancer with Serverless Network Endpoint Groups backends for Cloud Run instances) and flips the IP address for the application service host name to the load balancer IP address in another region automatically.

Unfortunately, the Cloud DNS health checks for application (L7) load balancers cannot detect the outage of the application backend service with that fidelity level yet. Regional and Cross-regional Application load balancers are built on Envoy proxies internally and Envoy proxy based load balancers are only health checking the state and availability of the Envoy proxy instances, not the application backends themselves.

If an application running in Cloud Run experiences malfunction (e.g. as a result of internal program error) and returns 500 response codes, the Cloud DNS won’t detect that and won’t take action switching the load balancer IPs for the application hostname. That situation would be detected by the Outlier Detection feature of the Internal Cross-regional Application Load Balancer and the load balancer will redirect traffic to the healthy backend by looking at the rate of successful calls towards each backend.

A missing load balancer backend is not considered as an outage by the Cloud DNS health checks though. When a Cross-regional Application Load Balancer backend resources are not properly configured, load balancer has none of them, or has malfunctioning backends, the Cloud DNS won’t take action and won’t flip load balancer IPs automatically. The Cloud DNS health checks only check the availability of the internal Google Cloud infrastructure (Envoy proxies) supporting the application (L7) load balancers.

Yet, in case a full Google Cloud region outage the load balancer infrastructure would not be available fully and the Cloud DNS health check could detect that and act as expected.

Options Choice

When designing a highly available application service distributed across multiple regions on Google Cloud we need to consider the currently existing constraints in the Google Cloud services and pick a combination that would support the application requirements.

Here are the constraints and trade-offs that you should consider when picking the Google Cloud load balancer type for your distributed application.

1. External vs Internal Load Balancers

The Global External Application Load Balancer offers the tools for building geographically distributed application service with best availability guarantees.

Fig. 10: External application load balancer with backends in Cloud Run

Instead of relying upon the DNS load balancing trick, it provides application endpoint availability to the clients via global anycast Virtual IP addresses and smart global network infrastructure that cleverly routes client traffic to the infrastructure and services in the healthy regions.

The Google Cloud internal load balancers infrastructure is available via regional IP addresses instead and hence require additional mechanism, Cloud DNS load balancing suggested in this article, to address the full regional outage scenario.

2. Managed Instance Groups vs Cloud Run Backends

The Network Passthrough (L4) Load Balancers cannot be configured with Cloud Run backends. They don’t support Serverless Network Endpoint Groups (NEGs) required for Cloud Run and Cloud Function based backends at the moment.

Hence, if you are building a multi-regional application service that should only be available in the internal company VPC network and would like to address majority of possible outage scenarios (full region outage, individual regional Google Cloud service outage, application service malfunction), then Network Passthrough (L4) Load Balancer with GCE Managed Instance Groups is the only option for the application backends. Please remember, that the GCE MIGs is the mechanism supporting GKE node pools as well. Hence, the GCE MIG backends option is also applicable to the Kubernetes workloads running in GKE.

An important consideration for the Cloud Run backends in multiple regions is authentication.

In order to seamlessly continue service for the authenticated Cloud Run clients the Cloud Run instances in different regions must be configured with Custom Audiences. In that way, the access token that client passes along with an authenticated call can be validated and accepted by the Cloud Run backends in all regions. Please note that the Custom Audiences is a feature available in Cloud Run instances of the 2nd generation. Cloud Run instances of the 1st generation can be used in the suggested multi-regional setup only when the application service does not need to authenticate clients.

3. Network Passthrough (L4) vs Application (L7) Load Balancers

Selecting the load balancer type also depends on the functionality that a load balancer can provide. In case of a Passthrough (L4) load balancer the application would need to implement the following tasks by itself (to name a few):

terminate TLS connections
authenticate incoming calls
implement request routing

The Application (L7) load balancers can help with that but their internal versions address less failure scenarios, in comparison to the Network Passthrough (L4) load balancer based solution, because of the current feature level of the Cloud DNS health check mechanism. For example, Cloud DNS would not flip an application service IP address in case if the application is experiencing internal malfunction (e.g. returning 50x error codes) or the load balancer backend is unavailable or missing altogether.

This is not a problem with External Application (L7) load balancers, since there is no need in Cloud DNS load balancing solution for exposing application in a high available way in multiple regions.

These mentioned “partial” or individual regional service infrastructure outage scenarios are handled by the Internal Cross-Regional Application Load Balancers on their backend side though. In addition, optional Outlier Detection load balancer configuration can help detecting application level malfunctions at the cost of wasting certain percentage of actual client requests in the case of outage.

Conclusion

Google Cloud goes beyond usual redundant deployments and offers architects and developers tools for building highly available application services across multiple geographic locations also for internal security restricted corporate use cases.

The choice of the particular combination of Google Cloud resources for improving multi-regional application availability depends on the individual applications requirements and features currently supported in the Google Cloud services such as network load balancers and Cloud DNS.

Enterprise security features in Google Cloud services get special attention and differentiate Google Cloud from other cloud hyperscalers. Please check one of my previous articles on Application Secrets Encryption in Google Cloud Kubernetes products for an example of possible with Google Cloud products.