Capacity Management with Load Balancing

Get Cooking in Cloud

Stephanie Wong
Google Cloud - Community
9 min readApr 21, 2020

--

Authors: Stephanie Wong, Priyanka Vergadia

Introduction

“Get Cooking in Cloud” is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this series we plan on identifying specific topics that developers are looking to architect on Google cloud. Once identified we will create a mini series on that topic.

In this miniseries, we will go over Google Cloud load balancing.

  1. Choosing the right load balancer
  2. Application Capacity Optimizations with Global Load Balancing
  3. Capacity Management with load balancing (this article)
  4. Load Balancing to GKE network endpoint groups

In this article we will show you how to compare performance with a multi-region load balancer versus a single region load balancer. We’ll do this by setting up a simple web server running a CPU-intensive application that computes Mandelbrot sets.

Check out the video

Review

In the last video, we showcased how multi-region HTTP(S) load balancing uses the Waterfall by Region algorithm to overflow traffic to the region with the next closest instances. This means that using a global load balancer over regional load balancers can be particularly effective when it comes to global e-commerce sites like Beyond Treat. Apps that use regional backends have nominally lower latency but they can become overloaded easily.

Let’s put this to a real test by walking through a tutorial that shows how Cloud Load Balancing optimizes your global application capacity, resulting in a better user experience and lower costs compared to most load balancing implementations.

In this tutorial, you’ll set up a simple web server running a CPU-intensive application that computes Mandelbrot sets.

  1. You start by measuring network capacity using load-testing tools (httperf) of multiple VM instances in a single region and measure response time under load.
  2. Then scale the network to multiple regions using global load balancing and then measure the server’s response time under load and compare it to single-region load balancing. Performing this sequence of tests lets you see the positive effects of the cross-regional load management of Cloud Load Balancing.

What you’ll learn, and Use

  • Learn how to use load testing tools (httperf).
  • Measure effects of overload with single-region load balancing.
  • Measure effects of overflow to another region with global load balancing.

You’ll be using:

  • Compute Engine
  • Load Balancing and forwarding rules

This Solution for Capacity Management with Load Balancing

Start a Cloud Shell instance

  1. Set your default project, using your project ID for [PROJECT_ID]:

2. Set your default Compute Engine zone, using the following (or replace with your preferred zone) and then set this as an environment variable for later use:

Create and configure the VPC network.

  1. Create a VPC network for testing:

2. Define a firewall rule to allow internal traffic:

3. Define a firewall rule to allow SSH traffic to communicate with the VPC network:

Measuring overload effects with a single-region load balancer

First, let’s examine the effects of overload on single-region load balancers, like typical load balancers used on premises, or an HTTP(S) load balancer when the load balancer is used in a regional (rather than a global) deployment.

Creating the single-region HTTP(S) load balancer

First create a single region HTTP(S) load balancer with a fixed size of 3 VM instances.

  1. Create an instance template for the web server VM instances that installs a Python Mandelbrot generation script. Run the following commands in Cloud Shell:

2. Next create a managed instance group with 3 instances based on the template from the previous step:

3. Create a firewall rule to allow external access to the webserver instance from your own machine:

4. Create the health check, backend service, URL map, target proxy, and global-forwarding rule needed in order to set up HTTP load balancing:

5. Get the IP address of the forwarding rule:

The output is the public IP address of the load balancer you created.

6. In a browser, go to the IP address returned by the previous command. Wait a few minutes, and you should see a computed Mandelbrot set. The image is being served from one of the VM instances in the newly created group.

Test the single-region load balancer with a load test

  1. Create the load-testing instance:

2. SSH in to the loadtest machine on the Compute Engine instances page:

3. On the load-testing instance, install httperf as your load testing tool:

4. Test the server response by sending various requests per second (RPS). Make sure that you use RPS values at least in the range from 5 to 20. For example, the following command generates 10 RPS. Replace [IP_address] with the IP address of the load balancer from Step 5.

You see output similar to the following:

The response latency goes up significantly as the number of RPS increases past 12 or 13 RPS. Here is a visualization of typical results:

5. Sign out of the loadtest VM instance:

With a single region load balancer, the average request latency spikes as the load increases past serving capacity. With 10 RPS, the average request latency is close to 500 ms, but with 20 RPS the latency is 5000 ms. Latency has increased 10x — your users aren’t going to be happy with application timeouts.

In the next section, you’ll add a second region to the load-balancing topology and compare how the cross-region failover affects end-user latency.

Measuring overflow effects to another region

If you use a global application with HTTP(S) Load Balancing and if you have backends deployed in multiple regions, you’ll get automatic traffic overflow to another region when capacity is overloaded in a single region. Let’s test this by adding a second VM instance group in another region.

Creating servers in multiple regions

Let’s add another group of backends in another region and assign a capacity of 10 RPS per region. You can then see how load balancing reacts when this limit is exceeded.

  1. In Cloud Shell, choose a zone in a region different than your default zone and set it as an environment variable:

2. Create a new instance group in the second region with 3 VM instances:

3. Add the instance group to the existing backend service with a maximum capacity of 10 RPS:

4. Adjust the max-rate to 10 RPS for the existing backend service:

Test the multi-regional load balancer with a load test

  1. After all instances boot up, SSH in to the loadtest VM instance.
  2. Run 500 requests at 10 RPS. Replace [IP_address] with the IP address of the load balancer:

3. You see results like the following:

The results are similar to those produced by the regional load balancer.

4. Because your testing tool immediately runs a full load and doesn’t slowly increase the load like a real-world implementation, you have to repeat the test a couple of times for the overflow mechanism to take effect. Run 500 requests 5 times at 20 RPS. Replace [IP_address] with the IP address of the load balancer.

You see results like the following:

After the system stabilizes, the average response time is 400 ms at 10 RPS, and only increases to 700 ms at 20 RPS. This is a huge improvement over the 5000 ms delay offered by a regional load balancer, and results in a much better user experience.

The following graph shows the measured response time by RPS using global load balancing:

Comparing results of regional vs. global load balancing

You just explored the latency effects of backends over capacity when using regional load balancing vs. global load balancing!

Regional load-balancing solutions become overloaded when traffic increases past capacity, since traffic can’t flow anywhere but to the overloaded backend VM instances. This fits the mold of traditional on-premises load balancers, network load balancing on GCP, and HTTP(S) Load Balancing in a single region. The average latency increases by more than a factor of 10!

Meanwhile Global HTTP(S) Load Balancing with backends in multiple regions allows traffic to overflow to the closest region that has available serving capacity. This leads to a measurable but comparatively low increase in end-user latency, and provides a much better user experience. If your application can’t scale out in a region quickly enough, global HTTP(S) Load Balancing is the best option since traffic can be redirected to other regions, helping to avoid a full service outage.

Congrats! You just compared the performance of regional vs. global HTTP load balancing!

For more about this recipe, and to walk through the full tutorial, check out this solution.

Next steps and references:

--

--

Stephanie Wong
Google Cloud - Community

Google Cloud Developer Advocate and producer of awesome online content. Creator of the series, GCP Networking End-to-End; host of Google’s Next onAir. @swongful