Capacity Management with Load Balancing

Get Cooking in Cloud

Stephanie Wong

Published in

Google Cloud - Community

9 min readApr 21, 2020

Authors: Stephanie Wong, Priyanka Vergadia

Introduction

“Get Cooking in Cloud” is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this series we plan on identifying specific topics that developers are looking to architect on Google cloud. Once identified we will create a mini series on that topic.

In this miniseries, we will go over Google Cloud load balancing.

Choosing the right load balancer
Application Capacity Optimizations with Global Load Balancing
Capacity Management with load balancing (this article)
Load Balancing to GKE network endpoint groups

In this article we will show you how to compare performance with a multi-region load balancer versus a single region load balancer. We’ll do this by setting up a simple web server running a CPU-intensive application that computes Mandelbrot sets.

Check out the video

Review

In the last video, we showcased how multi-region HTTP(S) load balancing uses the Waterfall by Region algorithm to overflow traffic to the region with the next closest instances. This means that using a global load balancer over regional load balancers can be particularly effective when it comes to global e-commerce sites like Beyond Treat. Apps that use regional backends have nominally lower latency but they can become overloaded easily.

Let’s put this to a real test by walking through a tutorial that shows how Cloud Load Balancing optimizes your global application capacity, resulting in a better user experience and lower costs compared to most load balancing implementations.

In this tutorial, you’ll set up a simple web server running a CPU-intensive application that computes Mandelbrot sets.

You start by measuring network capacity using load-testing tools (httperf) of multiple VM instances in a single region and measure response time under load.
Then scale the network to multiple regions using global load balancing and then measure the server’s response time under load and compare it to single-region load balancing. Performing this sequence of tests lets you see the positive effects of the cross-regional load management of Cloud Load Balancing.

What you’ll learn, and Use

Learn how to use load testing tools (httperf).
Measure effects of overload with single-region load balancing.
Measure effects of overflow to another region with global load balancing.

You’ll be using:

Compute Engine
Load Balancing and forwarding rules

This Solution for Capacity Management with Load Balancing

Start a Cloud Shell instance

Set your default project, using your project ID for [PROJECT_ID]:

gcloud config set project [PROJECT_ID]

2. Set your default Compute Engine zone, using the following (or replace with your preferred zone) and then set this as an environment variable for later use:

gcloud config set compute/zone us-central1-cexport ZONE=us-central1-c

Create and configure the VPC network.

Create a VPC network for testing:

gcloud compute networks create lb-testing --subnet-mode auto

2. Define a firewall rule to allow internal traffic:

gcloud compute firewall-rules create lb-testing-internal \--network lb-testing --allow all --source-ranges 10.128.0.0/11

3. Define a firewall rule to allow SSH traffic to communicate with the VPC network:

gcloud compute firewall-rules create lb-testing-ssh \--network lb-testing --allow tcp:22 --source-ranges 0.0.0.0/0

Measuring overload effects with a single-region load balancer

First, let’s examine the effects of overload on single-region load balancers, like typical load balancers used on premises, or an HTTP(S) load balancer when the load balancer is used in a regional (rather than a global) deployment.

Creating the single-region HTTP(S) load balancer

First create a single region HTTP(S) load balancer with a fixed size of 3 VM instances.

Create an instance template for the web server VM instances that installs a Python Mandelbrot generation script. Run the following commands in Cloud Shell:

gcloud compute instance-templates create webservers \--machine-type n1-highcpu-4 \--image-family=debian-10 --image-project=debian-cloud \--tags=http-server \--network=lb-testing \--metadata startup-script='#! /bin/bashapt-get -y updateapt-get install -y git python-numpy python-matplotlibgit clone \https://github.com/GoogleCloudPlatform/lb-app-capacity-tutorial-python.gitcd lb-app-capacity-tutorial-pythonpython webserver.py'

2. Next create a managed instance group with 3 instances based on the template from the previous step:

gcloud compute instance-groups managed create webserver-region1 \--size=3 --template=webservers

3. Create a firewall rule to allow external access to the webserver instance from your own machine:

gcloud compute firewall-rules create lb-testing-http \--network lb-testing --allow tcp:80 --source-ranges 0.0.0.0/0 \--target-tags http-server

4. Create the health check, backend service, URL map, target proxy, and global-forwarding rule needed in order to set up HTTP load balancing:

gcloud compute health-checks create http basic-check \--request-path="/health-check" --check-interval=60sgcloud compute backend-services create web-service \--health-checks basic-check --globalgcloud compute backend-services add-backend web-service \--global --instance-group=webserver-region1 \--instance-group-zone $ZONEgcloud compute url-maps create web-map --default-service web-servicegcloud compute target-http-proxies create web-proxy --url-map web-map`gcloud compute forwarding-rules create web-rule --global \--target-http-proxy web-proxy --ports 80

5. Get the IP address of the forwarding rule:

gcloud compute forwarding-rules describe --global web-rule --format "value(IPAddress)"

The output is the public IP address of the load balancer you created.

6. In a browser, go to the IP address returned by the previous command. Wait a few minutes, and you should see a computed Mandelbrot set. The image is being served from one of the VM instances in the newly created group.

Test the single-region load balancer with a load test

Create the load-testing instance:

gcloud compute instances create loadtest --machine-type n1-standard-1 \--network=lb-testing --image-family=debian-10 \--image-project=debian-cloud

2. SSH in to the loadtest machine on the Compute Engine instances page:

3. On the load-testing instance, install httperf as your load testing tool:

sudo apt-get install -y httperf

4. Test the server response by sending various requests per second (RPS). Make sure that you use RPS values at least in the range from 5 to 20. For example, the following command generates 10 RPS. Replace [IP_address] with the IP address of the load balancer from Step 5.

httperf --server [IP_address] --num-conns 500 --rate 10 2>&1| grep 'Errors\|ion time'

You see output similar to the following:

httperf --server 35.190.77.137 --num-conns 500 --rate 10 2>&1|grep 'Errors\|ion time'Connection time [ms]: min 418.6 avg 607.5 max 1963.5 median 522.5 stddev 249.2Connection time [ms]: connect 0.5Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

The response latency goes up significantly as the number of RPS increases past 12 or 13 RPS. Here is a visualization of typical results:

5. Sign out of the loadtest VM instance:

exit

With a single region load balancer, the average request latency spikes as the load increases past serving capacity. With 10 RPS, the average request latency is close to 500 ms, but with 20 RPS the latency is 5000 ms. Latency has increased 10x — your users aren’t going to be happy with application timeouts.

In the next section, you’ll add a second region to the load-balancing topology and compare how the cross-region failover affects end-user latency.

Measuring overflow effects to another region

If you use a global application with HTTP(S) Load Balancing and if you have backends deployed in multiple regions, you’ll get automatic traffic overflow to another region when capacity is overloaded in a single region. Let’s test this by adding a second VM instance group in another region.

Creating servers in multiple regions

Let’s add another group of backends in another region and assign a capacity of 10 RPS per region. You can then see how load balancing reacts when this limit is exceeded.

In Cloud Shell, choose a zone in a region different than your default zone and set it as an environment variable:

export ZONE2=us-east1-b

2. Create a new instance group in the second region with 3 VM instances:

gcloud compute instance-groups managed create webserver-region2 \--size=3 --template=webservers --zone $ZONE2

3. Add the instance group to the existing backend service with a maximum capacity of 10 RPS:

gcloud compute backend-services add-backend web-service \--global --instance-group=webserver-region2 \--instance-group-zone $ZONE2 --max-rate 10

4. Adjust the max-rate to 10 RPS for the existing backend service:

gcloud compute backend-services update-backend web-service \--global --instance-group=webserver-region1 \--instance-group-zone $ZONE --max-rate 10

Test the multi-regional load balancer with a load test

After all instances boot up, SSH in to the loadtest VM instance.
Run 500 requests at 10 RPS. Replace [IP_address] with the IP address of the load balancer:

httperf --server [IP_address] --num-conns 500 --rate 10 2>&1| grep 'ion time'

3. You see results like the following:

Connection time [ms]: min 405.9 avg 584.7 max 1390.4 median 531.5 stddev181.3Connection time [ms]: connect 1.1

The results are similar to those produced by the regional load balancer.

4. Because your testing tool immediately runs a full load and doesn’t slowly increase the load like a real-world implementation, you have to repeat the test a couple of times for the overflow mechanism to take effect. Run 500 requests 5 times at 20 RPS. Replace [IP_address] with the IP address of the load balancer.

for a in \`seq 1 5\`; do httperf --server [IP_address] \--num-conns 500 --rate 20 2>&1| grep 'ion time' ; done

You see results like the following:

Connection time [ms]: min 426.7 avg 6396.8 max 13615.1 median 7351.5 stddev3226.8Connection time [ms]: connect 0.9Connection time [ms]: min 417.2 avg 3782.9 max 7979.5 median 3623.5 stddev2479.8Connection time [ms]: connect 0.9Connection time [ms]: min 411.6 avg 860.0 max 3971.2 median 705.5 stddev 492.9Connection time [ms]: connect 0.7Connection time [ms]: min 407.3 avg 700.8 max 1927.8 median 667.5 stddev 232.1Connection time [ms]: connect 0.7Connection time [ms]: min 410.8 avg 701.8 max 1612.3 median 669.5 stddev 209.0Connection time [ms]: connect 0.8

After the system stabilizes, the average response time is 400 ms at 10 RPS, and only increases to 700 ms at 20 RPS. This is a huge improvement over the 5000 ms delay offered by a regional load balancer, and results in a much better user experience.

The following graph shows the measured response time by RPS using global load balancing:

Comparing results of regional vs. global load balancing

You just explored the latency effects of backends over capacity when using regional load balancing vs. global load balancing!

Regional load-balancing solutions become overloaded when traffic increases past capacity, since traffic can’t flow anywhere but to the overloaded backend VM instances. This fits the mold of traditional on-premises load balancers, network load balancing on GCP, and HTTP(S) Load Balancing in a single region. The average latency increases by more than a factor of 10!

Meanwhile Global HTTP(S) Load Balancing with backends in multiple regions allows traffic to overflow to the closest region that has available serving capacity. This leads to a measurable but comparatively low increase in end-user latency, and provides a much better user experience. If your application can’t scale out in a region quickly enough, global HTTP(S) Load Balancing is the best option since traffic can be redirected to other regions, helping to avoid a full service outage.

Congrats! You just compared the performance of regional vs. global HTTP load balancing!

For more about this recipe, and to walk through the full tutorial, check out this solution.

Next steps and references:

Follow this blog series on Google Cloud Platform Medium.
Reference: Capacity Management with Load Balancing
Follow Get Cooking in Cloud video series and subscribe to Google cloud platform YouTube channel
Want more stories? Follow me on Medium, and on Twitter.
Enjoy the ride with us through this miniseries and learn more about more such Google Cloud solutions :)