Globally Autoscaling Web Services with Health Checks

Season of Scale

Season of Scale

“Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.

In Season 1, we’re covering Infrastructure Automation and High Availability:

  1. Patterns for scalable and resilient applications
  2. Infrastructure as code
  3. Immutable infrastructure
  4. Where to scale your workloads
  5. Globally autoscaling web services (this article)
  6. High Availability (Autohealing & Auto updates)

In this article I’ll walk you through how to globally scale your web services on Google Cloud.

Check out the video


So far we’ve looked at how Critter Junction was able to launch a new app on Google Cloud. We covered the various compute options Google Cloud has to offer — some including powerful autoscaling capabilities. It really just depends on your language requirements, level of control, access to the OS, and other application characteristics like containerization. Today let’s take a look at how they can enable their apps to gracefully handle peaks and dips in traffic.

Preparation is everything

Critter Junction is becoming very popular with more users than ever. The game is all about playing daily, collecting items and furniture to decorate your house, and interacting with other players.

As we saw in the previous article they chose to run their Layout App on Cloud Run. But, they still chose to migrate some game servers to Compute Engine. As their traffic grew, they were struggling to provision additional instances globally at any given time of the day. This led to overutilized compute and created constant pressure on their operations team. So now they’re looking for an automated way to handle their growing users and maintain performance to keep their users coming back daily. In other words, how can they set up autoscaling instances that check for unhealthy instances and replace them when needed?

Global Load Balancer

The answer is Google Cloud’s global load balancer and managed instance groups to scale and distribute the traffic automatically. This keeps operations team happy and the users satisfied with performance advantages.

Managed instance groups provide features such as autoscaling, autohealing, auto-updating and regional (multiple zone) deployments. To understand this better, let’s step back and understand how a Compute Engine instance is created.

Instance creation

You create a custom image for your application which is then used to create an instance. To make this reusable, you create an instance template. With an instance template, not only can you set up configuration of the VM, but you can also run startup scripts to pull down the latest version of your code when the machine starts up. You can also attach disk templates with all the software dependencies your app requires, or you leave it as an empty shell that gets populated by a CI/CD pipeline. These templates then automate the creation of the Compute Engine instances at scale through managed instance groups.

MIG + Health Check Walkthrough

Let’s see how this works with a simple web app example!

Create firewall rules

  1. In the Google Cloud console, create a firewall rule under VPC networks with the following attributes:
  • Allow HTTP traffic to the app you’re about to deploy.
  • Provide a name: default-allow-http
  • Select a default network.
  • For Targets, select the specified target tags.
  • Set the target tag as http-server.
  • Set the source filter to IP ranges and provide to allow access for all IP addresses.
  • For ports and protocols select TCP and enter 80.
  • Now click Create.

Create an instance template

  1. Head over to Compute Engine and create an instance template with the following attributes:
  • Give it a name: instance-template.
  • Select machine type.
  • Set boot disk image to Debian9.
  • Check Allow HTTP traffic.
  • Under management tab find Automation and add the startup script.
sudo apt update && sudo apt -y install git gunicorn3 python3-pip
git clone
cd python-docs-samples/compute/managed-instances/demo
sudo pip3 install -r requirements.txt
sudo gunicorn3 --bind app:app --daemon

This script causes each instance to run a simple web application during startup.

2. Finally click create.

Create and instance using your new template

Now that you have an instance template, you can create an instance group using this template.

  1. Create an instance group on the Compute Engine instance groups page with the following attributes:
  • Give it a name.
  • Under location, select multiple zones. This protects you from zonal failures.
  • Select a region and under instance template select the template you just created.
  • Now set autoscaling mode to Autoscale.
  • Set the Autoscaling policy to CPU utilization. You can also set policy to HTTP load balancing or monitoring metrics.
  • Set the target CPU usage to 60%.
  • Set the minimum number of instances to 3

It’s recommended that you provision enough instances so that if an entire zone was to go down the remaining instances still meet the minimum number required.

  • Set the max number of instances to 6 to make sure you don’t incur additional cost.
  • We will set the cool down period to 120 seconds.

Make sure this number is higher than the time it takes for CPU utilization of the VM to initially stabilize.

  • Skip setting a health check for now but we’ll cover that in the next article.

2. And click create, then wait for few minutes until all the instances are running

3. Then go to VM instances and click on the external IP of the instance to see the demo web app page.

Traffic load-test

Now that we have it all set up, let’s generate traffic so we can see the autoscaling in action.

  1. Open Cloud Shell.

2. Create a local bash variable using the export PROJECT_ID command.

3. Run this bash script below.

export MACHINES=$(gcloud --project=$PROJECT_ID compute instances list --format="csv(name,networkInterfaces[0].accessConfigs[0].natIP)" | grep "autoscaling-web-app-group")
for i in $MACHINES;
NAME=$(echo "$i" | cut -f1 -d,)
IP=$(echo "$i" | cut -f2 -d,)
echo "Simulating high load for instance $NAME"
curl -q -s "http://$IP/startLoad" >/dev/null --retry 2

This script increases load which leads to increase in CPU utilization for our demo app. When it reaches the target value of 60% the autoscaling starts increasing the size of our instance group.

4. Now navigate to your monitoring tab in our instance group and you can see the increasing number of instances as the CPU usage is increasing.

You should be able to see the scale down effect by running a similar bash script that decreases the load leading to decrease in CPU utilization. And after a few minutes of stabilization period, the autoscaler prompts to decrease the instance group size which is visible in the monitoring tab.

Use load balancers at each layer

Critter junction has global users. They want the users from Singapore to end up on Asia east web server while the ones in US end up in the US central region.

For this they would use global load balancing which routes traffic to the nearest web server instance. which helps reduce latency and improve performance. From there the internal load balancer distributes the traffic to manage and maintain load across the backend.

These instance groups in different regions autoscale using an HTTP load balancing policy to scale seamlessly regardless of where the traffic is coming from.

Launch day success

Not only was Critter Junction able to automate the scaling of their Compute instances using autoscaling and managed instance groups, they were also able to improve performance by serving traffic from instances closest to their users using the Global Load Balancer.

But there’s one more step to scaling — identifying the instances that are unhealthy, and replacing them automatically! So stay tuned for the next episode where we will cover how Critter junction can set up Autohealing and keep their users happy.

And remember, always be architecting.

Next steps and references:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Stephanie Wong

Stephanie Wong


Google Cloud Developer Advocate and producer of awesome online content. Creator of the series, GCP Networking End-to-End; host of Google’s Next onAir. @swongful