Scaling Web Apps on Google Compute Engine

Get Cooking in Cloud

Priyanka Vergadia
Google Cloud - Community
5 min readOct 8, 2019

--

Introduction

In this mini series we are covering, how to create websites on Google Cloud. This is the fifth article in the series.

  1. Hosting web apps on Google Cloud: An Overview
  2. Hosting a web app on Google Cloud using Google Cloud Storage
  3. Hosting a web app on Google Cloud using Cloud Run
  4. 5 steps to deploy website using Google Compute Engine
  5. Scaling Web app on Google Compute Engine (This article)
  6. Case Study

In previous article we looked at a sample architecture to deploy a web application on Google Cloud using Google Compute Engine for application and web servers. When your website become popular and the users grow from one to one million, these instances autoscale as the requests increase or decrease by using instance templates. In this article let’s dive deeper into scaling and learn how it works.

Sample Architecture to host a website using Google Compute Engine

What you’ll learn

  • Create a website using managed instance groups in five simple steps.

Prerequisites

Check out the video

Scaling Web application on Google Compute Engine

What is Instance Template?

Instance templates are designed to create instances with identical configurations.

An instance template is a resource that you can use to create VM instances and managed instance groups. Instance templates define the machine type, boot disk image or container image, labels, and other instance properties. You can then use an instance template to create a managed instance group or to create individual VM instances. Instance templates are a convenient way to save a VM instance’s configuration so you can use it later to create new VM instances or groups of VM instances. You can learn more about instance templates here.

What is Managed Instance Group?

A managed instance group (MIG) contains identical instances that are based on an instance template. Managed instance groups maintain high availability of your apps by proactively keeping your instances available, that is, in the RUNNING state. Managed instance groups support autohealing, load balancing, autoscaling, and auto-updating.

Autoscaling Policies

Autoscaling policies, provides a way to add or remove instances as needed.

The impact of the autoscaling policies is two- fold:

  • Your users get a great experience using your application because there are always enough resources to meet demand.
  • You maintain better control over your costs because the autoscaler removes instances when demand falls below a specified threshold.

To create an autoscaler, you must specify the autoscaling policy and a target utilization level that the autoscaler uses to determine when to scale the group.

You can choose to scale using the following, which can be based on either utilization or requests per second:

  • Average CPU utilization
  • Stackdriver Monitoring metrics
  • HTTP load balancing serving capacity
  • Network load balancing

The autoscaler continuously collects usage information based on the policy, compares actual utilization to your desired target utilization, and determines if the group needs to be scaled up or down.

Autoscaling based on CPU Utilization

For example, if you scale based on CPU Utilization, you can set your target utilization level at 80% and the autoscaler will poll for the CPU utilization and check if it is > 80%, if so it adds the new instance to the instance group, if CPU utilization < 80 then it removes an instance form the group, making sure the capacity is maintained.

Autoscaling + Load Balancing

You can use autoscaling in conjunction with load balancing by setting up an autoscaler that scales based on the load of your instances.

For example, assume the load balancing serving capacity of a managed instance group is defined as 100 RPS per instance. If you create an autoscaler with the HTTP(S) load balancing policy and set it to maintain a target utilization level of 0.8 or 80%, the autoscaler will add or remove instances from the managed instance group to maintain 80% of the serving capacity, or 80 RPS per instance.

The architecture needs to automatically replace instances that have failed or have become unavailable. And when the new instance comes online it should:

  • Understand its role in the system
  • Configure itself automatically
  • Discover any of the dependencies
  • Start handling requests automatically
Automatically replacing instances

To replace a failed instance automatically, we can use several compute engine components together. You could create instance templates that use a public image and a startup script to prepare the instance after it starts running. But, we recommend that you use deterministic instance templates , which minimize risk and unexpected behavior from your instance templates.

Thanks to Managed Instance Groups, now we have a system that can replace unhealthy instances with new ones.

But, we still have a challenge, how are we going to know which instance to replace?

Health Checks

In order to know which instance to replace, we need to define what is an unhealthy instance, and to do that, we use health checks! It is recommend that you use separate health checks for load balancing and for autohealing. Autohealing health checks are set up at the managed instance group level.

You create a health check that looks for a response on port 80 and that can tolerate some failure before it marks instances as unhealthy and causes them to be recreated.

In this example, an instance is marked as healthy if it returns successfully two times. It is marked as unhealthy if it returns unsuccessfully 3 consecutive times.

Conclusion

You have learned some tricks to make a web architecture resilient by spinning up new instances and taking them down if computer resources fail or your traffic grows!

Next steps

--

--

Priyanka Vergadia
Google Cloud - Community

Developer Advocate @Google, Artist & Traveler! Twitter @pvergadia