Scaling Google App Engine to No Instances (or maybe just 1)
Lightly used applications do not need to be running 100% of the time and you shouldn’t be paying for it.
If you deploy applications on GAE you will start to notice that you are spending money for two instances instead of just one when your application *is* running. This how-to will walk through scaling back to a single instance for applications in flexible environments that do not need to be up and running for 100% of the time or have strict reliability/latency constraints.
Environment: Standard vs. Flexible
Before we look at how services are configured or even the cost data we need to make sure we know the difference between the two environment types offered by GAE. The documentation here outlines the differences. There are several key differences related to cost.
Flexible: In a flexible environment you are paying for usage of vCPU, memory, and persistent disks. It is generally more cost effective if you have regular traffic patterns that require scaling up and down gradually.
Standard: In a standard environment you are paying for only what you need (e.g. instance hours) and can scale to 0 instances when there is no traffic. It is generally more cost effective for small applications that do not take traffic all of the time.
With this understanding in mind, you can take a look at your applications’ app.yaml files to see which environment they are configured to use. If there is no env key specified, it is deployed to the standard environment.
Scaling Down Flexible Environments
If you need to run a Flexible environment for technical reasons or choices such as the ability to run Node.js or Ruby, or because you require SSH debugging (do you, really?) or the ability to have Background processes, you may still be able to configure the scaling parameters of your service.
If you have a small application that does not have any scaling or resource parameters specified that is a good place to start.
The first parameter that we will look at tuning is the automatic_scaling parameter. Since our example has a runtime of nodejs, we will continue with that.
There are two documents to review: 1) An Overview of App Engine, and 2) Configuring your App with app.yaml. You may already have these pages open or have read them previously but they are good references.
If your application does not require redundancy or high availability, you can actually scale it down to a single instance. By default GAE will deploy 2 instances for latency and redundancy/reliability purposes.
To be able to understand how you can set the parameters such as max_num_instances (from the second document above) you will need to review the instance metrics for your application with the understanding that if you scale below 2 instances (the default) you will take a hit on latency and redundancy/reliability.
In the Google Cloud Platform console, under App Engine, under Instances, select your service. In the drop-down below your service, you will be able to select several metrics to review:
Set the time period to something a bit longer, maybe 14 or 30 days and browse through the metrics.
In my example application that I will likely be scaling down to a single instance (since it doesn’t require low latency or strict reliability) I took a look at the Summary, Traffic, VM Traffic, CPU Utilization, Instances, Memory Usage, and Disk bytes.
This is a relatively small application as can be seen with these metrics. We are not auto-scaling beyond the 2 instance default and we are not pushing the CPUs beyond 10% utilization.
With this in mind I am pretty comfortable scaling down the number of instances below the default of 2 to a single instance. For this particular application, I don’t have any latency or redundancy/reliability concerns.
What we can also use these metrics for is setting the resource sizing that we would like for the application. By setting the appropriate resource sizing we will be matched to the appropriate machine type. The documentation for this is a little bit of scroll up in the previously linked document (or here).
There are three parameters that we will focus on setting: cpu, memory_gb, and disk_size_gb. The other parameters for volumes we don’t need to focus on in this example because we are not mounting any volumes.
By default, you get 1 cpu core and we will set it as such since we are barely hitting 10% CPU Utilization.
By default, you get 0.6 GB of memory. There is a formula in the linked documentation that shows you need ~0.4GB for overheard and a minimum total of 0.9GB. The formula from their documentation is memory_gb = cpu * [0.9–6.5] — 0.4. This means the absolute minimum we can set is 0.5 GB for a single core and that is what we will do.
By default, you get 10 GB of disk and since that is the minimum we will set that as such.
Your configuration for the smallest GAE resource setting possible would look something like this:
Now that we have set the resources and scaling parameters we can redeploy the application (assuming familiarity with that).
Assuming your application redeployed successfully you can start to review the various metrics in the console and command line. The first that I would recommend is to confirm the configuration that you have set since you have changed fairly significant parameters. This can be done with the app versions describe command as follows:
Notice the automaticScaling and resources key/values. These should match what you have set.
In the GCP console you should be able to view the App Engine > Dashboard reduction from 2 instances to 1 instance:
Also under the Versions and Instances tabs it should show a single instance as well:
Congrats
You probably just cut your GAE bill in half if you scaled down the number of instances in your Flexible Environment from two to one. Take a break, grab some coffee, celebrate!
Next post: How-to analyze your GCP Bills and save bazillions through visibility.