Auto-scaling from zero machine on Google Compute Engine

Photo by Samuel Zeller on Unsplash

Auto-Scaling Instance Groups

An auto-scaling group (ASG) allows you, as the name implies, regroup some machines of the same type under a same autoscaling policy.

Usually, you can choose to scale according to CPU usage or memory usage. You can generally use a monitoring tool from your cloud provider that lets you create a custom metric and scale up or down according to it.

On Google Compute Engine (GCE), you don’t directly have the notion of ASG like in AWS, but instead we can regroup our machines in an Instance Group (IG), to which we can attach an autoscaler. However with this autoscaler we can’t automatically scale down completely an instance group (i.e. we can’t have zero machine in a group). There’s a logic behind this: if you scale down to 0 machine, you won’t ever receive any metric that will trigger a scale-up.

In our case, many of our machines work at some specific time of the day, usually 8 to 10 hours straight, and won’t have anything else to do for the rest of the day. An example of scaling policy execution for one of our instance groups is as follow:

For up to 2/3rd of the day, we have a single machine that does nothing.

Auto-scaling down to zero machine

That would be great if we could avoid paying for this useless machine. It becomes even more important when the group is dedicated to resource-intensive tasks (e.g. machine learning) as they’re more expensive.

We’ll see that it is actually possible to have zero machine and still have an autoscaling system, but there’s one requirement: you need to have your own way of knowing when it’s time to scale up again. Let’s see how we did this.

At Adenlab we’re using a taskqueue (MRQ) for our recurring tasks. MRQ lets us dispatch our tasks to specific worker groups, depending on the queue and/or on the task itself. Each worker group is bound to an IG, so in our case we have a clear metric: for a given IG, do we have tasks waiting to be dequeued?

With that in mind we can use the following IG structure:

  • One tiny (read: cheap) machine that will always be up (no auto scaling)
  • One or more IGs with no autoscaling policy

By creating an IG with no autoscaling policy, GCE will effectively let us scale to zero if we want to. On our tiny machine we’ll have a script running every 5 minutes, which will check for each MRQ worker group if there is any task queued. If that is the case, the script will create an autoscaling policy for the corresponding Instance Group and scale it to 1. The autoscaler will then scale more if needed, according to whatever metric we configured (in our case, CPU usage).

Let’s write an MRQ task for that script. The imports we need:

from mrq.task import Task
from google.oauth2 import service_account
from googleapiclient import discovery
import time
import re

First we need to configure a few settings related to GCP:

class Scale(Task):
    project_name = "name of your GCP project"
zone = "europe-west1-c" # adapt your zone here
service_account_path = "service_account_credentials.json"
    # Groups that are not concerned by this task
# If your tiny machine is part of an IG, add its name here
groups_to_skip = ("group1",)

Add to groups_to_skip the worker groups that should not be autoscaled: at least add the group of the machine that will run this task.

Now let’s see the task entry point:

def run(self, params):
    credentials = service_account.Credentials \              

self.service =
    # We need to have a way to know what we want our different
# autoscaling policies to be.
# We could store them in a DB and fetch them here,
# so that it is shared with our Ansible playbooks for instance.
# For simplicity here we'll just hardcode them here:
self.autoscaler_configs = {
"group2": {
"min_replicas": 1,
"max_replicas": 10,
"cooldown": 180,
"cpu_target": 0.80
"group3": {
"min_replicas": 1,
"max_replicas": 8,
"cooldown": 180,
"cpu_target": 0.90
    # First we need to fetch existing IGs
    # Next we want to know which groups currently have an autoscaler
    # Check each groups and see if we should scale them up

Here is how to fetch IGs infos:

def fetch_groups(self):
self.groups = {}
request = self.service.instanceGroupManagers() \
    while request is not None:
response = request.execute()
       for ig in response['items']:
group_name = ig["baseInstanceName"]
self.groups[group_name] = {
"base_name": group_name,
"name": ig["name"],
"size": ig["targetSize"],
"link": ig["selfLink"]
       request = self.service.instanceGroupManagers() \

We populate self.groups with data about IGs. For more infos about the IG response structure, see here.

Now let’s look at the code that we’ll use to create an autoscaler for a given worker group name:

def create_autoscaler(self, group):
autoscaler_config = self.autoscaler_configs[group]
config = {
"target": self.groups[group]["link"],
"name": "%s-as" % group,
"autoscalingPolicy": {
"minNumReplicas": autoscaler_config["min_replicas"],
"maxNumReplicas": autoscaler_config["max_replicas"],
"coolDownPeriodSec": autoscaler_config["cooldown"],
"cpuUtilization": {
"utilizationTarget": autoscaler_config["cpu_target"]
    operation = self.service.autoscalers().insert(

You can find the code of wait_for_operation in this example repository.

The last thing we need is a method to scale up an IG:

def scale_up(self, group, size=1):
    if self.groups[group]["size"] > 0:
# Already scaled up
    # Make sure we have an autoscaler
if not self.autoscalers.get(group):
    operation = self.service.instanceGroupManagers().resize(

Finally, the actual logic is pretty straight forward:

class ScaleUp(Scale):
    def check_groups(self):
        # Now we have everything we need for the actual task logic:
for group in self.groups:
if group in self.groups_to_skip:
            if self.should_scale_up(group):

should_scale_up is where your scaling logic should be. It is not provided here, but remember that in our case it checks whether we have queued tasks or not.

This task is scheduled every 5 minutes. This is handy for our use case because even when no task is queued, a team member may trigger an action that will create a new task. We don’t want to wait too much time until the task is actually started. Of course for user actions that create tasks and that are supposed to get quick responses, we have a machine dedicated that is never controlled by an autoscaling policy. In most cases though you should avoid needing asynchrone tasks for user interactions that need feedback.

So now we can scale up when needed and GCE will take it from here. But we also need a way to scale back to zero! For this we have a second task, scheduled every 30 minutes, which will run the same check as the first task, but will instead delete the autoscaler and scale down if there’s no task running or queued.

We need a few more methods to scale down:

class ScaleDown(Scale):
    def delete_autoscaler(self, group):
        autoscaler = self.autoscalers[group]
        operation = self.service.autoscalers().delete(
    def scale_down(self, group):
        if self.groups[group]["size"] == 0:
# Already scaled down
        # Delete the autoscaler so that we can scale to zero machine
if self.autoscalers.get(group):
        operation = self.service.instanceGroupManagers().resize(
    def check_groups(self):
        for group in self.groups:
if group in self.groups_to_skip:
            if self.should_scale_down(group):

When we know we have no more task to dequeue, we delete the autoscaler and scale the remaining machine down. Again, should_scale_down should implement your own logic.

The upside of this approach is that we can have multiple criteria for scaling up or down. For instance, to avoid scaling up and down multiple times in a row, we can also scale down only after a certain amount of time was spent without any task queued.


We’ve seen that by deleting an autoscaler we can remove all machines of an instance group. We created an autoscaling task that uses a custom logic to know if a specific group should be up (1 machine) or down (0 machine). The drawback is that we must have a separate machine, always up, that is ready to dequeue our autoscaling task.