[GCP] Put your staging ENV into sleep mode during non-working hours

4 min readMar 8, 2019

Objectives

One of the key benefits from Cloud Service, is the ease of controlling your bill, and it’s a natural action to turn off all servers/resources of lower environment during non-office hours, e.g. in DEV UAT environment, turn on the so-called ‘sleep mode’ to save money, resource and the earth

The implementation in AWS was packaged via CloudFormation with the program called Instance Scheduler: https://docs.aws.amazon.com/solutions/latest/instance-scheduler/welcome.html, which is more matured and easy-to-use

The problem for GCP

Although there is no such out-of-box component in GCP yet, there is a GCP Scheduler in beta version

Following the instruction in this tutorial, it’s not that difficult to build create some scheduler job and control all VMs only run during workday 9 to 6

However, there are some limitations in this solution: it accepts 1 instance per scheduler, and you need to create both start/stop scheduler per instance

The tricky part is: Google will charge you $5 per scheduler every month if you have more than 3 scheduler jobs, and you have to maintain a lot of schedulers for each resource.

This article is trying to address this problem and enhance the scripts provided in that tutorial and manage all instances with 2 scheduler only

The following components will be used to achieve it:

GCloud command-line tools
Cloud Function
Cloud Pub/Sub
Cloud Scheduler
GCP Compute Engine
Bash shell

The source code is available in: https://github.com/CyanZero/gcp-vms-sleep

Step 1: Create VM instances

Assuming 2 instances are created: instance-a, instance-b

Step 2: Set up Cloud Pub/Sub

gcloud pubsub topics create start-instance-event
gcloud pubsub topics create stop-instance-event

Step 3: Set up Cloud Fuctions

git clone https://github.com/CyanZero/gcp-vms-sleep

There are 2 new functions exported from index.js: startInstancePubSubMul and stopInstancePubSubMul, and run the following GCloud API command to create 2 Cloud Functions:

# Notice the limited support for the Cloud Function, and the region used here is asia-northeast1gcloud functions deploy startInstancePubSubMul \
    --trigger-topic start-instance-event \
    --runtime nodejs6 \
    --region asia-northeast1gcloud functions deploy stopInstancePubSubMul \
    --trigger-topic start-instance-event \
    --runtime nodejs6 \
    --region asia-northeast1

Step 4: Verify the functions

# The zone value depends on your actual settingsdata=$(echo ‘{ “zone”: “YOUR_ZONE”, “instances”: [“instance-a”, “instance-b”] }’ | base64)# Notice the limited support for the cloud function, and the region used here is asia-northeast1gcloud functions call stopInstancePubSub \
 — region asia-northeast1 \
 — project YOUR_PROJECT_NAME \
 — data ‘{“data”:$data}’# Check if the instances are in terminated statusgcloud compute instances describe instance-a \
    --zone YOUR_ZONE \
    | grep statusstatus: TERMINATED

Great! So far semi-auto scripts are ready, you’re able to start/stop any GCE VMs using this piece of shell scripts

Step 5: Set up the Cloud Scheduler jobs to call Cloud Pub/Sub

# For a one-time scheduler setup, just trigger the following 2 GCloud API commandgcloud beta scheduler jobs create pubsub startup-workday-instance \
    --schedule '0 9 * * 1-5' \
    --topic start-instance-event \
    --message-body '{ "zone": "YOUR_ZONE", "instances": ["instance-a", "instance-b"] }' \
    --time-zone 'YOUR_TIMEZONE'gcloud beta scheduler jobs create pubsub shutdown-workday-instance \
    --schedule '0 18 * * 1-5' \
    --topic stop-instance-event \
    --message-body '{ "zone": "YOUR_ZONE", "instances": ["instance-a", "instance-b"] }' \
    --time-zone 'YOUR_TIMEZONE'

It’s quite common that there are servers to be created/destroyed in a testing ENV frequently, so we need to develop a little bash script which allows update/remove the list of instances to reduce repeat work, below is a snippet of code function

Note: There are limited regions that support Cloud Scheduler and need to set up for the first run, so go to GCP console → Cloud Scheduler to select a region first

start_job=$(gcloud beta scheduler jobs describe "startup-instance-$1" --quiet|grep name|awk '{split($0,a,"/");print a[6]}')if [ "startup-instance-$1"=="$start_job" ]thengcloud beta scheduler jobs delete "startup-instance-$1"figcloud beta scheduler jobs create pubsub "startup-instance-$1" \--schedule "$start_cron" \--topic start-instance-event \--message-body "${2}" \--time-zone 'YOUR_TIMEZONE'

The simple Bash scripts basically check if there is existing scheduler then delete and create a new one with the given parameter (zone and instances)

With this script, it’s flexible to re-run this scripts to remove/create the same scheduler when different servers are excluded/included in this workday running server group, for example:

bash mul_sleep_scheduler.sh dev "YOUR_ZONE" "instance-a instance-b"

By passing in the ENV name and GCP zone with the instances should be in sleep mode, the job is done!

Conclusion

Now you will be able to put all Vms of your testing environment into sleep mode during non-office hour, and it would save up to 77% cost by turning off them using these 2 schedulers

In the other hand, GCP is still young comparing to AWS, esp. for certain features, like Cloud Functions

It won’t be a surprise if there is another better way to maintain sleep mode a few months later, as the current Scheduler is still in Beta version, and sometimes it even pops up a confusing error like “You need to create an app engine to continue”

Do let me know if you have any comment on this