[GCP] Put your staging ENV into sleep mode during non-working hours
Objectives
One of the key benefits from Cloud Service, is the ease of controlling your bill, and it’s a natural action to turn off all servers/resources of lower environment during non-office hours, e.g. in DEV UAT environment, turn on the so-called ‘sleep mode’ to save money, resource and the earth
The implementation in AWS was packaged via CloudFormation with the program called Instance Scheduler: https://docs.aws.amazon.com/solutions/latest/instance-scheduler/welcome.html, which is more matured and easy-to-use
The problem for GCP
Although there is no such out-of-box component in GCP yet, there is a GCP Scheduler in beta version
Following the instruction in this tutorial, it’s not that difficult to build create some scheduler job and control all VMs only run during workday 9 to 6
However, there are some limitations in this solution: it accepts 1 instance per scheduler, and you need to create both start/stop scheduler per instance
The tricky part is: Google will charge you $5 per scheduler every month if you have more than 3 scheduler jobs, and you have to maintain a lot of schedulers for each resource.
This article is trying to address this problem and enhance the scripts provided in that tutorial and manage all instances with 2 scheduler only
The following components will be used to achieve it:
- GCloud command-line tools
- Cloud Function
- Cloud Pub/Sub
- Cloud Scheduler
- GCP Compute Engine
- Bash shell
The source code is available in: https://github.com/CyanZero/gcp-vms-sleep
Step 1: Create VM instances
Assuming 2 instances are created: instance-a, instance-b
Step 2: Set up Cloud Pub/Sub
gcloud pubsub topics create start-instance-event
gcloud pubsub topics create stop-instance-event
Step 3: Set up Cloud Fuctions
git clone https://github.com/CyanZero/gcp-vms-sleep
There are 2 new functions exported from index.js: startInstancePubSubMul and stopInstancePubSubMul, and run the following GCloud API command to create 2 Cloud Functions:
# Notice the limited support for the Cloud Function, and the region used here is asia-northeast1gcloud functions deploy startInstancePubSubMul \
--trigger-topic start-instance-event \
--runtime nodejs6 \
--region asia-northeast1gcloud functions deploy stopInstancePubSubMul \
--trigger-topic start-instance-event \
--runtime nodejs6 \
--region asia-northeast1
Step 4: Verify the functions
# The zone value depends on your actual settingsdata=$(echo ‘{ “zone”: “YOUR_ZONE”, “instances”: [“instance-a”, “instance-b”] }’ | base64)# Notice the limited support for the cloud function, and the region used here is asia-northeast1gcloud functions call stopInstancePubSub \
— region asia-northeast1 \
— project YOUR_PROJECT_NAME \
— data ‘{“data”:$data}’# Check if the instances are in terminated statusgcloud compute instances describe instance-a \
--zone YOUR_ZONE \
| grep statusstatus: TERMINATED
Great! So far semi-auto scripts are ready, you’re able to start/stop any GCE VMs using this piece of shell scripts
Step 5: Set up the Cloud Scheduler jobs to call Cloud Pub/Sub
# For a one-time scheduler setup, just trigger the following 2 GCloud API commandgcloud beta scheduler jobs create pubsub startup-workday-instance \
--schedule '0 9 * * 1-5' \
--topic start-instance-event \
--message-body '{ "zone": "YOUR_ZONE", "instances": ["instance-a", "instance-b"] }' \
--time-zone 'YOUR_TIMEZONE'gcloud beta scheduler jobs create pubsub shutdown-workday-instance \
--schedule '0 18 * * 1-5' \
--topic stop-instance-event \
--message-body '{ "zone": "YOUR_ZONE", "instances": ["instance-a", "instance-b"] }' \
--time-zone 'YOUR_TIMEZONE'
It’s quite common that there are servers to be created/destroyed in a testing ENV frequently, so we need to develop a little bash script which allows update/remove the list of instances to reduce repeat work, below is a snippet of code function
Note: There are limited regions that support Cloud Scheduler and need to set up for the first run, so go to GCP console → Cloud Scheduler to select a region first
start_job=$(gcloud beta scheduler jobs describe "startup-instance-$1" --quiet|grep name|awk '{split($0,a,"/");print a[6]}')if [ "startup-instance-$1"=="$start_job" ]thengcloud beta scheduler jobs delete "startup-instance-$1"figcloud beta scheduler jobs create pubsub "startup-instance-$1" \--schedule "$start_cron" \--topic start-instance-event \--message-body "${2}" \--time-zone 'YOUR_TIMEZONE'
The simple Bash scripts basically check if there is existing scheduler then delete and create a new one with the given parameter (zone and instances)
With this script, it’s flexible to re-run this scripts to remove/create the same scheduler when different servers are excluded/included in this workday running server group, for example:
bash mul_sleep_scheduler.sh dev "YOUR_ZONE" "instance-a instance-b"
By passing in the ENV name and GCP zone with the instances should be in sleep mode, the job is done!
Conclusion
Now you will be able to put all Vms of your testing environment into sleep mode during non-office hour, and it would save up to 77% cost by turning off them using these 2 schedulers
In the other hand, GCP is still young comparing to AWS, esp. for certain features, like Cloud Functions
It won’t be a surprise if there is another better way to maintain sleep mode a few months later, as the current Scheduler is still in Beta version, and sometimes it even pops up a confusing error like “You need to create an app engine to continue”
Do let me know if you have any comment on this