GCP Cost Efficiency: Scheduled Cloud Functions for Infra Start/Stop

Published in

Google Cloud - Community

8 min readJan 2, 2024

start GCP infra via scheduled cloud function

stop GCP infra via scheduled cloud function

In this blog, embark on an insightful journey into streamlining your cloud operations through the finesse of automated infrastructure management. Dive deep into harnessing the capabilities of Cloud Scheduler and Cloud Functions to choreograph the scheduled shutdown and startup of provisioned infrastructure. Uncover the strategic advantages of this automation, witnessing how it optimizes your cloud resources while significantly reducing operational costs. Experience the efficiency and cost-effectiveness it brings to lower environments within your cloud ecosystem.

Execution steps:

Create a service account with limited access.
Prepare Python code for each cloud function.
Build two Cloud Functions: one for starting GCP services and another for stopping them.
Create two Cloud Scheduler jobs to trigger these Cloud Functions.

Cloud Function is a serverless compute service provided by cloud platforms like Google Cloud Platform (GCP), allowing developers to run code in response to various events without managing server infrastructure. It’s designed for executing individual functions or tasks triggered by events such as HTTP requests, messages in a Pub/Sub topic, changes in Cloud Storage, or scheduled by Cloud Scheduler. This serverless architecture enables quick development, scalability, and cost-efficiency by charging only for the resources used during execution.

Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.

1. Create a service account with limited access

Create a restricted service account to manage infrastructure tasks. Begin by generating the service account:

gcloud iam service-accounts create sample-sa \
  --display-name "sample service account for infra automation"

Ensure the service account has necessary permissions for infrastructure management, including starting and stopping GCP services, and the cloud function invoker role:

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member serviceAccount:sample-sa@${PROJECT_ID}.iam.gserviceaccount.com \
  --role roles/cloudfunctions.invoker \
  --role roles/iam.serviceAccountUser \
  --role roles/compute.admin \
  --role roles/container.admin \
  --role roles/secretmanager.secretAccessor

2. Prepare Python code for each cloud function

Use the ‘stop_all_services.py’ script to halt essential GCP services, including VM instances and Cloud SQL instances within your project. In non-production environments like sandbox or testing, while it’s not possible to directly stop GKE clusters, the script efficiently reduces the node pool size to 0 every evening for optimal resource utilization. Run the provided code for seamless management of these services.

stop_all_services.py file content:

import functions_framework  # Import the Functions Framework

# Imports for Cloud APIs
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
from google.cloud import container_v1beta1
from google.cloud import compute_v1

# Function to stop all database instances
def stop_all_db_instances(project):
    # Obtain default application credentials
    credentials = GoogleCredentials.get_application_default()

    # Build service for Cloud SQL Admin API
    service = discovery.build('sqladmin', 'v1beta4', credentials=credentials)

    # List all database instances in the project
    request = service.instances().list(project=project)
    response = request.execute()

    if 'items' in response:
        instances = response['items']
        for database_instance in instances:
            instance_name = database_instance.get('name')
            if instance_name:
                print(f"Stopping database instance: {instance_name}")

                # Patch the instance to set activation policy to 'NEVER'
                stop_request = service.instances().patch(
                    project=project,
                    instance=instance_name,
                    body={'settings': {'activationPolicy': 'NEVER'}}
                )
                stop_response = stop_request.execute()
                print(f"Instance {instance_name} stopped.")

# Function to resize nodes in all clusters
def resize_nodes_in_all_clusters(project_id):
    # Initialize Cluster Manager Client
    client = container_v1beta1.ClusterManagerClient()

    # List clusters and resize node pools
    request = container_v1beta1.ListClustersRequest(
        project_id=project_id,
        zone="-",
        parent=f"projects/{project_id}/locations/-"
    )

    response = client.list_clusters(request=request)

    for cluster in response.clusters:
        for node_pool in cluster.node_pools:
            # Set node pool size to 0 (replace with desired count)
            request = container_v1beta1.SetNodePoolSizeRequest(
                project_id=project_id,
                zone=cluster.location,
                cluster_id=cluster.name.split('/')[-1],
                node_pool_id=node_pool.name,
                node_count=0
            )

            response = client.set_node_pool_size(request=request)
            print(response)  # Handle response as needed

# Function to stop all instances in a project
def stop_all_instances(project_id):
    # Initialize Compute Engine Client
    compute_client = compute_v1.InstancesClient()

    # List all instances in the project
    request = compute_v1.AggregatedListInstancesRequest()
    request.project = project_id
    request.max_results = 50

    agg_list = compute_client.aggregated_list(request=request)

    for zone, response in agg_list:
        if response.instances:
            for instance in response.instances:
                # Stop each instance
                operation = compute_client.stop(
                    project=project_id,
                    zone=zone.split('/')[-1],
                    instance=instance.name
                )
                print(f"VM instance '{instance.name}' in zone '{zone}' has been stopped.")

@functions_framework.http  # Define HTTP-triggered function
def main(self):
    project_id = 'PROJECT_ID'  # Replace with your actual project ID

    # Execute functions to stop instances and resize clusters
    stop_all_db_instances(project_id)
    resize_nodes_in_all_clusters(project_id)
    stop_all_instances(project_id)

if __name__ == "__main__":
    main()

requirements.txt file containing the necessary packages:

functions-framework==3.*
google-api-python-client
google-auth
google-auth-oauthlib
google-cloud-container
google-cloud-compute
oauth2client

Execute the ‘start_all_services.py’ script to resume operations for GCP services such as VM instances and Cloud SQL instances each morning. This script reactivates services that were previously stopped within the project. Use the provided code to streamline the process of restarting these essential services.

start_all_services.py file content:

import functions_framework  # Import Functions Framework for Cloud Function

# Imports for Cloud APIs
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
from google.cloud import container_v1beta1
from google.cloud import compute_v1

# Function to start all database instances
def start_all_db_instances(project):
    # Get default application credentials
    credentials = GoogleCredentials.get_application_default()

    # Build service for Cloud SQL Admin API
    service = discovery.build('sqladmin', 'v1beta4', credentials=credentials)

    # List all database instances in the project
    request = service.instances().list(project=project)
    response = request.execute()

    if 'items' in response:
        instances = response['items']
        for database_instance in instances:
            instance_name = database_instance.get('name')
            if instance_name:
                print(f"Starting database instance: {instance_name}")

                # Patch the instance to set activation policy to 'ALWAYS'
                start_request = service.instances().patch(
                    project=project,
                    instance=instance_name,
                    body={'settings': {'activationPolicy': 'ALWAYS'}}
                )
                start_response = start_request.execute()
                print(f"Instance {instance_name} started.")

# Function to resize nodes in all clusters
def resize_nodes_in_all_clusters(project_id):
    # Initialize Cluster Manager Client
    client = container_v1beta1.ClusterManagerClient()

    # List clusters and resize node pools
    request = container_v1beta1.ListClustersRequest(
        project_id=project_id,
        zone="-",
        parent=f"projects/{project_id}/locations/-"
    )

    response = client.list_clusters(request=request)

    for cluster in response.clusters:
        for node_pool in cluster.node_pools:
            # Set node pool size to 1 (replace with desired count)
            request = container_v1beta1.SetNodePoolSizeRequest(
                project_id=project_id,
                zone=cluster.location,
                cluster_id=cluster.name.split('/')[-1],
                node_pool_id=node_pool.name,
                node_count=1
            )

            response = client.set_node_pool_size(request=request)
            print(response)  # Handle response as needed

# Function to start all instances in a project
def start_all_instances(project_id):
    # Initialize Compute Engine Client
    compute_client = compute_v1.InstancesClient()

    # List all instances in the project
    request = compute_v1.AggregatedListInstancesRequest()
    request.project = project_id
    request.max_results = 50

    agg_list = compute_client.aggregated_list(request=request)

    for zone, response in agg_list:
        if response.instances:
            for instance in response.instances:
                # Start each instance
                operation = compute_client.start(
                    project=project_id,
                    zone=zone.split('/')[-1],
                    instance=instance.name
                )
                print(f"VM instance '{instance.name}' in zone '{zone}' has been started.")

@functions_framework.http  # Define HTTP-triggered function
def main(self):
    project_id = 'PROJECT_ID'  # Replace with your actual project ID

    # Execute functions to start instances and resize clusters
    start_all_instances(project_id)
    start_all_db_instances(project_id)
    resize_nodes_in_all_clusters(project_id)

if __name__ == "__main__":
    main()

requirements.txt file containing the necessary packages:

functions-framework==3.*
google-api-python-client
google-auth
google-auth-oauthlib
google-cloud-container
google-cloud-compute
oauth2client

3. Build two Cloud Functions: one for starting GCP services and another for stopping them

Utilize the earlier provided source code when deploying these Cloud Functions. Ensure that the previously configured service account is linked to these functions for proper authorization. Remember to execute the specified gcloud commands from the directory where the respective Python code resides. These functions, responsible for managing the start and stop actions of the infrastructure, are secured to disallow unauthenticated access and operate within the US-Central1 region, running on Python 3.11."

Cloud Function to Stop GCP Services:

gcloud functions deploy stop-infra-function \
  --trigger-http \
  --region us-central1 \
  --runtime python311 \
  --service-account sample-sa@${PROJECT_ID}.iam.gserviceaccount.com \
  --no-allow-unauthenticated --entry-point=main

Cloud Function to Start GCP Services:

gcloud functions deploy start-infra-function \
  --trigger-http \
  --region us-central1 \
  --runtime python311 \
  --service-account sample-sa@${PROJECT_ID}.iam.gserviceaccount.com \
  --no-allow-unauthenticated --entry-point=main

4. Create two Cloud Scheduler jobs to trigger these Cloud Functions

Create a scheduled job to invoke the Cloud Function previously created, using the service account that you created.

Schedule the start-infra-job to Trigger the Cloud Function at 9 AM IST:

In the Cloud console, go to the Cloud Scheduler page.
Click Schedule a job.
In the Define the schedule section, enter the following:

Name: start-infra-job
Region: us-central1
Frequency: 0 9 * * * (every morning 9 AM IST)
Timezone: Asia/Calcutta

4. Click Continue.

5. In the Configure the execution section, enter the following:

Target type: HTTP
URL: the URL of the Cloud Function deployed earlier ($FUNCTION_URL)
HTTP Method: POST
Body: {"name": "Scheduler"}
Auth Header: Add OIDC token
Service account: sample-sa

6. Click Continue, and then click Create.

Alternatively, you can run the following equivalent gcloud command:

gcloud scheduler jobs create http start-infra-job \
  --location us-central1 \
  --schedule "0 9 * * *" \
  --time-zone "Asia/Calcutta" \
  --uri "https://${REGION}-${PROJECT_ID}.cloudfunctions.net/start-infra-function" \
  --http-method POST \
  --message-body '{"name": "Scheduler"}' \
  --oidc-service-account-email sample-sa@${PROJECT_ID}.iam.gserviceaccount.com --project=${PROJECT_ID}

Similarly Schedule the stop-infra-job to Trigger the Cloud Function at 7 PM IST:

gcloud scheduler jobs create http stop-infra-job \
  --location us-central1 \
  --schedule "0 19 * * *" \
  --time-zone "Asia/Calcutta" \
  --uri "https://${REGION}-${PROJECT_ID}.cloudfunctions.net/stop-infra-function" \
  --http-method POST \
  --message-body '{"name": "Scheduler"}' \
  --oidc-service-account-email sample-sa@${PROJECT_ID}.iam.gserviceaccount.com --project=${PROJECT_ID}

After deploying both Cloud Functions and Cloud Schedulers, you’ll notice informative logs in the ‘start-infra-function’ Cloud Function. These logs will indicate the successful initiation of instances at 9 AM IST:

Following the shutdown of GCP services, you can review the log outputs within the ‘stop-infra-function’ Cloud Function. These logs provide information about the cessation process and related activities:

Impressive work!! You’ve effectively implemented automated shutdowns and startups for provisioned infrastructure, reducing developer overhead, optimizing cloud resources, and cutting costs. Stay tuned for more valuable content like this, unlocking greater efficiency and cost-effectiveness within your cloud environment.

References:

Questions?
If you have any questions, I’ll be happy to read them in the comments. Follow me on medium or LinkedIn.

GCP Cost Efficiency: Scheduled Cloud Functions for Infra Start/Stop

1. Create a service account with limited access

2. Prepare Python code for each cloud function

3. Build two Cloud Functions: one for starting GCP services and another for stopping them

4. Create two Cloud Scheduler jobs to trigger these Cloud Functions

Written by Prajakta Shete