Scheduling BigQuery Procedures using Cloud Scheduler/BQ Scheduled Queries

Abhik Saha
6 min readMay 7, 2023

--

Using Google Cloud Scheduler

Image credits: medium.com

Google Cloud Scheduler is a fully managed service provided by Google Cloud Platform (GCP) that allows users to schedule and automate recurring tasks or jobs across different GCP services or external endpoints. With Cloud Scheduler, users can easily set up automated workflows to execute tasks at specific times or intervals, without having to write any code or manage any infrastructure.

Cloud Scheduler supports scheduling of several types of jobs such as HTTP requests, Pub/Sub messages, App Engine applications, and Cloud Functions. Users can specify the schedule for each job, such as a specific time, a recurring interval, or a cron expression. Cloud Scheduler also provides features such as retries, time zones, and notifications to help ensure the jobs are executed reliably and according to schedule.

One of the key benefits of Cloud Scheduler is its tight integration with other GCP services such as BigQuery, Cloud Functions, and App Engine. This enables users to easily set up complex workflows that involve multiple services, without having to manage the underlying infrastructure. Cloud Scheduler also provides a REST API and client libraries in different programming languages, making it easy to programmatically create and manage scheduled jobs.

Scheduling Stored Procedures with Cloud Scheduler

  1. To schedule using Google Cloud Scheduler, we need to have the Google Cloud Function and the Google Cloud Scheduler APIs enabled first. You can do that from APIs and Services. For me, they are already enabled, so I cannot enable them again.

2. a) After that we need to create an HTTP triggered cloud function using the below configuration.

b) Note down the URL using which the Function will be triggered.

c) Set the Memory allocated to 256MB, the timeout to 60 seconds and max and min number of instances to 1 as we do not want auto-scaling for this process.

d) For the service account, you need to create a service account from IAM and Admin which has the below permissions. The CF should be able to edit BQ data, create execution job in BQ and should also be able to invoke a Cloud Function.

e) Deploy the Cloud Function with the below Source code. The code is quite simple. It calls a Stored Procedure named sp_dynamic_inserts using the BigQuery Client Libraries and then checks for the Status of the query job and prints the relevant result.

Please note that since it is an HTTP triggered CF, we must have a return type to send the status back. Here we are returning “OK”.

If you want to know more about the procedure sp_dynamic_inserts, click here.

from google.cloud import bigquery
import functions_framework
import time

def bq_sp_scheduler(request):

client =bigquery.Client()

query_job = client.query(
"""
CALL `plated-field-383807.prod_dataset.sp_dynamic_inserts`();
"""
)
query_job.result()

if query_job.state == 'DONE':
print("Procedure sp_dynamic_inserts executed")
elif query_job.state == 'FAILED':
print("Procedure sp_dynamic_inserts failed")
else:
print(query_job.state)

return "OK"

f) The “requirements.txt” section must have the following dependencies.

# Function dependencies, for example:
# package>=version
google-cloud
google-cloud-bigquery

3. Now it is time for us to create a scheduled job using the Cloud Scheduler.

a) Specify the name, the region and the description of the job. We will set the frequency to run the job every 10 mins.

b) Select target type as HTTP. Paste the URL you used in the cloud function. Set HTTP method to GET. Also set the Auth Header to “OICD Token” and the service account the one that you created earlier.

c) We would make no modification in the optional settings. Our Job configuration is now complete.

Testing the scheduled JOB

We use the Force Run option to trigger the Job immediately. We can see the status of last execution as success.

We can verify the same from the Cloud Function logs.

We can also verify the same from the BigQuery UI. It shows the last modified date as 7th May which is today.

We can check the same thing from the project history that the SP was triggered via our service account.

The testing is now complete

2. Using BigQuery Scheduler

To schedule using BigQuery scheduler, we need to enable the API first. The API that needs to be enabled is the BQ Data Transfer API. You can find it in the left-hand side of the BigQuery services pane.

We will schedule the SP sp_table_snapshots to run it every Sunday at 10:00 UTC. To do that we go to BigQuery UI and click on schedule after we have written the code to be scheduled. Fo our case, the below call function.

CALL `plated-field-383807.snapshot_dataset.sp_table_snapshots`();

To know more about the Stored Procedure sp_table_snapshots, click here.

We will get the below screen after that. Copy the below configuration and click on save.

We can see the status of the Scheduled jobs in the Scheduled queries pane.

After the run, we can see the status of the jobs as Success.

We can verify the same from the BigQuery UI.

We can see that all the snapshots have been created successfully and the snapshot expiration of 2 days.

The testing is now complete

Conclusion

Depending on your specific use case, Google Cloud Scheduler and BigQuery Scheduler offer different advantages and serve different purposes. Below are some potential benefits of each:

Advantages of Google Cloud Scheduler:

  • Enables scheduling of a diverse range of jobs, such as HTTP requests, Pub/Sub messages, and App Engine applications in addition to BigQuery jobs.
  • Provides more granular control over scheduling options, including time zone specification and retry and timeout configurations.
  • Integrates seamlessly with numerous other Google Cloud services simplifying the orchestration of workflows across multiple services.
  • Allows for the creation of custom job types through Cloud Functions which can be beneficial for more complex use cases.

Advantages of BigQuery Scheduler:

  • Designed specifically for scheduling BigQuery jobs, streamlining the setup process for this particular use case.
  • Automatically handles credentials and authorization for BigQuery jobs, making it easier to schedule jobs securely.
  • Offers comprehensive logging and monitoring of job execution, simplifying the identification and resolution of any issues that may arise.

Ultimately, determining the best option between these two schedulers depends on the specific requirements of your use case. If you need to schedule a diverse range of jobs or orchestrate workflows across multiple services, Google Cloud Scheduler may be a better fit. However, if you primarily need to schedule BigQuery jobs and want a more straightforward setup, BigQuery Scheduler may be the optimal choice.

Follow me on LinkedIn and Medium to get more content like these!

--

--

Abhik Saha

Data Engineer @Accenture India || Writes about Bigquery, Cloud Function, GCP, SQL || LinkedIn ID: https://www.linkedin.com/in/abhik-saha-919646108/