How to Create Remote Functions in BigQuery
Use Remote Functions to process and augment BigQuery data in unlimited ways
Introduction
Google BigQuery is a serverless data warehouse optimized for analytical workloads. BigQuery uses Google Standard SQL commands and supports a wide range of built-in functions. However, there are situations where the built-in functions are not sufficient. BigQuery Remote Functions allow BigQuery users to process their data with custom function calls to Cloud Functions.
For example, BigQuery Remote Functions could be used for things like:
- Custom hashing and encryption/decryption workloads
- Machine Learning Inference
- Reverse Geocoding
- Data Augmentation
This how-to guide will illustrate a step-by-step example of creating and using a BigQuery Remote Function. This function will use the SHA-384 hash function, which is not natively available as a function in BigQuery.
Names and References
Names I’m using in this how-to are:
Project ID: bq-remote-functions
Cloud Function Name: remote_hash
Cloud Function Entrypoint: remote_hash
BigQuery Dataset: remote_functions_dataset
BigQuery Table: remote_functions_table_native
BigQuery Remote Function Connection: remote-function-connection
Step 1: Enable the needed APIs
Open Cloud Shell and set your project with gcloud config set project [PROJECT ID]
if needed.
Enable the necessary APIs:
gcloud services enable cloudfunctions.googleapis.com --async
gcloud services enable cloudbuild.googleapis.com --async
gcloud services enable bigqueryconnection.googleapis.com --async
Step 2: Write the Function Code
Create folder called function/
.
In that folder, using your editor of choice, save the following Cloud Function code as main.py
:
Also in that folder, add a requirements file with the following command: echo functions-framework==3.2.0 >> requirements.txt
Step 3: Deploy Code to Cloud Function
Deploy the Cloud Function with this command:
gcloud functions deploy remote_hash \
--runtime python310 \
--trigger-http \
--ingress-settings all \
--source ./function/
If asked “Allow unauthenticated invocations of new function [remote_hash]? (y/N)?”, select “N”
Note that--ingress-settings
needs to be all
because otherwise the remote function call from BigQuery will be blocked.
Step 4: Create Sample BigQuery Dataset and Table
Create a dataset and table in BigQuery:
bq mk remote_functions_dataset
bq mk --table remote_functions_dataset.remote_functions_table_native id:INTEGER,name:STRING
Populate the table:
The table should look like this:
Step 5: Create the BigQuery Remote Connection
Create the BigQuery remote connection:
bq mk --connection \
--display_name=’Connection to remote Cloud Function’ \
--connection_type=CLOUD_RESOURCE \
--project_id=bq-remote-functions \
--location=US \
remote-function-connection
Show the details of the newly created connection using this command:
bq show --location=US --connection remote-function-connection
Take note of the serviceAccountID
under properties
. It should look something like this:
{“serviceAccountId”: “bqcx-012345678901-drct@gcp-sa-bigquery-condel.iam.gserviceaccount.com”}
Step 6: Add Role to Service Account
Add the invoker role to the SA displayed displayed above under serviceAccountID
:
gcloud projects add-iam-policy-binding bq-remote-functions \
--member="serviceAccount:bqcx-012345678901-drct@gcp-sa-bigquery-condel.iam.gserviceaccount.com" \
--role='roles/cloudfunctions.invoker'
Step 7: Create the BigQuery Remote Function
Find and take note of the Cloud Function endpoint URL using: gcloud functions describe remote_hash
Run this command in BQ to create the remote function:
Step 8: Test the Function
Test the function with this query:
The result should now look like this:
Conclusion
This has been a brief how-to on Remote Functions with BigQuery. Remote Functions are a powerful way to enable functionality beyond what is natively available in BigQuery.
References and Further Reading
1. https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions
2. https://cloud.google.com/blog/products/data-analytics/extending-bigquery-functions
Addendum: Request/Response Schema and Structure
For this table:
+------+-----+------+
| Row | id | name |
+------+-----+------+
| 1 | 10 | John |
| 2 | 333 | Luke |
| 3 | 400 | Mike |
+------+-----+------+
During a Remote Function call, the request object sent to the Cloud Function will look like this, with the calls being a list of lists:
{'requestId': '3b7456fd-22ed-4241-b444-00f98756cbe3',
'caller': '//bigquery.googleapis.com/projects/bq-remote-functions/jobs/bquxjob_5ca2b63f_182cc1a93f8',
'sessionUser': 'user@XXXXXXXXXXXXXX.com',
'calls': [["John"], ["Luke"], ["Mike"]]}
The response object needs to be structured like this, with the returned data being a list:
{"replies": ["9db044f2b3b06c6f83e78b491cd6d65b38fa10d9fbb47abca9954ec9524b69fd47dfc327f6c73ef4ea8f53007e66d024","bec067cb433d75794150b9e52aef3ec2b59be0e8229e8d6c5aebdec05a1a628813790a206f4614a2ed281cd552aaab6c","1b5621919d40b106be695c51d268aa4f168f1237edea9e8dba1bd63111f23236130661c173c925942ee2e5d5ff23f26d"]}
Note that a response object structured like this (a list of lists) will result in a Failed to decode JSON to SQL type.
error on the BQ side; You can’t have a list of lists as the response from the function:
{"replies": [["9db044f2b3b06c6f83e78b491cd6d65b38fa10d9fbb47abca9954ec9524b69fd47dfc327f6c73ef4ea8f53007e66d024"],["bec067cb433d75794150b9e52aef3ec2b59be0e8229e8d6c5aebdec05a1a628813790a206f4614a2ed281cd552aaab6c"],["1b5621919d40b106be695c51d268aa4f168f1237edea9e8dba1bd63111f23236130661c173c925942ee2e5d5ff23f26d"]]}