How to Create Remote Functions in BigQuery

Use Remote Functions to process and augment BigQuery data in unlimited ways

John Grinalds
Google Cloud - Community
4 min readSep 15, 2022

--

Introduction

Google BigQuery is a serverless data warehouse optimized for analytical workloads. BigQuery uses Google Standard SQL commands and supports a wide range of built-in functions. However, there are situations where the built-in functions are not sufficient. BigQuery Remote Functions allow BigQuery users to process their data with custom function calls to Cloud Functions.

For example, BigQuery Remote Functions could be used for things like:

  1. Custom hashing and encryption/decryption workloads
  2. Machine Learning Inference
  3. Reverse Geocoding
  4. Data Augmentation

This how-to guide will illustrate a step-by-step example of creating and using a BigQuery Remote Function. This function will use the SHA-384 hash function, which is not natively available as a function in BigQuery.

Names and References

Names I’m using in this how-to are:

Project ID: bq-remote-functions
Cloud Function Name: remote_hash
Cloud Function Entrypoint: remote_hash
BigQuery Dataset: remote_functions_dataset
BigQuery Table: remote_functions_table_native
BigQuery Remote Function Connection: remote-function-connection

Step 1: Enable the needed APIs

Open Cloud Shell and set your project with gcloud config set project [PROJECT ID]if needed.

Enable the necessary APIs:

gcloud services enable cloudfunctions.googleapis.com --async
gcloud services enable cloudbuild.googleapis.com --async
gcloud services enable bigqueryconnection.googleapis.com --async

Step 2: Write the Function Code

Create folder called function/ .

In that folder, using your editor of choice, save the following Cloud Function code as main.py :

Also in that folder, add a requirements file with the following command: echo functions-framework==3.2.0 >> requirements.txt

Step 3: Deploy Code to Cloud Function

Deploy the Cloud Function with this command:

gcloud functions deploy remote_hash \
--runtime python310 \
--trigger-http \
--ingress-settings all \
--source ./function/

If asked “Allow unauthenticated invocations of new function [remote_hash]? (y/N)?”, select “N

Note that--ingress-settingsneeds to be all because otherwise the remote function call from BigQuery will be blocked.

Step 4: Create Sample BigQuery Dataset and Table

Create a dataset and table in BigQuery:

bq mk remote_functions_dataset
bq mk --table remote_functions_dataset.remote_functions_table_native id:INTEGER,name:STRING

Populate the table:

The table should look like this:

remote_functions_table_native

Step 5: Create the BigQuery Remote Connection

Create the BigQuery remote connection:

bq mk --connection \
--display_name=’Connection to remote Cloud Function’ \
--connection_type=CLOUD_RESOURCE \
--project_id=bq-remote-functions \
--location=US \
remote-function-connection

Show the details of the newly created connection using this command:

bq show --location=US --connection remote-function-connection

Take note of the serviceAccountID under properties. It should look something like this:

{“serviceAccountId”: “bqcx-012345678901-drct@gcp-sa-bigquery-condel.iam.gserviceaccount.com”}

Step 6: Add Role to Service Account

Add the invoker role to the SA displayed displayed above under serviceAccountID:

gcloud projects add-iam-policy-binding bq-remote-functions \
--member="serviceAccount:bqcx-012345678901-drct@gcp-sa-bigquery-condel.iam.gserviceaccount.com" \
--role='roles/cloudfunctions.invoker'

Step 7: Create the BigQuery Remote Function

Find and take note of the Cloud Function endpoint URL using: gcloud functions describe remote_hash

Run this command in BQ to create the remote function:

Step 8: Test the Function

Test the function with this query:

The result should now look like this:

remote_functions_table_native with hash

Conclusion

This has been a brief how-to on Remote Functions with BigQuery. Remote Functions are a powerful way to enable functionality beyond what is natively available in BigQuery.

References and Further Reading

1. https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions
2. https://cloud.google.com/blog/products/data-analytics/extending-bigquery-functions

Addendum: Request/Response Schema and Structure

For this table:

+------+-----+------+
| Row | id | name |
+------+-----+------+
| 1 | 10 | John |
| 2 | 333 | Luke |
| 3 | 400 | Mike |
+------+-----+------+

During a Remote Function call, the request object sent to the Cloud Function will look like this, with the calls being a list of lists:

{'requestId': '3b7456fd-22ed-4241-b444-00f98756cbe3',
'caller': '//bigquery.googleapis.com/projects/bq-remote-functions/jobs/bquxjob_5ca2b63f_182cc1a93f8',
'sessionUser': 'user@XXXXXXXXXXXXXX.com',
'calls': [["John"], ["Luke"], ["Mike"]]}

The response object needs to be structured like this, with the returned data being a list:

{"replies": ["9db044f2b3b06c6f83e78b491cd6d65b38fa10d9fbb47abca9954ec9524b69fd47dfc327f6c73ef4ea8f53007e66d024","bec067cb433d75794150b9e52aef3ec2b59be0e8229e8d6c5aebdec05a1a628813790a206f4614a2ed281cd552aaab6c","1b5621919d40b106be695c51d268aa4f168f1237edea9e8dba1bd63111f23236130661c173c925942ee2e5d5ff23f26d"]}

Note that a response object structured like this (a list of lists) will result in a Failed to decode JSON to SQL type. error on the BQ side; You can’t have a list of lists as the response from the function:

{"replies": [["9db044f2b3b06c6f83e78b491cd6d65b38fa10d9fbb47abca9954ec9524b69fd47dfc327f6c73ef4ea8f53007e66d024"],["bec067cb433d75794150b9e52aef3ec2b59be0e8229e8d6c5aebdec05a1a628813790a206f4614a2ed281cd552aaab6c"],["1b5621919d40b106be695c51d268aa4f168f1237edea9e8dba1bd63111f23236130661c173c925942ee2e5d5ff23f26d"]]}

--

--