New Updates on Pub/Sub to BigQuery Dataflow Templates from GCP
We are pleased to announce several new features to the Cloud Pub/Sub to BigQuery Template including support for subscriptions(!!!) as well as some error handling improvements. We detail these updates below.
In the past, the Pub/Sub to BigQuery Dataflow template only supported reading messages from Pub/Sub topics using the parameter
inputTopic . We have created a second Dataflow template that reads messages from Pub/Sub subscriptions using the parameter
inputSubscription . We delineate these two templates on the Dataflow console under the
CREATE JOB FROM TEMPLATE button as “Cloud Pub/Sub Subscription to BigQuery” and “Cloud Pub/Sub Topic to BigQuery”. The code for generating both of these templates can also be found on Github. Note that a caveat of using subscriptions over topics is that subscriptions are only read once while topics can be read multiple times. Therefore a subscription template cannot support multiple concurrent pipelines reading the same subscription.
The remaining details, once a message is read either from a subscription or a topic, remain mostly the same.
- The user specifies an existing BigQuery table for the input messages to land. This is specified using the
outputTableSpecparameter. As always with all pipelines, users need to specify a GCS bucket location for writing/staging temp files.
outputDeadletterTable(see Github for the dead-letter schema).
Future Dataflow Pipelines + Features
Our Dataflow templates can be accessed on the Dataflow UI under the
CREATE JOB FROM TEMPLATE button or can be found open sourced on Github for further customization as users need (see this link for the Cloud Pub Sub to BigQuery code). Keep a look out for future continued efforts on Dataflow Templates from the GCP team.