Cloud Spanner Change Publisher

Knut Olav Løite
Google Cloud - Community
6 min readJul 8, 2020

Google Cloud Spanner is a fully managed, scalable, relational database service for regional and global application data. It is the first scalable, enterprise-grade, globally-distributed, and strongly consistent database service built for the cloud specifically to combine the benefits of relational database structure with non-relational horizontal scale.

Google Cloud Spanner
Google Cloud Spanner

Multiple applications and services can interact with the same Cloud Spanner database simultaneously, and a common pattern is for one service to trigger other services based on certain events, such as inserting or updating a record in a database. This article describes the open source spanner-change-publisher service that can run as a standalone application that publishes data changes in a Cloud Spanner database to Pubsub.

The open source project also contains the spanner-change-watcher library that can be integrated into an existing Java application to deliver the data change events directly to that application, instead of publishing these to Pubsub. An introduction to that service can be found here.

Prerequisites and Limitations

  • Spanner Change Publisher uses commit timestamps to determine when a row has been updated. Only tables that contain a column with the commit timestamp of the last update can be monitored using this library. The name of this column may be chosen freely.
  • The library can only detect inserts and updates, as it relies on the commit timestamp value of a row to detect a change. To detect deletes, you should implement logical deletes by setting a deleted flag on the row instead of actually deleting it. If the delete is executed on a parent table with one or more child tables marked with ON DELETE CASCADE, the child tables would also need to be updated accordingly to be picked up by the watcher. The flag can be used by a separate background task to detect which rows should be deleted at a later moment.
  • The library monitors the database for changes by polling the tables that are being watched. The polling interval is default one second and can be configured. If there are multiple changes to the same row within a time frame that is smaller than the polling interval, the library may only report the last change.

Spanner Change Publisher

Spanner Change Publisher is a service that monitors one, some or all tables in a Cloud Spanner database for data changes, and publishes the detected data changes to Pubsub. It can be executed both as part of an existing application, as well as a standalone application. Spanner Change Publisher has built in support for publishing the changed data in both Avro and JSON formats, and the data conversion is configurable, and allows you to supply your own converter to any format you might want to use.

Spanner Change Publisher relies on Spanner Change Watcher for detecting changes in a database. It is recommended to have a basic knowledge of Spanner Change Watcher before working with Spanner Change Publisher. An introduction to Spanner Change Watcher can be found here.

Changes to one table are guaranteed to be published in order of commit timestamp, but Pubsub does not guarantee that the messages will be delivered to subscribers in the same order as they were published. See https://cloud.google.com/pubsub/docs/ordering for more information on Pubsub message ordering.

Sample Application

The project contains a small sample application that will:

  1. Create a database with two tables (or use an existing database if it already exists).
  2. Create a Spanner Change Publisher that will monitor all tables in the database for changes and publish these to a Pubsub topic.
  3. Create a Pubsub Subscriber that receives the changes from Pubsub and writes these to the console.
  4. Write some changes to the Cloud Spanner database. The changes are written to the console by the Subscriber.

Data Model

The application uses the following data model. It will be created automatically for you if you start with an empty database.

Creating a Spanner Change Publisher

You can configure, create and start a change publisher from your own application if you include spanner-change-publisher as a dependency in your application. The sample application creates a publisher with the following properties:

  • Watch all tables (with a commit timestamp column) in the Cloud Spanner database for changes.
  • Publish the changed data in JSON format.
  • Publish all changes to a single Pubsub topic. The attributes of each Pubsub message will also contain the name of the table where the change occurred. Spanner Change Publisher also supports publishing changes to a different topic for each table. See the additional samples in the project and the documentation for the method Builder.setTopicNameFormat(String) for more information on how to do that.
Create a Spanner Change Publisher for an entire Database

Once the publisher is running, all changes to any of the tables in the given database will be published to Pubsub and will be delivered to subscribers of the topic(s) where the changes are published.

Subscribing to Changes from Cloud Spanner

The changes from Cloud Spanner are published to the chosen Pubsub topic in the chosen format. Subscribing to those changes is done using a normal Pubsub subscriber. The message data contains the actual Cloud Spanner data, while the message attributes contains metadata for the change, such as the database name, table name and commit timestamp.

The following code snippet shows how to subscribe to changes in JSON format and write these to the standard console.

Example Subscribing to Spanner Change Publisher

Running the Sample Application

To execute the sample application, follow these steps:

  1. Clone the project from GitHub.
  2. Navigate to the spanner-change-watcher/samples/spanner-change-publisher-samples folder.
  3. Create a .jar by executing mvn clean package.
  4. Execute the following command, where the <instance-id>, <database-id>, <topic-id> and <subscription-id>must be replaced by actual values. The instance must already exist, the database will automatically be created if it does not already exist. The Pubsub topic and Pubsub subscription will also be created automatically if these do not exist. Note that the account that is being used must have permission to create topics and subscriptions for these to be created automatically. If the service account does not have these permissions, you can also create the topic and subscription manually.
java -cp target/spanner-change-publisher-samples.jar \
com.google.cloud.spanner.publisher.sample.SimpleChangePublisherSample \
<instance-id> <database-id> <topic-id> <subscription-id>

The sample application will start the Spanner Change Publisher and the Pubsub subscriber, and then write some data to Cloud Spanner. The data will be written to the console log.

Spanner Change Publisher Standalone Application

Spanner Change Publisher can be configured and executed as a standalone application that does not need to be integrated into an existing Java application. The database to monitor, the topics where the changes should be published and other configuration options can be set in a properties file.

A complete example configuration file can be found on GitHub. The most important configuration parameters are listed below.

# ----- SPANNER SETTINGS ----- #
scep.spanner.project=my-spanner-project
scep.spanner.instance=my-instance
scep.spanner.database=my-database
#Credentials to use for Spanner if these differ from the default
scep.spanner.credentials=/path/to/spanner-credentials.json
# Turn watching all tables in the database for changes on/off.
scep.spanner.allTables=false
# A list of tables that should be excluded from monitoring.
# This may only be used in combination with allTables=true.
scep.spanner.excludedTables=TABLE1,TABLE2,TABLE3
# A list of tables that should be included in the monitoring.
# This may only be used in combination with allTables=false.
scep.spanner.includedTables=TABLE1,TABLE2,TABLE3
# The poll interval of the Spanner watcher.
scep.spanner.pollInterval=PT0.5S
# ----- PUBSUB SETTINGS ----- #
scep.pubsub.project=my-pubsub-project
# Credentials to use for Pubsub if these differ from the default.
scep.pubsub.credentials=/path/to/pubsub-credentials.json
# Converter to use to convert a Spanner row to a Pubsub message.
scep.pubsub.converterFactory=com.google.cloud.spanner.publisher.SpannerToJsonFactory
# Topic(s) where to publish the changes.
scep.pubsub.topicNameFormat=spanner-update-%database%-%table%

The configuration file to use is given as a system parameter during startup. It is also possible to override or set additional configuration values by specifying them as system parameters.

To run the Spanner Change Publisher standalone application, follow these steps:

  1. Clone the project from GitHub.
  2. Navigate to the spanner-change-watcher/google-cloud-spanner-change-publisher folder.
  3. Create a .jar by executing mvn clean package.
  4. Execute java -Dscep.properties=/path/to/your-scep.properties -jar target/spanner-publisher.jar where path/to/your-scep.properties is your configuration file.

Pubsub Topic Name Format

The Pubsub topic name format configuration value is used to determine where to publish changes from a specific table. The configuration value allows for multiple wildcards to be used to publish changes from different tables to different topics. You can also specify a fixed format without any wildcards. This will cause all changes to be published to the same topic. A subscriber can still see which table the change originated from, as the full table name is included in the attributes of the Pubsub message.

The wildcards that are supported in the Pubsub topic name format configuration are:

  • %project%: The project id of the Cloud Spanner database
  • %instance%: The instance id of the Cloud Spanner database
  • %database%: The database id of the Cloud Spanner database
  • %table%: The table name of the Cloud Spanner database

The following configuration will for example publish changes to the table MyTable in the database my-database to be published to a Topic with the name projects/my-pubsub-project/topics/spanner-update-my-database-MyTable.

scep.pubsub.project=my-pubsub-project
scep.pubsub.topicNameFormat=spanner-update-%database%-%table%

--

--