Cloud Spanner Change Watcher

Knut Olav Løite
Google Cloud - Community
6 min readJul 8, 2020

Google Cloud Spanner is a fully managed, scalable, relational database service for regional and global application data. It is the first scalable, enterprise-grade, globally-distributed, and strongly consistent database service built for the cloud specifically to combine the benefits of relational database structure with non-relational horizontal scale.

Google Cloud Spanner
Google Cloud Spanner

Multiple applications and services can interact with the same Cloud Spanner database simultaneously, and a common pattern is for one service to trigger other services based on certain events, such as inserting or updating a record in a database. This article describes the open source spanner-change-watcher library that can monitor and publish events for data changes in a Cloud Spanner database as an in-process service of a Java application.

The open source library also contains the spanner-change-publisher service that can run as a standalone application that will publish changes from a Cloud Spanner database to one or more Pubsub topics. An introduction to that service can be found here.

Prerequisites and Limitations

  • Spanner Change Watcher uses commit timestamps to determine when a row has been updated. Only tables that contain a column with the commit timestamp of the last update can be monitored using this library. The name of this column may be chosen freely.
  • The library can only detect inserts and updates, as it relies on the commit timestamp value of a row to detect a change. To detect deletes, you should implement logical deletes by setting a deleted flag on the row instead of actually deleting it. If the delete is executed on a parent table with one or more child tables marked with ON DELETE CASCADE, the child tables would also need to be updated accordingly to be picked up by the watcher. The flag can be used by a separate background task to detect which rows should be deleted at a later moment.
  • The library monitors the database for changes by polling the tables that are being watched. The polling interval is default one second and can be configured. If there are multiple changes to the same row within a time frame that is smaller than the polling interval, the library may only report the last change.

Components

The Spanner Change Watcher project contains three main components:

  • Spanner Change Watcher: A Java background service that will watch one, some or all tables of a Cloud Spanner database for changes, and publish these as in-process events to an application. This library can be integrated into existing Java applications that need to be notified of data updates. This article will focus on this component.
  • Spanner Change Publisher: A standalone application that will watch one, some or all tables of a Cloud Spanner database for changes, and publish these changes to Pubsub in JSON or Avro format. The application can be configured to publish all changes to one topic, or it can publish the changes to separate topics per table. See this article for more information on the Spanner Change Publisher component.
  • Spanner Change Archiver: An example application that uses Spanner Change Publisher to publish change events to a Pubsub topic, which again will trigger a Google Cloud Function that writes the change to Google Cloud Storage.

Spanner Change Watcher

Spanner Change Watcher is an in-process background service that can be included in existing Java applications. It will watch a set of Cloud Spanner tables for changes and notify the application through a callback of any changes that are detected.

Below is a simple example of a callback that will print the contents of the updated row to the console.

Spanner Change Watcher — Callback Sample

Spanner Change Watcher allows you to create watchers for individual tables, sets of tables or an entire database. Each watcher can be assigned one or more callbacks that will receive notifications when a change is detected, including the new data and the commit timestamp of the change.

Changes to one table are guaranteed to be delivered to the callback in order of commit timestamp, and the library guarantees that at most one callback will be active for a table at any given time. When watching multiple tables, the tables are polled in parallel and there is no guarantee to the ordering of callbacks to the different tables. This means that a callback for a change to TABLE1 may be called before a callback to TABLE2, even when the change to TABLE2 occurred before the change to TABLE1.

Watching a Single Table

Watchers for a single table implement the interface SpannerTableChangeWatcher. Currently, the only implementation of this interface is SpannerTableTailer that polls the table that is being monitored at fixed intervals.

Create a SpannerTableTailer

Watching an Entire Database

Watchers for an entire database, or a set of tables in a database, implement the interface SpannerDatabaseChangeWatcher. Currently, the only implementation of this interface is SpannerDatabaseTailer that polls each table that is being monitored at fixed intervals. The tables are polled independently of each other and in parallel.

Create a SpannerDatabaseTailer

Using the option allTables() when creating the SpannerDatabaseTailer will automatically create a watcher that will watch all tables that have a commit timestamp column. All tables that do not have a commit timestamp column will be ignored and do not need to be explicitly excluded from the set of tables to watch.

It is also possible to exclude certain tables from being watched. Use the option allTables().except(String...) to exclude specific tables from being watched.

Tutorial: Sample Application

We will now go through a step-by-step guide to create a simple application that will watch two tables for changes and write all changes to the console. The entire sample application can be found here.

Data Model

This sample assumes that the following tables will be present in your database. The provided sample application will automatically create these for you if they are not already present.

Each table contains a LastUpdateTime column of type TIMESTAMP with the option allow_commit_timestamp=true. This means that column may be automatically set by Cloud Spanner using the commit timestamp of the transaction that updated the row. Spanner Change Watcher can only monitor tables that have such a column. The name of the column may be chosen freely and the library will automatically detect which column to use.

Sample Data Model

Dependency

The spanner-change-watcher is needed as a dependency in your project.

Spanner Change Watcher Dependency

Create a Spanner Change Watcher

With the google-cloud-spanner-change-watcher added to your project, you can now create a change watcher. For this example, we will create a watcher for the entire database. This watcher will automatically include all tables in the database that have a commit timestamp column. All tables without a commit timestamp column will be ignored.

You can also set a specific set of tables to watch, or to set the watcher to monitor all tables except a specific set of tables. See the documentation of SpannerDatabaseTailer.Builder for more information.

Create a Database Watcher for all Tables

You can add one or more callbacks to a watcher to receive notifications of all changes that are detected. The callback in this example just prints the new contents of the row that has been updated to the console.

Write Changes

All changes that are written to the Spanner database will now also be printed to the console by the watcher. The sample application writes 5 Singers and 5 Albums to the database. It is important that the application that is writing the data to Cloud Spanner also fills the commit timestamp column with the commit timestamp of the current transaction. Use the Value.COMMIT_TIMESTAMP placeholder value to do so when using Mutation objects, and use the PENDING_COMMIT_TIMESTAMP() function when writing data through DML.

Use Value.COMMIT_TIMESTAMP to write the commit timestamp when using Mutation objects

Running the Sample Application

To execute the sample application, follow these steps:

  1. Clone the project from GitHub.
  2. Navigate to the spanner-change-watcher/samples/spanner-change-watcher-samples folder.
  3. Create a .jar by executing mvn clean package.
  4. Execute the following command, where the <instance-id> and <database-id> must be replaced by actual values. The instance must already exist, the database will automatically be created if it does not already exist.
java -cp target/spanner-change-watcher-samples.jar \
com.google.cloud.spanner.watcher.sample.SimpleChangeWatcherSample \
<instance-id> <database-id>

The sample application will start and the output to the console should look something like below.

Sample Output

If you insert or update more data in the database through any other application, such as for example the Google Cloud Spanner web interface or the gcloud util, the watcher will print out those changes as well.

Further Samples

Further samples on how to use spanner-change-watcher can be found in the spanner-change-watcher-samples/Samples.java file, with samples for:

  • Error handling
  • Configuring the poll interval
  • Using a custom commit timestamp repository
  • Using sharding to allow indexing of the commit timestamp column to optimize polling

--

--