Cloud Spanner Change Watcher
Google Cloud Spanner is a fully managed, scalable, relational database service for regional and global application data. It is the first scalable, enterprise-grade, globally-distributed, and strongly consistent database service built for the cloud specifically to combine the benefits of relational database structure with non-relational horizontal scale.
Multiple applications and services can interact with the same Cloud Spanner database simultaneously, and a common pattern is for one service to trigger other services based on certain events, such as inserting or updating a record in a database. This article describes the open source spanner-change-watcher library that can monitor and publish events for data changes in a Cloud Spanner database as an in-process service of a Java application.
The open source library also contains the spanner-change-publisher service that can run as a standalone application that will publish changes from a Cloud Spanner database to one or more Pubsub topics. An introduction to that service can be found here.
Prerequisites and Limitations
- Spanner Change Watcher uses commit timestamps to determine when a row has been updated. Only tables that contain a column with the commit timestamp of the last update can be monitored using this library. The name of this column may be chosen freely.
- The library can only detect inserts and updates, as it relies on the commit timestamp value of a row to detect a change. To detect deletes, you should implement logical deletes by setting a deleted flag on the row instead of actually deleting it. If the delete is executed on a parent table with one or more child tables marked with ON DELETE CASCADE, the child tables would also need to be updated accordingly to be picked up by the watcher. The flag can be used by a separate background task to detect which rows should be deleted at a later moment.
- The library monitors the database for changes by polling the tables that are being watched. The polling interval is default one second and can be configured. If there are multiple changes to the same row within a time frame that is smaller than the polling interval, the library may only report the last change.
Components
The Spanner Change Watcher project contains three main components:
- Spanner Change Watcher: A Java background service that will watch one, some or all tables of a Cloud Spanner database for changes, and publish these as in-process events to an application. This library can be integrated into existing Java applications that need to be notified of data updates. This article will focus on this component.
- Spanner Change Publisher: A standalone application that will watch one, some or all tables of a Cloud Spanner database for changes, and publish these changes to Pubsub in JSON or Avro format. The application can be configured to publish all changes to one topic, or it can publish the changes to separate topics per table. See this article for more information on the Spanner Change Publisher component.
- Spanner Change Archiver: An example application that uses Spanner Change Publisher to publish change events to a Pubsub topic, which again will trigger a Google Cloud Function that writes the change to Google Cloud Storage.
Spanner Change Watcher
Spanner Change Watcher is an in-process background service that can be included in existing Java applications. It will watch a set of Cloud Spanner tables for changes and notify the application through a callback of any changes that are detected.
Below is a simple example of a callback that will print the contents of the updated row to the console.
Spanner Change Watcher allows you to create watchers for individual tables, sets of tables or an entire database. Each watcher can be assigned one or more callbacks that will receive notifications when a change is detected, including the new data and the commit timestamp of the change.
Changes to one table are guaranteed to be delivered to the callback in order of commit timestamp, and the library guarantees that at most one callback will be active for a table at any given time. When watching multiple tables, the tables are polled in parallel and there is no guarantee to the ordering of callbacks to the different tables. This means that a callback for a change to TABLE1
may be called before a callback to TABLE2
, even when the change to TABLE2
occurred before the change to TABLE1
.
Watching a Single Table
Watchers for a single table implement the interface SpannerTableChangeWatcher. Currently, the only implementation of this interface is SpannerTableTailer that polls the table that is being monitored at fixed intervals.
Watching an Entire Database
Watchers for an entire database, or a set of tables in a database, implement the interface SpannerDatabaseChangeWatcher. Currently, the only implementation of this interface is SpannerDatabaseTailer that polls each table that is being monitored at fixed intervals. The tables are polled independently of each other and in parallel.
Using the option allTables()
when creating the SpannerDatabaseTailer
will automatically create a watcher that will watch all tables that have a commit timestamp column. All tables that do not have a commit timestamp column will be ignored and do not need to be explicitly excluded from the set of tables to watch.
It is also possible to exclude certain tables from being watched. Use the option allTables().except(String...)
to exclude specific tables from being watched.
Tutorial: Sample Application
We will now go through a step-by-step guide to create a simple application that will watch two tables for changes and write all changes to the console. The entire sample application can be found here.
Data Model
This sample assumes that the following tables will be present in your database. The provided sample application will automatically create these for you if they are not already present.
Each table contains a LastUpdateTime column of type TIMESTAMP with the option allow_commit_timestamp=true. This means that column may be automatically set by Cloud Spanner using the commit timestamp of the transaction that updated the row. Spanner Change Watcher can only monitor tables that have such a column. The name of the column may be chosen freely and the library will automatically detect which column to use.
Dependency
The spanner-change-watcher is needed as a dependency in your project.
Create a Spanner Change Watcher
With the google-cloud-spanner-change-watcher added to your project, you can now create a change watcher. For this example, we will create a watcher for the entire database. This watcher will automatically include all tables in the database that have a commit timestamp column. All tables without a commit timestamp column will be ignored.
You can also set a specific set of tables to watch, or to set the watcher to monitor all tables except a specific set of tables. See the documentation of SpannerDatabaseTailer.Builder for more information.
You can add one or more callbacks to a watcher to receive notifications of all changes that are detected. The callback in this example just prints the new contents of the row that has been updated to the console.
Write Changes
All changes that are written to the Spanner database will now also be printed to the console by the watcher. The sample application writes 5 Singers and 5 Albums to the database. It is important that the application that is writing the data to Cloud Spanner also fills the commit timestamp column with the commit timestamp of the current transaction. Use the Value.COMMIT_TIMESTAMP
placeholder value to do so when using Mutation objects, and use the PENDING_COMMIT_TIMESTAMP()
function when writing data through DML.
Running the Sample Application
To execute the sample application, follow these steps:
- Clone the project from GitHub.
- Navigate to the
spanner-change-watcher/samples/spanner-change-watcher-samples
folder. - Create a .jar by executing
mvn clean package
. - Execute the following command, where the
<instance-id>
and<database-id>
must be replaced by actual values. The instance must already exist, the database will automatically be created if it does not already exist.
java -cp target/spanner-change-watcher-samples.jar \
com.google.cloud.spanner.watcher.sample.SimpleChangeWatcherSample \
<instance-id> <database-id>
The sample application will start and the output to the console should look something like below.
If you insert or update more data in the database through any other application, such as for example the Google Cloud Spanner web interface or the gcloud util, the watcher will print out those changes as well.
Further Samples
Further samples on how to use spanner-change-watcher can be found in the spanner-change-watcher-samples/Samples.java file, with samples for:
- Error handling
- Configuring the poll interval
- Using a custom commit timestamp repository
- Using sharding to allow indexing of the commit timestamp column to optimize polling