Working with Cloud Spanner and Java
We’ve gone into the architectural details of Google Cloud Spanner in previous posts, and now it is time to get a little deeper into the details of building an application using Google Cloud Spanner.
If you decide to build your application on Cloud Spanner, you can rely on ANSI 2011 SQL support and client libraries for multiple languages. There are great tutorials that help you get started, though they don’t go into much depth regarding the different options when using Java; Data Manipulation Language or Mutations via the client libraries, or SQL/DML via the two JDBC drivers.
I’m not going to go full depth on these concepts, but I hope to provide enough information to help you understand your different options as a Java developer working with Cloud Spanner. To make sure I got the details right, this article and the code written for it (which you can clone here) got technical expertise assistance from Java expert Peter Runge (more appropriately prunge-helix on Github)
We will be using the same schema as the Google Cloud Spanner getting started guides, which is explained in detail on the Schema and data model page in the Cloud Spanner documentation.
We are essentially creating a music application, and our catalog contains details on Singers and their Albums. The strong parent child relationship between singers and albums lends itself well to a unique Cloud Spanner optimisation called interleaved tables, which are described on that page, and well worth understanding.
Examples and Options
ORMs and the JDBC driver
If you are a seasoned Java programmer, it may be easier or more relevant to use an ORM or the JDBC driver to interact with Cloud Spanner.
ORMs can also make it easier to manipulate data in Cloud Spanner in your language of choice without having to write DML. In many cases these are a wrapper around the existing Cloud Spanner APIs. For example in Java with spring, spring-cloud-gcp-starter-data-spanner uses the Cloud Spanner APIs (com.google.cloud.spanner.*) to execute statements.
When following modern programming practices, it is much easier and consistent to use ORMs to interact with the database compared with interspersing DML in your code. As ORMs often make use of the existing client libraries, all the benefits of working with DML vs Mutations etc. are maintained.
For ORM with SpringData, we will first create the singers table:
Now we will create the Albums table :
And of course we have to create the interfaces:
And now we can use the tables:
There are two JDBC drivers including an open source driver written by Google. It makes use of the client libraries to connect to Cloud Spanner, and allows you to execute SQL and by extension DML.
If your statements require many objects to be held in memory prior to execution, it may be more efficient to use the JDBC driver to execute statements against the database. Large statements that require multiple joins, group-bys, and aggregations, may be onerous to manage in an object oriented manner, and it may be simpler to write a single DML statement containing those actions instead. Though, in terms of execution, the latter example is not expected to be roughly the same in either ORM and DML
Of course, if you are connecting an off the shelf application it is likely that the simplest integration would be by connecting via the JDBC driver.
A quick note on SQL/DML
Cloud Spanner supports ANSI 2011 compatible SQL, enabling you to query databases using declarative SQL statements that specify what data you want to retrieve.
There are SQL best practices that can help Cloud Spanner to find the relevant data in the most efficient way, and understanding how Cloud Spanner executes SQL statements can go a long way to improve performance. For example, use of parameters and secondary indexes are two of the ways that query performance can be improved.
Data Manipulation Language (DML) and Partitioned DML
DML can be used to INSERT, UPDATE, and DELETE statements in the Cloud Console, gcloud command-line tool, and client libraries. DML is designed for transaction processing, where Partitioned DML is designed for bulk updates and deletes, with minimal impact on concurrent transaction processing. This is achieved in Partitioned DML by partitioning the key space and running the statement over partitions in separate, smaller-scoped transactions.
DML statements are executed inside read-write transactions acquiring locks only on the columns you are accessing. For reads, shared locks are used to ensure consistency, with writes or modifications resulting in exclusive locks.
The following DML best practices will help improve performance, and minimise locking.
Now we will execute the same steps illustrated in the ORM example, by using the Java JDBC to execute DDL and DML statements
Mutations
A Mutation represents a sequence of inserts, updates, and deletes that Cloud Spanner applies atomically to different rows and tables in a Cloud Spanner database. These are executed via the Mutation API.
Although you can commit mutations by using gRPC or REST, it is more common to access the APIs through the client libraries.
Peter Runge will publish a post on working with DML and Mutations next week if you want to delve a little deeper into that topic.
Since this is the third example, we are going to assume you have created the tables, and save some time by just using the Mutation API to add data to our Singers and Albums tables
If you just wanted to use the standard client library, the getting started guide takes you through the same example which we reference below, and the code is published on github
The client libraries are also used by the ORM and JDBC drivers, so you can also use them to execute DDL: