DML and Mutations - a tale of two data altering techniques in Cloud Spanner
Data Manipulation Language (DML) and Mutations are two APIs in Cloud Spanner that you can use to modify data. You may find yourself asking, “Why two alternative APIs?” or “When should I choose one over the other?”. Let’s demystify each and explore their similarities and differences, so you can answer those questions.
The code samples in this post refer to a fictitious music industry database with two tables Singers and Albums with the following schema. Albums is an interleaved table of Singers, which stores the albums list of each Singer.
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
) PRIMARY KEY(SingerId);CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
) PRIMARY KEY(SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
The Cloud Spanner DML enables you to INSERT, UPDATE, and DELETE data in Spanner tables. You can run DML statements using the client libraries, the Google Cloud Console, and the gcloud command-line tool.
Spanner offers two implementations of DML execution, each with different properties. At a high level, there is:
- Standard DML: suitable for OLTP (Online Transaction Processing) workloads. The following Java sample code creates a read-write transaction and calls
executeUpdate()to insert 4 records into the Singers table. To see examples in other supported languages, see Modifying data using DML in our official Spanner documentation.
- Partitioned DML: enables large-scale, database-wide operations with minimal impact on concurrent OLTP operations by partitioning the keyspace and running the statement over the partitions in separate, smaller scoped transactions. Partitioned DML is designed for bulk updates and deletes: 1) Periodic cleanup and garbage collection. Examples are deleting old rows or setting columns to NULL. 2) Backfilling new columns with default values. An example is using an UPDATE statement to set a new column’s value to False where it is currently NULL.
Besides DML, users can use Mutations to write data to Cloud Spanner. A Mutation represents a sequence of inserts, updates, and deletes that Spanner applies atomically to different rows and tables in a Spanner database.
You write data using a Mutation object. A Mutation object is a container for mutation operations to update, delete and insert data. You can include operations that apply to different rows, or different tables, in a mutation. After you define one or more mutations that contain one or more writes, you must apply the mutation to commit the write(s). Each change is applied in the order in which they were added to the Mutation.
Here’s an example, again in Java, of inserting 4 rows into Singers using the Mutation API. For each row, we add an insert mutation and then call
write() to execute the mutations. To see examples in other supported languages, see Modifying data using mutations in our official Spanner documentation.
Comparing DML and Mutations
Let’s take a deeper look at the differences.
- Read Your Writes (reading uncommitted results within an active transaction) semantics: changes you make using DML statements are visible to subsequent statements in the same transaction. This is different from using mutations, where changes are not visible in any reads (including reads done in the same transaction) until the transaction commits. This is because mutations in a transaction are buffered client-side (locally) and sent to the server as part of the commit operation. As a result, mutations in the commit request are not visible to SQL or DML statements within the same transaction.
- Upsert: DML does not support a native upsert, an operation that inserts rows into a database table if they do not already exist, or updates them if they do. However, there is an insert_or_update operation in the mutations world. Applications using DML can work around this limitation by reading the row first, and then using the appropriate DML statement.
- Constraint Checking: Spanner checks the constraints after every DML statement. This is different from using mutations, where Spanner buffers mutations in the client until commit and checks constraints at commit time. Evaluating the constraints after each statement allows Spanner to guarantee that the data that a DML statement returns is consistent with the schema.
- SQL Syntax: DML provides a conventional way to manipulate data, you can reuse SQL skills to alter the data using the DML API.
Avoid using DML and Mutations in the same transaction
If a transaction contains both DML statements and mutations in the commit request, Spanner executes the DML statements before the mutations. To avoid having to account for the order of execution in your client library code, you should use either DML statements or the mutations in a single transaction, but not both.
Here is an example that illustrates potentially surprising behavior. The code inserts two rows into Albums using the Mutation API. The snippet, then calls
executeUpdate() to update the newly inserted rows and calls
executeQuery() to read updated albums.
If you were to execute this code, you’d see 0 records updated. This happens because the changes we made using Mutations are not visible to subsequent statements until the transaction commits. Ideally, we should have buffered writes only at the very end of the transaction.
As we’ve seen, DML and Mutations support similar data manipulation features. Choosing which approach to use will depend on whether some of the richer features such as “Read Your Writes”, “Upsert” or support for SQL syntax are important to your use case.