Cloud Spanner — Latency comparison between DML and Mutations

Published in

Google Cloud - Community

5 min readMay 23, 2023

Data Manipulation Language (DML) and Mutations are two APIs in Spanner that you can use to modify data. Each offers similar data manipulation features.

The documentation compares and describes both approaches, but I’m surprised that it doesn’t cover important topics such as performance and latency.

Should we expect similar write performance using any of both APIs?

Let’s do some tests to answer this question.

I’ll use a simple python program which creates a single session on Cloud Spanner and runs transactions composed by insert statements (using DML or Mutations) over a table “TEST_TABLE”.

Let’s get started

For our first test, we’ll run a workload writing a single row per commit. So, we can measure the latency of a single write operation for both APIs.

Workload using Mutations

python3 load_generator.py “test-dba” “dbtest” “MUTATIONS” 200000 1

A single write using Mutations has an avg latency of about 3.7ms.

Now, let’s run the program using DML instead.

Workload using DML

python3 load_generator.py “test-dba” “dbtest” “DML” 200000 1

Using DML our avg transaction latency is about 12.3ms, this is 4x higher than using Mutations!

But, why does Mutation have better performance than DML?

First, there is something important to notice. Commit latency for both APIs was very close. Therefore, there is no difference in the write operation itself (applying mutation + replication) but the additional latency for DML must be caused by how the API works.

Let’s take a look to our transactions detail:

Using Mutations

Using DML

Ok, here is something interesting. The transaction using DML is composed for an additional read operation (a read over “_exists” column, this is a check of uniqueness for the id to be inserted). I think that this gives us a clue.

Now, let’s check the API method metrics to obtain a full picture of how both APIs works:

API methods during workload using Mutations

With Mutations we have just the Commit method related to the write operation.

API methods during workload using DML

We can see that in our test using DML, in addition to the Commit method we have the ExecuteSql method.

This method (link) is responsible for executing an SQL statement. So, this method parses the Insert statement and checks the uniqueness for the id (this is the read over “_exists” column we saw in the transaction details).

If everything goes ok, return an ACK to the client which then calls the Commit method to persist the mutation.

How much latency is adding the ExecuteSql method?

The avg latency for the ExecuteSql method call is 3ms. Also, we have to consider an extra network latency (because the round trip for ExecuteSql method call) and additional latency by the client between the different API method calls ( ExecuteSql and Commit).

I guess that considering the above information the extra 9ms of latency for DML is justified and makes sense.

Multi statement Transactions

With Mutations the changes in a transaction are buffered client-side (locally) and sent to the server as part of the commit operation in batch mode, checking constraints at commit time ( single round trip).

However, with DML the constraints are checked after every DML statement (multiple round trips). Knowing this, we should expect an ExecuteSql method call for each DML statement hence we’ll get a major difference in latency between DML and Mutations in a multi statement transaction.

Let’s check it out:

I’m gonna run the same program as before but this time writing 4 rows per commit.

Multi statement workload using Mutations

python3 load_generator.py “test-dba” “dbtest” “MUTATIONS” 200000 4

Latency is similar to our test writing a single row. The avg latency in this case was 4.6ms (vs 3.6 ms writing one row).

Multi statement workload using DML

python3 load_generator.py “test-dba” “dbtest” “DML” 200000 4

Transaction avg latency is about 50ms. This is 4x time higher than our test with a single insert statement. Therefore, latency increases linearly according to the number of statements per transaction.

API method calls:

Checking the API method call metric, we confirm that we have an ExecuteSql method call for each statement involved in the transaction (In this case, we have four ExecuteSql method calls for each Commit). So, our thoughts were right.

Summary

Writes using Mutations are faster than using DML. This is because DML uses an additional API method call (ExecuteSql), adding overhead over the transaction.
Latency for transactions using DML statements increases linearly according to the number of statements involved in it.