Cloud Spanner — How deleted rows could affect query latency

Pablo Arrojo
Google Cloud - Community
8 min readNov 10, 2023

Removing unneeded data is a good practice as we could reduce storage and backup cost, or improve query latency reducing the number of rows that database has to scan.

However, there are some cases where deleting data doesn’t provide the expected query latency improvement. In fact, there are cases where it could cause performance degradation.

Let ‘s do some tests to study these cases.

Schema

CREATE SEQUENCE id_sequence OPTIONS (
sequence_kind = 'bit_reversed_positive'
);

CREATE TABLE table_test (
id INT64 DEFAULT (GET_NEXT_SEQUENCE_VALUE(SEQUENCE id_sequence)),
field1 STRING(MAX),
field2 STRING(MAX),
field3 STRING(MAX),
ts TIMESTAMP,
) PRIMARY KEY(id);

CREATE INDEX idx_ts ON table_test(ts);

Tests

Let’s start with a simple query:

spanner> explain analyze select min(ts) from table_test;
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| ID | Query_Execution_Plan | Rows_Returned | Executions | Total_Latency |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| 0 | Serialize Result | 1 | 1 | 7.71 msecs |
| 1 | +- Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 7.71 msecs |
| 2 | +- Global Limit | 1 | 1 | 7.71 msecs |
| *3 | +- Distributed Union (distribution_table: idx_ts, split_ranges_aligned: false) | 1 | 1 | 7.71 msecs |
| 4 | +- Local Limit | 1 | 1 | 0.22 msecs |
| 5 | +- Local Distributed Union | 1 | 1 | 0.22 msecs |
| *6 | +- Filter Scan (seekable_key_size: 1) | 1 | 1 | 0.21 msecs |
| 7 | +- Index Scan (Index: idx_ts, scan_method: Scalar) | 1 | 1 | 0.21 msecs |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
Predicates(identified by ID):
3: Split Range: ISNOTNULL($ts)
6: Seek Condition: ISNOTNULL($ts)

1 rows in set (16.56 msecs)
timestamp: 2023-11-06T10:40:02.472676-04:00
cpu time: 9.65 msecs
rows scanned: 1 rows
deleted rows scanned: 0 rows
optimizer version: 6
optimizer statistics: auto_20231106_13_44_21UTC

As we can see, this query runs really fast. It takes just a few milliseconds.

We’re looking for the minimum value over an ascending ordered index, so Spanner just has to scan the first row in the index.

Now we’ll delete the “oldest” rows based on ts column. In this case we’re deleting 20M rows:

spanner> select count(*) from table_test where ts <'2023-11-06T10:18:11.196232Z';
+----------+
| |
+----------+
| 20086549 |
+----------+
1 rows in set (14.52 secs)

Snippet:

row_ct = database.execute_partitioned_dml("DELETE FROM test_table 
where ts < '2023-11-06T10:18:11.196232Z'")

Let’s run our query again:

spanner> explain analyze select min(ts) from table_test;
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| ID | Query_Execution_Plan | Rows_Returned | Executions | Total_Latency |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| 0 | Serialize Result | 1 | 1 | 7.08 secs |
| 1 | +- Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 7.08 secs |
| 2 | +- Global Limit | 1 | 1 | 7.08 secs |
| *3 | +- Distributed Union (distribution_table: idx_ts, split_ranges_aligned: false) | 1 | 1 | 7.08 secs |
| 4 | +- Local Limit | 1 | 2 | 10.67 secs |
| 5 | +- Local Distributed Union | 1 | 2 | 10.67 secs |
| *6 | +- Filter Scan (seekable_key_size: 1) | 1 | 2 | 10.67 secs |
| 7 | +- Index Scan (Index: idx_ts, scan_method: Scalar) | 1 | 2 | 10.67 secs |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
Predicates(identified by ID):
3: Split Range: ISNOTNULL($ts)
6: Seek Condition: ISNOTNULL($ts)

1 rows in set (7.09 secs) <<<<<<<<<
timestamp: 2023-11-06T11:23:56.009557-04:00
cpu time: 10.72 secs
rows scanned: 1 rows
deleted rows scanned: 20086549 rows <<<<<<<
optimizer version: 6
optimizer statistics: auto_20231106_13_44_21UTC

After deleting rows our query takes 7 seconds to finish. This is a huge difference in comparison with the 16ms of execution time before the row deletion.

To understand the reason for this latency increase we just have to see the deleted_rows_scanned statistic. Spanner is scanning 20M of deleted rows.

Why?

First of all, Spanner uses MVCC, that means it keeps multiple immutable versions of data. For each deleted row, Spanner creates a new immutable version of these rows with some kind of “DELETED” status and a timestamp of the write transaction.

These deleted data would stay in the database for the configured retention time at least.

That explains why deleted data still stays in the database, but why is Spanner scanning these rows?

Well, it seems to be that if our query filter applies to deleted rows Spanner has to scan them during the execution phase (then these rows are discarded from the query result set).

For the specific case of our query, we’re searching for the minimum ts value. As our index is in ascending order and we’ve deleted the “first” 20M rows, Spanner has to scan all these deleted rows until it finds the first non-deleted row.

So, the total scanned rows increased from 1 to 20 millions significantly increasing the query latency.

Let’s do some additional tests to confirm our thoughts.

Since we’ve deleted all rows with a ts lower than 2023–11–06T10:18:11.196232Z if we search rows that apply this condition Spanner won’t return anything. However, we won’t have any improvement here because all the recently deleted rows will be scanned:

Below the evidence:

spanner> explain analyze select count(*) from table_test where ts <'2023-11-06T10:18:11.196232Z';
+----+-----------------------------------------------------------------------------------+---------------+------------+---------------+
| ID | Query_Execution_Plan | Rows_Returned | Executions | Total_Latency |
+----+-----------------------------------------------------------------------------------+---------------+------------+---------------+
| 0 | Serialize Result | 1 | 1 | 7.22 secs |
| 1 | +- Global Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 7.22 secs |
| *2 | +- Distributed Union (distribution_table: idx_ts, split_ranges_aligned: false) | 0 | 1 | 7.22 secs |
| 3 | +- Local Stream Aggregate (scalar_aggregate: true) | 0 | 2 | 9.56 secs |
| 4 | +- Local Distributed Union | 0 | 2 | 9.56 secs |
| *5 | +- Filter Scan (seekable_key_size: 1) | 0 | 2 | 9.56 secs |
| 6 | +- Index Scan (Index: idx_ts, scan_method: Scalar) | 0 | 2 | 9.56 secs |
+----+-----------------------------------------------------------------------------------+---------------+------------+---------------+
Predicates(identified by ID):
2: Split Range: ($ts < timestamp (2023-11-06 02:18:11.196232-08:00))
5: Seek Condition: ($ts < timestamp (2023-11-06 02:18:11.196232-08:00))

1 rows in set (7.22 secs)
timestamp: 2023-11-06T14:34:56.03827-04:00
cpu time: 9.61 secs
rows scanned: 0 rows <<<<<<<
deleted rows scanned: 20086549 rows <<<<<<<
optimizer version: 6
optimizer statistics: auto_20231106_13_44_21UTC

0 rows scanned, but 20M of deleted rows scanned. This resulted in a latency of 7 seconds for a query that should take a few milliseconds.

Now, let’s test using a query filter which excludes deleted rows. Since there are not deleted rows with ts higher than 2023–11–06T10:18:11.196232Z Spanner doesn’t scan deleted data and latency won’t be affected:

spanner> explain analyze select count(*) from table_test where ts  > '2023-11-06T10:18:11.196232Z';
+----+-----------------------------------------------------------------------------------+---------------+------------+---------------+
| ID | Query_Execution_Plan | Rows_Returned | Executions | Total_Latency |
+----+-----------------------------------------------------------------------------------+---------------+------------+---------------+
| 0 | Serialize Result | 1 | 1 | 6.25 secs |
| 1 | +- Global Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 6.25 secs |
| *2 | +- Distributed Union (distribution_table: idx_ts, split_ranges_aligned: false) | 1 | 1 | 6.25 secs |
| 3 | +- Local Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 6.25 secs |
| 4 | +- Local Distributed Union | 13431550 | 1 | 5.9 secs |
| *5 | +- Filter Scan (seekable_key_size: 1) | 13431550 | 1 | 5.25 secs |
| 6 | +- Index Scan (Index: idx_ts, scan_method: Scalar) | 13431550 | 1 | 3.86 secs |
+----+-----------------------------------------------------------------------------------+---------------+------------+---------------+
Predicates(identified by ID):
2: Split Range: ($ts > timestamp (2023-11-06 02:18:11.196232-08:00))
5: Seek Condition: ($ts > timestamp (2023-11-06 02:18:11.196232-08:00))

1 rows in set (6.26 secs)
timestamp: 2023-11-06T14:35:10.438704-04:00
cpu time: 6.25 secs
rows scanned: 13431550 rows
deleted rows scanned: 0 rows <<<<<<<<<<<<<<<<
optimizer version: 6
optimizer statistics: auto_20231106_13_44_21UTC

As we can see, deleted_rows_scanned is 0.

Finally, according to the evidence and understanding obtained with our tests, we could optimize our original query excluding all deleted rows as below:

spanner>  explain analyze select min(ts) from table_test where ts  > '2023-11-06T10:18:11.196232Z';
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| ID | Query_Execution_Plan | Rows_Returned | Executions | Total_Latency |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| 0 | Serialize Result | 1 | 1 | 0.25 msecs |
| 1 | +- Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 0.25 msecs |
| 2 | +- Global Limit | 1 | 1 | 0.24 msecs |
| *3 | +- Distributed Union (distribution_table: idx_ts, split_ranges_aligned: false) | 1 | 1 | 0.24 msecs |
| 4 | +- Local Limit | 1 | 1 | 0.22 msecs |
| 5 | +- Local Distributed Union | 1 | 1 | 0.22 msecs |
| *6 | +- Filter Scan (seekable_key_size: 1) | 1 | 1 | 0.22 msecs |
| 7 | +- Index Scan (Index: idx_ts, scan_method: Scalar) | 1 | 1 | 0.21 msecs |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
Predicates(identified by ID):
3: Split Range: (($ts > timestamp (2023-11-06 02:18:11.196232-08:00)) AND ISNOTNULL($ts))
6: Seek Condition: (($ts > timestamp (2023-11-06 02:18:11.196232-08:00)) AND ISNOTNULL($ts))

1 rows in set (4.09 msecs) <<<<<<<<<<<<
timestamp: 2023-11-06T14:35:25.954096-04:00
cpu time: 3.25 msecs
rows scanned: 1 rows <<<<<<<<<<<<
deleted rows scanned: 0 rows <<<<<<<<<<<<
optimizer version: 6
optimizer statistics: auto_20231106_13_44_21UTC

Deleted data retention

We can control the version retention period for a database setting the version_retention_period option. When versions of data exceed the configured retention period these data are not allowed to query anymore. However, this data still stays in the database until the compaction process runs and reclaims storage for deleted rows.

This means that despite deleted data being expired, Spanner will still scan these rows if query filters apply them until the compaction process runs.

My database has a default retention period of 1 hour, but even 4 hours after the data was deleted Spanner still scan deleted rows:

spanner>  explain analyze select min(ts) from table_test;
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| ID | Query_Execution_Plan | Rows_Returned | Executions | Total_Latency |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
| 0 | Serialize Result | 1 | 1 | 8.58 secs |
| 1 | +- Stream Aggregate (scalar_aggregate: true) | 1 | 1 | 8.58 secs |
| 2 | +- Global Limit | 1 | 1 | 8.58 secs |
| *3 | +- Distributed Union (distribution_table: idx_ts, split_ranges_aligned: false) | 1 | 1 | 8.58 secs |
| 4 | +- Local Limit | 1 | 2 | 11.18 secs |
| 5 | +- Local Distributed Union | 1 | 2 | 11.18 secs |
| *6 | +- Filter Scan (seekable_key_size: 1) | 1 | 2 | 11.18 secs |
| 7 | +- Index Scan (Index: idx_ts, scan_method: Scalar) | 1 | 2 | 11.18 secs |
+----+--------------------------------------------------------------------------------------+---------------+------------+---------------+
Predicates(identified by ID):
3: Split Range: ISNOTNULL($ts)
6: Seek Condition: ISNOTNULL($ts)

1 rows in set (8.59 secs)
timestamp: 2023-11-06T15:50:03.165712-04:00
cpu time: 11.23 secs
rows scanned: 1 rows
deleted rows scanned: 20086549 rows <<<<<<<<<<<
optimizer version: 6
optimizer statistics: auto_20231106_13_44_21UTC

Conclusions

  • Spanner scans deleted rows if query filter conditions match them.
  • These deleted data will stay in the database until the compaction process runs, within 7 days since data is expired.
  • If you delete data periodically or use TTL you should design your schema and queries keeping in mind that query filters shouldn’t match with deleted rows to avoid scanning them and improve query performance.

--

--