This is part of a series on Google Cloud Spanner. The first post provided a technical overview. This post details some of the concepts worth understanding when working with the database. During our work with Google Cloud Spanner, we quickly realised that we were not dealing with just another OLTP database, however, there are a few key pieces in its architecture that were specifically designed to allow performance at a global scale. These features need to be considered when designing your application as well as how data is managed within the database. Failure to consider these critical concepts whilst developing your application will likely lead to performance issues when the system is under load in production.
Google’s Cloud Spanner is a database solution with no equal. I specifically say solution, because it is the combination of software and hardware that truly makes Cloud Spanner unique. It is specifically well suited for globally distributed systems that require transactional consistency. It relies on TrueTime, which uses synchronised GPS and atomic clocks to serialize most transactions via a customised Paxos engine
The specific features are discussed at length in Google’s technical documentation, however, it is easy to miss the significance of these concepts when starting out. I have tried to provide a summary of the most critical concepts to help people starting out with Google Cloud Spanner
Primary keys play a large role in Cloud Spanner architecture and performance. However, primary keys have to be generated by the user or application. Primary Key indexes are automatically created to maintain performance. More detail on indexes can be found in the section below.
Monotonically increasing primary keys can create hotspots which impact query performance under load and at scale. There are multiple recommended primary key strategies, and choosing a primary key is a very important part of schema design.
Hot spots and Splits
For redundancy and performance, Cloud Spanner will divide and shard data into splits. Splits can happen because of load or data size, and data will be moved to balance load and increase performance.
Cloud Spanner typically creates a new split when an existing split reaches 4 GB. Single rows in excess of 4 GB will impact both query and transaction performance.
Cloud Spanner divides data among servers by key ranges, and as such Primary Keys need to be carefully considered to ensure performance. Monotonically increasing primary keys create a hotspot, as data is always added to the end of the same split. This can easily create a performance bottleneck under load.
This image illustrates how Spanner could split data across key ranges
- In this example, all new writes would end up in split 8.
The hotspot created by such consistent reads and writes in the same split will impede performance. This is especially true for bulk loads and write performance under load , and the continual resplitting that results will add latency to writes. Implementing a correct primary key strategy will thus lead improved performance.
Indexes and Secondary Indexes
Cloud Spanner stores indexes in separate index tables. Since index tables are treated as actual tables, when data not stored in the index table is retrieved, Cloud Spanner will create a join back to the source table from the index table. For secondary indexes additional fields can be specified and stored along with the indexes eliminating the back-join and improving performance.
As data is distributed across Cloud Spanner servers for redundancy and performance, index tables can be stored in different servers from the source data which can also be split across multiple servers. Indexes do come at a small storage cost as well as a small performance overhead for writes, but proper indexing can result in significant performance increases for queries.
- Cloud Spanner has additional schema structure functionality that allows for increased performance where parent-child relationships exist in the data, by making use of Interleaved Tables.
- Interleaved tables allow parent-child relationships to be returned with simple co-located joins which can be processed by the single server where the data is located.
- Storing data with this structure in interleaved tables will improve query and write performance, and avoid costly back joins and cross applies when non-interleaved data is automatically sharded into separate splits located on different servers.
- Splits only occur at the root of a hierarchy of interleaved tables. For row size and splitting, Cloud spanner considers a parent row and all its interleaved children to be part of the same spit.
This diagram illustrates split boundaries for multiple interleaved tables The tables are as follows:
- Parent: Singers
- Interleaved child: Albums
- Interleaved child: Songs
Each top level row, in this case a Singer row, will be split along with all the interleaved children in its hierarchy.
This white paper on optimizing schema design provides a good explanation of the mechanics involved.
Reading and Writing
Cloud Spanner provides externally consistent writes, even though those transactions could span multiple servers and even data centres across the globe. This guarantee is stronger than even the strongest ANSI/ISO standard SQL 92 isolation level — serializable, and is made possible by TrueTime. Each write creates a new immutable version of the data whose timestamp is that of the write’s transaction.
The data is stored for an hour, enabling multi-version concurrency control.
Timestamps are also assigned to each read transaction, and provides consistency guarantees for reads called strong reads. These timestamps also allow read-only transactions to execute without requiring read locks as there is a copy of the data state at the point in time that the query was executed.
Cloud Spanner’s MVCC also allows reads at a specific time within that history or timestamp bound. This is called a stale read, and in situations where the recency of a read isn’t critical, stale reads can be used to improve performance for multi-region configurations.
As is expected from a globally distributed database, write transactions carry an overhead of replication, and can also span multiple servers. For read-write transactions this results in a read-lock until the writes have been replicated and committed. Transactions that are read only should therefore take care to use read-only functionality over read-write.
Reads and writes in the same keyspace increases the chances of reading from and writing to a single split on a server. Minimizing transactions that write to or read from multiple rows for the same transaction decreases the chances of the transaction spanning multiple servers which come with a performance cost.
For architectural reasons, and to maintain its high performance and redundancy, Cloud Spanner has certain quotas and limits that should be considered in application design. Quotas can be increased with a lead-time.
For example, there is a limit of 20,000 mutations per commit, and a maximum of 15 joins per query.
NOTE: Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected.
These limits along with schema design and hotspot prevention have an impact on bulk loading, so ensure bulk loading best practices are being followed.
The size limit for a split is approximately 4GB, so a single row in excess of 4 GB will impact both query and transaction performance. A parent row and all its interleaved children rows contribute to row size as data is split at the parent row level.
Nodes & replication
To ensure availability and performance, Cloud Spanner replicates data across multiple zones as defined in the instance configuration, and shards them into splits.
The instance configuration defines the geographic placement and replication of the databases in that instance.
Cloud Spanner servers use a synchronous, Paxos-based replication, and every write is replicated to at least a majority of replicas before a transaction gets committed. This happens whether the data is being written to a single split managed by a server, or many splits in multiple servers in a multi-region instance.
The node count determines the amount of compute and storage resources that are available to the databases in that instance.
NOTE: It is important to monitor node utilisation to ensure performance and linear scaling of your application.
Google recommend keeping CPU utilization below 65% for region specific instances, and below 45% for multi regional instances
There is an abundance of documentation for Google Cloud Spanner, though if you are starting out and need some more information based on what you just learned, the following links are worth reading too.
If you are really keen to understand the clockwork inside Google Cloud Spanner, they publish the White Papers that define the technology.