Google Cloud Spanner Technical Overview
Cloud Spanner is a relational database with transactional consistency at scale.
As I was trying to read and learn about Cloud Spanner, I realized a need for a short Overview of Cloud Spanner. Based on my notes and understanding from the documentation, I put the following overview hoping that it will help others interested in this technology.
Google Cloud Spanner is fully managed and employs automatic sharding (called splits), and replication to scale up to millions of nodes and trillions of database rows and still be highly available. Spanner is used at Google for large scale mission critical applications that require strong consistency, including Google AdWords.
When to use Google Cloud Spanner
Google Cloud Spanner can be used to meet the following requirements for your application:
- OLTP (Online Transactional Processing)
- Global scale
- Relational data model
- ACID/Strong or External consistency
- Low latency
- Fully managed and highly available
- Automatic replication
Google Cloud Spanner has been used in the following use cases:
Critical high load transactions
- Financial trading
- Insurance
- Telecom and billing
- Global call centers
- Supply-chain management and manufacturing
- Logistics and Transportation
- E-Commerce (High Availability)
Features
Google Cloud Spanner is fully managed, and requires no administration overhead. Google Cloud Spanner maintains the following SLAs; multi-region deployments provide a 99.999% availability SLA — which equates to 5min downtime per year, and single region deployments provide a 99.99% availability SLA — 52.5min downtime per year. Database administration are only required to chose the region configuration when the database is created, and then resize compute resources to manage performance at scale. Other administrative functions such as replication and sharding are managed, and system updates are able to occur transparently without requiring database outages.
As a fully managed system, transient failures are managed internally, and do not need to be accounted for in the application layer. Transaction failures due to potential deadlocks or other reasons need to be considered in the application layer.
Instances can be either regional or multi-regional. In the case of regional instances data will be bound to that region to provide locality. Multi-regional makes use of, paxos based replication, TrueTime and leader election, to provide global consistency and higher availability. Google Cloud Spanner instances have:
- At least three read-write replicas of the database each in a different zone
- Each zone is a separate isolation fault domain
- Paxos distributed consensus protocol used for writes/transaction commits
- Synchronous replication of writes to all zones across all regions
- Database is available even if one zone fails (99.999% availability SLA for multi-region and 99.99% availability SLA for regional)
Google Cloud Spanner provides security through IAM integration, with permissions and access configurable for groups and users at the instance and database level. Data stored within Google Cloud Spanner is also encrypted at rest. Comprehensive audit logging is also provided for both Admin Activity and Data Access. Admin activity logs includes any operation that modifies the configuration or metadata of a resource. Data Access logs contain API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read database content.
Replication is used for both global availability and geographic locality, with fail-over between replicas being transparent to the client. Transactions are replicated using a Paxos distributed consensus protocol to ensure transactions are available in sufficient replicas before being committed. Google Cloud Spanner automatically reshards data into splits and automatically migrates data across machines (even across datacenters) to balance load, and in response to failures. Spanner’s sharding considers the parent child relationships in interleaved tables, and related data is migrated together to preserve query performance.
Google Cloud Spanner is exposed to applications through multiple channels:
- Client libraries
- C#
- C++
- Go
- Java
- Node.js
- PHP
- Python
- Ruby
- Rest API, including support for instance and database management, as well as CRUD and more
- JDBC driver
- Google Cloud console, for full administration support, as well as control plane and data plane operations
- The gcloud command line tool, including instance and database management, full CRUD and also operations management
Data in Google Cloud Spanner can be queried/modified using SQL queries (ANSI 2011), Data Manipulation Language (DML) as well as mutations. Schema updates occur on the live database without requiring any downtime.
Google Cloud Spanner provides external consistency for all transactions, which is a stronger guarantee than both ACID and Serializability. It does this by using TrueTime, a globally distributed clock with high accuracy and availability. TrueTime consists of synchronised GPS and atomic clocks with minimal drift. TrueTime generates monotonically increasing timestamps, which allows globally consistent reads across the database at a timestamp without requiring locks. These highly accurate timestamps provide serialization of transactions minimising transaction contention and failures due to timestamp clashes.
When reading data in Google Cloud Spanner in either a read-only transaction or a single read call, you can set a timestamp bound, which tells Google Cloud Spanner how to choose a timestamp at which to read the data. The document on reads provides further information on the types of reads and when they would be useful.
Scaling is managed by the addition and removal of nodes:
- Adding nodes scales your Google Cloud Spanner instance linearly
- Splits are automatically distributed to the new nodes
- Each node allows for an additional 2TB of data storage
- Nodes provide additional compute resources to increase throughput
How Cloud Spanner compares to traditional databases
Google Cloud Spanner differs from traditional databases in some key ways, and have similarities in others.
Relational tables: Relational with optimisation for Interleaved tables
Foreign Keys: Foreign Keys and Interleaved tables
SQL: SQL, DML and mutations
JDBC: Supported with 2 drivers
Client Libraries: Client Libraries for: C#, Go, Java, Node.js, PHP, Python, Ruby
Stored procedures: Can use cloud functions to manage regular long running transactions
Triggers: Triggers are not currently supported
Cursors: Paged results if required
Views: Not currently supported
Data Definition Language: Data Definition Language
Performance Schemas: Audit and Performance Schemas
ACID compliance: External Consistency (even more than ACID)
SSL support: SSL support Out of the box
Query caching: Query Restart tokens with some caching
Sub-SELECTs: Supports “sub-selects” and “with” clauses
Replication support: Fully managed replication with 0 downtime failover
Partitioned tables: Automatic table splits and sharding for performance and fail-over
Clustering: Fully managed multi-zone replication
Multiple storage engines: Fully managed optimised storage
Schema updates with potential downtime: Live schema updates with no downtime
Key links
IMPORTANT READING
There is an abundance of documentation for Google Cloud Spanner, though if you are starting out and need some more information based on what you just learned, the following links are worth reading too.
Monitoring with the Console or Cloud Monitoring
TrueTime and External Consistency
Life of Cloud Spanner Reads & Writes
If you are really keen to understand the clockwork inside Google Cloud Spanner, they publish the White Papers that define the technology.