Google Cloud Spanner Technical Overview

Ash van der Spuy
Google Cloud - Community
6 min readOct 21, 2020

Cloud Spanner is a relational database with transactional consistency at scale.

As I was trying to read and learn about Cloud Spanner, I realized a need for a short Overview of Cloud Spanner. Based on my notes and understanding from the documentation, I put the following overview hoping that it will help others interested in this technology.

Google Cloud Spanner is fully managed and employs automatic sharding (called splits), and replication to scale up to millions of nodes and trillions of database rows and still be highly available. Spanner is used at Google for large scale mission critical applications that require strong consistency, including Google AdWords.

When to use Google Cloud Spanner

Google Cloud Spanner can be used to meet the following requirements for your application:

  • OLTP (Online Transactional Processing)
  • Global scale
  • Relational data model
  • ACID/Strong or External consistency
  • Low latency
  • Fully managed and highly available
  • Automatic replication

Google Cloud Spanner has been used in the following use cases:

Critical high load transactions

  • Financial trading
  • Insurance
  • Telecom and billing
  • Global call centers

Event-sourced systems

  • Supply-chain management and manufacturing
  • Logistics and Transportation
  • E-Commerce (High Availability)

Gaming

Features

icons © Google

Google Cloud Spanner is fully managed, and requires no administration overhead. Google Cloud Spanner maintains the following SLAs; multi-region deployments provide a 99.999% availability SLA — which equates to 5min downtime per year, and single region deployments provide a 99.99% availability SLA — 52.5min downtime per year. Database administration are only required to chose the region configuration when the database is created, and then resize compute resources to manage performance at scale. Other administrative functions such as replication and sharding are managed, and system updates are able to occur transparently without requiring database outages.

As a fully managed system, transient failures are managed internally, and do not need to be accounted for in the application layer. Transaction failures due to potential deadlocks or other reasons need to be considered in the application layer.

Instances can be either regional or multi-regional. In the case of regional instances data will be bound to that region to provide locality. Multi-regional makes use of, paxos based replication, TrueTime and leader election, to provide global consistency and higher availability. Google Cloud Spanner instances have:

  • At least three read-write replicas of the database each in a different zone
  • Each zone is a separate isolation fault domain
  • Paxos distributed consensus protocol used for writes/transaction commits
  • Synchronous replication of writes to all zones across all regions
  • Database is available even if one zone fails (99.999% availability SLA for multi-region and 99.99% availability SLA for regional)
Image © Google

Google Cloud Spanner provides security through IAM integration, with permissions and access configurable for groups and users at the instance and database level. Data stored within Google Cloud Spanner is also encrypted at rest. Comprehensive audit logging is also provided for both Admin Activity and Data Access. Admin activity logs includes any operation that modifies the configuration or metadata of a resource. Data Access logs contain API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read database content.

Replication is used for both global availability and geographic locality, with fail-over between replicas being transparent to the client. Transactions are replicated using a Paxos distributed consensus protocol to ensure transactions are available in sufficient replicas before being committed. Google Cloud Spanner automatically reshards data into splits and automatically migrates data across machines (even across datacenters) to balance load, and in response to failures. Spanner’s sharding considers the parent child relationships in interleaved tables, and related data is migrated together to preserve query performance.

Google Cloud Spanner is exposed to applications through multiple channels:

  • Client libraries
  • C#
  • C++
  • Go
  • Java
  • Node.js
  • PHP
  • Python
  • Ruby
  • Rest API, including support for instance and database management, as well as CRUD and more
  • JDBC driver
  • Google Cloud console, for full administration support, as well as control plane and data plane operations
  • The gcloud command line tool, including instance and database management, full CRUD and also operations management

Data in Google Cloud Spanner can be queried/modified using SQL queries (ANSI 2011), Data Manipulation Language (DML) as well as mutations. Schema updates occur on the live database without requiring any downtime.

Google Cloud Spanner provides external consistency for all transactions, which is a stronger guarantee than both ACID and Serializability. It does this by using TrueTime, a globally distributed clock with high accuracy and availability. TrueTime consists of synchronised GPS and atomic clocks with minimal drift. TrueTime generates monotonically increasing timestamps, which allows globally consistent reads across the database at a timestamp without requiring locks. These highly accurate timestamps provide serialization of transactions minimising transaction contention and failures due to timestamp clashes.

When reading data in Google Cloud Spanner in either a read-only transaction or a single read call, you can set a timestamp bound, which tells Google Cloud Spanner how to choose a timestamp at which to read the data. The document on reads provides further information on the types of reads and when they would be useful.

icons © Google

Scaling is managed by the addition and removal of nodes:

  • Adding nodes scales your Google Cloud Spanner instance linearly
  • Splits are automatically distributed to the new nodes
  • Each node allows for an additional 2TB of data storage
  • Nodes provide additional compute resources to increase throughput

How Cloud Spanner compares to traditional databases

Google Cloud Spanner differs from traditional databases in some key ways, and have similarities in others.

Relational tables: Relational with optimisation for Interleaved tables

Foreign Keys: Foreign Keys and Interleaved tables

SQL: SQL, DML and mutations

JDBC: Supported with 2 drivers

Client Libraries: Client Libraries for: C#, Go, Java, Node.js, PHP, Python, Ruby

Stored procedures: Can use cloud functions to manage regular long running transactions

Triggers: Triggers are not currently supported

Cursors: Paged results if required

Views: Not currently supported

Data Definition Language: Data Definition Language

Performance Schemas: Audit and Performance Schemas

ACID compliance: External Consistency (even more than ACID)

SSL support: SSL support Out of the box

Query caching: Query Restart tokens with some caching

Sub-SELECTs: Supports “sub-selects” and “with” clauses

Replication support: Fully managed replication with 0 downtime failover

Partitioned tables: Automatic table splits and sharding for performance and fail-over

Clustering: Fully managed multi-zone replication

Multiple storage engines: Fully managed optimised storage

Schema updates with potential downtime: Live schema updates with no downtime

Key links

IMPORTANT READING

There is an abundance of documentation for Google Cloud Spanner, though if you are starting out and need some more information based on what you just learned, the following links are worth reading too.

Best Practices

Cloud Platform Youtube Videos

Monitoring with the Console or Cloud Monitoring

Latency Metrics

Latency Troubleshooting Demo

Quotas and Limits

Audit Logging

Client Libraries

Event-Sourced Systems

DDL Reference

Gaming Best Practice

SQL Syntax

Query Execution Plans

TrueTime and External Consistency

CPU Utilisation Metrics

Information Schema

Life of Cloud Spanner Reads & Writes

Sessions

Bulk Loading Best Practice

Access Control

Long Running Operations

Performance Regressions

If you are really keen to understand the clockwork inside Google Cloud Spanner, they publish the White Papers that define the technology.

--

--

Ash van der Spuy
Google Cloud - Community

Advisor | Product | Strategy at Helix Startup Studio/Helix Collective