Photo by Aryan Singh on Unsplash

6 Steps to Migrate to Cloud Spanner

Jerene Yang
Google Cloud - Community
4 min readJul 6, 2020

--

This is an overview of steps necessary to migrate to Cloud Spanner with some application downtime (zero downtime migration is also possible with Cloud Spanner, but is outside the scope of this blogpost). We assume you have evaluated Cloud Spanner (perhaps using our free emulator — try it here!) and determined it is optimal for your use case. This blogpost aims to provide a framework for the migration process. Based on your specific source database, some steps may be more involved than others.

Step 1: Size Instance and Prepare App

Based on your QPS and data size requirements, determine the number of spanner nodes that you will need. You can use 10,000 QPS reads / 2,000 QPS writes per spanner node as a starting point, but do note that actual QPS varies based on the size of your requests and read-write mix.

To prepare your app for migration, create a database access object for your original database and one for spanner. You should be able to easily toggle between the two with a flag. This will greatly help you later on in the migration process. Optionally you can set up dual writes (writes to both the original database and spanner) to reduce impact to users in case the migration is not successful. NOTE: The purpose of this dual write is as a fall back mechanism, and not to reduce the amount of downtime the application experiences.

Step 2: Load Test

To ensure you have sufficiently provisioned the instance for your workload, you will need to perform a load test on the instance with some synthetic representative workload. Things to note are QPS, P50, P95, P99 latencies for reads and writes, throughput, CPU utilization, and any other metrics that you are interested in. If you are seeing CPU utilization over 65% for regional or over 45% for multi-regional instances, you will need to scale up your instance. The load test needs to run over a long period of time (several hours) for reliable results. This is because the instance needs time to tune itself to the workload for optimal performance. We also recommend rerunning the test on larger and smaller instance sizes. When in doubt, err on the side of having more nodes in the instance.

Step 3: Bulk Migration

Take a snapshot / export of your current database and import that into spanner (If you are only doing bulk migration without deltas, you should turn off your application prior to taking the snapshot). If you are migrating from another Google Cloud database (or another instance of Cloud Spanner — say from a regional instance to a multi regional instance), we recommend you use Dataflow to accomplish this. Follow the instructions here on how to export from your origin database and import into spanner. If you are migrating from a non Google Cloud source, refer to your current database’s documentation regarding how best to perform this step.

If the time taken to move the entire dataset is more than the allowable downtime of the application, you will need to perform a one time bulk migration and a one time delta migration.

TIP 1: You can scale up your spanner instance for this stage to decrease the time data migration takes, and scale down post migration. Conduct some tests to get a sense of how your migration time changes as you increase the number of nodes.

TIP 2: One time bulk migration is a lot simpler than bulk + delta migration. Think of your database as a collection of tables and determine if some tables can be migrated beforehand while the application is still live and calculate whether the allowable downtime can fit just the tables that will experience rapid changes. For example, log data / write-only data can be moved beforehand.

If delta migration is needed, you will need to turn on Change Data Capture (CDC) or have a way to figure out what data has changed since this bulk migration was done (timestamps, etc).

Step 3b: Delta Migration (Optional — only for applications whose allowable downtime is less than time taken for entire dataset movement)

After bulk migration is complete and validated, turn off the application. Begin delta migration by moving the new data that has come in since the snapshot / export. We recommend you use a dataflow template for this as well to parallelize migration.

Step 4: Validation

After delta migration is done, we recommend you run a few validation queries to ensure that the migration was successful. If anything goes wrong, you can fall back to your origin database.

Step 5: Warm up spanner + Activate application

If you can activate your application to users in stages (for example, only allow 10% of the users to have access, ramping up to 20%, etc) then you will not need to warm up Spanner. Otherwise, we recommend you run with simulated production traffic (Note: Non-representative traffic may end up making latencies worse) for 30mins-1hour before activating your application. This is to allow Spanner to tune itself to best serve your workload. Without warmup, you may experience high tail latency immediately after activating the application though it will eventually stabilize.

Reads and writes should now be served entirely from Spanner. Optionally, if you have dual writes activated, you can fall back to your original database in the event something goes wrong. You can also validate that both Spanner and your original database are returning the same results.

If you have any further questions, reach out to us here!

--

--

Jerene Yang
Google Cloud - Community

Current Googler. Ex-startup co-founder. Technology enthusiast passionate about female empowerment, team building, and making beautiful products.