Online Schema Migration with MySQL on GCP (Cloud SQL)
This is a note on trying online schema migration with MySQL on Cloud SQL. I tried those two tools:
- pt-online-schema-change
- gh-ost
Terms
I tried them with MySQL under the following conditions:
- Cloud SQL Second Generation
- MySQL Version 5.7 (default as of writing)
- Binary logging is enabled
- Access via Cloud SQL Proxy
Online DDL
The first thing that comes to mind for when performing online migration is online DDL. However, there are limitations.
The main advantage of online DDL is that you don’t need to rely on any additional tools, as MySQL provides all of them for you. Under certain conditions, you can access the table being migrated (Concurrent DML) and perform an in-place migration that doesn’t require a large amount of disk space or I/O.
However, there is a major concern regarding Online DDL: even if you run a query with LOCK=None
MySQL will only perform it online on the master, blocking the replicas from applying DML until the master completes your DDL. Therefore, running DDL on a large table can cause significant replication lag.
In addition, it is not possible to throttle or pause the migration process, and so the load on the master cannot be controlled.
These problems are not specific to Cloud SQL, but to MySQL (InnoDB) in general.
If you don’t care about load or replication lag, online DDL might be a good option, but it wouldn’t be a common case if you run a service that offers a higher service level. So, I recently tried out two online migration tools to see if they could be used with Cloud SQL.
pt-online-schema-change (pt-osc)
Percona’s pt-osc is an online schema migration tool that has been popular for many years.
It creates a new table with the desired schema, then starts copying rows from the original table. It also creates triggers to mirror the changes that occurred in the original table during the migration process to the new table. Finally, pt-osc renames the table, switching between the original table and the new table.
Run pt-osc with Cloud SQL
When I tried to run pt-osc with Cloud SQL, I had to pay a touch of attention to the following points.
1) log_bin_trust_function_creators
must be enabled
As pt-osc creates triggers, it requires a user with SUPER previlige but this is impossible in Cloud SQL. Instead, you can enable log_bin_trust_function_creators
in your instance to allow all users to create stored functions and triggers. You can find how to change database flags on Cloud SQL here.
2) Explicitly specify the replica with --check-slave-lag
pt-osc monitors replication lag, pauses and throttles itself if it detects a lag greater than --max-lag
. pt-osc tries to find replicas by runningSHOW SLAVE HOSTS
and SHOW PROCESSLIST
on the master by default. However, as long as you have pt-osc connect to MySQL servers via Cloud SQL Proxy, pt-osc should not be able to reach the host information retrieved from the MySQL server. You would see the following warning message:
No slaves found. See — recursion-method if host localhost has slaves.
Not checking slave lag because no slaves were found and — check-slave-lag was not specified.
pt-osc provides a way of explicitly passing replica host information. I was able to get it to recognize replica by adding a flag like —-check-salve-lag D=foo,h=127.0.0.1,P=33001
. To see if this has succeeded, you can check the log.
Will check slave lag on:
localhost -> 127.0.0.1:33001
Example of pt-osc command execution
$ pt-online-schema-change \
--alter=“DROP COLUNN bar ”D=test,t=foo,h=127.0.0.1,P=33000,u=root,p=pass \
--check-slave-lag="D=test,h=127.0.0.1,P=33001,u=root,p=pass" \
--print \
--execute
gh-ost
gh-ost is designed as TRIGGER-less to avoid problems caused by triggers. You can see here what kind of problems triggers could cause.
Online migration with gh-ost goes like this: gh-ost creates a new table with the desired schema and copies rows from the original table to the new one. To mirror the changes that occurred in the original table to the new table, gh-ost connects to MySQL server and receives binlog events as though it were a replica server. It then extracts the changes from the binlog events and applies them to the new table in the master. Finally, it performs the final phase called cut-over, where it switches the tables.
Thanks to the triggerless design, gh-ost provides a more flexible throttling and some interactive commands that can control its behavior without pausing the migration process.
Run gh-ost with Cloud SQL
When I tried to run gh-ost with Cloud SQL, I had to pay a touch of attention to the following points.
1) run with --allow-on-master
flag
With gh-ost, it is recommended the binlog events be read from a replica in order to reduce the load on the master. It also offers a way to safely test the migration process, where you don’t have to touch the master server by having gh-ost connect to the replica (with — migrate-on-replica
or--test-on-replica
option).
However, as Cloud SQL doesn't allow to create a replica of a replica, there is no way to have gh-ost receive binlogs from a replica with Cloud SQL. (As mentioned here, it might be possible with an external replica, though)
To have gh-ost connect to the master, --allow-on-master
flag is required. Please see this document for more details.
2) run with --gcp
flag
I faced the following error when running via Cloud SQL Proxy without --gcp
option:
FATAL Unexpected database port reported: 3306
The document says it’s for the 1st generation, but it seems to be required if you run it via a tunnel connection (i.e. if the port specified for the connection is different from the port that is actually being served).
Example of gh-ost command execution
$ gh-ost \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas=127.0.0.1:33001 \
--max-lag-millis=3000 \
--user=root \
--password=pass \
--host=127.0.0.1 \
--port=33000 \
--allow-on-master \
--database=test \
--table=foo \
--verbose \
--alter="DROP COLUMN bar" \
--default-retries=120 \
--panic-flag-file=/tmp/host.panic.flag \
--serve-socket-file=/tmp/ghost.test.foo.sock \
--gcp \
--execute
Afterword
I found that both tools somehow work with Cloud SQL. The next step is to run it in my production environment. I will update this note if I find any issues specific to Cloud SQL with schema migration throughout operations.
Works Cited: