Moving from AWS Aurora to Google CloudSQL

Published in

YoungCapital dev

7 min readAug 28, 2018

YoungCapital has been using the Amazon Web Services ecosystem for many years but, with recent developments of Google Cloud, we have decided to make a complete move of our infrastructure there. It is a big and time consuming operation involving transfer all of many fragile projects (monoliths and microservices) and technologies.

Transferring our enormous databases without causing downtime was turning into one of the biggest challenges we faced so far.

When we firstly started exploring the options of transferring our databases, there was no native option to launch a Google CloudSQL 2nd generation instance that replicated from an external master (AWS Aurora in our case), this meant that we had to rely on external tools to provide this functionality.

We tried using Tungsten Replicator to do the replication by reading from our Aurora instance and writing to a new Google CloudSQL instance, but quickly we ran into several issues that made us reconsider this choice. Lucky for us, just when we were trying to get Tungsten to work, we noticed Google offering a new migration feature, allowing CloudSQL to replicate from an external master (https://cloud.google.com/sql/docs/mysql/migrate-data).

There is however one caveat that prevented us from directly using our AWS Aurora machine as an external master. One of the requirements Google poses for the external master is the required support for GTID transactions, which unfortunately AWS Aurora does not support.

We attempted to setup a custom MySQL machine between our Aurora machine and CloudSQL to see if MySQL natively offers a possibility to read from a server with ANONYMOUS transactions (non-GTID) and is able to act as a master to a GTID enabled slave. This attempt failed out of the box, since standard MySQL only supports either ANONYMOUS or GTID transactions, and the setting needs to be consistent across the entire replication chain.

We found help in an article posted by Booking.com that explains how they introduced GTID-based replication into their MySQL setup. While our goal is different, essentially we were bumping into the same problem, namely, how do we replicate from a master using ANONYMOUS transactions to a GTID enabled slave.

The solution proposed by the article involved using a patched MySQL that disables verification that GTID-based replication is enabled on the entire chain, a mode that they named ANONYMOUS_IN-GTID_OUT. They have also created a feature request in the MySQL bug tracker to have this functionality supported in the original MySQL, but this has not been released yet.

Inspired by their article, we built a new version of MySQL that included the ANONYMOUS_IN-GTID_OUT patch to see if we would be able to setup replication between Aurora and Google Cloud using the patched MySQL in between.

Our attempts succeeded and we were able to have a working replication between AWS Aurora and Google CloudSQL using the patched MySQL as a middle-man. It takes ANONYMOUS transactions from AWS and introduces GTIDs to all transactions to allow for direct replication to Google Cloud.

Building the patched MySQL

We use Amazon Linux on our EC2 instances in AWS, so the easiest way to build a custom MySQL was to spin up a new machine in Amazon and leverage the MySQL provided source RPM package as a starting point. We used the MySQL 5.6.41 RHEL 6 source package downloaded from the MySQL website.

# Download the MySQL source RPM
curl -LO https://dev.mysql.com/get/Downloads/MySQL-5.6/MySQL-5.6.41-1.el6.src.rpm# Extract the source RPM package
rpm -i MySQL-5.6.41–1.el6.src.rpm# cd into the extracted package
cd rpmbuild/SOURCES# And extract the MySQL sources and delete the original tar.gz file
tar -zvf mysql-5.6.41.tar.gz

Now, we are ready to apply the patch suggesteed by the Booking.com article. We open the mysql-5.6.41/sql/rpl_slave.cc file, look for the get_master_version_and_clock() function and remove the following code block:

if (mi->master_gtid_mode > gtid_mode + 1 ||
 gtid_mode > mi->master_gtid_mode + 1)
{
 mi->report(ERROR_LEVEL, ER_SLAVE_FATAL_ERROR,
 “The slave IO thread stops because the master has “
 “@@GLOBAL.GTID_MODE %s and this server has “
 “@@GLOBAL.GTID_MODE %s”,
 gtid_mode_names[mi->master_gtid_mode],
 gtid_mode_names[gtid_mode]);
 DBUG_RETURN(1);
}

Once this code block is removed, we need to build a new set of packages that we can use to install the patched MySQL on our machine. To build a new set of packages:

# Remove the original tar.gz file
rm mysql-5.6.41.tar.gz# Build a new tar.gz file using the changed source code
tar czf mysql-5.6.41.tar.gz mysql-5.6.41/# And remove the directory
rm -rf mysql-5.6.41/# To build the new packages we need some dependencies
sudo yum install -y rpm-build gcc gperf ncurses-devel zlib-devel gcc-c++ libaio-devel cmake openssl-devel cyrus-sasl-devel openldap-devel perl-Data-Dumper perl-JSON# And build the RPM (this will take a while)
rpmbuild -ba SPECS/mysql.spec

We have a new set of RPM packages available in the rpmbuilds/RPMS folder after the rpmbuild has finished. We downloaded these RPM files from the machine and stored them in a Google Cloud Storage bucket.

Setting up the replication

Installing the middle-man MySQL machine

At this stage we have our newly patched MySQL. It is time to setup the middle-man MySQL machine.

Create a new EC2 instance, preferably in the same zone as the database you want to replicate, and install the custom MySQL using the RPMs we‘ve created.

yum install perl-Data-Dumper MySQL-shared.rpm MySQL-server.rpm MySQL-client.rpm

Add the following configuration settings to my.cnf after installing the RPMs:

[mysqld]
gtid_mode                = ON
enforce-gtid-consistency = ON
log-bin                  = /var/lib/mysql/mysql-bin
log-slave-updates        = ON
expire-logs-days         = 14
sync-binlog              = 1
binlog-format            = ROW

With these settings in place, we’re ready to start the actual replication.

Starting the replication

Starting the replication from an existing machine usually is done by calling mysqldump with the master-data=1 option enabled, but Aurora doesn’t support this. Luckily, there is a trick enabling us to get a consistent snapshot with the correct master data position: making a snapshot of the Aurora instance you want to replicate, and restoring that snapshot to a new Aurora instance.

We can find an event on the new Aurora machine in the RDS dashboard that shows us the exact binlog file and position that was used to restore the snapshot. Write this down somewhere — we will need this later.

It’s finally time to do a mysqldump of the snapshot machine and restore to the middle-man MySQL we’ve just set up.

mysqldump -h <snapshot-hostname> -u <username> --password=<password> --databases <list-of-databases> --order-by-primary --skip-add-drop-table --skip-add-locks --skip-disable-keys --skip-set-charset --no-autocommit --default-character-set=utf8 --single-transaction | mysql -u root

Configure (but don’t start) the replication from our original Aurora to the middle-man MySQL:

CHANGE MASTER TO MASTER_HOST = '<original-rds-hostname>', MASTER_PORT = 3306, MASTER_USER = 'replication', MASTER_PASSWORD = 'mysecretpassword', MASTER_LOG_FILE = '<binlog-file>', MASTER_LOG_POS = <binlog-pos>;

Important to notice that we replicate from the original Aurora instance, not from the snapshot. The snapshot is only used to get a consistent start position. The MASTER_LOG_FILE and MASTER_LOG_POS should be filled with the values we got from the event that was logged when the snapshot restore finished.

Finally, we create a backup of the middle-man MySQL to be used as a starting point for the Google CloudSQL migration process, thus create a Google Cloud Bucket in the project where you want to migrate the database to and start a backup from the middle-man MySQL machine to the bucket:

mysqldump -u root --databases <list-of-databases> --order-by-primary --skip-add-drop-table --skip-add-locks --skip-disable-keys --skip-set-charset --no-autocommit --default-character-set=utf8 --single-transaction --set-gtid-purged=on --master-data=1 | gzip | gsutil cp  - gs://<your gcs bucket>/dump.sql.gz

Having created the backup, start the replication by running START SLAVE on the middle-man MySQL machine and we’re finally ready to start the Migrate data process in Google CloudSQL.

Point Google Cloud to use the middle-man MySQL machine as a source, follow the guide, and you should be good to go! Once Google Cloud finishes reading the database dump we put in GCS, it should automatically start replicating.

Automating these steps

Having done this by hand multiple times and verifying that it worked on multiple sets of databases, we decided to automate as many steps as we could.

We created a terraform project that automates setting up the EC2 machine, creating the GCS bucket, getting credentials and placing a scripted version of the set of commands mentioned above on the EC2 machine.

With terraform automation, we only need to login to the machine and start the script to kick-off the backups and start the replication. After the script succeeds, it outputs instructions to setup the replication on Google Cloud.

We published the terraform code and shell scripts we wrote to GitHub, the repository can be found at https://github.com/karsthammer/migrate-rds-to-cloudsql.

Finding this solution took quite some time. Our Google searches didn’t end up with many results for doing a database move between AWS and Google, and experimenting with some suggested external tools took us more time than we originally expected.

We hope that this article can help others who are facing the same issue, whether it is moving between cloud providers or moving between different versions of MySQL.