How we migrated our Postgres database to Google Cloud SQL with minimal downtime.
In a recent post we described the migration of Mavenlink’s integrations platform from Heroku to Kubernetes running on Google Kubernetes Engine. Part of that process involved migrating 35G of data from Heroku’s hosted Postgres service to Google’s Cloud SQL service.
We were able to complete the majority of the migration of our production app to Kubernetes without tangible downtime. We did this by spinning up new worker and web pods on the Kubernetes cluster and pointing them to the original database. We could then point the DNS to the Google Cloud load balancer, and process incoming traffic seamlessly on either system while that propagated.
Finding the Right Approach
Option 1: Replication with Seamless Cutover
Our first thought for migrating the Postgres database was replication. In Heroku we had a primary database and several followers, giving us both redundancy and the ability to keep the app up while the infrastructure was rotated or upgraded. In theory we could create a replica in the destination system following the primary in the source system, and then switch our app to use the replica as a primary with no interruption. Unfortunately, Cloud SQL doesn’t support replicating from external databases so we had to table that option.
Option 2: Dump, Transfer, and Import
Next, we shifted our focus to transferring the data quickly so that the downtime required would be minimal. We considered a few options. Google has some helpful documentation for exporting and importing data to Cloud SQL, including a particularly useful command to format a dump correctly for use with
gcloud sql import.
pg_dump -U [USERNAME] --format=plain --no-owner --no-acl [DATABASE_NAME] | sed -E 's/(DROP|CREATE|COMMENT ON) EXTENSION/-- \1 EXTENSION/g' > [SQL_FILE].sql
The rub there for a production migration is that we’re still stuck with a three-step operation, each with its own transfer.
- First, we would export the data using
pg_dump. This could be optimized by running the export as close to the database as possible, like in a Heroku dyno in our case.
- Then we have to transfer the dump to Google Cloud Storage, which is the only storage device from which Cloud SQL can import. If you controlled the source system you could use GCS FUSE to mount your GCS bucket into the filesystem, export directly there, and eliminate this step.
- Finally, we would run the import. This is presumably quite optimized because Google owns both the storage bucket and the SQL server. There may be some delay here before the import is kicked off as well. They don’t specify import performance in their SLA, but there is a ‘pending’ status. This is probably minimal, but in a production migration we want to control as many variables as possible.
This process would have worked, but the multiple steps provide more points of failure, and redundant transfers.
Option 3: Streaming Dump and Import
The third option, and the one we chose to employ, was to do a streaming export / import by piping
pg_restore. We actually used the Heroku CLI's
pg:pull command, which is a nice wrapper around
pg_dump, essentially boiling down to:
pg_dump --verbose --format=custom --compress=0 | pg_restore --verbose --no-acl --no-owner
This approach does the transfer in one step, using Postgres’ custom archive format. Compression is turned off because with plenty of bandwidth on the source and target systems, and the intermediary machine running the command, the overhead of compression would become the bottleneck.
We wanted to run this command on a high availability machine in the GKE cluster, keeping it as close to the database instance as possible. This makes the longest leg of our trip (and likely the main bottleneck to speed) the one-time transfer over the wire between the source database and the intermediary machine.
This achieved a transfer of our production data set, 35G, in around 50 minutes, or about 12MBps.
Dialing In the Details
To automate the process as much as possible we prepared a docker image based on
ubuntu:18.04 with the
psql command and
RUN apt update
RUN apt install -y curl postgresql-client gnupg tmux screen
RUN curl https://cli-assets.heroku.com/install-ubuntu.sh | sh
COPY db-migrator.sh /usr/local/bin/db-migrator
We injected the database credentials into the container as environment variables by setting them as Kubernetes secrets, and then created a simple bash script to use those secrets with
#!/bin/bashheroku pg:pull $SOURCE_DB postgres://$PG_USERNAME:$PG_PASSWORD@$PG_HOST/$PG_DBNAME -a $SOURCE_APP
This allowed us to set up everything and be ready to go. When doing a mission-critical operation it’s a good idea to keep it simple and remove as many points of human error as possible. This way, when the time came to perform the actual migration, we could just
kubectl exec into the container running on the cluster, and type