As part of the data center migration project, we had a challenging task of migrating our existing PostgreSQL clusters from one AWS region to another. If we calculate the total amount of data we had to migrate it adds up to around 6TB.
In this blog post we’d like to share how we used AWS DMS for this task and the things we learnt along the way.
Before getting into what is AWS DMS, let’s understand our PostgreSQL setup.
This diagram shows our setup where the applications connect to pgpool nodes via HAProxy. We have added PG master and one standby as PGPool backends, where read-write requests are served by the master node and read-only requests are served by the standby node.
Options for PG migration
PostgreSQL comes with the pg_dump and pg_restore scripts which can be used to copy data between clusters in a clean way but it may involve significant downtime for large databases. Also, there are some restrictions if there are large objects in the db.
As we had a couple of very large databases which couldn’t afford a long write downtime, this approach was out of question for us.
It’s an asynchronous PostgreSQL replication system which allows multi-source and multi-target operations. But it has some limitations:
- Performance and maintenance overhead on source db.
- No support for replicating schema changes (DDLs).
- Consistency can be impacted if infra is not optimized correctly.
AWS Data Migration Service (DMS)
Data migration service (DMS) is an AWS cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. It can be used to migrate data into the AWS Cloud or between combinations of cloud and on-premises setups.
After evaluating all the options we ended up choosing DMS.
At a basic level, DMS provides a server in the AWS cloud that runs replication software. You create source and target connections to tell where to extract data from and load to. Then you schedule a task that runs on this server to move your data. DMS creates the tables and associated primary keys if they don’t exist on the target. You may also create the target tables beforehand yourself, if you prefer. You can perform one-time migrations (full load), and/or replicate ongoing changes (CDC) to keep sources and targets in sync. It also provides HA/Resiliency with Multi-AZ replication instances.
The following diagram illustrates the replication process:
Prerequisites for DMS
There are a few prerequisites for using DMS, mainly:
- Either one of the source or target endpoints should be on AWS i.e. you can’t have source and target both outside AWS.
- For PostgreSQL, the server version needs to be 9.4+ and pglogical extension needs to be installed on both source and target clusters.
Refer AWS documentation for complete list of prerequisites.
Checking for datatype compatibility
There are certain limitations about which data/column types are supported. There are some pg data types which are not supported at all eg.
tsvector whereas others that are supported partially eg.
json . One good thing is that databases can be checked beforehand for compatibility so that there are no surprises during migration. For that, make sure you check the “Pre-migration assessment” option when starting the task. You can even create dummy endpoints and tasks just for this purpose — even a source db with just the schema loaded should be sufficient.
If you are using data types which are not supported, then those columns won’t get migrated. If that’s a deal breaker then you may need to evaluate other options for migration.
Partial support on the other hand means DMS can migrate these data types but only under certain conditions. If you use json data type in your db, you may come across this case when migrating from PG to PG. For homogeneous migration (i.e both source and target dbs are same), partially supported data types are mostly what DMS calls “Large Objects” (LOBs). The problem with such columns is that they can potentially contain unbounded data and hence DMS has no clue about how much memory to pre-allocate for them during migration.
This can be worked around by choosing “Limited LOB mode” when creating the task, which allows you to set a known limit which is applicable to all the LOBs in the db. If the data exceeds the specified size limit, then it gets truncated, so beware of using it with json types. We based this limit on the max size of the json columns that existed in our db using the following query:
Warning! This query can be very expensive on a large db. To ensure it doesn’t impact normal db operations, we temporarily provisioned a hidden replica just for running it. It took ~27 hours!
coalesce((octet_length(c1::text)) / (1024.0), 0) AS size
Logical replication plugin
Before getting into which logical replication plugin to use, let’s look under the hood and understand logical decoding.
The traditional mechanism in PostgreSQL for keeping standby/replica nodes in-sync with the primary is physical replication. It is implemented by streaming the write-ahead-log (WAL) records in binary format to the replica, which then get applied byte-by-byte i.e. in its entirety. This ensures that the replica has all the required changes for it to be promoted as the new primary in case the original one fails. This works great for synchronization and failover but it’s not so suitable for data migration. For one, selective replication is not possible i.e. there’s no way to replicate only a certain db on the cluster or a subset of tables in a db. Secondly, the PG installation on both the primary and standby nodes need to be exactly the same — same version, dbs, schemas etc.
These limitations have been worked around to an extent using various trigger based approaches such as Slony, Londiste, Bucardo.
In version 9.4, PostgreSQL introduced logical decoding which paved the way for logical replication. It is implemented by decoding the contents of the WAL into an application-specific format. Moreover, it supports selective replication. Logical replication is built on top of the same streaming architecture that physical replication uses and “output plugins” can be implemented to transform the WAL’s internal representation into the desired application-specific format.
test_decoding is a basic output plugin included in the
postgresql-contrib package. Directly quoting from the docs — “It doesn’t do anything especially useful, but can serve as a starting point for developing your own output plugin”.
pglogical is an advanced output plugin implemented by 2ndquadrant as a postgresql extension following the publish/subscribe model.
DMS uses either
pglogical depending on whether the pglogical plugin is available on the source. However it can also be explicitly specified by setting the extra connection attribute
Aside: PostgreSQL 10+ comes with a builtin implementation of logical replication, but doesn’t look like AWS DMS uses it even for those versions of pg. All our pg instances were on version 9.6 so we didn’t need to go deeper into this.
Which plugin to use?
We started with
pglogical as we had installed it on our source clusters. But one of our tasks failed in the CDC stage after running for a few days. As per AWS support, there’s a known issue with DMS and pglogical and we were advised to use
test_decoding for the failed task.
Following that we chose test_decoding for rest of the dbs. Thus we ended up using pglogical for some tasks and test_decoding for others. Due to time constraints, we didn’t really go into depth of what exactly the known issue is.
Outside the context of DMS, pglogical is more feature rich than both test_decoding as well as the builtin logical decoding implementation in pg 10. So it can be a good first choice.
PG config changes for DMS
If you don’t have pglogical installed already, you will need to install the extension on the source as well as target. This can be done using the OS’s package manager.
Once that is done, modify the
postgresql.confparameters as follows,
Some of the config parameters above require a server restart, so you may need to schedule a planned maintenance / downtime.
Additionally, there are a few more points to note:
- If you are on Ubuntu and behind the latest minor version of postgresql then installing the latest version of pglogical may cause postgresql to be upgraded as well. Being a minor version upgrade, it’s not a problem from compatibility standpoint but it will cause the postgresql process to restart, which may be undesirable. apt-get will prompt you for confirmation about this, so make sure you say “no” to it. In that case you may want to perform this step during a planned maintenance window.
- If you are running a highly available setup with one or more standby nodes then
shared_preload_librarieswill have to be modified on the standby nodes as well. Also
max_worker_processeshas a default value of 8. In case it’s being set to a higher value, then the value on standby nodes must be at least equal to the one on primary. So you will need to configure and restart the standby nodes first otherwise the standby will fail to join the primary node when it’s restarted.
- AWS documentation recommends setting
wal_sender_timeoutto 0 on the source db to prevent the sending server from ending the replication connections that are inactive for longer than 60 seconds (default). This config change doesn’t require postmaster restart to take effect so you can do it just before starting the first DMS task on this source db. In case your pg instance is configured with a different value than default, take note of that value so that it can be reset after the DMS tasks using this source endpoint are completed.
Choosing the replication instance
AWS DMS uses a replication instance to connect to your source data store, read the source data, and format the data for consumption by the target data store. It also loads the data into the target data store. Most of this processing happens in memory. However, large transactions might require some buffering on disk. Cached transactions and log files are also written to disk.
AWS DMS currently supports the T2/T3 (burstable performance instances), C4/C5 (CPU optimized), R4/R5 (Memory optimized) Amazon EC2 instance classes for replication instances.
As discussed earlier, DMS has 2 stages — (1) Full load and (2) CDC
In the full load stage, DMS starts capturing changes for the tables it’s migrating from the source engine’s transaction log (cached changes). After full load is done, these cached changes are collected and applied on the target. Depending on the volume of cached changes, these changes can directly be applied from memory, where they are collected first, up to a set threshold. Or they can be applied from disk, where changes are written when they can’t be held in memory.
After cached changes are applied, by default AWS DMS starts a transactional apply process on the target instance (CDC).
In the entire process, DMS uses memory buffers in various places e.g. two stream buffers, one each for incoming and outgoing data as well as a memory buffer for a component named
sorter which is responsible for maintaining the commit order. If any of these buffers experience memory pressure, the migration can have performance issues and potential failures.
So for heavy workloads, the extra memory provided by R5 instances is useful. R5 instances can also help with holding a large number of transactions in memory, thereby preventing memory-pressure issues during ongoing replications.
For more details about this, please refer to the AWS documentation.
Dropping/restoring indexes and constraints
The full load stage of DMS is essentially a
COPYoperation. During this stage, presence of indexes and constraints can affect the task performance negatively. When data is copied, additional cost is incurred in updating the indexes. Also, the full load task loads groups of tables at a time, so referential integrity constraints may get violated causing failures.
It’s far more efficient to create the indexes and constraints later on i.e. after full load completes and before CDC begins. Thanks to pg_restore’s
—-sectionoption, this can be achieved rather easily.
- Before starting the DMS task, create the database on target and restore the schema using pg_restore with the flags
--section=pre-data. This will import all table definitions but leave out the indexes and constraints.
- When creating the DMS task make sure to select the DMS config as follows: (a) “Do nothing” option in the table preparation mode and (b) “Stop task after full load completes” option. This will give us time to restore the indexes and constraints before the CDC stage.
- Run the task.
- When the full load stage completes, the task will stop. At this point run the same pg_restore command again but this time with the flag.
--section=post-data.It will now create the indexes and constraints.
- Once the above command finishes, “resume” the DMS task.
If you take this approach, don’t forget to account for the time required to restore the indexes and constraints while estimating the total migration time.
Sequences need to be synced manually
If your tables use auto-incremented primary keys implemented as sequences, then the sequences will not be advanced on target even though data is being copied from source. This is something DMS doesn’t support with any of the logical replication plugins.
This means once you cut over the writes to the target db, the first insert in such a table will result in a conflict for the primary key.
To work around this, add a step in your checklist just after stopping the DMS task to reset the sequence to the next value.
For eg. Let’s say there is a table
t1 with column
c1 that is auto-incremented using a sequence
s1 , do the following just after stopping the DMS task:
- Find the max value of the column in target db and add 1 to it
select coalesce(max(c1), 0) + 1 from t1;
2. Let’s say the above query returned 451 as the result. Reset the sequence on target db to that value
alter sequence s1 restart with 451;
Monitoring the DMS tasks
Monitoring of DMS tasks is very crucial in production, particularly for tasks that may run for a long period of time.
When a DMS task fails, the
replication_slot created on the source db is not dropped (it just becomes “inactive”). This causes WAL logs to be retained on the source db, thereby increasing the disk usage. If the slot is not dropped within a reasonable time frame, there’s a risk of the source db running out of disk space. So make sure this is not missed. (More on how to drop slot later in the “Stopping the DMS task” section).
One way to monitor DMS tasks is by using cloudwatch alarms. But we felt it was an overkill. Instead, we implemented a python script that runs as a cron job. It reads a list of task ids from a file and checks the status of each task using the AWS APIs. If any of the tasks in the list is found to be in stopped or failed state, the on-call person gets paged.
Stopping the DMS tasks
If you stop the DMS task in the CDC stage, you still have an option to resume it as long as the
replication_slot is present on the source. This means, for the time duration when the job is stopped, WALs will be retained on source and it’s disk usage will keep increasing rapidly. At this point, you need to take a decision whether to drop the slot or add more disk space.
To drop a slot, first find out the one which has been created for the particular task.
slot_type = 'logical'
AND DATABASE = '<your-dbname>'
AND active IS FALSE;
This query should mostly return a single row but if that’s not the case, you can compare the
restart_lsn column with the value displayed as part of the “CDC recovery checkpoint” on the DMS task page in AWS console.
Once you figure out the slot corresponding to the task, drop it as follows:
So these were some of the things we learnt from migrating large PostgreSQL databases using AWS DMS. We hope these points are useful to someone who plans to use DMS for migrating PostgreSQL databases.
After using DMS extensively, we are pretty impressed by it. While a lot of heavy lifting is done by the underlying logical decoding framework and plugins, DMS does provide a good hosted solution by managing the complexity. Other than moving large amounts of data between regions, we can definitely see ourselves using it in future, particularly for database upgrades.