Master Database Migration & Replication with AWS DMS

Published in

AWS Specialists

13 min readAug 22, 2023

Migrating databases can be intimidating, but AWS Database Migration System (DMS) streamlines this process. DMS not only assists in identifying databases ripe for migration but also aids in converting from systems like MSSQL or Oracle to alternatives like MySQL or PostgreSQL. Beyond migration, DMS excels in database replication.

In this article, we’ll zone in on DMS’s replication abilities. Users can opt for full data replication, replicate only data changes, or maintain an ongoing replication. The choice hinges on the specific requirements of your database environment.

A replication consists of four parts:

A replication instance
Source endpoint
Target endpoint
Migration Task

The Replication Instance

The replication instance is an EC2 instance that will actually do the replication. As with all EC2, you will need to select an instance class, which VPC it should be deployed in, and a Security group for it to use, but also some specific DMS settings.

Before creating your instance replica, you must first create a subnet group.

Create Subnet Group

A subnet group is, as it sounds, a grouping of the subnets that the instance replica can use to deploy its instance replica/s. Here you will have to do some planning. If you want the replication instance to be reachable from the internet, you must select public subnets. Otherwise, I recommend using private subnets.

Select the VPC you want to use.
Select the subnets you want to be part of the subnet group. Routes from the replication instance to the source and target endpoints are also needed.
Set a name and an optional descriptive ARN.

Replication Instance Configuration

Specify the name and an optional descriptive name for the replication instance.

Select the instance class for the replication instance. The size depends on the size of the data and the speed needed for it. If verification of the replication is to be selected, you might need a larger instance as well, as that can take some time. But if you are just running a one-time migration and time isn’t that important, you can make do with one of the more minor instances.

Always use the latest engine version if you don’t have a specific need for a previous version since the latest version should work for all RDS-supported versions of the DB engines.

Select the High Availability setup for your replication instance, either a single or multiple instance setups, a primary instance in one AZ, and a secondary in another AZ.

Select the network setup for the replication instance — either standard IPv4 or a Dual-stack mode that supports both IPv4 and IPv6.

Select the VPC you want to deploy the replication instance in and then the subnet group (from that VPC) to use. Select if the instance should be Public accessible.

Under advanced settings for Connectivity and Security, you can specify which AZ to deploy the replication instance in.

Specify which security group/s to use for the Replication instance. Add this SG to inbound rules for the database port’s endpoints in AWS.

You can also specify a KMS key for the replication instance to use. If you don’t specify one, the instance will use the default KMS key for DMS.

Select a maintenance window for the replication instance and select if the DMS engine should automatically upgrade minor versions during the window.

Then you are ready to create the replication instance.

The Endpoint Types

The endpoints are the source and target databases where the data will be replicated from/to. Both source and target endpoint requires an identifier (name) and an optionally easy-to-read ARN.

Replication requires a user in both the source and target databases

The required permissions for the users depend on the specific settings of the migration task. But the user for the source database will at least need Read rights to the database that should be replicated and Write rights for the target database (but probably more).

The endpoints must be given the username/password and the server address, database name, and port. The best way to handle DB credentials is to create a secret in AWS Secret Manager with this information, but it can also be added to each endpoint.

If using the AWS Secrets Manager for storing the database credentials, the endpoint will also need an IAM role to be able to access the secret.

{
  "Version": "2012–10–17",
    "Statement": [
      {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
          "Service": "dms.eu-north-1.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
  ]
}

{
  "Version": "2012–10–17",
  "Statement": [
    {
      "Sid": "GetDMSSource",
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:eu-north-1:12345678910:secret:{SECRET}"
    }
  ]
}

Set the Source Endpoint

The source endpoint is the database from which the data will be replicated. If the source is an RDS, you can select the RDS instance, and most settings will be filled in. But if the source endpoint is not an RDS instance, you must fill in the settings yourself.

There are various source engines to choose from, all with their own required settings. For our example, we will use PostgreSQL as the source.

Add the access method to the source database, manual configuration, or AWS Secrets Managers with the IAM role.

Specify if the connection should be unencrypted or encrypted with SSL, with or without CA.

Also, we need to specify the database name.

More advanced endpoint settings can also be configured via a Wizard or JSON, but we will not go into these during this article.

DMS uses KMS to encrypt the endpoints’ passwords and other important information. DMS will create a default KMS key if you don’t specify using your own.

Tags can be added to the endpoint and are a great way to organize and track the costs.

At this point, the endpoint configuration is done, but you can test the connection, which is always good to do. You will specify the VPC and Replication instance to use for the test. The test will determine either success or, if it fails, specifying a reason for it.

When testing an endpoint, they are created and then tested, so even if the test fails, it is created. Multiple tests can then be run from the created endpoint.

Set up the Target Endpoint

Target endpoint is set up in the same way as the source endpoint but with the user and settings for that specific database.

Set up the Database Migration Task

Now that you have created the replication instance and the endpoints, we can set up the database migration task.

The task requires an identifier (name) and an optional descriptive ARN. Always give them a good name so you can easily see the difference between the migration tasks from the name.

You then specify the replication instance, the source, and the target endpoints.

Then you select the Migration type. You can select between:

Migrate existing data. It performs a one-time migration of the selected data from the source to the target endpoint.
Migrate existing data and replicate ongoing changes. It performs a one-time migration of the selected data from the source to the target endpoint and continues to replicate changes.
Replicate data changes only. It replicates changes from the source to the target endpoint. When selecting this, you can select a start point (or date/time) from which to start the replication. Different DB engines have other solutions here.

Configure the Task Settings

To configure the task settings, you can either use the wizard or write it in JSON. If you don’t need any special configurations, I recommend using the wizard since it contains the necessary configurations.

Target table preparation mode

First, you need to decide how DMS should handle if the tables that will be migrated already exist in the target database.

You can choose between

Do nothing
Drop tables on target
Truncate

If you choose Do nothing, DMS will not do any preparations to the target table if it exists.

If you choose Drop tables on target, DMS will drop the table on the target database before recreating them and transferring the data.

If you choose Truncate, DMS will truncate the data in the table but keep the table and all its settings.

What you choose may vary depending on your use case.

I usually choose to Drop tables on target on my first test migration to see that the migration works. Then I might need to make configuration changes to the tables in the target DB to fix every issue I have encountered during the test migration (removing Not Null from columns or changing owners etc.).

To not lose these changes, I then change the migration task to truncate the data instead.

Selecting the Right LOB Column Settings

One of the things that can create the most issues when migrating a database is when you have a LOB column. A LOB column contains a data type that can contain a Large object. The column doesn’t need to contain a Large object, but if the column type allows it, it will be categorized as a LOB column.

When selecting the lob settings, you need to know how your database is configured. The setting you can choose from for lob columns are:

Don’t include LOB columns
Full LOB mode
Limited LOB mode

If you select Don’t include LOB columns, they will not be migrated, as you probably guessed.

If you select Full LOB mode, it will migrate all the Lob columns, but it can take a very long time since DMS will migrate them one at a time. You will get to specify the size of each chunk that DMS will divide the LOB into.

If you select Limited LOB mode, you will specify a maximum LOB size; anything larger than this will not be migrated. This is much faster than the Full LOB mode since it loads it in bulk.

A warning will be logged to the logs if a LOB is truncated due to being larger than the specified LOB size.

A LOB column migration also has some requirements on its table. The table needs to have a primary key or unique constraint. The LOB column also can not have the Not Nullable constraint (meaning it doesn’t allow NULL as a value). This is because DMS first migrates all the columns that aren’t LOB and specifies NULL in the LOB column for that record. It then migrates the LOB data and requires an ID for each record.

I recommend using Limited LOB mode during a test migration and checking the logs to see if something was truncated. If you find tables where data is truncated, create a separate migration task for those tables and select Full LOB mode. If you find any issues with Not nullable LOB columns, you can change this in the target DB after the test migration. If you select to truncate the tables at the next migration, the Table settings will remain.

Validate migration of the table(s)

When migrating, you can choose to have DMS validate the table migration. It will take some time, but it will go through each record to compare the target and source databases.

You will see in the migration tab (on the replication task) how the validation has gone, and if there are any issues, this will be written to a separate table in the target database named ‘awsdms_control.awsdms_validation_failures_v1.’

SELECT * FROM public.awsdms_validation_failures_v1;
-[ RECORD 1 ]+ - - - - - - - - - - - - - - - - - - - - 
TASK_NAME | VWQDG4UTIQXWB3KDMHJA2EKAF2RFPXJVNRXWDLA
TABLE_OWNER | public
TABLE_NAME | customer_info
FAILURE_TIME | 2023–08–18 14:28:05.416654
KEY_TYPE | Row
KEY   | { +
      | "key": ["10"] +
      | }
FAILURE_TYPE | MISSING_TARGET
DETAILS |

-[ RECORD 2 ]+ - - - - - - - - - - - - - - - - - - - - 
TASK_NAME | VWQDG4UTIQXWB3KDMHJA2EKAF2RFPXJVNRXWDLA
TABLE_OWNER | public
TABLE_NAME | customer_info
FAILURE_TIME | 2023–08–18 14:28:05.585561
KEY_TYPE | Row
KEY     | { +
        | "key": ["11"] +
        | }
FAILURE_TYPE | MISSING_TARGET
DETAILS |

I recommend using validation when migrating.

Enabling Task Logs

Enabling task logs is usually a must since, otherwise, you will have a hard time trying to fix a failed migration. You can specify the level of logging for each part of the migration.

For example, if you are having issues with the target database, you can specify that Target Load and Target Apply should have DEBUG log level.

Table mappings

The next step is to specify which tables should be migrated. It can be done either by using the Wizard or JSON.

Depending on which database engine you are migrating from/to, these settings can be different, but in this article, we use a postgres migration as an example.

When using the wizard, you will specify which schema (or all schemas by using %) and the tables ( or all tables by using %) that should be migrated.

If not selecting all tables in the schema, you must have one mapping per table in that schema that you want to migrate if you can’t use a wildcard to select multiple.

Each rule will either include or exclude the selection you make. For example, if you want to migrate all tables in the schema except one, you can create one mapping that selects all tables in that schema and another mapping that excludes that specific table. This will make sure that all tables except that one will be migrated. The exclude selection rule will always override any include rule.

{
"rules": [
  {
    "rule-type": "selection",
    "rule-id": "144832447",
    "rule-name": "144832447",
    "object-locator": {
      "schema-name": "myschema",
      "table-name": "%"
    },
    "rule-action": "include",
    "filters": []
    }
  ]
}

Premigration assessment

A premigration assessment will run a check to see if there will be any issues between the source and target tables configuration. Enable this is always a good idea as it will show you possible problems before you migrate. The time you will save will be well worth the small effort of setting this up.

A premigration assessment requires an S3 bucket and an IAM role that allows the DMS to write the premigration assessment to the bucket.

After selecting to enable a premigration assessment report, you can give it a specific name, but since each is per replication task, it is unnecessary.

Depending on the migration type of the replication task, the available assessments will vary.

For a single migration, the premigration assessment will asses:

Large objects (LOBS) are used, but target Lobs are not nullable.
Source table with lobs but without primary keys or unique constraints.
Unsupported data types.

The premigration assessments will not show you everything that can become an issue but will give you a good heads-up for any major table or column configuration issues.

IAM to allow DMS to access the S3 bucket.

{
  "Version": "2012–10–17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "dms.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

{
  "Version":"2012–10–17",
  "Statement":[
    {
      "Effect":"Allow",
      "Action":[
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:PutObjectTagging"
        ],
        "Resource":[
        "arn:aws:s3:::{BUCKETNAME}/*"
        ]
      },
      {
        "Effect":"Allow",
        "Action":[
          "s3:ListBucket",
          "s3:GetBucketLocation"
        ],
        "Resource":[
          "arn:aws:s3:::{BUCKETNAME}"
        ]
      }
    ]
}

Configuring the Migration Task Startup

The last thing now is to decide if the migration task should be started directly or manually later.

If you have selected premigration assessment, you can select to start the task automatically since the premigration assessment has to run first.

See the results of the Premigration Assessments

After the database migration task is created, and if you choose to run a premigration assessment, you can see the result of it when you select the task and the Premigration assessment tab. You can see the status of each Premigration assessment run, and if you select one, you will get detailed information on it.

The assessment will detail each table and column and if it did not pass a message regarding why it failed or is a warning.

Conclusion

AWS Database Migration Service (DMS) has proven to be an essential solution for those seeking to transition databases across platforms with minimal downtime. Offering both continuous data replication and the ability to manage a diverse range of source and target database endpoints, it presents a significant advantage over traditional migration techniques.

However, it’s essential to understand its limitations. In the context of a PostgreSQL database, while DMS efficiently handles data migration, aspects like sequences, indexes, users, roles, and some other database objects need manual intervention post-migration. This means it doesn’t offer the complete convenience of a database dump, where every element is captured in one go.

Nevertheless, armed with an understanding of these nuances and supported by this comprehensive guide, I hope you can get up to speed with DMS and harness its power to achieve smooth and effective database migrations.