How to Migrate MySQL DB to AWS RDS Using mysqldump and DMS

Sojeong Baek
Cloud Villains
Published in
14 min readSep 30, 2022

Introduction

This blog explains how to migrate MySQL to AWS RDS using two different ways — mysqldump and DMS.

Data Migration

Generally, AWS DB migration requires the following steps.

Build DB Migration Plan → Change Schema (in case of heterogeneous DB migration) → Migrate Data → Cutover → Rollback to source DB → CDC → Re-change endpoint to target DB

Build DB Migration Plan

When planning, you will decide which tools to migrate the database. Tools that are widely used for MySQL DB migration include mysqldump and DMS. Firstly, mysqldump is a command line program that allows you to back up your MySQL DB. You can create a dump file which is a backup file and import it into the target DB. Secondly, DMS stands for AWS Database Migration Service. As you can assume from its name, DMS is a DB migration service provided by AWS. You can migrate the DB by connecting the source DB and the target DB with the service in an easy and convenient manner.

The following factors should be considered when selecting the DB transfer method.

  1. Downtime: Consider whether DB downtime is acceotable. If you need continuous service delivery, you can use DMS with minimal downtime. With mysqldump, DB access will be interrupted by locking the table when creating and transferring dump files.
  2. Schema: The DB structure or schema is applied to the target DB with mysqldump migration. With DMS, basic schemas such as tables, primary keys will be created, but not schemas such as Index, foreign keys.
  3. Security: RDS supports SSL communication, but not SSL communication for external replication. To meet these requirements, secure data migration can be achieved using DMS.
  4. Data Size: For DMS, TB-class size DBs can also be migrated relatively quickly and the data transfer process can be checked. Meanwhile, migrating more than 100GB of data, the dump file export/import may be a little slow with mysqldump. Using mysql related open source utilities mysqlpump or mysqldumper allows you to migrate data in parallel just like DMS.

If you chose the tool for DB migration, you should consider the cut-over strategy to provide services from the existing source DB to the target DB and a recovery strategy just in case.

  1. Cut-over method
  • Offline migration: Control access to the source DB so that it can be read but not updated while the migration is in progress. For instance, during DB migration, restrict access to only inquire products at the mall available, and no transactions such as shopping carts and orders. As there will be no update in the source DB during the migration, the whole data migration will be completed.
  • Incremental migration: CDC for additional changes made after a full load data transfer while maintaining all access to the source DB. When the Full load data migration is complete, change the endpoint for the existing source DB to the target DB endpoint. If there is additional updated data after full load migration, only that data will be transferred with CDC migration method. Control access to the source DB is still required but can significantly reduce the source DB downtime compared to full load for whole data.

2. How to recover

  • In case of certain issues arise after migration, it might require that access to the target DB be rolled back to the source DB. At this point, roll back the target DB endpoint that the application head to the source DB endpoint. It requires updating some of the data manually while the target DB is running the service.

3. Target DB-related settings

  • To avoid a spike, Activate Multi-AZ and Backup retention after migration.
    If you considered all the factors above, let’s get the migration started!

1. DB Migration with mysqldump

📕Scenario

Let’s assume that EC2 in IDC VPC is the on-premise server and MySQL engine is installed on the server. Site-to-site VPN is configured to enable communication between IDC and AWS VPC. Create the dump file in IDC server and transfer the dump file to AWS through ftp. Once the file is uploaded to a server in AWS, import the data from the server to RDS directly.

  • Source DB: MySQL 5.7.37 in on-premise server
  • DB: testdb(around 5GB)
  • User: admin
  • Target DB: AWS RDS for MySQL 8.0.27

Prerequisite

  • Grant permission to admin user in source DB
GRANT SELECT, LOCK TABLES, RELOAD, SHOW VIEW ON *.* TO ‘admin’@’localhost’;
GRANT PROCESS ON *.* TO ‘admin’@’localhost’;
  • Configure Site-to-Site VPN between source DB VPC and target DB VPC

There is a DB called testdb in source DB. With the query below, you can find 8 tables and the total data size is around 5GB.

mysql -u admin -p
USE testdb;
SHOW TABLES;
SELECT * FROM customer LIMIT 10;

Create a dump file for the DB first. You can specify which directory to create a dump file in. Here I’ll create a dump file in ec2-user’s home directory.

mysqldump -u admin -p testdb > dump.sql

It might take some time to create a dump file in case the data size is big. You can check the progress with the command below in a new session.

watch ls -al dump.sql

When the process is completed, check if the dump file is created properly.

ls -lhcat dump.sql | more

Now we need to transfer the dump file to the replication server in AWS. Prior to this step, create a key file the of the source DB server since SCP command requires it. Paste the key file with vi editor.

sudo vi testkey.pem

Now, copy the dump file to the replication server.

scp -i testkey dump.sql ec2-user@${Replication server private ip}:dump.sql

Enter replication server in AWS and check if the dump file is well transferred.

ls -lhcat dump.sql | more

Import dump file to RDS.

mysql --user=${user} --password --database=${db} --host=${target RDS endpoint} < dump.sql

When dump file is imported, enter RDS and check the tables in backupdb are imported into RDS properly.

mysql -u admin -p -h ${target RDS endpoint}

Select 10 rows in customer table to validate imported data.

2. DB Migration with DMS

📕Scenario

Let’s assume that EC2 in IDC VPC is the on-premise server and MySQL engine is installed on the server. Site-to-site VPN is configured to enable communication between IDC and AWS VPC. Use DMS to migrate data from source DB to target RDS in a simple and easy manner.

  • Source DB: MySQL 5.7.37 in onpremise server
  • DB: testdb(around 5GB)
  • User: admin
  • Target DB: AWS RDS for MySQL 8.0.27

Prerequisite

  • Grant permission to testdb user in source DB
GRANT ALTER, CREATE, DROP, INDEX, INSERT, UPDATE, DELETE, SELECT ON testdb.* TO ‘testuser’@’%’;
  • Grant permission for awsdms_control DB which controls DMS tasks to a user in target RDS
GRANT ALL PRIVILEGES ON awsdms_control.* TO ‘testuser’@’%’;
  • Configure Site-to-Site VPN between source DB VPC and target DB VPC

💡Tips for DMS

  • Need to check if DB engines are supported by DMS.(Source engine, Target engine)
  • Needs to check if the specific data type migration is supported by DMS. (Data type)
  • As DMS creates only tables with the primary key, need to make sure each table has a primary key prior to the initial full load.
  • DMS doesn’t migrate data in a specific order. If you implement full load migration with a foreign key, the child table data can be migrated before the parent table data is and the data migration can proceed with errors.

1. Full Load

There is a DB called testdb in source DB. With the query below, you can find 8 tables and the total data size is around 5GB.

mysql -u admin -p
USE testdb;
SHOW TABLES;
SELECT * FROM customer LIMIT 10;

Migration with DMS requires the subnet group where the DMS Replication instance is located, the endpoint to be connected to the source DB and the target DB, and the replication instance to implement replication. Once the above three are ready, create the Database migration task to start the migration.

Create a subnet from the Subnet groups tab within the DMS page. This is the subnet where the Replication instance will be located. Select the VPC and the private subnet where the target DB is located. By doing so, the Replication instance will be located in the network same as the target DB side.

After the backup file is created from the source DB, transfer the backup file to the Replication instance with FTP. As the file is sent from the Replication instance to the target DB, which is located within the same AWS network, it can be transferred faster through the AWS backbone.

Before you create a DMS Replication instance, create a security group to apply to the DMS Replication instance. After selecting AWS VPC where the target DB is located for the security group, no additional rules are required.

After creating a DMS Replication instance security group, allow inbound rules for the DMS Replication instance security group for both the source DB and the target DB security group.

Return to the Replication instances tab on the DMS page, and create an instance to implement the migration. for the Replication instance configuration, select the appropriate Instance class and Multi-AZ configuration. I’ll go with dms.t3.small for testing.

The Replication instance storage stores ongoing migration task logs and cached changes. If you want to enable CloudWatch logs for the collection of migration process logs when creating tasks in the future, you will need to select larger storage.

Under Advanced security and network configuration, select the Replication subnet group and VPC security group that you just created.

When the instance is ready, create endpoints that are associated with the source and target DB.

On the Endpoints tab, create a Source endpoint.

Under Endpoint configuration, enter source DB information. For Server name, enter the private IP of the on-premise DB server, and for User name and Password, enter the source DB username and password.

When the Source endpoint is ready, check if the endpoint and source DB is connected properly on the Status tab.

After creating Source endpoints, create target endpoints as well. Check Select RDS DB Instance to select the RDS within my account for the target endpoint.

Enter the target RDS information for the Endpoint Configuration. Enter RDS endpoint for Server name and source DB user name and password for User name and Password.

As the user created when creating RDS is a master user with all the permission, enter the newly created user’s information instead. It’s better use a user with least privileges for the security reason.

Check if the target DB and endpoint are well connected on the Connections tab. Connecting target endpoint is normally easier compared to the source endpoint.

As shown above, you’ve created the Subnet group where the DMS will be located, the Endpoint to be connected to the source DB and the target DB, and the Replication instance to implement migration. Now create the Database migration task to start the migration.

Create a task on the Database migration tasks tab. Select the instances and endpoints that you’ve created.

  1. Do nothing keeps the data and metadata in the existing table. Need to check this option thoroughly since when there is existing data in the table, the data can be duplicated with new migration. Drop tables on target drops the existing table and creates a new table to migrate data in it. Truncate maintains table metadata and truncates data in the existing table. Use this option when you need to migrate new data without changing the table schema.
    All of the above options newly create a table if there is no target table.
  2. LOB stands for ‘Large of object’ and is called CLOB and BLOB in short, which is used to store large data such as images, videos, and large text. In the above option, Full LOB mode migrates all LOB columns, which may cause a longer migration time.
    If you have a column that contains a LOB, better select Limited LOB mode and limits the maximum size of the column. It is important to specify the Max LOB size to match the actual data size since the excess data will be truncated when the actual data exceeds the Max LOB size. If there are many LOB columns in a particular table, you can also create separate tasks for each particular table.
  3. When the Validation option is enabled, the data will be validated to ensure that the data is migrated from the source DB to the target properly. Data validation is highly required for SOX* or security compliance.

*SOX: Laws regulating financial reporting and audit activities of listed companies

In Table mappings, select Add new selection rule to enter the schema and table of the source DB to migrate. ‘%’ selects all tables as wildcards.

For real-world migration, enable the Premigration assessment to identify data type conversion issues. During the test, create a task without checking.

When the migration starts after the task is created, you can check the progress in Table statistics.

Once the DMS task’s completed, enter the target DB and check if all the data is migrated properly with the SELECT statement.

mysql -h ${RDS endpoint} -u root
USE ${DB name};
SELECT * FROM ${table name} LIMIT 10;

2. CDC

When the full load is completed, the data that has been changed afterward also need to be migrated. Migrating the changed data in the target DB is called CDC (Change Data Capture). Before implementing CDC, I will modify the data in the source DB a little.

*Note: Binary logging must be enabled on the source DB for CDC. Binary log stores a history of data updates or DDL syntax within a MySQL instance.

There are DB engines and versions that do not support DMS CDC function, so you need to look into it. (Reference)

Edit the MySQL configuration file with the command below. The my.cnf file stores various DB-related information that is similar to the parameter group in the RDS.

#check the datadir location in the file for the log-bin locationsudo vi /etc/my.cnf#add the following in etc/my.cnf [mysqld]server-id = 1expire_logs_days = 10binlog_format = rowlog-bin = /var/lib/mysql/mysql-bin

Grant required permission to a user.

#grant permission to read server’s bin logGRANT REPLICATION SLAVE ON *.* TO ‘admin’@’%’;#grant permission to read Master status, Slave status and bin log GRANT REPLICATION CLIENT ON *.* TO ‘admin’@’%’;

After editing the configuration file, enter MySQL to check if binary logging is enabled with the command below.

show global variables like “log_bin”;

Now that everything is ready for the CDC, I will update the data a little before creating the task.
Add 5 new records to the part table with the query below.

INSERT INTO testdb.part VALUES (default, ‘blush thistle blue yellow saddle’, ‘Manufacturer#9’, ‘Brand#7’, ‘STANDARD POLISHED BRASS’, 18, ‘JMED DRUM’, 904.00, ‘egular deposits hag’);INSERT INTO testdb.part VALUES (default, ‘cornflower chocolate smoke green pink’, ‘Manufacturer#8’, ‘Brand#36’, ‘LARGE BURNISHED STEEL’, 21, ‘JMED DRUM’, 903.00, ‘sual a’);INSERT INTO testdb.part VALUES (default, ‘blush thistle blue yellow saddle’, ‘Manufacturer#9’, ‘Brand#7’, ‘STANDARD POLISHED BRASS’, 7, ‘JMED DRUM’, 906.00, ‘N/A’);INSERT INTO testdb. part VALUES (default, ‘thistle dim navajo dark gainsboro’, ‘Manufacturer#8’, ‘Brand#9’, ‘PROMO PLATED STEEL’, 53, ‘JMED DRUM’, 801.00, ‘ironic foxe’);INSERT INTO testdb.part VALUES (default, ‘forest brown coral puff cream’, ‘Manufacturer#15’, ‘Brand#12’, ‘SMALL PLATED BRASS’, 13, ‘SM PKG ‘, 908.00, ‘N/A’);

Check if the new rows are inserted properly.

SELECT * FROM testdb. part ORDER BY P_PARTKEY DESC LIMIT 5;

Delete records where C_NATIONKEY is 2 in the customer table. A total of 59,952 records are deleted.

DELETE FROM testdb.customer WHERE C_NATIONKEY=2;

Let’s create a DMS task for CDC.

Create a task on the Database migration tasks tab. Select the instances and endpoints that you have created.

Select Replicate data changes only for the Migration type to migrate the changed data only.

Activate Custom CDC start mode to specify the start date on which the CDC will implement. Select the date as the day before, so that it can capture all updates that occurred after the Full load transfer.

Select Do nothing for Target table preparation mode to update changed data in existing tables.

Specify the schema and table of the source DB from which to migrate the updated data, and then finally create a task.

When the Task starts and progress reaches 100%, you can check the updated tables in Table statistics, and enter the target DB to see if the data is updated.

Tadaa, All done!

Thanks for reading the article.

--

--

Sojeong Baek
Cloud Villains

A junior solutions architect loves tech and business.