MyDumper, MyLoader and My Experience of migrating to AWS RDS

Dilip Kola
Jul 2 · 6 min read
Ref: https://tinyurl.com/yymj43hn

Setting up RDS Instance

Setting up a server for mydumper

# Installers: https://github.com/maxbube/mydumper/releases# You may choose to take latest available release here.
sudo yum install https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el7.x86_64.rpm
# Now we should have both mydumper and myloader commands installed on the server

Dumping data from the source

# Remember to run the following commands in the screen as it is a long running process.# Example1: Following will dump data from only DbName1 and DbName2
time \
mydumper \
--host=<db-server-address> \
--user=<mydumper-username> \
--password=<mydumper-password> \
--outputdir=/db-dump/mydumper-files/ \
--rows=50000 \
-G -E -R \
--compress \
--build-empty-files \
--threads=16 \
--compress-protocol \
--regex '^(DbName1\.|DbName2\.)' \
-L /<mydumper-logs-dir>/mydumper-logs.txt
# Example2: Following will dump data from all databases except DbName1 and DbName2
time \
mydumper \
--host=<db-server-address> \
--user=<mydumper-username> \
--password=<mydumper-password> \
--outputdir=/db-dump/mydumper-files/ \
--rows=50000 \
-G -E -R \
--compress \
--build-empty-files \
--threads=16 \
--compress-protocol \
--regex '^(?!(mysql|test|performance_schema|information_schema|DbName1|DbName2))' \
-L /<mydumper-logs-dir>/mydumper-logs.txt
cd <dump-directory># Check if any schema files are using DEFINER, as files are compressed, we need to use zgrep to search
zgrep DEFINER *schema*
# Uncompress the schema files
find . -name "*schema*" | xargs gunzip
# Remove definers using sed
find . -name "*schema*" | xargs sed -i -e 's/DEFINER=`[A-Za-z0-9_]*`@`localhost`//g'
find . -name "*schema*" | xargs sed -i -e 's/SQL SECURITY DEFINER//g'
# Compress again
find . -name "*schema*" | xargs gzip

Restoring data to RDS instance

transaction-isolation=READ-COMMITTED
innodb_log_buffer_size = 256M
innodb_log_file_size = 1G
innodb_buffer_pool_size = {DBInstanceClassMemory*4/5}
innodb_io_capacity = 2000
innodb_io_capacity_max = 3000
innodb_read_io_threads = 8
innodb_write_io_threads = 16
innodb_purge_threads = 2
innodb_buffer_pool_instances = 16
innodb_flush_log_at_trx_commit = 0
max_allowed_packet = 900MB
time_zone = <use-source-DB-time-zone>
# Remember to run the following commands in the screen as it is a long running process.time myloader --host=<rds-instance-endpoint> --user=<db-username> --password=<db-password> --directory=<mydumper-output-dir> --queries-per-transaction=50000 --threads=8 --compress-protocol --verbose=3 -e 2><myload-output-logs-path>
# Following gives approximate number of data files already restored
grep restoring <myloader-output-logs-path>|grep Thread|grep -v schema|wc -l
# Following gives total number of data files to be restored
ls -l <mydumper-output-dir>|grep -v schema|wc -l
# Following gives information about errors
grep -i error <myloader-output-logs-path>

Verification of data on RDS against the source DB

# Check the databases
show databases;
# Check the tables count in each database
SELECT table_schema, COUNT(*) as tables_count FROM information_schema.tables group by table_schema;
# Check the triggers count in each database
select trigger_schema, COUNT(*) as triggers_count
from information_schema.triggers group by trigger_schema;
# Check the routines count in each database
select routine_schema, COUNT(*) as routines_count
from information_schema.routines group by routine_schema;
# Check the events count in each database
select event_schema, COUNT(*) as events_count
from information_schema.events group by event_schema;
# Check the rows count of all tables from a database. Create the following procedure:
Get Rows counts from all the tables of a database
# Run the following in both DB servers and compare for each database.
call COUNT_ROWS_COUNTS_BY_TABLE('DbName1');

Replication

# Get bin-log info of source DB from mydumber metadata file
cat <mydumper-output-dir>/metadata
# It should show something like below:
SHOW MASTER STATUS:
Log: mysql-bin-changelog.000856 # This is bin log path
Pos: 154 # This is bin log postion
# Set external master
CALL mysql.rds_set_external_master(
'<source-db-server>',
3306,
'<source-db-replication-user>',
'<source-db-replication-password>',
'<source-db-bin-log-path>',
<source-db-bin-log-postion>,
0);
# Start the replication
CALL mysql.rds_start_replication;
# Check the replication status
show slave status \G;
# Make sure that there are no replication errors and Seconds_Behind_Master should reduce to 0.

Conclusion

Tensult Blogs

Stories on Cloud computing, Analytics, Automation and Security

Dilip Kola

Written by

IIT Kanpur | Ex-Amazon | Ex-AWS | Co-founder @Tensult

Tensult Blogs

Stories on Cloud computing, Analytics, Automation and Security