Navigating Data Migration Challenges with AWS Database Migration Service (DMS)

Anusaya Rajkumar Tantak
Ankercloud Engineering
8 min readAug 17, 2023

Introduction :
DMS (Database Migration Service) troubleshooting can be very informative for users encountering issues during the migration process.
Embarking on a database migration journey can be an enlightening experience, revealing valuable insights when navigating through unforeseen challenges.
In this blog, we delve into the troubleshooting process we undertook during a migration from Azure PostgreSQL to Self-Managed PostgreSQL on AWS using the Database Migration Service (DMS). Our aim is to shed light on the obstacles we encountered and the solutions we devised, providing a guide for fellow data migrators to navigate similar scenarios with confidence.

Understanding the Migration Landscape :
The foundation of our migration project was laid with the creation of a replication instance, endpoints, and DMS tasks. Armed with a colossal dataset of around 1TB, we found ourselves faced with an assortment of tables containing duplicates and multiple primary keys. It was the beginning of an expedition that tested our technical prowess and patience

Let’s quickly Navigating to the Initial Obstacles that we faced during the migration process

Migration Set-Up and Initial Hurdles :
Our migration journey began with the establishment of a replication instance, endpoints, and DMS tasks to facilitate the data transfer. Armed with a dataset of around 1TB, we encountered tables with duplicates and multiple primary keys, posing initial complexities. Initially opting for a collaborative task setup for all tables, we soon adapted our strategy based on insights from AWS support, breaking the migration into three tasks.

Collaborative Task Setup and Evolution :
In our early stages, we decided to collaborate on a single task setup encompassing all tables. However, after seeking counsel from the AWS support team, we opted to bifurcate the migration into three tasks. This strategic shift allowed for better control and management of data flow, mitigating potential bottlenecks.

Scaling Data and Addressing Size Constraints :
Data scaling, with some tables crossing the 100GB mark, presented unique challenges. Although our initial attempts were promising, certain tables held intricate complexities, leading us to seek assistance from the AWS support team.

Troubleshooting Table-Specific Data Volume Challenges :
With data scaling around 1TB, specific tables reached a substantial size, surpassing 100GB individually. Initial independent migration attempts led us to unforeseen complications, necessitating AWS support intervention.

Below are some of the issues that we faced during the migration process and also mentioning the possible solutions

1) Error -> Column “name” has a unique constraint; duplicate key insertion error.
Issue: Column Uniqueness Constraint
Solution: Evaluate the necessity of column uniqueness on target tables. Drop unique constraints if not required, or address duplication issues in the source data.
check if column uniqueness is required on the target tables, if not then drop the unique-constraint, Or, check the source if there is duplication of records, you can delete the duplicate rows and reload the tables using the step from below document

PostgreSQL column sizes affect the conversion of PostgreSQL LOB data types to AWS DMS data types.
To work with this, take the following steps for the following AWS DMS data types:
->BLOB — Set Limit LOB size to the Maximum LOB size (KB) value at task creation.
->CLOB — Replication handles each character as a UTF8 character.
Therefore, find the length of the longest character text in the column, shown here as max_num_chars_text. Use this length to specify the value for Limit LOB size to. If the data includes 4-byte characters, multiply by 2 to specify the Limit LOB size to value, which is in bytes.
In this case, Limit LOB size to is equal to max_num_chars_text multiplied by 2.
->NCLOB — Replication handles each character as a double-byte character.
Therefore, find the length of the longest character text in the column (max_num_chars_text) and multiply by 2. You do this to specify the value for Limit LOB size to. In this case, Limit LOB size to is equal to max_num_chars_text multiplied by 2.
If the data includes 4-byte characters, multiply by 2 again. In this case, Limit LOB size to is equal to max_num_chars_text multiplied by 4.

References:
============
[+]
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source.PostgreSQL.Homogeneous.DMS

For PostgreSQL, AWS DMS treats JSON data types like LOBs. Make sure that if you choose Limited LOB mode, the Max LOB size option is set to a value that doesn’t cause the JSON data to be truncated.

References:
============
[+]
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.html#CHAP_BestPractices.LOBS.LimitedLOBMode

User can refer following document on the best practive to improve LOB performance :

References:
============[+]
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.html#CHAP_BestPractices.LOBS.Performance

AWS DMS converts following PostgreSQL source data types that are supported when using AWS DMS and the default mapping to AWS DMS CLOB types.
— — — — — —
Postgre datatype DMS DataTypes
TEXT => NCLOB
BYTEA => BLOB
TSVECTOR => CLOB
TSQUERY => CLOB
XML => CLOB
JSON => NCLOB
JSONB => NCLOB
ARRAY => NCLOB
COMPOSITE => NCLOB
HSTORE => NCLOB
— — — — — —

References:
============
[+]
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source-PostgreSQL-DataTypes

2) Error: Value of column ‘data’ was truncated.
Issue: Long-Running Commands and Statement Timeout
Solution: Calculate the maximum JSON data size, set the “LobMaxSize” slightly above that, and restart the task. Adjust the LobMaxSize to prevent data truncation.

To avoid this issue, Support team suggested to calculate the max value of the Json data using the below command and set the “LobMaxSize” to slightly bigger than this value and restart the task.
— — — — — — — —
select max(pg_column_size(column_name)) from table_name;

If we truncate the json record and try to insert, it throws same error as observed in the task-log :
Failure happened as the json data got truncated while inserting into the target into the column “data”. Now the table ‘public’.’instagram_post’ is placed into suspended state.
- This is an issue with the PostgreSQL, as it checks Json data and ensures it is in required format before it insert the data into the column.
- In this case, the JSON data got truncated due to “LobMaxSize” settings, which created the formatting issue with the data and resulted in the failure

Now, Let us quickly understand what exactly the LobMaxSize works for DMS tasks
LobMaxSize is a parameter used in AWS Database Migration Service (DMS) tasks. It’s used to control the maximum size of Large Object (LOB) data that can be migrated during a DMS task. LOBs are data types used to store large amounts of data, such as images, documents, or other binary or text data.
The LobMaxSize parameter allows us to define the maximum size for LOB data during the migration process. If a piece of LOB data exceeds this size, DMS will handle it according to specified settings
Large binary objects (LOBs) can sometimes be difficult to migrate between systems. AWS DMS offers a number of options to help with the tuning of LOB columns.

References:
============
[+]
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.LOBSupport.html

3) Error: Statement timeout; canceling command due to execution time.
Issue: Handling LOB (Large Object) Data Types
Solution: Identify factors causing lengthy command runs. Terminate blocking processes, increase execution timeout values, and consider parallel load and filter options for faster migration.
Adjust Limit LOB size based on data type. Set Limit LOB size for BLOB, CLOB, and NCLOB data types.
Parallel Load and Filter Strategies:
For large and complex tables, employing parallel load and filter strategies proved advantageous. Creating boundaries using unique indexed columns allowed us to segment data efficiently, resulting in faster migration.
Employing parallel load and filter strategies for large, complex tables yielded significant advantages. Utilizing unique indexed columns to establish boundaries enabled efficient data segmentation and consequently accelerated migration.

When AWS DMS tries to either get data from source or put data in the target, it uses the default timeout setting of 60 seconds. If the source or target is heavily loaded or there are locks in the tables, then AWS DMS can’t finish running those commands within 60 seconds. So, the task fails with an error that says “canceling statement due to statement timeout” and you see above error.

-> Identify the cause of long run times for commands. The table might have been locked by some other process which has prevented DMS from reading the table data ane ultimately ended in failure.

-> If there is any command that is blocking, then you need to find the PID of the corresponding process and terminate it.

-> Increase the timeout value : As the issue was happened during SOURCE_UNLOAD, then It was recommended to set the timeout value in the source.

-> We can set the “executeTimeout=3600;” in the “Extra connection attributes” for the source and target-endpoint.

References:
============
[+]
https://repost.aws/knowledge-center/dms-error-canceling-statement-timeout

4)Error : WAL reader terminated with broken connection / recoverable error.”
This indicates, there was connection issue with source during the CDC phase and while reading WAL files.

“The wal_sender_timeout parameter ends replication connections that are inactive longer than the specified number of milliseconds. The default is 60000 milliseconds (60 seconds). Setting the value to 0 (zero) disables the timeout mechanism, and is a valid setting for DMS.”

Hence It is suggested to check and set the wal_sender_timeout to zero which disables the timeout mechanism for the replication connections. After setting the timeout to zero we should not see any wal timeout errors both in DMS Task log as well as in source error log.

set the following two parameters in the extraConnectionAttribute at the source and target endpoint which will increase the timeout value and allow extended time for the SELECT statement to complete.
— — — — — — — — — -
executeTimeout=86400000;heartbeatEnable=true

References:
============
[+] wal_sender_timeout :
https://www.postgresql.org/docs/current/runtime-config-replication.html

[+]https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source.PostgreSQL.Prerequisites

Here’s the In-Depth Exploration of Strategies: There are few more strategies that we explored at the time of migration

Parallel Load and Filter Techniques:
We harnessed the potential of parallel load and filter strategies, offering a detailed exploration of their application in complex migration scenarios. Delve into our documented approach to understand the nuances of their implementation.

Mitigating Long-Running Commands and Optimizing Performance:
The blog delves into actionable insights for tackling long-running command issues and optimizing performance for a seamless migration experience. Detailed guidelines and best practices are provided.

AWS DMS provides the filters option to limit the rows and break large tables into multiple chunks by creating a WHERE clause. These tasks work independently and can run concurrently. First, you need to find an efficient filter that can be used to break up the table.
During a full load migration task, you can accelerate the migration of large or partitioned tables by splitting the table into segments and loading these segments in-parallel into the target.

Row filter for parallel load :

Generally, date columns comes handy here, we can find a non-null date column and use it for this purpose. It’s always good idea to break up a table by the same number of rows so that the segregation will be even.

References:
============
[+]
https://aws.amazon.com/blogs/database/speed-up-database-migration-by-using-aws-dms-with-parallel-load-and-filter-options/
[+]
https://repost.aws/knowledge-center/dms-filter-task-by-date

We gained valuable experience throughout this migration process, and ultimately, we succeeded in achieving the migration using DMS through persistent trial and error methods.

Conclusion:
Our journey through data migration proved to be an intensive learning experience, yielding valuable insights into troubleshooting various challenges. By sharing our experiences and solutions, we hope this blog serves as a guiding light for fellow migration enthusiasts, offering strategies to tackle issues head-on. Data migration is a dynamic process that demands adaptability, persistence, and collaboration. Embracing the trials and triumphs, we pave the way for smoother migration journeys for all.

--

--