Best Practices for using AWS DMS: CDC super large table

Admiral Wiem
SiCepat Ekspres Tech
2 min readMar 15, 2021
Continuous Data Replication (Source: https://aws.amazon.com/dms/)

Before we start, what is DMS (Database Migration Service)? Basically, it’s a migration tool with GUI and a lot of metric for monitoring by using cloud watch. You can view the documentation here: https://aws.amazon.com/dms/

On our case, SiCepat Ekspres is on the middle of migrating the whole architecture from monolith to microservices. To feed data from the old big-monolith DB to new microservices small databases, we use Kafka as our messaging services. All rows and update/insert must be feed in almost real time, that’s why we choose DMS to be the tool for CDC-ing data to Kafka.

Here’s some tips that we’ve found during a few months of testing using DMS (CDC) from Postgre 9.6 to Kafka:

  1. DMS should start on low IO traffic

When we start DMS task for table with high traffic (10 mil rows/day or above) it would cause a super high transaction log when CDC is running. Analyze the traffic of database before and start the replication task when traffic is on the lowest to avoid high transaction log usage (check your cloud watch metric!). When DMS started, we should monitor the transaction logs usage carefully to avoid unusual disk usage.

2. Materialized view source data table always caused DMS failure.

Any table with high traffic (read/write) , and used as source data, when starting DMS (CDC mode), it would fail . Please make sure when you start DMS (CDC) when the traffic is on the lowest point.

3. For high traffic table, “Batch Apply Enabled: true” in DMS is a must

For table with more than 1 million update row per day, “Batch Apply Enabled: true” is de facto to avoid high transaction log disk usage.

4. DMS Replication instance have to be separated for high traffic table.

DMS would fail is there’s too much task running with one replication instance. We should consider it to split the replication instance between job for big table or small table. Monitoring EC2s used as DMS Replication instance is absolutely important. When there’s too much DMS task in one EC2, the more memory it takes.

Setting cloud watch alarm for DMS Task and integrate it with slack or anything would be a nice options.

That’s all. Cheers!

--

--