Securely Syncing DBs with AWS DMS

Published in

Finimize Engineering

6 min readJan 6, 2020

tldr; How we set up AWS Data Migration Service to securely move data between our Production and Staging RDS DBs.

Migrating data across databases in a reliable and scalable way is hard, and even harder when you need to do this across isolated Virtual Private Clouds (VPCs). Read on to find out how we leveraged AWS services to bridge this chasm.

The Finimize app is very data-driven, and sensitive to what our content team put in the database.

Because of this, it was always a problem getting staging data to look like production. This makes testing changes there difficult, and often fails to show potential performance issues under real life circumstances.

Given these problems it was clear we needed a solution that allowed us to bring production like data into our staging environment. We looked at a number of potential solutions, but the one that seemed to make the most sense was AWS Data Migration Service (DMS). We liked the idea of using a managed service that was highly configurable, could operate at scale, and allowed us to move data securely.

Securely Connecting our Production and Staging DBs

The first step when configuring DMS is to securely connect the Production and Staging Databases. There are various ways this can be done, we opted to set up a new VPC which had Peering Connections with the Staging and Production RDS VPCs.

A Peering connection is a network connection between two VPCs using private IPv4 addresses or IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network.

To set up a VPC Peering connection you need to follow these steps:

Go to the VPC dashboard in the AWS console.
Click to create a Peering connection.
Choose the Requester and Accepter VPCs to create the connection.
Once created the connection is in a ‘Pending Acceptance’ status, click to manually accept the request to make it active.
Create an entry in the route table for the Accepting VPC, where the destination is the CIDR of the Requester VPC, and the target is the Peering Connection.

For a more detailed guide to the steps above, the VPC Peering medium article from Mohamed Jawad P is a great resource.

Setting up DMS

We added a Source DMS endpoint connected to the Production RDS cluster, and a Target DMS endpoint connected to the Staging RDS cluster. Next we added DMS Replication Instance which is responsible for running the replication process. This Replication Instance sits in the newly created VPC, and has secure routes to each of the DMS endpoints. This instance is managed by DMS, and is not treated like a regular EC2 instance. We are only billed for its hourly usage while running data replication tasks.

To enable DMS to be able to communicate with our Staging and Production RDS clusters, we also need to add an Inbound Rule on each of the RDS cluster Security Groups. For this we need the Private IP of the network interface associated with our DMS replication instance. We can find this by getting the VPC ID and Public IP of the DMS replication instance, and finding the matching Network interface in AWS console. Bear in mind if you select the Multi AZ option for your DMS replication instance configuration, you’ll need to repeat this for the failover Network interface.

Network Architecture for DMS based solution

We then set up a Replication Task in DMS that connects the Source and Target endpoints with Replication Instance, and allows us to configure the Migration job settings. These settings include the following:

Task Configuration — For Migration type DMS has 3 options. For our purposes we use ‘Migrate existing data’ as we only need to sync on a semi-regular basis.
Task Settings- We specify the Truncate table preparation mode (for the target DB) and enable CloudWatch logging for the task.
Table Mappings- Here we specify exactly which tables we want copied across. We use Selection Rules that include all tables in the public schema, and then add explicit exclude statements to blacklist tables with sensitive data (e.g. a user’s APNS or GCM push notification tokens) that we don’t want moved across.

DMS sounds great, is this all we need?

Unfortunately DMS on its own is not enough to manage the entire syncing process. We initially tried to run the replication task after setting all the above up, but ran into issues where data inserted by DMS was violating foreign key constraints on the target Staging DB. To get around this we needed a way to temporarily remove all Foreign Key constraints prior to the DMS task starting, and then add them back on completion. We also noticed an issue where sequences (for auto-increment ids) were not being updated as part of the DMS sync, so we needed to correct these post replication.

Another consideration for us is that the DB tables have a corresponding (Django) data model in code, so we need to ensure we only run the DB sync when the data model on Production and Staging are the same.

Tying this all together

Given all of the steps required prior and post DMS replication, this process now looks a lot like a Data workflow. We wanted a nice way to tie all of this together. Enter AWS Step Functions. This is a Serverless orchestration service where you can configure state machines to model workflows like this very simply using JSON. Step Functions allows you to define complex branching, and error handling out of the box. So we could ensure the process safely stopped if there was a problem at any point in the process.

Read the second part of this blog to read about how we automated this whole workflow using Step Functions, and the key learnings along the way.

Our Learnings

Setting up the Network infrastructure to connect DMS across private VPCs was not straight forward. Although there are some recipe based migration guides, the examples I saw didn’t cover the VPC peering case. In the end of a combination of AWS docs, online articles and forum posts allowed me to piece the puzzle together. I’ve tried to bring all of these learning together here, so others don’t have to go through the same pain.
DMS is a bit of a black box. Given that this is a managed service, you might assume that once everything is connected and migration jobs are configured there’s no more to do. That proved not to be the case for us. We had to come up with a custom solution to workaround DB integrity constraints and fixing auto-increment sequences. Perhaps this is not so much of an issue if using a Migration type which replicates ongoing data changes, but that approach would have been much more costly, given that we only want to perform periodic updates.
The Validation process which you can toggle on/off for Migration task takes a significant amount of time to run. With a reasonably small dataset we saw the migration of data take around 2 minutes to complete, but the validation ran for over an hour. As DMS is billed by the hour you should really consider if you need this option for your use case.
At the time of writing there is a promotion running where DMS is free for 6 months if you are migrating to Amazon Aurora, Amazon Redshift, Amazon DynamoDB or Amazon DocumentDB (with MongoDB compatibility). Thank you AWS!
AWS Step Functions is the real MVP here! Being able to define semi-complex workflows with branching, error handling and retry logic in a small JSON document is very powerful. And it’s basically what’s made our solution manageable and maintainable. Read more about this in the second part of this blog.

If you need to move data across isolated DBs in AWS, I hope this helps alleviate some of the pain we had to go through.

New to Finimize? We’re on a mission to empower millennials to become their own financial advisors.

Find out more by subscribing to our financial email newsletter or downloading our app. By the way, we’re hiring!

(You can also get £20 off our premium subscription with this link!)