Migrating DynamoDB from one AWS account to another with Zero Downtime
Intuit is moving fast from private data center to AWS, all new services are written for AWS and are hosted in AWS. Initially, teams created their own AWS accounts and deployed their services, later we decided to migrate them all to central AWS accounts, for obvious reasons (easy maintenance, less inter VPC traffic flow etc.).
One of such service used DynamoDB as its persistence layer, thus we needed to move DynamoDB tables from one AWS account to another AWS account.
Service Migration
There are two important aspects of migrating a service
Application Migration
This is the easier part, you just need to bring up new servers in another AWS account, test them out and do DNS cutover whenever your are ready. You may consider % based dialing vs big bang cutover depending on your traffic and usage patterns.
Data Migration
Before you can dial traffic to new AWS account, you need to move your data from old database to new database. As in this case our service was in production and there was a need for one time Bulk Data Migration and Delta Migration (for new data during bulk data migration).
Bulk Data Migration
We evaluated below options
=> DynamoDB Backups: AWS provides an option to backup tables via AWS console, but this backup is available with in the same AWS region and AWS account, thus it could not be used in our case.
=> AWS Data Pipeline: AWS Data pipeline provides pre-defined templates to take DynamoDB table dump to S3 and another template to restore from S3 to DynamoDB.
Annette Wilson has described it nicely in this article.
They were able to use it successfully in their POC. But unfortunately, I couldn’t because we don’t use default VPC at Intuit. Data pipeline failed to bring up the cluster and was stuck at WAITING_FOR_RUNNER step, it also failed to write any logs in S3. On talking to support team, they responded with:
We note that no logs are present on S3 due to the fact that the instances are failing to communicate with the datapipeline service. This is a requirement of the service.
I am taking a look at some of the internal logs and this seems to point to an internal name resolution issue. This should not be occurring. I am investigating with the datapipeline service team as to possible reasons why this should be.
— — — — —
This error may be related due to the fact that you do not have a default VPC set in the region you are running the pipeline.
Our security guidelines did not allow us to use default VPC, specially when it is a shared account across multiple services.
=> dynamo-backup-to-s3/dynamo-restore-from-s3: In the same article, Annette Wilson described how they finally used dynamo-backup-to-s3 nodejs module.
This module has 97 stars and 63 forks and is used by few other companies like Skyscanner for backing up the data to another aws account, however there are some limitations to this script.
- It creates a single S3 file as backup which slows down the restore process as a synchronous read from one S3 file. This should not be a problem if your data is not huge, we tested it with around 10GB data and it worked fine, you can however divide the files and live up with this, OR modify the module to write to multiple files.
- Write capacity is limited because of synchronous write to one partition(data for one partition is sequential in the file) at a time (thus limiting writes to 1000TPS, 500TPS for global table)
- Global tables were not replicating: AWS keeps some extra fields to manage the replication which were also coming in the backup and thus AWS thinks those records are already replicated. I fixed this issue by removing extra AWS fields while taking backup and raised a pull request to original repo. You might want to use code from my forked repo for Global Tables support
- It does not have very good control on WCUs, i added few sleep statements to overcome this in my forked repo, which might not be needed in your case.
Here is a typical backup/restore command
#Backup table to a s3 bucket
./bin/dynamo-backup-to-s3 -i $table_name --read-percentage 0.5 --bucket $bucket_name --aws-region us-west-2 --backup-path $table_name --global-table true#Restore the table from s3 to a table
./bin/dynamo-restore-from-s3 -s $s3_path -t $table_name --partitionkey partitionkey —sortkey sortkey --aws-region us-west-1 --overwrite true -c 20 -sf true
Delta Migration
Here comes the trickier part: Bulk migration might take some time (in hours for both backup and restore) depending on the data size and we did not want to bring down our service during this period, thus, we also needed to take care of the delta records.
We can handle this via double write / double read, there can be multiple options, simplest one being
Double write from old account to new account before starting the backup — but need to be careful while restoring the bulk data, it might overwrite recent data.
Typically you can modify your application code to write to both accounts, but we are in AWS and we can use AWS Lambda on DynamoDB streams to copy new data to S3 and than another AWS Lambda as event notification on S3 to write to DynamoDB table in new account.
This keeps your application code clean — no changes to your already complex code, no cleaning required after migration, just cut-off the lambda triggers once you are done with the migration.
Here is how it looks

On the day of Cutover
- Add Lambda trigger on source table to write to S3
- Start full db backup
- Restore the dump to destination table
- Create S3 bucket in new account with lambda event to write to DynamoDB table
- Sync source S3 files to destination S3.
- Here you might want to have the cron job to do the sync after every minute OR modify the lambda function on the source to directly write to S3 in destination account
- Add the Lambda trigger on destination table for rollback
- Test out the new setup and switch the traffic
- Have a cup of coffee and monitor the logs
- After few days — clean up S3 buckets, Lambdas etc.
Few important things
- DynamoToS3 function (trigger on DynamoDB table) does not have parallel execution problem as DynamoDB streams take care of that. But S3ToDynamoDB can have the parallel execution problem — it is triggered immediately as soon as you put a new object in S3, it does not wait for previous lambda to finish. Thus you should have a way to decide if you want to overwrite data in table with new event, we did that based on the timestamp.
- You might need to modify lambda settings, batch size, parallel execution rate, timeout etc.
- We did not have delete use-case, if you have delete, you will need to give it a thought.
Here are the DynamoToS3 and S3ToDynamo lambda functions for your reference.
Innovate at Intuit
Intuit is hiring across globe for multiple positions.
