How to Easily Replicate DynamoDB across Regions

Why would you want to replicate your data across several physical locations? Here is a short list from the top of my head:

  • Redundancy. If something bad happens to your data in one location, you will still have data in all other locations. Remember, Multi-regional redundancy is much safer than replicating data across multiple servers within a single location. Also, the latter is done by AWS automatically.
  • High availability. Again, until you have a live location, your service may function.
  • Co-location. There are applications requiring to be optimized for users distributed across the world. In such cases you don’t want users from Europe to hit servers in Singapore, the latency will be noticeable. Instead, you should have your application deployed in several locations and choose the closest one to users.
  • Compliance requirements.

Now, I have already talked about things you need to make multiregional application. However, DynamoDB is worth to be mentioned separately.

Beating Global Tables

DynamoDB Global Tables is the easiest way to have your data replicated across several regions.

However, there are drawbacks you should consider:

  • Since it’s a multi-master setup, there is some kind of a conflict resolving strategy implemented for cases when the same record is updated simultaneously in several regions. AWS resolves conflicts by itself, so you do not have any control over that process.
  • There are many requirements for global tables, and most of them make sense and are easy to follow. Though, one requirement may be troublesome: AWS requires global tables to be completely empty in order to be able to add a new replica table.

DynamoDB Cross-Region Replication Using Lambda

Basically, all you need for such replication is to implement a very simple AWS Lambda that will be watching changes in DynamoDB tables, and replicate them.

For these purposes, I’ve created a micro Lambda anyone may reuse: dynamodb-replicator. We actually have been using it on production for more than a year with no problems. Here is an example ClourFormation template:

If you want to use the package of dynamodb-replicator from my S3 buckets, you will have to create the stack in 2 regions: eu-west-1 (use bleshik bucket) and ap-southeast-1 (use bleshik-singapore bucket). After that, you’ll have a DynamoDb table “Table” that is automatically replicated in those 2 regions. I’ve tried creating a record in the table in ap-southeast-1 region, here is what I ended up with in the other region:

__originRegion is a system attribute that is added automatically when the record is replicated. It’s needed to NOT replicate replicated data, i.e. to avoid cyclic infinite replication.

Handling Conflicts

In order to have your replicated tables always in an (eventually) consistent state, you will have to specify 1 system attribute in each record: __timestamp, which is basically a UTC timestamp.

In case of a conflict, the record with larger __timestamp “wins”. Note, if you do NOT specify this timestamp attribute, you might end up in an inconsistent state (not even eventually).

This is similar to what AWS implemented for Global Tables, but if you want something different, feel free to fork dynamodb-replicator or send me a pull request changing that behavior in a more universal way.

Adding a New Region

It’s not trivial, but a quite doable 3-steps process and easier than adding a replica into non-empty DynamoDB Global Tables:

  1. Create the required tables in the new region. Specify the new region in the replication list of regions in the replicators lambdas in other regions. Now, with that done the new region will start receiving new data.
  2. Choose any existing replica, dump the data using the scan operation or any dumping/restoring utility for DynamoDB, restore the dump in the new region by the same means as for the dumping process.
  3. Deploy the replicator in the new region. Do not forget to specify in the replication list of regions all the regions including the new one.

Remember, that you must not use the new replica until you complete all the steps.

Subscribing Lambdas to Changes

I must mention that we do NOT use CloudFormation for creating and applying changes to DynamoDB tables (by changes I mean creating indexes, moving data and other housekeeping things). We’ve got a separate migrations utility tool, I will definitely write an article about implementing such a tool on NodeJS and TypeScript.

The point is that without CloudFormation you need another way of subscribing you replicating lambdas to the changes. We just use a tiny bash script that is executed during our deployment process. That is one of the most embarrassing things I’ve ever written, but it worked fine for me in the beginning and, you know, stayed in the codebase until now. Feel free to reuse it or make a better version:

STREAMS="`aws dynamodbstreams list-streams`"
if [ $? -ne 0 ]; then
exit 1
for stream in `echo $STREAMS | node -pe "JSON.parse(require('fs').readFileSync('/dev/stdin').toString()).Streams.filter(function(i) { return i.TableName.indexOf('$1_') == 0; }).map(function(s) { return s.StreamArn; }).join(\"\n\");"` ; do
echo $stream
aws lambda create-event-source-mapping --starting-position LATEST --event-source-arn $stream --function-name DynamoDbReplicatorLambda-$1
exit 0

The script expects 1 argument which is an environment name, it subscribes the replicating lambda (it’s suffixed with the environment name) to the changes of DynamoDB tables of this environment (they all prefixed with the environment name).

What’s Next?

Since we’re talking about DynamoDB, you might want to check out this article as well: How to migrate DBs with NodeJS and TypeScript, where I described a way of applying changes to DynamoDB tables using migrations.

Comments, likes, and shares are highly appreciated. Cheers! ❤️