What you should know about DynamoDB Global Tables and Streams

As mentioned in a previous blog post DAZN is a global sports streaming application that aims to provide a fluent, multi-region experience for users. We use DynamoDBs global tables to ensure high availability, low turnaround, and eventual data consistent between each region.

We are also in the process of migrating user data from from our old monolith service (ran by an external company) to our new in-house microservices architecture. We use a canary release to limit the blast radius, along with assessing the stability of our new service. We therefore need to keep data consistent between the old monolith, and the new microservices.

Our architecture is multi-regional. We ingest data in one region, and rely on DynamoDBs global tables to replicate data to other regions. We also need to synchronize data changes between the monolith and microservice . We rely on Lambdas, and DynamoDB Streams to send our changes to the monolith application.

We noticed that we always receive two duplicate change Records for every insert or modification. After some thought, we realised that it was due to the internal behaviour of DynamoDB Global Tables.

AWS automatically adds a few attributes to a record when using a Global Table. aws:rep:deleting which is a flag to determine whether the record has been deleted, aws:rep:updatetime is a unix timestamp which records when the change happened on the local database, and aws:rep:updateregion contains the region in which the update was made.

These two events would occur in quick succession, notice the ids are the same, and the aws fields have been added.

A DynamoDB Stream Record can provide a NewImage for newly updated data and an OldImage of the previous data. We noticed that the first record would contain only changes in the NewImage that we had made, and the second record would include updated aws:rep:deleting, aws:rep:updatetime and aws:rep:updateregion attributes.

While the cross region replication logic for DynamoDB Tables is a black box to us, we reasoned that the following must be happening.

Since we are only interested in our changes, and want to ignore the internal changes by DynamoDB, we had to figure out a way to drop the duplicate records.

Our updates do not affect the aws:rep:updatetime attribute, while the AWS Blackbox record does. We used this information to determine whether or not we should forward events on. Our logic becomes…

When newTime is undefined it’s the first insert of a record into the table. This can only occurs on the local DynamoDB table, as any replication to other regions includes aws:rep:region and aws:rep:updatetime fields. We use this to determine if the record is an update from us and if it is consume it.

The attribute is only updated when the AWS Blackbox modifies the data. Modifications that do not update the aws:rep:updatetime (or oldTime === newTime) are updates we have made, otherwise it’s AWS noise.

We noticed these changes when testing a feature addition to our Event Consumer. When searching our logs we saw that everytime we were performing inserts or modifications to a row that there would be two change records.

We found the documentation on the internals of how Global Tables do cross region replication, and made an assumption that the additional fields would be added before insertion, not after.

This highlights the issue with making assumptions and relying on the internals of a blackbox system.

It’s only thanks to our logging we were able to catch this bug. We also improved our integration tests so that if AWS changes this behaviour in future we will be able to catch it.

Interested in seeing how we work first hand? Well, in case you didn’t notice, WE ARE HIRING!

Check out DAZN Engineering for open vacancies and more. Don’t forget to follow us on Twitter as well!