As mentioned in a previous blog post DAZN is a global sports streaming application that aims to provide a fluent, multi-region experience for users. We use DynamoDBs global tables to ensure high availability, low turnaround, and eventual data consistent between each region.
We are also in the process of migrating user data from from our old monolith service (ran by an external company) to our new in-house microservices architecture. We use a canary release to limit the blast radius, along with assessing the stability of our new service. We therefore need to keep data consistent between the old monolith, and the new microservices.
Our architecture is multi-regional. We ingest data in one region, and rely on DynamoDBs global tables to replicate data to other regions. We also need to synchronize data changes between the monolith and microservice . We rely on Lambdas, and DynamoDB Streams to send our changes to the monolith application.
We noticed that we always receive two duplicate change Records for every insert or modification. After some thought, we realised that it was due to the internal behaviour of DynamoDB Global Tables.
AWS automatically adds a few attributes to a record when using a Global Table.
aws:rep:deleting which is a flag to determine whether the record has been deleted,
aws:rep:updatetime is a unix timestamp which records when the change happened on the local database, and
aws:rep:updateregion contains the region in which the update was made.
A DynamoDB Stream Record can provide a NewImage for newly updated data and an OldImage of the previous data. We noticed that the first record would contain only changes in the NewImage that we had made, and the second record would include updated
While the cross region replication logic for DynamoDB Tables is a black box to us, we reasoned that the following must be happening.
Since we are only interested in our changes, and want to ignore the internal changes by DynamoDB, we had to figure out a way to drop the duplicate records.
Our updates do not affect the
aws:rep:updatetime attribute, while the AWS Blackbox record does. We used this information to determine whether or not we should forward events on. Our logic becomes…
When newTime is undefined it’s the first insert of a record into the table. This can only occurs on the local DynamoDB table, as any replication to other regions includes
aws:rep:updatetime fields. We use this to determine if the record is an update from us and if it is consume it.
The attribute is only updated when the AWS Blackbox modifies the data. Modifications that do not update the
oldTime === newTime) are updates we have made, otherwise it’s AWS noise.
We noticed these changes when testing a feature addition to our Event Consumer. When searching our logs we saw that everytime we were performing inserts or modifications to a row that there would be two change records.
We found the documentation on the internals of how Global Tables do cross region replication, and made an assumption that the additional fields would be added before insertion, not after.
This highlights the issue with making assumptions and relying on the internals of a blackbox system.
It’s only thanks to our logging we were able to catch this bug. We also improved our integration tests so that if AWS changes this behaviour in future we will be able to catch it.
Interested in seeing how we work first hand? Well, in case you didn’t notice, WE ARE HIRING!
Following on from this post there were some other issues with relying on streams from a global table.
We have a user history table, that consumes from streams from the global tables. This User History table is single region. It consumes from the streams in each region and adds changes made to a record to the table.
We noticed when firing updates in quick succession, containing a record that include updates from the same field, that the streams we received would not contain all the modifications.
We deduced that the black box mentioned above only sends the records that would lead to the final state of the database, discarding any intermediate events.
This means that the Stream you receive from a global table only contains a subset of the changes actually made to the database.