At DAZN we are currently migrating from an external monolith application towards microservices. In aid of this we are using canary deployments to migrate traffic over from the old system to the new one.
As such we need to ensure that we have data parity between these services. If a user signs up on our system, we require that their data syncs to the old monolith system and vice versa.
To do this we use Lambda, DynamoDB, Kinesis, SNS and SQS. This post will outline how we achieved this, and where Kinesis fits into our architecture.
Our microservices write to a DynamoDB. We use DynamoDB Streams to capture the changes. We push these changes to a Kinesis. We then use lambda to convert them to a custom event, and push them to the external monolith using SNS and SQS.
We originally used the DynamoDB Stream as our sole event source.
A poll-based event source (I.E: DynamoDB Streams or Kinesis) upon failure will retry until the data expires. We use this to ensure that if the external monolith isn’t responsive that we can retry until it is.
This means that we have to handle corrupt or incorrectly formatted data in the lambda. We push this data to a dead letter queue for investigation, and allow the lambda to finish execution without error.
DynamoDB Streams allows us to replay data. They retain information for 24 hours. If there were any issues with data synchronisation we can replay the last days events to attempt to solve it.
The problem was that any data synchronisation issue had to be caught by ourselves or our external provider, and rectified in a 24 hour window. This timeframe didn’t provide enough confidence for us, and so we had to find a solution.
Kinesis allowed us to keep all this functionality, while also adding a 7 day retention window, and allowed us to replay events from a timestamp.
We added a new lambda to forward events from DynamoDB Streams to Kinesis. We modified our previous lambda to ingest data from the kinesis stream. Due to the similarities between Kinesis and DynamoDB streams this involved minimal code changes.
One thing that Kinesis provides that DynamoDB Streams do not is the
AT_TIMESTAMP option for starting position. With this we can provide any UTC Timestamp from 7 days in the past to begin replaying events. This provides a huge amount of flexibility. We do not have to replay the last 7 days of events, potentially delaying any future events. We can instead replay a relevant subset of events.
We added a continuous integration job that took a timestamp parameter. This would recreate the event source mapping with the Starting Position set to the timestamp provided. This causes events to replay through the lambda, and thus get sent back to the external monolith.
Adding Kinesis to our architecture gave us more flexibility when it came to our disaster recovery plan. We retained the functionality of DynamoDB, while gaining a larger window of 7 days of replayability, along with the ability to specify the
AT_TIMESTAMP option to replay events from a timestamp.
With minimal changes to our codebase we were able to switch from DynamoDB Streams to Kinesis. We found that this gave us additional confidence in our Disaster Recovery plans. Kinesis makes for an effective event broker, with tight integration with Lambda, an easy to use interface, along with replayability.
Interested in seeing how we work first hand? WE ARE HIRING!