101 level overview
Backups in AWS? What are you speaking about? It’s all in Cloud, isn’t it?
Well, yes..and no. AWS has something called Shared Responsibility Model. This model clearly states what is in AWS zone of responsibility and what - in your’s. Regardless of the fact, that SRM is about security and compliance, you can see an idea: AWS provides you (as a customer) with hardware basement, tools and ‘building blocks’ and it’s up to you to make it work secure and robust. But, maybe, SRM is too theoretical and loose example. Let’s then take a look on Amazon DynamoDB. AWS says that DynamoDB is
a fully managed, multiregion, multimaster database with built-in security, backup and restore, and in-memory caching for internet-scale applications
Sound really cool! What should I be worried about? For starters, imagine your GDPR service having a bad day and erasing not one, but all client’s emails from DynamoDB table. Likely DynamoDB has point in time recovery … what does it mean: it’s not enabled by-default and we have never enabled it? Or imagine, there is a small mistake snuck in your Terraform manifest and the whole DynamoDB table has gone. And you never wrote Lambda to create table’s backups. Or, you have DynamoDB backups, but something very-very bad happened and the whole production account was deleted. With all tables and all backups (DynamoDB backups are stored in the same account as DynamoDB table). Do you have a copy of data in another AWS account?
I hope, I have scare you enough :) Let’s take a look together how data backup can be setup for the most popular AWS resources. Such as S3, Aurora\RDS and DynamoDB. But before going there, it might be useful to think from which scenario of data loss, we are trying to protect our-self and our business.
Scenario 1: programatic error or “something wrong in my code”. Something can go wrong. All the time. List of items you passed to function were way too long. Or loop was wrong. Or exception handling was not so awesome or well-documented.
Scenario 2: infrastructure-as-code error. Human mistake or not-ideal CICD for Terraform and whoooh: DynamoDB (or RDS, or S3) itself is gone. Or someone deleted the whole AWS account, shit happens.
Scenario 3: AWS Region is down. Well you know: tsunami or war. No the most realistic scenario, but regulations are regulations.
Scenario 4: AWS account has been removed. Well you know: Vengeful Employees, dumb mistakes - bad stuff happens.
How all 3 scenarios look in comparison?
So, it’s quite clear that AWS out-of-the-box covered 5/6 of call data backup cases.
- S3 has versioning and cross-account replication.
- Amazon Aurora has both scheduled and on-demand snapshots. Read Replicas and Multi-regional will provide significant data availability.
- Amazon DynamoDB has snapshots, point-in-time recovery and DynamoDB global tables to cover most of the cases.
How ever there are 2 situations, where some additional code has to be written:
- cross-account Aurora\RDS snapshots copy
- cross-account DynamoDB data copy
To address the first problem RDS snapshot tool can be used. Being managed by AWSlabs it provides quite a rich functionality: tags-based backups, scheduling and copying snapshots to another AWS account.
Writing your own tooling can be an alternative. If you are using Python, method copy_db_snapshot in RDS.Client class is what you are looking for.
To have cross-account data replication for Amazon DynamoDB, you should to use:
- Amazon DynamoDB streams (to treck changes)
- AWS Lambda (to send changes to destination account)
The final solution schema looks like the following: