Backing up an Amazon Web Services DynamoDB

Sketch by Annette Wilson

Backups with Data Pipeline

DynamoDB doesn’t really support backups as a first-class operation. Amazon suggests using Amazon Data Pipeline. You can fairly quickly and easily create a new data pipeline to perform backups through the AWS web console. Unfortunately, this masks a lot of complexity, which can lead to some surprises when you want to customise it to your needs.

Data Pipeline overview
  • By policy we do not use the web console for deployments in production, and strongly prefer to use CloudFormation
  • We want to put our backups into Skyscanner’s isolated ‘data’ AWS account
  • By policy we do not grant AWS services broad permissions that would allow them to interfere with the private resources of other services running in the same AWS account
Data pipeline copies data from the DynamoDB to an S3 bucket in the same account
Data pipeline copies data from the DynamoDB to an S3 bucket in the same account

Cross-account S3 bucket access

Writing to S3 like this cross-account presents a number of problems. Firstly, we have to find a way to grant the data pipeline permission to write to the bucket in the data account. After that, we’ll worry about making sure it can be read back again.

A bucket policy in the data account can grant permissions to the root user in the production account, and then a policy on the data pipeline role in the production account can delegate that permission to the role.

Cross-account S3 object ownership

That’s all great, and is enough to save and even restore backups, but it suffers from a subtle but critical flaw. The backups are only readable by the account that wrote them. This is frustrating when it stops us from restoring a production backup into a test account to run destructive tests using production data. It could, however, render the backups useless if something bad happens to the production account, which is exactly why we’re writing them into another account in the first place. If some catastrophe prevents us from using the production account anymore, we want to be able to recreate everything in a fresh account.

S3 layers of permissions
<property> <name>fs.s3.canned.acl</name> <value>BucketOwnerFullControl</value> </property><name>fs.s3.canned.acl</name><value>BucketOwnerFullControl</value>
{ “classification”:”core-site”, “properties”: { “fs.s3.canned.acl”: “BucketOwnerFullControl” } }“classification”:”core-site”,“fs.s3.canned.acl”: “BucketOwnerFullControl”
… { “name”: “EmrClusterForBackup”, “id”: “EmrClusterForBackup”, … “configuration”: { “ref”: “EmrClusterConfigurationForBackup” } }, { “name”: “EmrClusterConfigurationForBackup”, “id”: “EmrClusterConfigurationForBackup”, “type”: “EmrConfiguration”, “classification”: “core-site”, “property”: [{ “ref”: “FsS3CannedAcl” }] }, { “name”: “FsS3CannedAcl”, “id”: “FsS3CannedAcl”, “type”: “Property”, “key”: “fs.s3.canned.acl”, “value”: “BucketOwnerFullControl” }“name”: “EmrClusterForBackup”,“id”: “EmrClusterForBackup”,“ref”: “EmrClusterConfigurationForBackup”“name”: “EmrClusterConfigurationForBackup”,“id”: “EmrClusterConfigurationForBackup”,“type”: “EmrConfiguration”,“classification”: “core-site”,“key”: “fs.s3.canned.acl”,“value”: “BucketOwnerFullControl”
{ “Id”: “EmrClusterForBackup”, “Name”: “EmrClusterForBackup”, “Fields”: [ … { “Key”: “configuration”, “RefValue”: “EmrClusterConfigurationForBackup” } ] }, { “Id”: “EmrClusterConfigurationForBackup”, “Name”: “EmrClusterConfigurationForBackup”, “Fields”: [ { “Key”: “type”, “StringValue”: “EmrConfiguration” }, { “Key”: “classification”, “StringValue”: “core-site” }, { “Key”: “property”, “RefValue”: “FsS3CannedAcl” } ] }, { “Id”: “FsS3CannedAcl”, “Name”: “FsS3CannedAcl”, “Fields”: [ { “Key”: “type”, “StringValue”: “Property” }, { “Key”: “key”, “StringValue”: “fs.s3.canned.acl” }, { “Key”: “value”, “StringValue”: “BucketOwnerFullControl” } ] }“Id”: “EmrClusterForBackup”,“Name”: “EmrClusterForBackup”,“RefValue”: “EmrClusterConfigurationForBackup”“Id”: “EmrClusterConfigurationForBackup”,“Name”: “EmrClusterConfigurationForBackup”,{ “Key”: “type”, “StringValue”: “EmrConfiguration” },{ “Key”: “classification”, “StringValue”: “core-site” },{ “Key”: “property”, “RefValue”: “FsS3CannedAcl” }{ “Key”: “type”, “StringValue”: “Property” },{ “Key”: “key”, “StringValue”: “fs.s3.canned.acl” },{ “Key”: “value”, “StringValue”: “BucketOwnerFullControl” }
  • Schedules and executes backups
  • Easy notifications
  • Should scale to massive table sizes
  • Very slow to start up — each attempt to run a backup takes a minimum of 15 minutes, by default retries three times so a failure can take 45 minutes
  • (Relatively) expensive — charges a minimum of one hour of usage of quite a large EC2 instance for every backup (EMR doesn’t let us use a smaller instance size)
  • Complex and difficult to understand
  • Still more work required to minimize permissions in production
Sketch by Annette Wilson

Backups without Data Pipeline

We decided to investigate other options for backup. Fundamentally, the backup operation is not complex. It needs to perform a scan operation on the DynamoDB table, reading records and writing them to S3. It should be aware of the table’s provisioned throughput to avoid saturating it and preventing anyone else from reading the table.

  • The script needs to run as a role with permission to write to the bucket
  • The bucket policy must also allow the operations
  • We need to set an appropriate ACL on any objects stored into the bucket
var upload = new Uploader({ accessKey: self.awsAccessKey, secretKey: self.awsSecretKey, region: self.awsRegion, bucket: self.bucket, objectName: path.join(backupPath, tableName + ‘.json’), stream: stream, — debug: self.debug + debug: self.debug, + objectParams: { + ACL: ‘bucket-owner-full-control’ + } });var upload = new Uploader({accessKey: self.awsAccessKey,secretKey: self.awsSecretKey,objectName: path.join(backupPath, tableName + ‘.json’),+ ACL: ‘bucket-owner-full-control’
  • We can see and understand all the parts involved
  • It uses Docker, for which we already have great internal tools to manage deployments
  • It took less time and was less confusing to set up than Data Pipeline
  • It runs on the ECS cluster we already have set up
  • It is harder to scale if the table gets really big
  • We need to do a bit more work if we want email notifications
  • We need to take care that our ECS cluster has enough capacity

Learn with us

Take a look at our current job roles available across our 10 global offices.

We’re hiring!

--

--

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Skyscanner Engineering

Skyscanner Engineering

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!