AWS DynamoDB: Backup and Restore Strategies

By: Shashank Anumula, Amit Naik

At Financial Engines, we are big fans of AWS and the “Serverless Paradigm” where we do not have to provision servers, in the form of EC2 instances, in order to get all the benefits of the cloud, such as on-demand scaling and per-call pricing.

DynamoDB is a popular NoSQL database offering from AWS that integrates very nicely in the serverless eco-system. It is a fully managed, auto-scaling, serverless NoSQL Database based on Amazon’s 2007 Dynamo paper [1]. 
An additional bonus is that it integrates well with AWS Lambda and S3, making DynamoDB the default choice for projects that can be rapidly deployed to production. Indeed, at Financial Engines, we have multiple microservices in production that use DynamoDB as the backing data store. Amazon has worked hard to continuously improve the enterprise aspects of DynamoDB — one key requirement for wide enterprise adoption is backup and restore capabilities.


DynamoDB Back-up and Restore

Prior to June/July 2018, DynamoDB did not offer built-in back-up and restore capabilities that matched the ease and sophistication of its other managed DB offering — RDS. There were two main methods that AWS suggested for DynamoDB back-up:

  1. Use AWS Data Pipeline to set up periodic backups of specific DynamoDB tables into S3 buckets which creates full point-in-time copies of DynamoDB tables to Amazon S3 as outlined on the following pages: https://github.com/awslabs/dynamodb-continuous-backup
  2. Enable DynamoDB streams to capture all table activity to S3 and use the resulting log outputs to reconstruct the table.

Compare these with the back-up methods available on the SQL (RDBMS) service — AWS RDS — which creates automated backups of the DB instance during the backup window via a couple of clicks (or AWS CloudFormation settings). These back-ups are available for archiving, restore, etc. The lack of convenient “one-click” like backup options was one of the pain-points related to DynamoDB enterprise adoption.

At AWS Re:Invent 2017, there were a couple of announcements that went a long way to addressing the enterprise aspects especially related to DynamoDB back-up and restore [2].

  1. Point-in-Time Recovery for DynamoDB Tables
  2. On Demand Backups for DynamoDB

These were GA’d in July 2018 and are available for general use. Let us look at each of these two options in detail:


Point in Time Recovery (PITR):

AWS added a new feature called Point-in-time Recovery for DynamoDB [3]. Once this feature is enabled for a given table, DynamoDB maintains continuous backups of that table for the last 35 days and can restore to any point in time within EarliestRestorableDateTime and LatestRestorableDateTime. LatestRestorableDateTime is typically 5 minutes behind the current time. PITR needs to be enabled on a per-table basis. We can enable this via UI, API, CLI or Cloudformation (preferred way)

Below is a snippet of code to enable PITR via Cloudformation.

PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true

The Cloudformation template (in yaml) for the DynamoDB table looks like this.

tableEmployee:
Type: AWS::DynamoDB::Table
Properties:
TableName: employee
PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true
ProvisionedThroughput:
ReadCapacityUnits: 10
WriteCapacityUnits: 20
KeySchema:
- AttributeName: employeeId
KeyType: HASH

Now that we enabled the PITR for our DynamoDB table, lets see how we can restore the data.

Restoring DynamoDB table via CLI:

The point-in-time recovery process always restores to a new table. It is the responsibility of the developer to handle how to use the restored table, instead of original table should there be any need.
Below is a CLI command to restore table “employee” to “employee-restore” to the latest restorable point.

aws dynamodb restore-table-to-point-in-time --source-table-name employee --target-table-name employee-restore --use-latest-restorable-time

In order to restore to a specific point in time, use the following command.

aws dynamodb restore-table-to-point-in-time  --source-table-name employee  --target-table-name employee-restore-pit  --no-use-latest-restorable-time  --restore-date-time 1534285294.754

Along with data, the following settings are also restored to the new DynamoDB table:

  • Global secondary indexes (GSIs)
  • Local secondary indexes (LSIs)
  • Provisioned read and write capacity
  • Encryption settings

There are a few settings that will need to be manually set up on the restored table and are not copied over:

  • Auto scaling policies
  • AWS Identity and Access Management (IAM) policies
  • AWS CloudWatch metrics and alarms
  • Tags
  • Stream settings
  • Time-to-Live (TTL) settings
  • Point-in-time recovery settings

On Demand Backup

Creating on-demand backup for DynamoDB Tables

We can create on demand backups via CLI using the following command.

aws dynamodb create-backup --table-name employee --backup-name employee-backup

We can use the same backup name and every backup copy has its own arn resource. We could also create a job that takes the backup of the tables periodically.

Restoring from Backup:

We can restore the table using the following command.

aws dynamodb restore-table-from-backup --target-table-name employee-restore --backup-arn arnresource

Note: The above command throws an error if the target-table exists already.

There are other backup strategies that we could leverage for DynamoDB like creating a data pipeline to move the data from DynamoDB table to S3 or leveraging Global tables which can replicate the DynamoDB data from one region to another. We will discuss these in future posts.


Back-up Pricing

Point-in-time recovery

Point-in-time recovery (PITR) is charged based on the current size of each DynamoDB table (table data, local secondary indexes) where it is enabled. AWS will continue to bill you until you disable PITR on each table. As of late 2018 the rates were in the range of ~$0.224 per GB-month for the us-west-1 (Northern California) region.

On-demand backup

With on-demand backups, you can create full backups of your Amazon DynamoDB table data and settings for data archiving. On-demand backup is charged based on the storage size of the table (in other words, the table data and local secondary indexes). The size of each backup is determined at the time of each backup request. The total backup storage size billed each month is the sum of all backups of DynamoDB tables in an AWS account by AWS Region. You will be billed for the total size of your backups for the month less a prorated credit for any backups that are deleted within that billing month. AWS will continue to bill you for on-demand backups at the same rate until you delete the backups. As of late 2018, the rates were in the range of ~$0.112 per GB-month for the us-west-1 (Northern California) region.


Conclusion

With AWS adding critical features such as the ability to easily back-up data to DynamoDB, it should be an easy choice for your future data storage scenarios that can benefit from a NoSQL data store. DynamoDB is a critical component in our microservice strategy at Financial Engines and we are looking forward to driving even more innovation in the days ahead.

References:

[1]: https://www.allthingsdistributed.com/2017/10/a-decade-of-dynamo.html

[2]: https://www.slideshare.net/AmazonWebServices/dynamodb-whats-new-dat304-reinvent-2017

[3]: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html