DynamoDB and backups

Nate Kupp
2 min readJul 8, 2017

--

12/21/2017 Edit: This is no longer relevant now that AWS has shipped backups for DynamoDB. Leaving it here for the sake of posterity :)

So how do we backup this data again?

I just saw Jono’s DynamoDB post fly by on HN, and wanted to share our experiences with DynamoDB and backups. We heavily use DynamoDB at Thumbtack (alongside PostgreSQL, our primary data store).

One of our biggest challenges with DynamoDB is creating backups. There is no great way to create point-in-time backups of production data in DynamoDB.

AWS Data Pipeline is an official data export solution, but (1) it is not a snapshot, and (2) it does not adjust your table read throughput, so you cannot easily export very large tables with this tool.

At Thumbtack, we ended up building our own batch export solution which scales up read throughput to a significant multiple of our default production capacity, does a complete batch export of a table using a Spark job, and then scales read throughput back to normal production capacity. Still, there is some fine print involved:

  • DynamoDB silently re-shards your data at certain thresholds of read throughput, so you need to be cautious about scaling up read throughput for batch exports.
  • DynamoDB only permits ~3–4 changes to read throughput capacity per day, so you only get one attempt at a daily backup — otherwise you could be stuck at an excessively high read throughput value until the next day.
  • This still doesn’t get us point-in-time; some of our largest tables take O(hours) to export.

Would love to hear from others if anyone has found a better way to back up large (100GB+) DynamoDB tables.

--

--

Nate Kupp

Data Infrastructure @ Instacart. Previous Elementl, Thumbtack, Apple.