AWS Backups and Retention Policies | Part 2

Exequiel Barrirero
binbash
Published in
5 min readSep 20, 2021

Article based on our experience with dozens of AWS customers projects at https://www.binbash.com.ar

Recommended Solutions

Following AWS Backups and Retention Policies | Part 1

This article is based on Binbash Leverage Reference Architecture for AWS backup feature the original inspiration article has been written and shared by Diego Ojeda (DevOps Cloud Solutions & Software Architecture Consultant at Binbash)

Backup and Disaster Recovery Concepts

When you are dealing with disaster recovery and backups, there are two concepts that show up early on: RTO and RPO.

RTO dictates how quickly your infrastructure needs to be back online after a disaster. This is often a target time set for services restoration after a disaster. In other words: how long can it take for our system to recover after we were notified of a business disruption?

RPO measures the acceptable amount of data loss after a disruption of service. For instance: how much data (updated or created) will be lost or need to be reentered after an outage? It is necessary to know how much data is an acceptable loss.

In general, when it comes to databases, the more frequent you back up your data, the less data you would likely lose.

Our Recommended Solutions

At this point we were satisfied with all alternatives we analyzed and given the limited time frame we had for a first iteration on this subject (keep in mind that backup and disaster recovery should be a continuous effort) it was about time to devise our recommendation. We came up with the following:

RDS Aurora

1- RDS Automated Backups

This will allow you to go back to a specific point-in-time (within your retention period).

  • Set retention to 1 day in order to be able to go back up to any time within the last 24 hours. A higher retention can be configured if need be.
  • Frequency is managed by AWS.
  • Backup window needs to be set in a way that won’t collide with other backup strategies.

2 - AWS Backup

We can use it to retain data beyond the maximum that is supported by automated backups.

In this case we will use to take daily snapshots, which are kept for 30 days:

  • Frequency: daily — but we can configure it to run more frequently.
  • Retention: 30 days — but we can configure this to accommodate to our needs.
  • Monitoring: configure alerts to trigger when a Backup Job fails
  • Backup Window: set it in a way that won’t collide with automated backups.
  • Copy: configure it to copy to the DR region.
Figure: AWS Backup service slack notifications (Binbash self-taken screenshoot)

3- For longer retention backups we suggest a couple of options

AWS Backup

  • Implementation: very easy.
  • Frequency: monthly (end of month).
  • Retention: 10 years.
  • Monitoring: configure alerts to trigger when a Backup Job fails
  • Cost: it can be somewhat expensive as we will store 12 snapshots per year, which results in 120 snapshots kept for 10 years. However, since we only take monthly snapshots, it is not as expensive as keeping daily snapshots for 10 years (3650 snapshots).

RDS Export to S3

Binbash Leverage Terraform module

  • Implementation: custom; we can use Lambda to trigger it.
  • Frequency: monthly (end of month) — via CloudWatch Event Rule (cron-like).
  • Retention: 10 years — managed by the destination S3 bucket.
  • Additional setup: we need to configure the destination bucket to transition the backups to the appropriate storage classes. For instance: keep the backups in standard for 15 days, then move them to Infrequent Access after 30 days, then move them to Glacier after 60 days, and expire them after 10 years. We also need to replicate the bucket in the DR region.
  • Cost: this is the cheapest alternative as it leverages S3 cheaper storage classes.

EBS

AWS Backup

📆 Daily snapshots

  • Frequency: daily — but we can configure it to run more frequently.
  • Retention: 30 days — but we can configure this to accommodate to our needs.
  • Monitoring: configure alerts to trigger when a Backup Job fails
  • Copy: configure it to copy to the DR region.

📆 Monthly snapshots

  • Frequency: monthly.
  • Retention: 10 years.
  • Monitoring: configure alerts to trigger when a Backup Job fails

Cheaper Alternatives

  • Use a Third-party service for long term retention.

S3

Configure Lifecycle to retain objects for 10 years

  • Define lifecycle rules to make all objects expire after 10 years.

Configure objects to transition to cheaper storage based on access needs. We have 2 choices here:

✅ We could either trust Intelligent Tiering but keep in mind the following:

  • It is better suited for data that has unknown or changing access patterns.
  • Based on that, it can move objects that have not been accessed in 30 consecutive days to the Infrequent Access tier.
  • And optionally, it can move objects that haven’t been accessed for 90 consecutive days to the Archive Access tier, and after 180 consecutive days of no access, to the Deep Archive Access tier.
  • Pay close attention to the behavior above and confirm that your data access patterns will be compatible with them.

✅ Or we could set our own transition rules:

  • For instance, we could transition objects to S3 Standard IA after 30 days, then to Glacier after 180 days, and then to Glacier Deep after 360 days.

Configure buckets replication in the DR region if need be.

References

  1. Overview of backing up and restoring an Aurora DB cluster — Amazon Aurora
  2. Backing up and restoring an Amazon Aurora DB cluster — Amazon Aurora
  3. Amazon EBS snapshots — Amazon Elastic Compute Cloud
  4. Managing your storage lifecycle — Amazon Simple Storage Service
  5. What is AWS Backup? — AWS Backup
  6. Amazon Data Lifecycle Manager — Amazon Elastic Compute Cloud

--

--

Exequiel Barrirero
binbash

Co-Founder & Director of Engineering @ binbash | AWS Community Builder 🏗️☁️