Cloud outages inevitably happen. Be sufficient in your recovery efforts and plan ahead.

Disaster Recovery is a procedure to recover technology infrastructure and systems following a disaster. There are 2 types of disasters:
Natural — These include natural calamities like floods, tornado, earthquakes.
Man-Made — These are disasters caused by human negligence or errors such as infrastructure failure, IT Bugs, cyber-terrorism.
In such cases, not only should we have backups but backups should be copied across multiple regions and multiple accounts.

Here is a 5-point guide for AWS DR automation:

Type of Backups

There Are Three Major Levels Of Recoveries, Organization Should Consider While Designing Their Recovery Solution:

File Level Recovery — from files stored in S3.

Volume Level Recovery — from snapshots.

Database Level Recovery — from DB Snapshots.
For every AWS Infrastructure, there are many kinds of resources that need to be backed up for DR purpose:
EC2 Instance Backups (EC2 AMIs)
EBS Volume Backups (Snapshots)
RDS DB Backups (DBSnapshots)
Elasticache DB Cluster Backups (Elasticache Snapshots)
Redshift DB Cluster Backups (Redshift Snapshots)
Route53 Hosted Zone Backups (S3 Copy Hosted Zone Files)
CloudFormation Template Backups (CloudFormation Template)

Critical vs Less Critical vs Non-Critical

Depending on the systems and their potential impacts on the business, we can classify strategies into 3 types –
Most Critical System — Frequency — 1 hour. Retention -1 year
Less Critical System — Frequency -1 day. Retention — 180 days
Non-Critical System — Frequency -1 week. Retention — 4 weeks.
— Manually Backup if required.

Automated vs Manual backups

In a dynamic cloud environment, with a wide range of services, it is extremely difficult to manage resources and deal with continuous changes beneath them.
For example:
If an organization has 100’s of instances of different types with different roles to play, it becomes impossible to manually create backups and monitor them.
With Automation, you just need to add tags to every instance defining their
role. It will help to create individual policies based on their role.
Let’s say, you have the following definition of instances –
Tag Instance Count Backup Policy

ENV/DEVELOPMENT30Once in a weekENV/MONITORING5Once in a monthENV/PRODUCTION60 Every 4 hoursENV/OTHERS5Not required(manually)

In the example shown above, automation is a clear winner relative to a manual backup.

Cost Optimized backups

Organizations should make strategies to clean up old backups which are no longer required. This will drastically reduce AWS Infrastructure Cost.
Also, AWS has a limit on the number of backups that can be created in an account. For e.g. EBS Snapshot limit is 10,000.

Cost Optimized DR Strategy is therefore required to ensure limited backups.
In Botmetric backups jobs, Snapshots to retain parameter(s) ensures to keep the number of snapshots per volume.
Similarly, AMIs to retain ensures to keep number of AMIs per instance.
Let us understand it with an example — If there are 180 Snapshot to retain, and the job execution is once a day it will keep snapshots of 180 days (i.e. 6 months) old.

If there are 360 Snapshot to retain and the job execution is twice a day, it will keep a backup of 180 days (i.e. 6 months) old. However, it will keep 2 snapshots per volume of the past 180 days.

Note: For safety purpose we will try to keep Snapshot to retain+1.

DR Automation for various AWS Resource

Depending on the AWS Infrastructure and DR Strategy backups can be taken across regions/across accounts.
In Botmetric, we have a wide variety of jobs for various services-

EC2:
Create EC2 Ami based on EC2 Instance tags
Copy EC2 Ami based on EC2 Instance tags across regions
Copy EC2 Ami based on EC2 ami tags across regions
Copy EC2 Ami based on EC2 Instance tags across accounts
EBS:
Create EBS snapshot based on ebs volume tags
Create EBS snapshot based on ec2 instance tags
Create EBS snapshot based on ec2 instance ids
Copy EBS snapshot based on ebs volume tags across regions
Copy EBS snapshot based on ec2 instance tags across regions
Copy EBS snapshot based on ebs volume tags across accounts

RDS:
Create RDS snapshot snapshot based on DB Instance tags
Copy RDS snapshot based on DB Instance tags across regions

REDSHIFT:
Create Redshift snapshot based on redshfit cluster tags

ROUTE53:
Create Route53 Hosted Zone backups

In addition to it, for cleaning up of old backups, we have de-register Old EC2 AMIs and Delete Old EBS Snapshots jobs.

Conclusion

In today’s ever changing cloud environment, zeal to achieve continuous availability, robustness, scalability and dynamicity spawned the rise of ‘Backup as a Service’ (BaaS). With AWS DR automation and smart strategies you can secure make your business ‘disaster-free’. Read about the do’s and don’ts of DR Automation strategy.

Botmetric is an intelligent cloud management platform that is designed to make cloud easy for engineers. Sign up now, to see how Botmetic can help you with your Disaster recovery planning.

Nutanix

Written by

Nutanix

We make infrastructure invisible, elevating IT to focus on the applications and services that power their business.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade