
Cloud outages inevitably happen. Be sufficient in your recovery efforts and plan ahead.
Disaster Recovery is a procedure to recover technology infrastructure and systems following a disaster. There are 2 types of disasters:
Natural — These include natural calamities like floods, tornado, earthquakes.
Man-Made — These are disasters caused by human negligence or errors such as infrastructure failure, IT Bugs, cyber-terrorism.
In such cases, not only should we have backups but backups should be copied across multiple regions and multiple accounts.
Here is a 5-point guide for AWS DR automation:
Type of Backups
There Are Three Major Levels Of Recoveries, Organization Should Consider While Designing Their Recovery Solution:
File Level Recovery — from files stored in S3.
Volume Level Recovery — from snapshots.
Database Level Recovery — from DB Snapshots.
For every AWS Infrastructure, there are many kinds of resources that need to be backed up for DR purpose:
EC2 Instance Backups (EC2 AMIs)
EBS Volume Backups (Snapshots)
RDS DB Backups (DBSnapshots)
Elasticache DB Cluster Backups (Elasticache Snapshots)
Redshift DB Cluster Backups (Redshift Snapshots)
Route53 Hosted Zone Backups (S3 Copy Hosted Zone Files)
CloudFormation Template Backups (CloudFormation Template)
Critical vs Less Critical vs Non-Critical
Depending on the systems and their potential impacts on the business, we can classify strategies into 3 types –
Most Critical System — Frequency — 1 hour. Retention -1 year
Less Critical System — Frequency -1 day. Retention — 180 days
Non-Critical System — Frequency -1 week. Retention — 4 weeks.
— Manually Backup if required.
Automated vs Manual backups
In a dynamic cloud environment, with a wide range of services, it is extremely difficult to manage resources and deal with continuous changes beneath them.
For example:
If an organization has 100’s of instances of different types with different roles to play, it becomes impossible to manually create backups and monitor them.
With Automation, you just need to add tags to every instance defining their
role. It will help to create individual policies based on their role.
Let’s say, you have the following definition of instances –
Tag Instance Count Backup Policy
ENV/DEVELOPMENT30Once in a weekENV/MONITORING5Once in a monthENV/PRODUCTION60 Every 4 hoursENV/OTHERS5Not required(manually)
In the example shown above, automation is a clear winner relative to a manual backup.
Cost Optimized backups
Organizations should make strategies to clean up old backups which are no longer required. This will drastically reduce AWS Infrastructure Cost.
Also, AWS has a limit on the number of backups that can be created in an account. For e.g. EBS Snapshot limit is 10,000.
Cost Optimized DR Strategy is therefore required to ensure limited backups.
In Botmetric backups jobs, Snapshots to retain parameter(s) ensures to keep the number of snapshots per volume.
Similarly, AMIs to retain ensures to keep number of AMIs per instance.
Let us understand it with an example — If there are 180 Snapshot to retain, and the job execution is once a day it will keep snapshots of 180 days (i.e. 6 months) old.
If there are 360 Snapshot to retain and the job execution is twice a day, it will keep a backup of 180 days (i.e. 6 months) old. However, it will keep 2 snapshots per volume of the past 180 days.
Note: For safety purpose we will try to keep Snapshot to retain+1.
DR Automation for various AWS Resource
Depending on the AWS Infrastructure and DR Strategy backups can be taken across regions/across accounts.
In Botmetric, we have a wide variety of jobs for various services-
EC2:
Create EC2 Ami based on EC2 Instance tags
Copy EC2 Ami based on EC2 Instance tags across regions
Copy EC2 Ami based on EC2 ami tags across regions
Copy EC2 Ami based on EC2 Instance tags across accounts
EBS:
Create EBS snapshot based on ebs volume tags
Create EBS snapshot based on ec2 instance tags
Create EBS snapshot based on ec2 instance ids
Copy EBS snapshot based on ebs volume tags across regions
Copy EBS snapshot based on ec2 instance tags across regions
Copy EBS snapshot based on ebs volume tags across accounts
RDS:
Create RDS snapshot snapshot based on DB Instance tags
Copy RDS snapshot based on DB Instance tags across regions
REDSHIFT:
Create Redshift snapshot based on redshfit cluster tags
ROUTE53:
Create Route53 Hosted Zone backups
In addition to it, for cleaning up of old backups, we have de-register Old EC2 AMIs and Delete Old EBS Snapshots jobs.
Conclusion
In today’s ever changing cloud environment, zeal to achieve continuous availability, robustness, scalability and dynamicity spawned the rise of ‘Backup as a Service’ (BaaS). With AWS DR automation and smart strategies you can secure make your business ‘disaster-free’. Read about the do’s and don’ts of DR Automation strategy.
Botmetric is an intelligent cloud management platform that is designed to make cloud easy for engineers. Sign up now, to see how Botmetic can help you with your Disaster recovery planning.
