AWS Disaster Recovery Part 1 — Backup

Published in

Paul Zhao Projects

10 min readMar 7, 2021

Before we begin AWS Disaster Recovery, let us discuss where majority of organizations are standing in terms of disaster recovery. As we all know, on-premise server has long been there store the critical data of an organization. But cloud, as one of the booming industry, gives organizations the edge to take advantage of completing storing and disaster discovery in more efficient and cost-effective manner. AWS Disaster Recovery is no doubt among the list

How Disaster Recovery is not “one solution fits all”. Depending on the sizes, nature, requirements and other factors of a business. The strategies could be varied. We’ll touching upon some generic strategies before jumping into our backup

we will touch upon following strategies:

Identify and Describe All of Your Infrastructure

It’s essential to have a clear picture about your own infrastructure prior to coming up with a disaster recovery plan

Involve the Entire Development Team

It would not be possible to have a comprehensive disaster recovery plan without consulting the entire development team. Establishing dependencies and mapping infrastructure is a time-consuming process and one that must occur over time. It’s a bit like a grocery list — you keep adding to it as new items come to mind

Identify the Importance of Each Infrastructure Element

Prioritize elements according to its importance in the organization. Say, if a key business service is down, then it has to be restored in a timely manner. Otherwise, the time allowed to recover would be longer

Discuss RTO and RPO with Stakeholders

RTO (recovery time objective) — when they can expect to be back up and running)
RPO (recovery point objective) — the point in the past where they will be taken once recovery occurs

Based on different RTO and RPO, we may end up with following scenarios

Use Your Accumulated Information to Create Your Plan

Downtime Cost vs. Backup/Recovery Cost: As inidicated by our RTO/ RPO recovery plans, we come to a conclusion that there is no solution that fits all. For instance, a giant transaction based e-commerce website like Amazon can’t afford a few seconds of downtime, then Multi-Site is clearly the alternative. However, a small enterprise can adopt either backup or pilot light since time allowed to recover overweighs enormous cost by having a Multi-Site Disaster Recovery Plan

How much data loss is acceptable? The amount of data loss could cause big issue for some big financial institutions. For small firms, it might not be a big deal to lose its data

Which specific backup options are best suited to your circumstances? Within EC2, for example, you have choices between Amazon Machine Images (AMI) or EBS snapshots. In general, instance store-based AMI’s are slower and less flexible and cost more than EBS snapshots. But there are also strengths that come with that additional cost.

How will you automate your backup and how should you choose an additional region for copies of those backups? There are a wealth of AWS tools for automating your backups so that you can rest easy knowing that the process goes on without your intervention. And you can literally select an additional region for backup half a world away. You will want to make use of AWS disaster recovery management tools, many of which can be had with a few clicks of your cloud provider console. There are also custom solutions available via the AWS marketplace, including options ranging from “pilot light” to “hot standby.”

Establish the In-House Communication Network

Clearly, there are two ways to set it

Re-assign developers from your in-house team to monitor and fine-tune your infrastructure, and run DR scenarios

Hire a DevOps support team, who will manage your IT support 24/7, report on new findings and continuously optimize your infrastructure performance

Testing and Re-Testing

What matters most is testing, testing and testing. Without doing enough testing, you would not be able to foresee what is better for your organization

In this project, we will be discussing AWS backup to back and recover

Prerequisites

For this walkthrough, you need the following:

An AWS account — with non-root user (take security into consideration)
In terms of system, we will be using RHEL 8.3 by Oracle Virtual Box on Windows 10 using putty
AWSCLI installed

Let us work on them one by one.

Creating a non-root user

Based on AWS best practice, root user is not recommended to perform everyday tasks, even the administrative ones. The root user, rather is used to to create your first IAM user, groups and roles. Then you need to securely lock away the root user credentials and use them to perform only a few account and service management tasks.

Notes: If you would like to learn more about why we should not use root user for operations and more about AWS account, please find more here.

**Keep credentials (Access key ID and Secret access key)**

Set up RHEL 8.3 by Oracle Virtual Box on Windows 10 using putty

First, we will download Oracle Virtual Box on Windows 10, please click Windows hosts

Second, we will also download RHEL iso

Let us make it work now!

Click Oracle VirtualBox and open the application and follow instructions here, you will install RHEL 8.3 as shown below

Notes: In case you are unable to install RHEL 8.3 successfully, please find solutions here. Also, after you create your developer’s account with Red Hat, you have to wait for sometime before register it. Otherwise, you may receive errors as well.

Now it’s time for us to connect to RHEL 8.3 from Windows 10 using VirtualBox.

Click activities and open terminal

Notes: In order to be able to connect to RHEL 8.3 from Windows 10 using putty later, we must enable what it is shown below.

**Bridged Adapter selectedBridged Adapter selected**

Now we will get the ip that we will be using to connect to RHEL 8.3 from Windows 10 using Putty (highlighted ip address for enp0s3 is the right one to use)

Then we will install Putty.

ssh-keygen with a password

Creating a password-protected key looks something like this:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pzhao/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/pzhao/.ssh/id_rsa.
Your public key has been saved in /home/pzhao/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:RXPnUZg/fGgRGTOxEfbo3VOMo/Yp4Gi80has/iR4m/A pzhao@localhost.localdomain
The key's randomart image is:
+---[RSA 3072]----+
|          o . %X.|
|         . o +=@ |
|          .   B++|
|         .   oo==|
|       .S . o...=|
|     . .oo o . ..|
|    o oo=.. . o  |
|     +o*o.   .   |
|     .E+o        |
+----[SHA256]-----+

To find out private key

$ cat .ssh/id_rsa
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn
NhAAAAAwEAAQAAAYEAwoavXHvZCYPO/sbMD0ibtkvF+9/NmSm2m/Z8wRy7O2A012YS98ap
8aq18PXfKPyyAMNF3hdG3xi1KMD7DSIb/C1gunjTREEJRfYjydOjFBFtZWY78Mj4eQkrPJ
.
.
.
-----END OPENSSH PRIVATE KEY-----

Notes: You may take advantage of GUI of RHEL to send Private Key as an email, then open the mail and copy the private key from email

Open the Notepad in Windows 10 and save private key as ansiblekey.pem file

Then open PuTTY Key Generator and load the private key ansiblekey.pem

**Load private key in putty key generator**

Then save it as a private key as ansible.ppk file

We now open Putty and input IP address we saved previously as Host Name (or IP address) 192.168.0.18

We then move on to Session and input IP address

For convenience, we may save it as a predefined session as shown below

You should see the pop up below if you log in for the very first time

Then you input your username and password to login. You see below image after log in.

Installing AWS CLI

To install AWS CLI after logging into Redhat8

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

To verify the installation

$ aws --version
aws-cli/2.0.46 Python/3.7.4 Darwin/19.6.0 exe/x86_64

To use aws cli, we need to configure it using aws access key, aws secret access key, aws region and aws output format

$ aws configure
AWS Access Key ID [****************46P7]:
AWS Secret Access Key [****************SoXF]:
Default region name [us-east-1]:
Default output format [json]:

Now we may begin our project to backup our EC2 Instance

Since YAML file is indentation sensitive, so we’ll be using Git gists for our project files

Create a YAML file named default-region-infrastructure.yaml

vim default-region-infrastructure.yaml

default-region-infrastructure.yaml

Create a CloudFormation stack named default-region-infrastructure and assign your email as the email to NotificationEmail and assign us-east-1b as the Availability Zone to AvailabilityZone

$ aws cloudformation create-stack --template-body file://default-region-infrastructure.yaml --stack-name default-region-infrastructure --parameters ParameterKey=NotificationEmail,ParameterValue=zhaofeng8711@gmail.com ParameterKey=AvailabilityZone,ParameterValue=us-east-1b --capabilities CAPABILITY_IAM
{
    "StackId": "arn:aws:cloudformation:us-east-1:464392538707:stack/default-region-infrastructure/7b9db5d0-7ed1-11eb-b4ce-12ab197e9ef5"
}

Cross check resources created

Create Backup Plan

Firstly, we create a backup vault named BackupVault

$ aws backup create-backup-vault --backup-vault-name BackupVault
{
    "BackupVaultName": "BackupVault",
    "BackupVaultArn": "arn:aws:backup:us-east-1:464392538707:backup-vault:BackupVault",
    "CreationDate": 1615077316.706
}

Then, we create backup plan named Backupplan, with rules set up to back up every day at 5:00 am UTC and assign tags key value and value APP, which were targeting EC2 we created using CloudFormation. Also, it starts within 480 minutes and delete after 35 days

$ aws backup create-backup-plan --backup-plan "{\"BackupPlanName\":\"Backupplan\",\"Rules\":[{\"RuleName\":\"DailyBackups\",\"ScheduleExpression\":\"cron(0 5 * * ? *)\",\"StartWindowMinutes\":480,\"TargetBackupVaultName\":\"BackupVault\",\"Lifecycle\":{\"DeleteAfterDays\":35}}]}" --backup-plan-tags workload=APP
{
    "BackupPlanId": "a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
    "BackupPlanArn": "arn:aws:backup:us-east-1:464392538707:backup-plan:a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
    "CreationDate": 1615077342.351,
    "VersionId": "NTBjZmU4MzUtNmZkNS00YzBlLWFjMjUtM2Q4N2U5YTlmZTg4"
}

Enable Notifications

We set up our notification upon BACKUP_JOB_COMPLETED and RESTORE_JOB_COMPLETED

$ aws backup put-backup-vault-notifications --region us-east-1 --backup-vault-name BackupVault --backup-vault-events BACKUP_JOB_COMPLETED RESTORE_JOB_COMPLETED --sns-topic-arn arn:aws:sns:us-east-1:464392538707:BackupNotificationTopic-default-region-infrastructure$ aws backup get-backup-vault-notifications --backup-vault-name BackupVault --region us-east-1
{
    "BackupVaultName": "BackupVault",
    "BackupVaultArn": "arn:aws:backup:us-east-1:464392538707:backup-vault:BackupVault",
    "SNSTopicArn": "arn:aws:sns:us-east-1:464392538707:BackupNotificationTopic-default-region-infrastructure",
    "BackupVaultEvents": [
        "BACKUP_JOB_COMPLETED",
        "RESTORE_JOB_COMPLETED"

Testing Recovery

Now we will be testing our recovery plan using on-demand backup

Create an on-demand backup to simulate an EC2 backup and restore process

Select EC2 as Resource type and Instance ID created previously i-09f31cc79bb63e142 and leave IAM role as Default since it will automatically create a corresponding IAM role

Upon setting up on-demand backup, backup job was initiated. Around 10 minutes later, job was done

Email received upon completion of backup

Right after, restore jobs will be triggered by Lambda Function. 10 minutes after, the restore job was done

Email received for restore job completion

**Email received for completion of restore job**

While backing up we created a brand new EC2. Upon completion recovery, Lambda Function will automatically trigger to terminate newly created EC2 instance to save costs

Clean up

Delete CloudFormation Stack using

$ aws cloudformation delete-stack --stack-name default-region-infrastructure

Cross check in AWS CloudFormation

Delete AWS Backup Resources

$ aws backup delete-backup-plan --backup-plan-id a56a1fc9-b176-46b8-80b8-a7fa5679ec1f
{
    "BackupPlanId": "a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
    "BackupPlanArn": "arn:aws:backup:us-east-1:464392538707:backup-plan:a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
    "DeletionDate": 1615096227.504,
    "VersionId": "NTBjZmU4MzUtNmZkNS00YzBlLWFjMjUtM2Q4N2U5YTlmZTg4"
}

Cross check in AWS Backup

Delete Backup Vault Points

$ aws backup delete-recovery-point --backup-vault-name BackupVault --recovery-point-arn arn:aws:ec2:us-east-1::image/ami-089590ebcfd373c24

Cross check in AWS Backup Vault Points

Delete Backup Vault

$ aws backup delete-backup-vault --backup-vault-name BackupVault

Cross check in AWS Backup Vault

Delete CloudWatch Log Groups

$ aws logs delete-log-group --log-group-name /aws/lambda/RestoreTestFunction-default-region-infrastructure

Conclusion:

In this part one of the AWS Disaster Recovery series, we focus on Backup and Restore of EC2 instance. With that said, we only have daily backup set up. In case of disaster, EC2 can’t be recovered automatically. We’ll be discussing auto recovery and other pilot light recovery in the next part of this series