AWS Disaster Recovery Part 1 — Backup
Before we begin AWS Disaster Recovery, let us discuss where majority of organizations are standing in terms of disaster recovery. As we all know, on-premise server has long been there store the critical data of an organization. But cloud, as one of the booming industry, gives organizations the edge to take advantage of completing storing and disaster discovery in more efficient and cost-effective manner. AWS Disaster Recovery is no doubt among the list
How Disaster Recovery is not “one solution fits all”. Depending on the sizes, nature, requirements and other factors of a business. The strategies could be varied. We’ll touching upon some generic strategies before jumping into our backup
we will touch upon following strategies:
Identify and Describe All of Your Infrastructure
It’s essential to have a clear picture about your own infrastructure prior to coming up with a disaster recovery plan
Involve the Entire Development Team
It would not be possible to have a comprehensive disaster recovery plan without consulting the entire development team. Establishing dependencies and mapping infrastructure is a time-consuming process and one that must occur over time. It’s a bit like a grocery list — you keep adding to it as new items come to mind
Identify the Importance of Each Infrastructure Element
Prioritize elements according to its importance in the organization. Say, if a key business service is down, then it has to be restored in a timely manner. Otherwise, the time allowed to recover would be longer
Discuss RTO and RPO with Stakeholders
- RTO (recovery time objective) — when they can expect to be back up and running)
- RPO (recovery point objective) — the point in the past where they will be taken once recovery occurs
Based on different RTO and RPO, we may end up with following scenarios
Use Your Accumulated Information to Create Your Plan
Downtime Cost vs. Backup/Recovery Cost: As inidicated by our RTO/ RPO recovery plans, we come to a conclusion that there is no solution that fits all. For instance, a giant transaction based e-commerce website like Amazon can’t afford a few seconds of downtime, then Multi-Site is clearly the alternative. However, a small enterprise can adopt either backup or pilot light since time allowed to recover overweighs enormous cost by having a Multi-Site Disaster Recovery Plan
How much data loss is acceptable? The amount of data loss could cause big issue for some big financial institutions. For small firms, it might not be a big deal to lose its data
Which specific backup options are best suited to your circumstances? Within EC2, for example, you have choices between Amazon Machine Images (AMI) or EBS snapshots. In general, instance store-based AMI’s are slower and less flexible and cost more than EBS snapshots. But there are also strengths that come with that additional cost.
How will you automate your backup and how should you choose an additional region for copies of those backups? There are a wealth of AWS tools for automating your backups so that you can rest easy knowing that the process goes on without your intervention. And you can literally select an additional region for backup half a world away. You will want to make use of AWS disaster recovery management tools, many of which can be had with a few clicks of your cloud provider console. There are also custom solutions available via the AWS marketplace, including options ranging from “pilot light” to “hot standby.”
Establish the In-House Communication Network
Clearly, there are two ways to set it
Re-assign developers from your in-house team to monitor and fine-tune your infrastructure, and run DR scenarios
Hire a DevOps support team, who will manage your IT support 24/7, report on new findings and continuously optimize your infrastructure performance
Testing and Re-Testing
What matters most is testing, testing and testing. Without doing enough testing, you would not be able to foresee what is better for your organization
In this project, we will be discussing AWS backup to back and recover
Prerequisites
For this walkthrough, you need the following:
- An AWS account — with non-root user (take security into consideration)
- In terms of system, we will be using RHEL 8.3 by Oracle Virtual Box on Windows 10 using putty
- AWSCLI installed
Let us work on them one by one.
Creating a non-root user
Based on AWS best practice, root user is not recommended to perform everyday tasks, even the administrative ones. The root user, rather is used to to create your first IAM user, groups and roles. Then you need to securely lock away the root user credentials and use them to perform only a few account and service management tasks.
Notes: If you would like to learn more about why we should not use root user for operations and more about AWS account, please find more here.
Set up RHEL 8.3 by Oracle Virtual Box on Windows 10 using putty
First, we will download Oracle Virtual Box on Windows 10, please click Windows hosts
Second, we will also download RHEL iso
Let us make it work now!
Click Oracle VirtualBox and open the application and follow instructions here, you will install RHEL 8.3 as shown below
Notes: In case you are unable to install RHEL 8.3 successfully, please find solutions here. Also, after you create your developer’s account with Red Hat, you have to wait for sometime before register it. Otherwise, you may receive errors as well.
Now it’s time for us to connect to RHEL 8.3 from Windows 10 using VirtualBox.
Click activities and open terminal
Notes: In order to be able to connect to RHEL 8.3 from Windows 10 using putty later, we must enable what it is shown below.
Now we will get the ip that we will be using to connect to RHEL 8.3 from Windows 10 using Putty (highlighted ip address for enp0s3 is the right one to use)
Then we will install Putty.
ssh-keygen with a password
Creating a password-protected key looks something like this:
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pzhao/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/pzhao/.ssh/id_rsa.
Your public key has been saved in /home/pzhao/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:RXPnUZg/fGgRGTOxEfbo3VOMo/Yp4Gi80has/iR4m/A pzhao@localhost.localdomain
The key's randomart image is:
+---[RSA 3072]----+
| o . %X.|
| . o +=@ |
| . B++|
| . oo==|
| .S . o...=|
| . .oo o . ..|
| o oo=.. . o |
| +o*o. . |
| .E+o |
+----[SHA256]-----+
To find out private key
$ cat .ssh/id_rsa
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn
NhAAAAAwEAAQAAAYEAwoavXHvZCYPO/sbMD0ibtkvF+9/NmSm2m/Z8wRy7O2A012YS98ap
8aq18PXfKPyyAMNF3hdG3xi1KMD7DSIb/C1gunjTREEJRfYjydOjFBFtZWY78Mj4eQkrPJ
.
.
.
-----END OPENSSH PRIVATE KEY-----
Notes: You may take advantage of GUI of RHEL to send Private Key as an email, then open the mail and copy the private key from email
Open the Notepad in Windows 10 and save private key as ansiblekey.pem file
Then open PuTTY Key Generator and load the private key ansiblekey.pem
Then save it as a private key as ansible.ppk file
We now open Putty and input IP address we saved previously as Host Name (or IP address) 192.168.0.18
We then move on to Session and input IP address
For convenience, we may save it as a predefined session as shown below
You should see the pop up below if you log in for the very first time
Then you input your username and password to login. You see below image after log in.
Installing AWS CLI
To install AWS CLI after logging into Redhat8
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
To verify the installation
$ aws --version
aws-cli/2.0.46 Python/3.7.4 Darwin/19.6.0 exe/x86_64
To use aws cli, we need to configure it using aws access key, aws secret access key, aws region and aws output format
$ aws configure
AWS Access Key ID [****************46P7]:
AWS Secret Access Key [****************SoXF]:
Default region name [us-east-1]:
Default output format [json]:
Now we may begin our project to backup our EC2 Instance
Since YAML file is indentation sensitive, so we’ll be using Git gists for our project files
Create a YAML file named default-region-infrastructure.yaml
vim default-region-infrastructure.yaml
Create a CloudFormation stack named default-region-infrastructure
and assign your email as the email to NotificationEmail
and assign us-east-1b as the Availability Zone to AvailabilityZone
$ aws cloudformation create-stack --template-body file://default-region-infrastructure.yaml --stack-name default-region-infrastructure --parameters ParameterKey=NotificationEmail,ParameterValue=zhaofeng8711@gmail.com ParameterKey=AvailabilityZone,ParameterValue=us-east-1b --capabilities CAPABILITY_IAM
{
"StackId": "arn:aws:cloudformation:us-east-1:464392538707:stack/default-region-infrastructure/7b9db5d0-7ed1-11eb-b4ce-12ab197e9ef5"
}
Cross check resources created
Create Backup Plan
Firstly, we create a backup vault named BackupVault
$ aws backup create-backup-vault --backup-vault-name BackupVault
{
"BackupVaultName": "BackupVault",
"BackupVaultArn": "arn:aws:backup:us-east-1:464392538707:backup-vault:BackupVault",
"CreationDate": 1615077316.706
}
Then, we create backup plan named Backupplan, with rules set up to back up every day at 5:00 am UTC and assign tags key
value and value
APP, which were targeting EC2 we created using CloudFormation. Also, it starts within 480 minutes and delete after 35 days
$ aws backup create-backup-plan --backup-plan "{\"BackupPlanName\":\"Backupplan\",\"Rules\":[{\"RuleName\":\"DailyBackups\",\"ScheduleExpression\":\"cron(0 5 * * ? *)\",\"StartWindowMinutes\":480,\"TargetBackupVaultName\":\"BackupVault\",\"Lifecycle\":{\"DeleteAfterDays\":35}}]}" --backup-plan-tags workload=APP
{
"BackupPlanId": "a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
"BackupPlanArn": "arn:aws:backup:us-east-1:464392538707:backup-plan:a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
"CreationDate": 1615077342.351,
"VersionId": "NTBjZmU4MzUtNmZkNS00YzBlLWFjMjUtM2Q4N2U5YTlmZTg4"
}
Enable Notifications
We set up our notification upon BACKUP_JOB_COMPLETED
and RESTORE_JOB_COMPLETED
$ aws backup put-backup-vault-notifications --region us-east-1 --backup-vault-name BackupVault --backup-vault-events BACKUP_JOB_COMPLETED RESTORE_JOB_COMPLETED --sns-topic-arn arn:aws:sns:us-east-1:464392538707:BackupNotificationTopic-default-region-infrastructure$ aws backup get-backup-vault-notifications --backup-vault-name BackupVault --region us-east-1
{
"BackupVaultName": "BackupVault",
"BackupVaultArn": "arn:aws:backup:us-east-1:464392538707:backup-vault:BackupVault",
"SNSTopicArn": "arn:aws:sns:us-east-1:464392538707:BackupNotificationTopic-default-region-infrastructure",
"BackupVaultEvents": [
"BACKUP_JOB_COMPLETED",
"RESTORE_JOB_COMPLETED"
Testing Recovery
Now we will be testing our recovery plan using on-demand backup
Create an on-demand backup to simulate an EC2 backup and restore process
Select EC2 as Resource type
and Instance ID
created previously i-09f31cc79bb63e142 and leave IAM role
as Default since it will automatically create a corresponding IAM role
Upon setting up on-demand backup, backup job was initiated. Around 10 minutes later, job was done
Email received upon completion of backup
Right after, restore jobs will be triggered by Lambda Function. 10 minutes after, the restore job was done
Email received for restore job completion
While backing up we created a brand new EC2. Upon completion recovery, Lambda Function will automatically trigger to terminate newly created EC2 instance to save costs
Clean up
Delete CloudFormation Stack using
$ aws cloudformation delete-stack --stack-name default-region-infrastructure
Cross check in AWS CloudFormation
Delete AWS Backup Resources
$ aws backup delete-backup-plan --backup-plan-id a56a1fc9-b176-46b8-80b8-a7fa5679ec1f
{
"BackupPlanId": "a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
"BackupPlanArn": "arn:aws:backup:us-east-1:464392538707:backup-plan:a56a1fc9-b176-46b8-80b8-a7fa5679ec1f",
"DeletionDate": 1615096227.504,
"VersionId": "NTBjZmU4MzUtNmZkNS00YzBlLWFjMjUtM2Q4N2U5YTlmZTg4"
}
Cross check in AWS Backup
Delete Backup Vault Points
$ aws backup delete-recovery-point --backup-vault-name BackupVault --recovery-point-arn arn:aws:ec2:us-east-1::image/ami-089590ebcfd373c24
Cross check in AWS Backup Vault Points
Delete Backup Vault
$ aws backup delete-backup-vault --backup-vault-name BackupVault
Cross check in AWS Backup Vault
Delete CloudWatch Log Groups
$ aws logs delete-log-group --log-group-name /aws/lambda/RestoreTestFunction-default-region-infrastructure
Conclusion:
In this part one of the AWS Disaster Recovery series, we focus on Backup and Restore of EC2 instance. With that said, we only have daily backup set up. In case of disaster, EC2 can’t be recovered automatically. We’ll be discussing auto recovery and other pilot light recovery in the next part of this series