Even a minimal interruption of service can mean disaster for an organization, implying thousands of dollars in data loss. A disaster can be caused by a security attack, a natural disaster or human error. Business continuity is critical for any company in the cloud. A solid disaster recovery plan help organizations stay up in the event of failure or attack.
One of the leading cloud vendors, Amazon Web Services (AWS), provides its users with features to help them build their own Disaster Recovery Solution. In this article, I aim to cover what is a Disaster Recovery Plan (DRP) for AWS and I’ll offer 10 tips to leverage the functions in your AWS console to prevent and recover from a disaster.
AWS Disaster Recovery Plan Overview
A Disaster Recovery Plan (DRP) is a structured and detailed set of instructions geared to recover system and networks in the event of failure or attack, with the aim to help the organization back to operational as fast as possible.
Deploying an on-premises disaster recovery solution usually involves high costs of implementation and maintenance. Therefore, many companies leverage the disaster recovery tools and solutions provided by their cloud vendors, such as AWS or Azure. These solutions may be offered by third-party vendors — for example, AWS partners with companies such as N2WS and Cloudberrylab that offer disaster recovery solutions tailored to AWS.
AWS users can derive several benefits from developing a recovery plan and having it ready such as:
- Minimize data loss — protects critical data by establishing replication intervals
- Quickly restores critical applications — minimizing downtime
- Distributes the risk — by using AWS cross-region disaster recovery
- Quick bounce back — requires minimal time to retrieve files and data, thus restoring operations
10 Tips For Developing an AWS Disaster Recovery Plan
#1. Identify critical resources and assets
What resources compose the core of your business? A Business Impact Analysis (BIA) can help give you a picture of which areas can become more affected in the event of a threat. It also can guide you to preview the potential impact of a disaster in operations.
#2. Define your recovery time objective (RTO) and your recovery point objective (RPO)
You should know how much system downtime your organization can afford before suffering irreparable monetary losses.Therefore, calculating your recovery time objective is critical for a successful recovery plan. Moreover, you need to calculate how much data loss your organization can absorb before incurring too much damage — that is the recovery point objective. For example, if losing 4 hours of data will cause too much damage, then you need to account for a RPO of much less than 4 hours.
#3. Choose a disaster recovery planning method
There are four main recovery methods you can choose according to your organization requirements and preferences:
- Backup and restore — you can use a managed solution to backup and restore data on a need-to-do basis. However, the restoration can consume a lot of time and resources as the system does not keep data on standby.
- Pilot light — keep a core of critical applications and data running to enable quick retrieving in the event of a disaster.
- Warm standby — this involves duplicating the system’s core elements and keeping them running on standby at all times. In the event of a disaster, this duplicate can be promoted to primary to maintain operations.
- Hot standby — make a full replica of the data and applications, deploying it in two or more active locations. You can then split the traffic between them, so in the event of a disaster, the system simply reroutes everything to an undamaged region.
#4. Define and implement security and corrective measures
For example, you can implement detective measures such as server and network monitoring software. Corrective measure as remediation tools can help restore a system after a disaster.
#5. Test your plan before implementing it
Schedule testing while developing your DRP can help you catch flaws before you need to implement the plan. This can ensure your plan is well oiled before a disaster or threat occurs.
#6. Schedule maintenance
You should update your plan on a regular basis, to catch up with system changes. In the aftermath of a threat, this forms part of lessons learned, refining the plan to prevent further attacks or failures.
#7. Backup your data
Scheduling regular backups of what you have stored on Amazon EC2 and EBS volumes could be insufficient to face a disaster. You need to have quick access to the data in the event of a disaster. A detailed and up-to-date AWS disaster recovery plan can help you recover and restore the backup data from the cloud environment with minimal downtime.
#8. Use cross-region backups
While developing your plan you need to decide where the critical data will be stored. To avoid getting your entire system knocked offline, you should distribute the data across different availability zones (AZ) around the world.
For example, you can use cross-region replication for S3. S3’s duplicates the data to multiple locations within a region by default, creating high durability. However, this does not eliminate the risk of data loss in a given region. To prevent this, you can use the cross-region replication option, automating the copying the data to a designated bucket in another region.
You can also use global tables in DynamoDB to deploy a multi-region multi-master database. This spreads the changes across several tables. Since the data is distributed in different regions, minimizes the risk of data loss.
#9. Use multi-factor authentication
Needless to say, you should keep your root passwords and credentials secure and hidden from non-authorized users, even disabling the programmatic keys once they are used, to prevent internal threats. Setting a multi-factor authentication solution can ensure the administrator and programmatic privileges don’t fall in malicious hands.
#10. Consider a third-party Disaster Recovery-as-a-Service (DRaaS)
While it may be tempting to implement all steps of a disaster recovery plan in-house, smaller companies lacking a dedicated IT team find it easier to use a third-party solution. Disaster recovery-as-service companies help organizations to develop, implement and maintain their DRPs, enabling them to focus on growing their businesses.
AWS Disaster Recovery Options
Let’s say you migrated to the cloud using the rehosting method and you use EC2 instances for your application. There are several ways to begin leveraging AWS functions to develop a DR plan:
- EC2 EBS snapshots — allow you to make incremental backups of an EBS volume.
- EC2 AMIs — works similarly to an EBS snapshot, contains metadata for the EC2 instance, and allows the entire EC2 instance to be restored.
- Lambda — a serverless product that allows you to run code outside the application environment and at the same time access the AWS resources. You can use Lambda to automate tasks such as EBS snapshots.
Developing and implementing a disaster recovery plan for AWS requires a certain degree of ingenuity, since AWS does not offer its own DR solution. However, the platform enables users to build a customized DR solution by repurposing some of the platform’s features and tools. In this article, I’ve aimed to give you some tips and tools to develop your own disaster recovery plan leveraging AWS environment.