Business Continuity and Disaster Recovery at AWS for WordPress

Taking care of backups and disaster recovery is the matter you are going to consider sooner or later. Even if you are running your infrastructure on such a durable cloud provider as Amazon, don’t disdain at least minimal backup\restore strategy.

As per AWS disaster recovery WhitePaper we have four options.

  • Backup and Restore: Pretty common solution for on-premises infrastructure. Data is backed up to tapes or some other remote storage, AWS can also act as that “remote storage”
  • Pilot Light: Based on idea of having little “something” in a remote location that will help us to speed up recovery process in case of disaster, for example, it can be a copy of Database running in another region. Amazon wrote a good comparison here:
“The idea of the pilot light is an analogy that comes from the gas heater. In a gas heater, a small flame that’s always on can quickly ignite the entire furnace to heat up a house.”
  • Warm Standby: This solution extends the pilot light elements and preparation. It further decreases the recovery time (RTO) because some services are always running.
  • Multi site: Full copy of infrastructure in another region. Don’t want to deep into details here, since name of this strategy pretty much explains itself.

We will use simple and well known WordPress based web application. Let’s say we have the following setup (of course we might have some more resources such as CloudWatch monitoring alarms, SNS topics, LoadConfiguration for Auto Scaling, but those are considered as basics and will not be present on the following scheme)

Few words about setup:

  • Network: 4 subnets in 4 availability zones of US-EAST region,
  • S3: stores WordPress content and themes,
  • EFS: stores website uploads, pictures, icons, etc…

Previously we clarified four available DR strategies. Who said you should be limited to one of those strategies? Feel free to improvise. We can take one as a basic strategy and combine it with another. We rely on Amazon pretty much and believe it’s durable enough provider to let it take care of some backups for us such as RDS backup. Almost everything is stored on amazon S3 under the hood (AMIs, DB backups, and so on…), and we trust in S3, since it’s geo-redundant storage, distributed across all AWS regions. If we talk about EC2 instance, to back it up we’ve created AMI, that is also stored on S3 behind the scenes. So roughly speaking we went a Backup and Restore way so far plus some tools, on which I’m going to shed some light now.

Backup remarks:

It’s OK to rely on Amazon unless there is no nuclear war down in the area, where you host your environment. Just imagine a situation when whole AWS data center \ region is destroyed, what would you do to restore you services? For instance, you will not be able to restore RDS database from a snapshot unless you copied specific snapshot to another region BEFORE the disaster, and here’s another thing to consider: it’s not possible to copy DB snapshot of an encrypted database to another region. Another example is an EC2 instance “backed up” to AMI, this AMI will be available only in specific region, where it has been created, and again you have to copy this AMI to another region BEFORE the disaster to be able to restore it to another region. The purpose of this article is to help preparing your target environment for restore, so I assume you’ve already chosen a right backup strategy to be safe in case of AWS region unavailability.

Even though we are pretty safe with all of our data stored on S3 in case of disaster it would have taken a long time to build an environment from scratch even if we had all backups near at hand. To reduce RTO we’ve used Amazon Cloud Formation to create a template with everything we have in current setup, that allows us to build a platform for restore of our environment on a cold site in about 30 minutes.

Technical details

Before trying to roll out infrastructure from this template
First of all open this template in designer, get acquainted, remove some components that you might not need, check instance types for RDS, EC2, specify correct email addresses in SNS topic “Support” and also cover these moments:

  1. You have to have pre-configured AMI with WordPress installed and configured for hosting your website. Make changes to CF template load configuration to refer to your AMI;
  2. Manually create key pair in AWS console, and specify it’s name in CF template;
  3. Change database password in CF template;

Components that are created from this CF template:

  • VPC with subnets, internet gateway, routing table, custom routes;
  • Security groups;
  • Elastic Load Balancer;
  • AutoScaling group with load configuration, auto scaling policies;
  • SNS topics and CloudWatch alarms;
  • RDS;
  • S3 bucket;
  • EFS;
  • ElastiCache;

Some resources are created but require additional configuration or data load:

  1. Empty RDS instance is created with this template and we need to populate it with our data (restore of RDS daily backup, point in time restore or restore from dump);
  2. Empty S3 and EFS storages will be created and we will have to upload data there;
  3. Import existing or request new certificate and configure HTTPS listener on elastic load balancer;
  4. Modify DB details in Wordpress config to point it to new DB address;
  5. Configure Route53.

Overall visualization of our template:

Cold_site_CF.template

In this particular example we used a combination of best practices and different tools. Such as our ultimate goal was to reduce time required for restore of environment on a cold site, we can say that now everything is tuned out.

Accept yourself, your strengths, your weaknesses, your truths, and know what tools you have to fulfill your purpose.
-Steve Maraboli, Life, the Truth, and Being Free