Performing an Aurora Restore-To-Point-In-Time with Terraform

Photo by Martin Sanchez on Unsplash

With the recovery and backup tools available in Amazon Aurora, it allows for teams to meet tighter RPO (Recovery Point Objective) requirements. By default, Aurora backs up its cluster volume automatically and retains it for a user-defined backup retention period. Aurora backups are continuous and incremental, data can be restored to any point in time within the user-defined backup retention period. You can even restore your data to what it was 5 minutes ago.

In the road leading up to our MVP launch at the CorenetX team, I took on a task where I would be working on identifying the data recovery activities for our database. Thanks to “Restore to point in time”, we were able to meet the RPO set by our stakeholders.

Restore To Point In Time via the AWS Admin Console
Restore-To-Point-In-Time via the AWS Admin Console

Restoring via AWS Admin Console

This creates a new Aurora cluster populated with data as at the restore point. However, that is not exactly the case for the new cluster’s configuration. From the “Restore to point in time” UI, there are inputs that are not auto-populated even though they have been set in the original cluster(e.g.: Initial database name, Copy tags to snapshots, Log Exports).

Inputs to check against your original cluster’s configuration

Restoring via AWS CLI

At the point where I initiate the creation of a new recovery cluster based on the restore point, I needed the new cluster to have a configuration as identical to the original as possible. One option available was to create the recovery cluster via the AWS CLI instead. I could create a shell script and hardcode the various configurations I need to be replicated onto the recovery cluster. The shell script will need to incorporate the restore-db-cluster-to-point-in-time and create-db-instance commands. The first command only creates a cluster, but without any instances. We have also been provisioning all of our necessary infrastructure via Terraform. Separately running a shell script will cause drift.

Restoring via Terraform

Thankfully, Terraform provides options to perform “Restore to point in time” recovery. It is also a simple task to copy the configurations of the original cluster into the recovery cluster. Yay for Infra-as-code! ^_^

The team also uses Terragrunt to keep our Terraform DRY. If you are using Terraform, all terragrunt commands mentioned in this article can be replaced with terraform as I will not be using any Terragrunt specific commands.

Our existing Aurora cluster was provisioned using the Terraform module “terraform-aws-modules/rds-aurora/aws”. To create a new cluster populated with the restore point’s data, I added another module block identical to the original and included two additions.

module "aurora-recovery" {  source  = "terraform-aws-modules/rds-aurora/aws"
version = "6.2.0"
create_cluster = var.enable_restore restore_to_point_in_time = {
source_cluster_identifier = local.db_name
restore_to_time = var.restore_to_time
}
#the rest are configs copied from the original cluster...
...
...
}

var.enable_restore is a boolean variable that I use as a trigger switch. I set a default value false to prevent this module from accidentally provisioning resources.

local.db_name is the identifier of the cluster to copy data from.

restore_to_time is the point in time to restore to, in UTC format (e.g.: 2022–07–19T08:00:00+00:00)

If you did not provision by using the module “terraform-aws-modules/rds-aurora/aws”, you can provision using the resource block aws_rds_cluster instead. It also accepts the restore_to_point_in_time block. Don’t forget to also provision an instance with aws_rds_cluster_instance if you go down this route.

With all of that done, we can run terragrunt apply to provision the new cluster. Once the new cluster and its write instance is up and running, you can update your application to point to the new database endpoint. But ideally, you probably would want to reuse the old endpoint instead and avoid changing your application code…

Renaming the database cluster and instance identifiers

Unfortunately as of the time of writing, changing the database cluster and instance identifiers will result in Terraform destroying and re-creating the resources and causing data to be lost.

But! Not all is lost. It is possible to rename the identifiers, and keep them maintained in your Terraform, without destroying and re-creating the cluster and the instance.

Enter, modifying the Terraform state file. Assuming you have created a backup snapshot of the original cluster and deleted it, or have already renamed it (sorry… a bit of a chicken and egg, but you can do the below to rename your original too), the gist of it is:

  • Remove the states for your cluster and instance
  • Head back to the Admin Console and rename the identifiers
  • Perform a terraform state import

By doing the above, you are removing the state that contains the identifiers used when the recovery cluster was created, and then replacing them with a new state that includes the updated identifiers.

The steps to achieve the above:

  • Highly recommend that you backup your existing state first terragrunt state pull > backup.tfstate
  • From the Admin Console, create a backup snapshot of the original cluster before deleting the cluster. I did the deletion by removing the module block for the original and then ran terragrunt apply . Or, you can rename the original cluster. DB identifiers must be unique.
  • While still on the Admin Console, rename the identifiers of the recovery cluster and instance.
  • Get the addresses of your resources in your Terraform state. Because my resources were provisioned by the module block “aurora-recovery”, I am interested in addresses beginning with “module.aurora-recovery”. I run the command terragrunt state list | grep module.aurora-recovery to get them.
module.aurora-recovery.aws_appautoscaling_policy.this[0]
module.aurora-recovery.aws_appautoscaling_target.this[0]
module.aurora-recovery.aws_db_subnet_group.this[0]
module.aurora-recovery.aws_iam_role.rds_enhanced_monitoring[0]
module.aurora-recovery.aws_iam_role_policy_attachment.rds_enhanced_monitoring[0]
module.aurora-recovery.aws_rds_cluster.this[0]
module.aurora-recovery.aws_rds_cluster_instance.this["1"]
  • In my case, I am interested in the resources aws_rds_cluster and aws_cluster_instance . I will remove them from the state using terragrunt state rm 'module.aurora-recovery.aws_rds_cluster.this[0]' and terragrunt state rm 'module.aurora-recovery.aws_rds_cluster_instance.this["1"]'
  • Next, I import the database resources from AWS into the state with terragrunt state import 'module.aurora-recovery.aws_rds_cluster.this[0]' <the updated cluster identifier> and terragrunt state import 'module.aurora-recovery.aws_rds_cluster_instance.this["1"]' <the updated cluster instance identifier>
  • Finally, I remove the restore_to_point_in_time block from module “aurora-recovery”and run terragrunt plan to do a check. There should be no attempt by Terraform to destroy the Aurora cluster or instance, an indication that the new identifiers are reflected in the state.

Summary

With this, you can use Terraform to perform a Point in time recovery for your Aurora clusters and with a bit of state file modification, you can reuse your old endpoints to connect to your new recovery cluster. This should minimise application code changes in the event of a data recovery exercise.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Edmund Loh

Edmund Loh

Software engineer, casual gamer, cycling enthusiast