Thirikandanathan Sivaraj
Ankercloud Engineering
7 min readMar 14, 2024

--

Asynchronous Replication using Persistence Disk

In today’s business landscape, data availability and integrity are paramount. Disasters, whether natural or man-made, can disrupt operations and pose a significant risk to critical information.

To eliminate this risk, GCP has introduced Persistent Disk Asynchronous Replication, enabling disaster recovery for Compute Engine workloads by replicating data between Google Cloud regions. This provides a sub-1 min Recovery Point Objective (RPO) and low Recovery Time Objective (RTO).

Setting up replication is simple. Replication is managed with a few API calls — no required VM agents or dedicated replication VMs are needed. PD Async Replication supports the full lifecycle of DR testing, failover, and failback.

Setting up replication-

PD Async Replication can be enabled on the existing PD disks with just API, gcloud, or Google Cloud console. First, create a new blank disk in the secondary region with a reference to the primary disk you want to protect. Then, start replication from the primary disk with a reference to the secondary disk. From that point on, data is automatically replicated between disks, typically with an RPO of less than a minute, depending on the change rate of the disk. This setup workflow helps to ensure that an explicit action is taken in both regions before any data is transferred. You don’t need to reconfigure your network to use PD Async Replication.

After the setup is done and once the replicated is started, we can able to monitor and observe the time since the last replication and the network bytes sent in Cloud Monitoring.

Asynchronous Replication Set up-

  • Create a virtual machine in a primary region
  • If we want to replicate the persistent disk, We can create a disk while creating the VM or it can be done separately and assigned to the VM.
  • The primary disk is created along with the VM.

Create a secondary disk-

  • In the Google Cloud console, go to the Disks page.
  • Click the name of the primary disk. And create the secondary disk

Give the name for the disk and select the region that Supports Asynchronous Replication.

Supported region pairs for Asynchronous Replication

Start replication-

  • In the Google Cloud console, go to the Asynchronous replication page.
  • Click the name of the secondary disk that you want to start replication.
  • Click Start replication. The Start replication window opens.
  • Click Start replication.

Note( if you did not find the start replication option, by default while creating the secondary disk the replication would have been started.)

  • Replication status can be found under the details of the secondary replicating disk, as shown in the image.
  • we can able to monitor and observe the time since the last replication and the network bytes sent in Cloud Monitoring.

Stop replication-

  • In the Google Cloud console, go to the Asynchronous replication page.
  • Click the disk for which you want to stop replication. The Manage disk page opens.
  • You can find an option to terminate replication. Click Terminate replication.
  • The disk will terminate.

Note (once the replication is stopped, it is not possible to replicate further. If we want to replicate further, we want to create a new secondary disk and start replication.

  • To confirm if replication is stopped, we can verify it from the disk details under the persistence status, as shown in the image

Failover to the secondary region-

Whenever a disaster occurs in the primary region, standby or backup resources from the secondary region come into action. Operations and site engineers are responsible for deciding when a disaster has occurred in the primary region and when to initiate a failover.

To begin the failover, stop replication between disks and attach the secondary disk to a VM in the secondary region. The secondary region will now act as the primary region.

Failover to the secondary region:

Whenever a disaster occurs in the primary region, standby or backup resources from the secondary region come into action. Operations and site engineers are responsible for deciding when a disaster has occurred in the primary region and when to initiate a failover.

To begin the failover, stop replication between disks and attach the secondary disk to a VM in the secondary region. The secondary region will now act as the primary region.

  • Instance after created in the secondary region

Disks after creating the VM in the secondary region.

Failback to the original primary region-

After a disaster is resolved, initiate a failback to the original primary region. This configures and starts replication from the acting primary disk to a new secondary disk in the acting secondary region.

  • Create a secondary disk in the acting secondary region. The acting secondary region is the original primary region.
  • Start replication from the acting primary disk to the new secondary disk.
  • Optional: Move the workload from the acting primary region to the original primary region by doing the following:
  • Wait for the initial replication to complete. The initial replication is complete when the disk/async_replication/time_since_last_replication metric is available in Cloud Monitoring. If you don’t see the RPO metric in Cloud Explorer, that means the initial replication isn’t complete.
  • Recommended-: To avoid data loss, schedule downtime for the workload and bring the workload offline.
  • Stop replication.
  • If you don’t already have a VM in the same region as the secondary disk, create one.
  • When creating a new VM, create a boot disk and for the secondary disk add the secondary disk which is replicated. The secondary disk is now the workload’s primary disk in the original primary region.
  • Reconfigure replication in the original primary region by doing the following-
  • Create a new secondary disk in the original secondary region.
  • Start replication from the primary disk to the new secondary disk.

Steps in CLI-

  • Once the VM is created in the primary region ssh into the VM.
  • Use the command `lsblk`, you can find the additional disk at the bottom, which is unmounted.
  • Create a mount point using `sudo mkdir /mnt/mydisk`.
  • To mount the disk, give the cmd sudo mount /dev/sdb /mnt/mydisk.
  • Once the disk is mounted, we can check mounted disk using the command “lsblk” and see the corresponding mount point.

Automount on Boot (Optional)-

  • When the Virtual machine is stopped and turned on. The additional disk will unmount automatically.
  • If you want the disk to be automatically mounted whenever the VM boots up, you can add an entry to the /etc/fstab file
  • Open the /etc/fstab file using the cmd sudo nano /etc/fstab.
  • Add an entry for the disk using its UUID or device path along with the mount point and filesystem type

UUID=< your disk’s UUID> /mnt/mydisk ext4 defaults 0 0

  • Using the command “blkid” to see the UUID.

For eg,

  • Now you can go to the particular and upload/add the respective data.
  • It will start replicating, we can monitor it in Cloud Monitoring.
  • Once a VM is created or attached to the secondary disk
  • Now ssh into the VM and do the same mounting process, and mount the disk.
  • Once the disk is mounted, you can see the data and files that are replicated from the primary region. As shown in the image.

--

--