Kubernetes volume backup for disaster recovery

Amitabh Prasad
7 min readDec 15, 2021

--

Authors: Ranjeet Singh, Amitabh Prasad

Containers have always been considered a great fit for stateless applications. But most real-world use case requires some sort of data persistence. To support such use cases Kubernetes introduced the concept of stateful workloads.

Some of the cloud-native applications we recently developed uses in cluster databases (such as MongoDB deployed using their operators) to take advantage of resilience, auto-scale, and other life cycle features that made Kubernetes goto platform for cloud deployment.

When running in-cluster DB it’s equally important to take regular backups. While for native Kubernetes resources taking etcd backup is sufficient and tools such as Velero does an excellent job to do backup/restore of Kubernetes resources, the challenge comes with volume backup, though Velero does support volume backup using restic but that’s a very rudimentary way of taking backups moreover restic backup always does full backups which may result in extended RTO’s among other complexities, as volume of data grows.

In this blog post, we will show one more way of taking volume backups i.e. by using volume snapshots in the AWS cloud. Please note at the time of writing volume backups relies heavily on cloud providers and Kubernetes inherently does not support DR with volume snapshots i.e. it doesn’t support the recovery of data in a new cluster in a new region.

Thankfully most of the cloud providers do support volume snapshot, moving snapshot to recovery region and restore using those snapshots.

Here is the list of cloud providers that supports volume snapshot.

Some of the Kubernetes components used in this blog are

Persistent Volume (PV)

The abstraction that represents physical storage.

StorageClass

The type of storage from which a PV is created. For example a PersistenceVolumeClass can be from an AWS gp2 or io1. It can also be from a local disk with specific RAID options or an NFS partition.

PersistentVolumeClaim (PVC)

A claim to a specific resource on a PV. The PVC consumes specific size and access modes and other characteristics of the PV. In order for a pod to mount a PV storage type it must first create a claim upon that PV. While in the course of mounting storage on k8s pods it operates as though the PVC is the volume while in fact the PVC itself does not represent any physical storage but a right to use physical storage.

VolumeSnapshotContent

This is similar to the PV in that it represents an actual snapshot of data on some physical storage. This is typically created from a PVC but can also be pre-provisioned.

VolumeSnapshot

The VolumeSnapshot relationship to the VolumeSnapshotContent is akin the PVC relationship to the PV. The VolumeSnapshotContent is the physical storage while the VolumeSnapshot is the interface to it. The VolumeSnapshot represents the request for a snapshot from a PVC and it maintains the status of creating the snapshot from the data held by the PVC onto the VolumeSnapshotContent. It also can be used as a source for creating a PVC from the contents of its bound VolumeSnapshotContent.

VolumeSnapshotClass

This is similar to the SnapshotClass in that it describes attributes of the VolumeSnapshotContent. The most important attribute in this case is the ebs.csi.aws.com driver.

The diagram below represents the use case we will be using for this blog

Let’s understand all the steps involved .

  1. Deploy sample application that uses pvc for persistency and add some sample data to this volume
  2. Take backup of volume using volume snapshot
  3. Copy the snapshot to another region (eu-west-2) where second cluster is deployed
  4. Use AWS copied snapshot to create VolumeSnapshotContent & VolumeSnapshot in second OCP cluster
  5. Create application PVC in second cluster using newly created VolumeSnapshot and verify restore

Prerequisites

  1. We have two OpenShift clusters deployed in AWS and these should be in two different regions (us-east-1 & eu-west-2).
  2. An application is installed and running on one of the clusters where we shall initiate application volume backup and restore it on another cluster.
  3. A storage class compatible to CSI driver should be installed on both the clusters.
  4. Ensure that Volume snapshot CRD’s & volume snapshot controller are installed on these two clusters, if not please execute following steps.

5. VolumeSnapshotClass should be also created in both the clusters

  • Make sure to use `deletionPolicy` as Retain, else snapshot will be deleted from AWS as soon as volumesnapshot is deleted either by deleting application namespace or when kubernetes cluster is deleted.

The Solution:-

We now execute all the following steps sequentially to demonstrate the backup & restore of an application with its volume (PVC) using AWS snapshot.

Link to git repo that has code used in this blog

Step 1. Deploy sample application that uses pvc for persistency and add some sample data to this volume

We deploy a simple application with a pod (test-pod) and a PVC (pvc-002) attached to this pod for data persistency. This application is installed on first cluster running on AWS, us-east-1 region.

Pod definition:-

apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: csi-app
labels:
app: test-pod
spec:
containers:
- image: quay.io/libpod/ubuntu
name: test-pod
command: ["sleep","1000"]
volumeMounts:
- mountPath: /myvolumepath
name: mydata
volumes:
- name: mydata
persistentVolumeClaim:
claimName: pvc-002

PVC definition:-

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: csi-app
name: pvc-002
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: gp2-csi

Then we add some sample data to the application to its mount point (/myvolumepath) with a sample file, sample.txt

Step 2. Take backup of volume using volume snapshot

Now, we create a volume snapshot that will take backup of application PVC and this will create a snapshot in AWS. We run the following command to create a snapshot of PVC.

VolumeSnapshot definition-

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: pvc002-snapshot
spec:
volumeSnapshotClassName: csi-aws-vsc
source:
persistentVolumeClaimName: pvc-002

Once we create a volume snapshot for PVC, a snapshot is also created in the same region, us-east-1 in the AWS environment.

Step 3. Copy the snapshot to another AWS region (eu-west-2) where the second cluster is deployed

We can copy this snapshot using the AWS console as per the following image. We need to specify the destination region and description on the page and then a snapshot will be created in the second region which is a copy of the first one.

Copied snapshot is created in the eu-west-2 region in AWS.

Step 4. Use AWS copied snapshot to create VolumeSnapshotContent & VolumeSnapshot in the second OCP cluster which is our recovery cluster.

Now, we create a VolumeSnapshotContent in the second OCP cluster deployed in the eu-west-2 region in AWS using the newly copied snapshot.

We use the following command to create it. Notice, we mention Snapshot ID for snapshotHandle parameter in VolumeSnapshotContent definition. We can get this value (Snapshot ID starts with snap-) on the AWS console.

VolumeSnapshotContent definition-

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-snapshot-content
spec:
deletionPolicy: Delete
driver: ebs.csi.aws.com
source:
snapshotHandle: snap-026780e8ff3bd6e74
volumeSnapshotClassName: csi-aws-vsc
volumeSnapshotRef:
name: restore-snapshot
namespace: csi-app

Next, we create VolumeSnapshot on the same second OCP cluster using the newly created VolumeSnapshotContent.

VolumeSnapshot definition:-

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: restore-snapshot
spec:
source:
volumeSnapshotContentName: test-snapshot-content

Step 5. Create application PVC in the second cluster using newly created VolumeSnapshot and verify the restore

We need to specify the VolumeSnapshot name and storage class name in the spec section.

PVC definition:-

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: csi-app
name: pvc-002
spec:
storageClassName: gp2-csi
dataSource:
name: restore-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

Last, create POD in the second OCP cluster that uses newly created PVC pointing to restored data using the same POD definition.

Log into the POD and check if data is correctly restored at the mount point using volume snapshot.

We can see sample.txt file is present at the mount path, which has the same data that we have written in the application running on the first cluster.

Conclusion:

In the above blog post, we demonstrated how we can take a volume snapshot, move it into a different region, and recreate volume using the same snapshot. Snapshot can be taken using Velero as well, but moving to different region restore in new cluster/region doesn’t work out of the box with Velero. This was one of the ways to meet our requirement for addressing DR for volumes.

For native Kubernetes, we continue to use Velero but purposefully left those details to keep our focus on volume backup/restore.

--

--