Using Velero and Restic to Backup Kubernetes Resources

Kubernetes Bare-metal with Rook-Ceph

Alex Punnen
Techlogs
9 min readAug 6, 2020

--

Environment

We are doing this in a bare-metal K8s cluster. We have a rook-ceph based StorageClass for Persistent Volumes. Rook version 1.2, this supports CSI driver, and Ceph 14.2. Note that this is not important now, as the CSI Snapshot feature is still not available in Rook 1.2. We can use Restic.

Need for Velero and Restic

Using Rook-Ceph 1.2 version or higher, and using the CSI Volume driver, you are able to take VolumeSnapshot, but this is taken as a local persistent volume and cannot be taken out of the cluster. So you need Velero to take the VolumeSnapshot outside of the cluster.

Update: The Ceph-CSI team has now released ver ceph-csi-v3.0.0 which support the v1beta1 K8s snapshot storage API — https://www.humblec.com/ceph-csi-v3-0-0-released-snapshot-clone-multi-arch-rox/ . This is in Rook now and we could use that instead of Restic, though there are bugs here when I tried. See my other blog postrelated to this https://medium.com/techlogs/velero-with-csi-a883e8a24710

Step 1: Install S3 Object Storage. Simplest is Minio

After this port-forward so that you can access Mino externally

Or you can create ingress and use it

Step 2: Install Velero with Restic

Create a S3 credentials file credentials-velero with the following content

Next is to download the velero tar file, from https://github.com/vmware-tanzu/velero/releases/download/v1.5.1/velero-v1.5.1-linux-amd64.tar.gz, and move the binary to the bin folder.

Note that it expects a Kubeconfig file with ClusterAdmin privilege and you can then install it from your local machine to any cluster.

If you are using an Ingress the S3Url need to change s3Url=http://minio.10.x.y.z.nip.io

Without CSI

Without Restic and Using CSI Snapshot class of your provider

Let’s tackle that as another blog — https://medium.com/techlogs/velero-with-csi-a883e8a24710

Using Restic

“We integrated restic with Velero so that users have an out-of-the-box solution for backing up and restoring almost any type of Kubernetes volume*. This is a new capability for Velero, not a replacement for existing functionality. If you’re running on AWS, and taking EBS snapshots as part of your regular Velero backups, there’s no need to switch to using restic. However, if you’ve been waiting for a snapshot plugin for your storage platform, or if you’re using EFS, AzureFile, NFS, emptyDir, local, or any other volume type that doesn’t have a native snapshot concept, restic might be for you.”

Step 3: Test it out — Create a Test pod and PV and add some data

Pre-requisite. If you are using rook-ceph or similar for storage, ensure that you have the right Storage Driver (CSI or Flex) in both source and target. This is the case where you backup in one cluster and restores in another (target) cluster. Note that for this scenario to work the same storage class and storage plugin should be available in target.

We will be using the following Storage Class in both Source and Target. Note again that we are not backing up the StorageClass from source to target. That is usually not a good idea as versions etc could be different

Let’s write some random data in the Pod. Restic compresses data, so if it is not random you will not know the performance

The state of pods and PVC in the namespace

Now we need to annotate the Pod volumes that we need to backup from. Note that in a future version* of Velero this will not be necessary, but for now this is needed and without it the PV and PVC is copied but data won’t be copied.

*In v1.5 onward this is not needed- https://github.com/vmware-tanzu/velero/issues/1871

Now let’s use Velero to take back-up.

That’s it. In your Mino Server, you can see the backup stored.

Now, let’s delete our `test-nginx` namespace or restore this in a different cluster

And restore it back via Velero

You can see that everything in the namespace is restored

Here is the restore part

https://asciinema.org/a/358619

That’s it.

A few more details

Limitations of Restic?

https://velero.io/docs/v1.4/restic/

Restic scans each file in a single thread. This means that large files (such as ones storing a database) will take a long time to scan for data deduplication, even if the actual difference is small. If you plan to use the Velero restic integration to backup 100GB of data or more, you may need to customize the resource limits to make sure backups complete successfully.

Here is a test with 5G and 10G data

How to restore to another Cluster?

Install Velero in second cluster pointing to the same S3 bucket

How to Schedule backups?

https://velero.io/docs/v1.4/disaster-case/

Example every five minutes

If S3 is down will fail, next will be successful

Can I backup only filter particular resources -Persistent Volumes, Secrerts ?

Yes, you can filter via --include-resources

Can I flitler via particular name ?

Partially yes via --selector You can give a label here;

Can I restore to another namespace in same cluster?

Yes, namespace mapping. Note this does not work for PVC’s in same cluster. You can do this in another cluster, --namespace-mapping

Are backups incremental?

Yes

Are restore’s incremental?

No

What if the connection to S3 breaks?

For scheduled backups, the next schedule will trigger

For Inprogress, will stay in Progress forever. You need to delete the backup and delete the velero operator pod to recover. This looks like a bug

--

--