K8s DR in GCP using Velero

Kirat pal Singh Lamba
Quinbay
Published in
5 min readFeb 8, 2022

(Velero -v1.6)

Why Backup of kubernetes cluster is needed?

The main point that is often lost when running services in high availability (HA) mode is that HA (and thus replication) is not the same as having backups. HA protects against zonal failures, but it will not protect against data corruption or accidental removals. It is very easy to mix up the context or namespaces and accidentally delete or update the wrong Kubernetes resources. This may be a Custom Resource Definition (CRD), a secret, or a namespace. If you are running a StatefulSet in your cluster (e.g. ELK stack for logging), backups are needed to recover from persistent volume failures.

What is Velero?

Velero (formerly Heptio Ark) is an open-source tool for back up and restore of Kubernetes cluster resources and persistent volumes. You can run Velero with a public cloud platform or on-premises. Velero lets you:

  • Take backups of your cluster and restore in case of loss.
  • Migrate cluster resources to other clusters.
  • Replicate your production cluster to development and testing clusters.

Installation and Configuration

Velero consists of:

  • A server that runs on your cluster
  • A command-line client that runs locally

The server component is recommended to be installed via Helm chart whereas the command-line client can be setup using velero birary which is publicly available on the official github repo.

  1. Before installing a server component, you need to create a velero service account and attach policies to give velero the necessary permissions to function:
#ROLE_PERMISSIONS=(
compute.disks.get
compute.disks.create
compute.disks.createSnapshot
compute.snapshots.get
compute.snapshots.create
compute.snapshots.useReadOnly
compute.snapshots.delete
compute.zones.get
)#gcloud iam roles create velero.server \
--project $PROJECT_ID \
--title “Velero Server” \
--permissions “$(IFS=”,”; echo “${ROLE_PERMISSIONS[*]}”)”#gcloud projects add-iam-policy-binding $PROJECT_ID \
--member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
--role projects/$PROJECT_ID/roles/velero.server#gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}

2. Once the service account is created, you can use publicly available velero. helm chart to install the server component as follows :

$ helm3 install --wait --timeout 600s --namespace velero /
--set configuration.provider=gcp /
--set-file credentials.secretContents.cloud=<SA_key_file> /
--set configuration.backupStorageLocation.name=<backup_location> / --set configuration.backupStorageLocation.bucket=<bucket_name> /
--set configuration.backupStorageLocation.prefix=<bucket_prefix> /
--set image.repository=velero/velero /
--set image.pullPolicy=IfNotPresent /
--set initContainers[0].name=velero-plugin-for-gcp /
--set initContainers[0].image=velero/velero-plugin-for-gcp:v1.0.2 /
--set initContainers[0].volumeMounts[0].mountPath=/target /
--set initContainers[0].volumeMounts[0].name=plugins /
--set deployRestic=false vmware-tanzu/velero --generate-name

3. Once the server and CLI component is installed, velero can be configured by modifying values.yaml. The key configuration steps are installing the plugins for the storage provider and defining the Storage Location as well as the Volume Snapshot Location:

configuration:
provider: gcp
backupStorageLocation:
name: velero-cluster-backup
bucket: <gcp-bucket-name>
prefix: prod
config:
kmsKeyId: <my-kms-key>
region: <gcp-region>
volumeSnapshotLocation:
name: aws
config:
region: ${region}
logLevel: debug

Backup Strategy

It is always recommended to take backup of the whole cluster irrespective of the namespaces, as during a disaster the whole cluster can be restored in one go under a single backup.

Creating a Backup

Velero gives an option to take backup on 3 different levels -
1. Namespace
2. Resource type
3. Selector

$ velero backup create <backup-name> --storage-location=velero- cluster-backup --include-namespaces <namespace> --selector <label>

When the backup command is issued, Velero runs through the following steps:

  1. Call the Kubernetes API to create the Backup CRD
  2. Velero BackupController validates the request
  3. Once the request is validated, it queries the Kubernetes resources and takes snapshots of disks to back up and creates a tarball
  4. Finally, it initiates the upload of the backup objects to the configured storage service

Once the backup is created, you can describe the backup to see the details of resources that got backed up.

$ velero backup describe <backup-name> --details

You can also check all the backup files stored in GCS bucket

Scheduled Backups

Instead of only creating backups on-demand, you can also configure scheduled backups for critical components:

$ velero schedule create schedule-gke-prod –-schedule=”@every 24h”

Here BACKUP TTLis set to default 30 days. The backup ttl can be specified while creating backup by using --ttl command.
For backups which are already created, you can change the ttl by editing the backup using this command -

$ kubectl -n velero edit backup <backupName>

Restore Strategy

Since the backup strategy was to take backup of the whole cluster, during restore you can specifically mention the namespace or the resource type that you want to restore.

For restoring a statefull application, it is recommended to restore StatefulSet, PersistentVolumeand PersistentVolumeClaimin one go!.

Creating a Restore

$ velero restore create <restore-name> --from-backup <backup-name> 
--include-resources stetefulest,persistentvolume,persistentvolumeclaim

Once the restore is created, you can describe the restore to see the details of resources that got restored.

$ velero restore describe <restore-name>

Advantages of using velero :

  • Back up your cluster and restore it in case of loss.
  • Recover from disaster.
  • Copy cluster resources to other clusters.
  • Replicate your production environment to create development and testing environments.
  • Take a snapshot of your application’s state before upgrading a cluster.

Happy Learning… Thanks for taking time out to read it.

--

--